George Joseph [Tue, 12 Jul 2016 01:07:20 +0000 (19:07 -0600)]
rest_api/channels: Fix multiple issues with create and dial
* We weren't properly subscribing to the channel and it's originator
on create.
* We weren't doing a publish_dial after calling ast_call on dial.
* We weren't calling depart_bridge when a channel left the dial bridge.
The first 2 issues were causing events to not be generated and the third
was actually causing channels to not get properly destroyed when hung up.
Together these 3 issues were causing the new
rest_apichannels/create_dial_bridge tests to fail.
As a result of the fixes, the cdr state machine had to be slightly
tweaked to allow bridge leave events without asserting and the tests
themselves had to be updated to account for the channels now cleaning
themselves up.
REF_DEBUG: Prevent logging of container node objects.
Using AO2_CONTAINER_ALLOC_OPT_DUPS_REPLACE can result in an unref being
recorded to the refs log for the node being replaced. This prevents
logging of those unrefs since they would produce errors in
refcounter.py.
chan_sip/res_pjsip_t38: Handle a request to negotiate T.38 after it is enabled.
Some T.38 implementations may send another re-invite after the initial
one which adds additional negotiation details (such as the max bitrate).
Currently this will fail when passthrough is being done in chan_sip as we
do nothing if T.38 is already active.
Other handlers of T.38 inside of Asterisk (such as res_fax) handle this
scenario so this change adds support for it to chan_sip and res_pjsip_t38.
If a request to negotiate is received while T.38 is already enabled a
new re-INVITE is sent and negotiation is done again.
When using TCP transport with chan_pjsip, the TCP_NODELAY
option value was allocated on the stack, then passed as a
pointer to the tcp transport configuration structure, and
later re-used on subsequently created sockets when it was
no longer valid. This patch changes the allocation to be
a static.
ASTERISK-26180 #close
Reported by: Scott Griepentrog
Richard Mudgett [Wed, 22 Jun 2016 22:26:38 +0000 (17:26 -0500)]
res_pjsip_session.c: Don't send extra BYE if SDP invalid.
When an answer SDP is invalid we were disconnecting the outgoing call and
sending two BYE requests. The first BYE was sent by PJPROJECT because of
the invalid SDP answer. The second BYE was sent by Asterisk because it
thought the canceled call was the result of the RFC5407 section 3.1.2 race
condition.
* Made not send the BYE on a canceled session if the SDP negotiation is
incomplete because PJPROJECT has already sent a BYE for the failed
negotiation.
Richard Mudgett [Mon, 27 Jun 2016 22:19:08 +0000 (17:19 -0500)]
res_pjsip_session.c: End call on initial invalid SDP negotiation.
When an incoming call defers SDP negotiation and then sends us an invalid
SDP in the ACK, we need to send a BYE to disconnect the call. In this
case SDP negotiation has failed and we don't have valid media streams
negotiated.
Richard Mudgett [Thu, 30 Jun 2016 20:17:02 +0000 (15:17 -0500)]
features: Fix channel datastore access.
Found as a result of the testsuite tests/callparking test crashing.
Several calls to ast_get_chan_featuremap_config() and
ast_get_chan_features_xfer_config() did not lock the channel before
calling so the channel's datastore list was accessed without the lock's
protection. Apparently another thread deleted a datastore on the
channel's list while the crashing thread was walking the list. Crash at
0xdeaddead due to MALLOC_DEBUG's memory filler value as a result.
* Add missing channel locks to calls that were not already protected
as the doxygen for those calls indicates.
Matt Jordan [Wed, 29 Jun 2016 20:31:30 +0000 (15:31 -0500)]
hep.conf.sample: Default 'enabled' to 'no'
Following the principle of least surprise, we should not be sending
massive numbers of PJSIP and RTCP HEP packets out into the ether to some
only-slightly-random IP address. Having 'enabled' set to 'no' in the
sample configuration file should prevent this from happening for those
who run 'make samples'.
Matt Jordan [Wed, 29 Jun 2016 20:09:02 +0000 (15:09 -0500)]
pjproject/patches/config_site: Increase the max number of ICE candidates
When negotiating ICE candidates with WebRTC capable endpoints, many
networks will result in a browser offering ICE candidates that exceeds
the default number of max candidates, 16. This patch bumps the max
candidates to 32, with the max checks at twice the number of candidates.
In practice, this has shown to be sufficient for browser/WebRTC
negotiation.
George Joseph [Tue, 28 Jun 2016 14:00:32 +0000 (08:00 -0600)]
codecs: Fix ABI incompatibility created by adding format_name to ast_codec
Adding format_name even to the end of ast_codec caused issued with
binary codec modules because the pointer would be garbage in asterisk
when they registered. So, the ast_codec structure was reverted and an
internal_ast_codec structure was created just for use in codec.c. A new
internal-only API was also added (__ast_codec_register_with_format) so
that codec_builtin could register codecs with the format_name in a
separate parameter rather than in the ast_codec structure.
This patch removes the following modules:
- pbx_functions: It never existed.
- res_pjsip_log_forwarder: It no longer exists.
- res_hep_pjsip: The base HEP module wasn't loaded, and most basic PBXs
aren't going to be installing HOMER
- res_pjsip_phoneprov_provider: The basic res_phoneprov module isn't
loaded, and we aren't configured to make use of the
module
Joshua Colp [Wed, 22 Jun 2016 16:19:32 +0000 (13:19 -0300)]
siren: Add format attribute modules for Siren7 and Siren14.
This change removes hardcoded SDP parsing and generation for
Siren7 and Siren14 from chan_sip and moves it to format attribute
modules so it can also be used by chan_pjsip.
With this the fmtp lines for both are added with the bitrate
information.
Corey Farrell [Wed, 22 Jun 2016 20:04:54 +0000 (16:04 -0400)]
res_fax: Fix reference leak in fax_v21_session_new.
fax_v21_session_new created a session details object but only released
the allocation reference during error conditions. fax_session_new adds
it's own reference to details if needed so the caller is always
responsible for cleaning it's own reference.
Alexei Gradinari [Wed, 22 Jun 2016 19:25:23 +0000 (15:25 -0400)]
res_pjsip: improve realtime performance #2
The patch removes updating all Endpoints' status on startup.
Instead, only non-qualified aors with static contact
and non-qualified non-expired contacts are retrieved from the realtime to
update the endpoint status to ONLINE.
The endpoint name was added to the contact object to simply find the endpoint
that created this contact.
The status of endpoints with qualified aors will be updated by 'qualify'
functions.
George Joseph [Wed, 22 Jun 2016 15:37:23 +0000 (09:37 -0600)]
chan_unistim: Fix memcpy in get_to_address
A code block only enabled when HAVE_PKTINFO is not defined (FreeBSD)
was using a pointer to a pointer as the destination of a memcpy and a
'&' instead of '*' in the sizeof.
Mark Michelson [Mon, 20 Jun 2016 18:21:52 +0000 (13:21 -0500)]
Fix Alembic upgrades.
A non-existent constraint was being referenced in the upgrade script.
This patch corrects the problem by removing the reference.
In addition, the head of the alembic branch referred to a non-existent
revision. This has been fixed by referring to the proper revision.
This patch fixes another realtime problem as well. Our Alembic scripts
store booleans as yes or no values. However, Sorcery tries to insert
"true" or "false" instead. This patch introduces a new boolean type that
translates to "yes" or "no" instead.
George Joseph [Wed, 22 Jun 2016 15:51:14 +0000 (09:51 -0600)]
test_res_pjsip_scheduler: Add 'depends' on pjproject in MODULEINFO
Since the file was missing the depends on pjproject, it wasn't
picking up the pjproject related include path. If there was no
system installed pjproject and pjproject-bundled was used, a compile
would fail because pjsip.h wasn't found.
George Joseph [Sun, 12 Jun 2016 16:19:27 +0000 (10:19 -0600)]
res_pjsip_pubsub: Address SEGV when attempting to terminate a subscription
Occasionally under load we'll attempt to send a final NOTIFY on a
subscription that's already been terminated and a SEGV will occur
down in pjproject's evsub_destroy function. This is a result of a
race condition between all the paths that can generate a notify
and/or destroy the underlying pjproject evsub object:
* The client can send a SUBSCRIBE with Expires: 0.
* The client can send a SUBSCRIBE/refresh.
* The subscription timer can expire.
* An extension state can change.
* An MWI event can be generated.
* The pjproject transaction timer (timer_b) can expire.
Normally when our pubsub_on_evsub_state is called with a terminate,
we push a task to the serializer and return at which point the dialog
is unlocked. This is usually not a problem because the task runs
immediately and locks the dialog again. When the system is heavily
loaded though, there may be a delay between the unlock and relock
during which another event may occur such as the subscription timer
or timer_b expiring, an extension state change, etc. These may also
cause a terminate to be processed and if so, we could cause pjproject
to try to destroy the evsub structure twice. There's no way for us to
tell that the evsub was already destroyed and the evsub's group lock
can't tolerate this and SEGVs.
The remedy is twofold.
* A patch has been submitted to Teluu and added to the bundled
pjproject which adds add/decrement operations on evsub's group lock.
* In res_pjsip_pubsub:
* configure.ac and pjproject-bundled's configure.m4 were updated
to check for the new evsub group lock APIs.
* We now add a reference to the evsub group lock when we create
the subscription and remove the reference when we clean up the
subscription. This prevents evsub from being destroyed before
we're done with it.
* A state has been added to the subscription tree structure so
termination progress can be tracked through the asyncronous tasks.
* The pubsub_on_evsub_state callback has been split so it's not doing
double duty. It now only handles the final cleanup of the
subscription tree. pubsub_on_rx_refresh now handles both client
refreshes and client terminates. It was always being called for
both anyway.
* The serialized_on_server_timeout task was removed since
serialized_pubsub_on_rx_refresh was almost identical.
* Missing state checks and ao2_cleanups were added.
* Some debug levels were adjusted to make seeing only off-nominal
things at level 1 and nominal or progress things at level 2+.
ASTERISK-26099 #close Reported-by: Ross Beer.
Change-Id: I779d11802cf672a51392e62a74a1216596075ba1
Alexander Traud [Tue, 21 Jun 2016 12:05:30 +0000 (14:05 +0200)]
res_rtp_asterisk: Use latest DTLS version available by underlying platform.
Do not use DTLSv1_method() but DTLS_method() when available in OpenSSL of the
underlying platform. This change enables DTLS 1.2 since OpenSSL 1.0.2, for
WebRTC (DTLS-SRTP via SIP-over-WebSockets). This change enables AEAD-based
cipher-suites.
PJSIP: provide transport type with received messages
The receipt of a SIP MESSAGE may occur over any transport including TCP
and TLS. When the message is received, the original URI is added to the
message in the field PJSIP_RECVADDR, but this is insufficient to ensure
a reply message can reach the originating endpoint. This patch adds the
PJSIP_TRANSPORT field populated with the transport type.
Alexander Traud [Tue, 21 Jun 2016 13:01:40 +0000 (15:01 +0200)]
BuildSystem: Avoid obsolete warning with HELP_STRING on autoconf.
Some configure scripts used both AC_HELP_STRING and its replacement
AS_HELP_STRING. For consistency and to avoid obsolete warnings, those were
changed to AS_HELP_STRING.
Joshua Colp [Mon, 20 Jun 2016 15:29:13 +0000 (12:29 -0300)]
res_pjsip_session: Handle race condition at shutdown with timer.
When shutting down res_pjsip_session will get unloaded before res_pjsip.
The act of unloading unregisters all the PJSIP services and sets
their module IDs to -1. In some cases it is possible for a timer to
occur after this happens which calls into res_pjsip_session. The
res_pjsip_session module can then try to get the session from the
INVITE session using the module ID. Since the module ID is now -1
this fails.
This change stores a copy of the module ID and uses it for the timer
callback scenario. If the module ID is -1 the callback immediately
returns but if the module ID is valid then it continues as normal.
This works as the original ID of the module is guaranteed to still
be valid when used with the INVITE session.
Mark Michelson [Mon, 13 Jun 2016 22:40:07 +0000 (17:40 -0500)]
ARI: Ensure announcer channels are destroyed.
Announcer channels were not being destroyed because the
stasis_app_control structure that referenced them was not being
destroyed. The control structure was not being destroyed because it was
not being unlinked from its container. It was not being unlinked from
its container because the after bridge callback for the announcer
channel was not being run. The after bridge callback was not being run
because the after bridge datastore was not being removed from the
channel on destruction. The channel was not being destroyed because the
hangup that used to destroy the channel was now only reducing the
reference count to one. The reference count of the channel was only
being reduced to one because the stasis_app_control structure was
holding the final reference...
The control structure used to not keep a reference to the channel, so
that loop described above did not happen.
The solution is to manually remove the control structure from its
container when the playback on a bridge is complete.
Alexander Traud [Mon, 20 Jun 2016 13:05:09 +0000 (15:05 +0200)]
http: leverage 'bindaddr' for TLS in http.conf
The internal HTTP/WebSocket server supports both TCP and TLS, which can be
activated separately via the file http.conf. The source code intends to re-use
the TCP parameter 'bindaddr' for TLS, even if 'tlsbindaddr' is not specified
explicitly. This did not work because of a typo. This change resolves this typo.
Richard Mudgett [Wed, 18 May 2016 22:37:27 +0000 (17:37 -0500)]
res_pjsip_transport_management.c: Misc cleanups to survive shutdown.
* In unload_module(), reordered destroying things to minimize the window
that the global transports container could be used by other threads on
shutdown. When shutting down you need to stop things in the opposite
order of creation.
* Put the global transports container into an AO2_GLOBAL_OBJ_STATIC to
eliminate the crash potential by other threads using the container on
shutdown.
* Made struct monitored_transport.sip_received not use
ast_atomic_fetchadd_int() since it is used as a boolean value that is only
set TRUE. It was previously incremented for every received SIP message
and could theoretically overflow.
* In monitored_transport_state_callback(), allocated the monitored
transport object without a lock since the lock was unused.
* In keepalive_global_loaded(), removed releasing the transports container
if the keepalive_thread could not be started. I set it up to be tried
again if the user reloads the configuration.
Alexander Traud [Wed, 8 Jun 2016 11:15:15 +0000 (13:15 +0200)]
core: Not the configured but granted number of possible file descriptors.
With CLI "core show settings", simply the parameter maxfiles of the file
asterisk.conf was shown. If that parameter was not set, nothing was displayed
although the environment might have set a default number itself. Or if maxfiles
were not granted (completely), still maxfiles was shown. Now, the maximum number
of possible file descriptors in the environment is shown.
Alexander Traud [Wed, 8 Jun 2016 11:05:22 +0000 (13:05 +0200)]
astfd: With RLIMIT_NOFILE only the current value is sensible.
With menuselect "DEBUG_FD_LEAKS" and CLI "core show fd", both the maximum max
and current max of possible file descriptors were shown. Both show the same
value always. Not to confuse users, just the current maximum is shown now.
Joshua Colp [Tue, 7 Jun 2016 23:45:37 +0000 (20:45 -0300)]
cel: Ensure only one dial status per channel exists.
CEL wrongly assumed that a channel would only have a single dial
event on it. This is incorrect. Particularly in a queue each
call attempt to a member will result in a dial event, adding
a new dial status in CEL without removing the old one. This
would cause the container to grow with only one dial status
being removed when the channel went away. The other dial status
entries would remain leaking memory.
This change fixes the memory leak by ensuring that only one dial
status will only ever exist for each channel.
The behavior during the scenario where multiple events are received
has also been improved. For failure cases the first failure will
be the dial status. If an answer dial status is received, though,
it will take priority and the dial status for the channel will be
answer.
Memory usage has also been decreased by storing the minimal
amount of information and the code has been cleaned up slightly.
Mark Michelson [Wed, 1 Jun 2016 18:48:00 +0000 (13:48 -0500)]
ARI: Ensure proper channel state on operations.
ARI was recently outfitted with operations to create and dial channels.
This leads to the ability to try funny stuff. You could create a channel
and then immediately try to play back media on it. You could create a
channel, dial it, and while it is ringing attempt to make it continue in
the dialplan.
This commit attempts to fix this by adding a channel state check to
operations that should not be able to operate on outbound channels that
have not yet answered. If a channel is in an invalid state, we will send
a 412 response.
Mark Michelson [Wed, 8 Jun 2016 16:27:41 +0000 (11:27 -0500)]
test_http_media_cache: Fix failing test.
The retrieve_cache_control_directives test has been failing occasionally
in Jenkins. The apparent failure occurs when attempting to validate the
expiration of the retrieved file.
After reproducing, the problem was pretty clear. At the beginning of the
test, the current time is retrieved. The seconds value of this timestamp
is X. When the file is retrieved, res_http_media_cache calculates the
expiration and in doing so retrieves the current time. In most cases,
since the test executes quickly, it will also retrieve a timestamp with
X seconds. However, if the test starts very near to when the timestamp
seconds are set to increment, res_http_media_cache may retrieve a
timestamp with X+1 seconds instead.
The test attempted to account for this by allowing a tolerance of 1
second when validating the expiration. However, the problem was that the
comparisons being used in the validation used > and < operations. This
meant that values that fell within the tolerance (because they equaled
the upper bound of the tolerance) would fail.
The solution is to use >= and <= operators in the expiration validation.
However, I estimated that while the one second tolerance should be
fine on most machines, it would still be possible on a very slow machine
to end up falling outside the one second tolerance. So I have also
relaxed the tolerance of expiration validation to be three seconds
instead.
The final change here is to add a debug message when validating
expiration so that we can see what values are being compared.
George Joseph [Thu, 9 Jun 2016 15:33:48 +0000 (09:33 -0600)]
cdr.c: Remove assert in base_process_dial_end
Scenario: Caller blonde transfer
Bob calls Charlie who answers.
Bob puts Charlie on hold and calls Alice.
Before Alice answers, Bob transfers Charlie to Alice.
Charlie's channel triggers an assert because he gets an "ANSWERED"
event even though he never dialed anything. With recent changes to dial
events, this is now a valid scenario so the assert needed to be removed.
Mark Michelson [Thu, 9 Jun 2016 15:37:53 +0000 (10:37 -0500)]
chan_pjsip: Lock channel when checking for RTP changes.
bridge_native_rtp can call into an RTP-capable channel driver in order
for the driver to update information about who the channel is
communicating with. For SIP channel drivers, this means deactivating
RTCP and sending a reinvite so that the endpoints can communicate
directly.
bridge_native_rtp does the right thing and has the channel locked when
calling into the channel driver. chan_pjsip can't alter session
properties in this thread, though. chan_pjsip queues a task on the
session serializer in order to update properties there.
The problem is that this queued task was not locking the channel. This
meant that the queued task could attempt to deactivate RTCP at the same
time that the channel thread was attempting to process an incoming RTCP
packet. This could lead to a crash.
This patch fixes the issue by locking the channel in the queued task
when altering RTP properties.
This patch fixes a race condition processing received REGISTER requests
and their retransmissions caused by REGISTER requests being processed by
two threads. The "sip_transaction Unable to register REGISTER transaction
(key exists)" message is a notable symptom of this issue.
This issue was more likely to happen before the pjsip/distributor
serializers were created. Instead of steps one and two below placing the
REGISTER messages into the same pjsip/distributor they were placed in
random pjsip/default serializers.
1) REGISTER requests come in and get placed on the pjsip/distributor
serializer.
2) Before the first request is processed a retransmission comes in and is
placed on the same pjsip/distributor serializer.
3) The first request goes up the pjsip stack and is then shunted off to
the pjsip/aor/<aor> serializer.
4) Before the first request is completed processing in the pjsip/aor/<aor>
serializer, the second request goes up the pjsip stack and is also shunted
off to the pjsip/aor/<aor> serializer.
5) The first request completes processing and sends out its response.
6) The second request completes processing and tries to send out its
response but pjlib complains that the REGISTER transaction key already
exists.
7) Sadness ensues.
* The race is eliminated by removing the pjsip/aor/<aor> serializer and
continuing the processing in the pjsip/distributor serializer. Now any
retransmissions queued in the pjsip/distributor serializer will be
processed after the first message is completely processed.
ASTERISK-26088 #close
Reported by: Richard Mudgett
Stasis subscriptions and message routers create taskprocessors to process
the event messages. API calls are needed to be able to set the congestion
levels of these taskprocessors for selected subscriptions and message
routers.
* Updated CDR, CEL, and manager's stasis subscription congestion levels
based upon stress testing. Increased the congestion levels to reduce the
potential for bursty call setup/teardown activity from triggering the
taskprocessor overload alert. CDRs in particular need an extra high
congestion level because they can take awhile to process the stasis
messages.
Richard Mudgett [Thu, 2 Jun 2016 23:19:13 +0000 (18:19 -0500)]
sorcery: Add setting object type congestion levels.
Sorcery creates taskprocessors for object types to process object observer
callbacks. An API call is needed to be able to set the congestion levels
of these taskprocessors for selected object types.
* Updated PJSIP's contact and contact_status sorcery object type observer
default congestion levels based upon stress testing. Increased the
congestion levels to reduce the potential for bursty register/unregister
and subscribe/unsubscribe activity from triggering the taskprocessor
overload alert.