Benjamin Ford [Tue, 14 Jul 2015 13:55:14 +0000 (08:55 -0500)]
ARI: Added new functionality to unload a single module.
An http request can be sent to unload an Asterisk module. If the
module can not be unloaded or is already unloaded, an error response
will be returned.
The command "curl -v -u user:pass -X DELETE 'http://localhost:8088
/ari/asterisk/modules/{moduleName}'" (or something similar, depending
on configuration) can be run in the terminal to access this new
functionality.
For more information, see:
https://wiki.asterisk.org/wiki.display/~bford/Asterisk+ARI+Resource
* Added new ARI functionality
* Asterisk modules can be unloaded through http requests
Benjamin Ford [Mon, 13 Jul 2015 21:00:19 +0000 (16:00 -0500)]
ARI: Added new functionality to load a single module.
An http request can be sent to load an Asterisk module. If the
module can not be loaded or is loaded already, an error response
will be returned.
The command curl -v -u user:pass -X POST 'http://localhost:8088/ari
/asterisk/modules/{moduleName}'" (or something similar, depending on
configuration) can be run in the terminal to access this new
functionality.
For more information, see:
https://wiki.asterisk.org/wiki.display/~bford/Asterisk+ARI+Resource
* Added new ARI functionality
* Asterisk modules can be loaded through http requests
Benjamin Ford [Mon, 13 Jul 2015 15:54:51 +0000 (10:54 -0500)]
ARI: Added new functionality to get information on a single module.
An http request can be sent to retrieve information on a single
module, including the resource name, description, use count, status,
and support level.
The command "curl -v -u user:pass -X GET 'http://localhost:8088/ari
/asterisk/modules/{moduleName}'" (or something similar, depending on
configuration) can be run in the terminal to access this new
functionality.
For more information, see:
https://wiki.asterisk.org/wiki.display/~bford/Asterisk+ARI+Resource
* Added new ARI functionality
* Information on a single module can now be retrieved
Kevin Harwell [Wed, 8 Jul 2015 19:56:10 +0000 (14:56 -0500)]
bridge.c: Fixed race condition during attended transfer
During an attended transfer a thread is started that handles imparting the
bridge channel. From the start of the thread to when the bridge channel is
ready exists a gap that can potentially cause problems (for instance, the
channel being swapped is hung up before the replacement channel enters the
bridge thus stopping the transfer). This patch adds a condition that waits
for the impart thread to get to a point of acceptable readiness before
allowing the initiating thread to continue.
Matt Jordan [Wed, 8 Jul 2015 21:28:13 +0000 (16:28 -0500)]
main/sorcery: Don't fail object set creation from JSON if field fails
Some individual fields may fail their conversion due to their default
values being invalid for their custom handlers. In particular,
configuration values that depend on others being enabled (and thus have
an empty default value) are notorious for tripping this routine up. An
example of this are any of the DTLS options for endpoints. Any of the
DTLS options will fail to be applied (as DTLS is not enabled), causing
the entire object set to be aborted.
This patch makes it so that we log a debug message when skipping a
field, and rumble on anyway.
Matt Jordan [Wed, 8 Jul 2015 21:21:09 +0000 (16:21 -0500)]
main/format_cap: Parse capabilities generated by ast_format_cap_get_names
We have a strange relationship between the parsing of format
capabilities from a string and their representation as a string. We
expect the format capabilities to be expressed as a string in the
following format:
allow = !all,ulaw,alaw
disallow = g722
While we would generate the string representation of those formats as:
When the configuration framework needs to store values as a string, it
generates the format capabilities using the second representation; this
representation however cannot be parsed when the entry is rehydrated.
This patch fixes that by updating
ast_format_cap_update_by_allow_disallow to parse an entry as if it were
in the generated format if it has a leading '(' and a trailing ')'.
Matt Jordan [Sat, 27 Jun 2015 22:53:37 +0000 (17:53 -0500)]
tests/test_devicestate: Add additional tests for the device state API
This patch adds more tests that exercise the device state API. This includes:
* Tests that cover adding a device state provider, as well as deleting a
device state provider. This also verifies that you cannot add an
already added device state provider, and cannot delete an already
deleted device state provider.
* A test that covers changing device state and receiving said updates
from a device state subscriber. This also covers hitting both the
device state cache as well as a custom device state provider.
* A test that covers converting device state to channel state and device
state values to a string representation and back.
* A test that covers obtaining device state from an active channel and a
channel driver that provides its own device state.
Matt Jordan [Sat, 27 Jun 2015 22:51:43 +0000 (17:51 -0500)]
main/devicestate: Prevent duplicate registration of device state providers
Currently, the device state provider API will allow you to register a
device state provider with the same case insensitive name more than
once. This could cause strange issues, as the duplicate device state
providers will not be queried when a device's state has to be polled.
This patch updates the API such that a device state provider with the
same name as one that has already registered will be rejected.
Matt Jordan [Sat, 11 Jul 2015 03:25:00 +0000 (22:25 -0500)]
res/res_sorcery_memory_cache: Fix test registration issues
Again, tests now need to not end with a newline. This patch makes it so
the tests can register again, unit tests will actually pass, and we can
stop wasting time trying to figure out why builds are failing when they
really aren't failing.
Matt Jordan [Sat, 11 Jul 2015 02:42:58 +0000 (21:42 -0500)]
tests/test_sorcery_memory_cache_thrash: Fix test loading problems
Because unit tests now want descriptions to not end with a newline, the
sorcery memory cache thrash tests failed to register. This patch
corrects their descriptions.
Benjamin Ford [Fri, 26 Jun 2015 15:57:15 +0000 (10:57 -0500)]
ARI: Added new functionality to get all module information.
An http request can be sent to retrieve a list of all existing modules,
including the resource name, description, use count, status, and
support level.
The command "curl -v -u user:pass -X GET 'http://localhost:8088/ari/
asterisk/modules" (or something similar, depending on configuration)
can be run in the terminal to access this new functionality.
For more information, see:
https://wiki.asterisk.org/wiki.display/~bford/Asterisk+ARI+Resource
* Added new ARI functionality
* Information on modules can now be retrieved
bridge_native_rtp.c: Don't start native RTP bridging after attended transfer.
The bridge_native_rtp module adds a frame hook to channels which are in
a native RTP bridge. This frame hook is used to intercept when a hold
or unhold frame traverses the bridge so native RTP can be stopped or
started as appropriate. This is expected but exposes a specific bug
when attended transfers are involved.
Upon completion of an attended transfer an unhold frame is queued up
to take one of the channels involved off hold. After this is done
the channel is moved between bridges.
When the frame hook is involved in this case for the unhold it
releases the channel lock and acquires the bridge lock. This
allows the bridge core to step in and move the channel
(potentially changing the bridging techology) from another thread.
Once completed the bridge lock is released by the bridge core.
The frame hook is then able to acquire the bridge lock and
wrongfully starts native RTP again, despite the channel no longer
being in the bridge or needing to start native RTP. In fact at
this point the frame hook is no longer attached to the channel.
This change makes it so the native RTP bridge data is available to
the frame hook when it is invoked. Whether the frame hook has
been detached or not is stored on the native RTP bridge data and
is checked by the frame hook before starting or stopping native
RTP bridging. If the frame hook has been detached it does nothing.
Joshua Colp [Sat, 16 May 2015 22:02:50 +0000 (19:02 -0300)]
res_sorcery_memory_cache: Backport to 13
Gerrit is complaining of conflicts when trying to create a patch series
of all of the cherry-picked master commits, so I have instead squashed
it all into one commit.
res_rtp_asterisk: Prevent simultaneous access to DTLS SSL context.
This change moves logic for setting up the DTLS SSL contexts to
when the SDP is done being processed instead of when ICE negotiation
completes. It also stops handshakes from being initiated when we
are acting as a server.
Manipulating the SSL context when ICE negotiation has completed
is problematic as the SSL context is not protected and if acting
as a client the remote side may have started DTLS negotiation
already.
The retransmission timeout timer code has also been split up
and simplified some. Both RTP and RTCP now have their own timers
and the points at which the timer is stopped and started is now
more specific. When a packet is sent the timer is started. When
a response is received but before it is processed the timer is
stopped. This provides a guarantee that the timeout is not
occurring while the response is processed.
MWI subscriptions can crash or corrupt memory when using the subscription
datastore to access the MWI subscription object because the datastore is
not holding a reference to the object.
* Give the subscription datastore a ref to the MWI subscription object.
It is unfortunate that the ref causes a circular ref chain that must be
explicitly broken to allow the memory to get released. The loop is broken
when the subscription is shutdown and if the subscription setup fails.
When res_pjsip body generator modules were generating XML or XPIDF
response bodies, there was a chance that the generated body would be the
exact size of the supplied buffer. Adding the nul string terminator would
then write beyond the end of the buffer and potentially corrupt memory.
* Fix MALLOC_DEBUG high fence violations caused by adding a nul string
terminator on the end of a buffer for XML or XPIDF response bodies.
* Made calls to pj_xml_print() safer if the XML prolog is requested. Due
to a bug in pjproject, the return value could be -1 _or_
AST_PJSIP_XML_PROLOG_LEN if the supplied buffer is not large enough.
* Updated the doxygen comment of AST_PJSIP_XML_PROLOG_LEN to describe the
return value of pj_xml_print() when the supplied buffer is not large
enough.
When a caller calls a FAX number and then hangs up right after the call is
answered then the T.38 re-INVITE automatic reject timer may still be
running after the channel goes away.
* Added session NULL channel checks on the code paths that get executed by
t38_automatic_reject() to prevent a crash when the T.38 re-INVITE
automatic reject timer expires.
Richard Mudgett [Fri, 5 Jun 2015 20:37:33 +0000 (15:37 -0500)]
res_pjsip: Need to use the same serializer for a pjproject SIP transaction.
All send/receive processing for a SIP transaction needs to be done under
the same threadpool serializer to prevent reentrancy problems inside
pjproject and res_pjsip.
* Add threadpool API call to get the current serializer associated with
the worker thread.
* Pick a serializer from a pool of default serializers if the caller of
res_pjsip.c:ast_sip_push_task() does not provide one.
This is a simple way to ensure that all outgoing SIP request messages are
processed under a serializer. Otherwise, any place where a pushed task is
done that would result in an outgoing out-of-dialog request would need to
be modified to supply a serializer. Serializers from the default
serializer pool are picked in a round robin sequence for simplicity.
A side effect is that the default serializer pool will limit the growth of
the thread pool from random tasks. This is not necessarily a bad thing.
* Made pjsip_distributor.c save the thread's serializer name on the
outgoing request tdata struct so the response can be processed under the
same serializer.
NOTE: session_inv_on_state_changed() is disassociating the dialog from the
session when the invite dialog becomes PJSIP_INV_STATE_DISCONNECTED.
Unfortunately this is a tad too soon because our BYE request transaction
has not completed yet.
This change makes it so that when accepting a WebSocket
connection the HTTP response is sent as one packet instead of
fragmented. Browsers don't like it when you send it fragmented.
Matt Jordan [Sat, 27 Jun 2015 23:47:19 +0000 (18:47 -0500)]
Makefile: Remove coverage files on 'make clean'
This patch updates a variety of Makefiles in Asterisk's build system to
remove .gcda and .gcno files when 'make clean' is executed. These files
are generated when '--enable-coverage' is passed to the Asterisk
configure script.
Walter Doekes [Thu, 2 Jul 2015 14:08:12 +0000 (16:08 +0200)]
chan_sip: Fix early call pickup channel leak.
When handle_invite_replaces() was called, and either ast_bridge_impart()
failed or there was no bridge (because the channel we're picking up was
still ringing), chan_sip would leak a channel.
Thanks Matt and Corey for checking the bridge path.
Walter Doekes [Thu, 2 Jul 2015 11:10:59 +0000 (13:10 +0200)]
rtp_engine: Skip useless self-assignment in ast_rtp_engine_unload_format.
When running valgrind on Asterisk, it complained about:
==32423== Source and destination overlap in memcpy(0x85a920, 0x85a920, 304)
==32423== at 0x4C2F71C: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/...)
==32423== by 0x55BA91: ast_rtp_engine_unload_format (rtp_engine.c:2292)
==32423== by 0x4EEFB7: ast_format_attr_unreg_interface (format.c:1437)
The code in question is a struct assignment, which may be performed by
memcpy as a compiler optimization. It is changed to only copy the struct
contents if source and destination are different.
Walter Doekes [Thu, 2 Jul 2015 10:16:26 +0000 (12:16 +0200)]
astfd: Fix buffer overflow in DEBUG_FD_LEAKS.
If DEBUG_FD_LEAKS was used and more file descriptors than the default of
1024 were available, some DEBUG_FD_LEAKS-patched functions would
overwrite memory past the fixed-size (1024) fdleaks buffer.
This change:
- adds bounds checks to __ast_fdleak_fopen and __ast_fdleak_pipe
- consistently uses ARRAY_LEN() instead of sizeof() or 1023 or 1024
- stores pointers to constants instead of copying the contents
- reorders the fdleaks struct for possibly tighter packing
- adds a tiny bit of documentation
Walter Doekes [Thu, 2 Jul 2015 09:57:44 +0000 (11:57 +0200)]
res_timing: Don't close FD 0 when out of open files.
This fixes so a failure to get a timer file descriptor does not cascade
to closing FD 0.
On error, both res_timing_kqueue and res_timing_timerfd would call the
destructor before setting the file handle. The file handle had been
initialized to 0, causing FD 0 to be closed. This in turn, resulted in
floods of "CLI>" messages and an unusable terminal.
ASTERISK-19277 #close
Reported by: Barry Chern
For the 13 branch, this was already fixed. This patch only ensures that
we do not attempt to close a negative file descriptor.
Matt Jordan [Wed, 1 Jul 2015 21:04:50 +0000 (16:04 -0500)]
sorcery/realtime: Add a bit of debug and warning messages for bad configs
When a mapping does not exist between a sorcery.conf defined object and
a realtime mapping in extconf, currently, the user will receive a slew
of ERROR messages that don't really tell what is happening. Some ERROR
messages may even be misleading, as they occur after the sorcery API has
already given up on the attempt to load and create the sorcery object.
This patch adds a bit of debug and a useful WARNING message for when a
wizard's open callback fails for a particular object type. In the bad
configurations that resulted in this patch, this provided a 'root cause'
WARNING message that pointed in the right direction of the configuration
problem.
Mark Michelson [Mon, 29 Jun 2015 17:45:02 +0000 (12:45 -0500)]
res_sorcery_realtime: Fix leak of sorcery object type.
This prevents a leak of a sorcery object type when realtime sorcery
objects are retrieved by fields or when multiple objects are retrieved.
The extent of this leak is that sorcery object types would be leaked.
These are allocated whenever an object type is registered with sorcery,
meaning that on module shutdown, these objects would be leaked. This
could be problematic if many reloads were performed, but it is not as
severe as if every sorcery object retrieved from realtime were being
leaked.
Matt Jordan [Sat, 27 Jun 2015 03:02:42 +0000 (22:02 -0500)]
res/res_corosync: Always decline module load, instead of failing
Returns a 'failure' from the module load routine indicates to Asterisk
that it should abort loading completely. This is rarely - in fact,
really, never - a good option. Aborting load of Asterisk from a dynamic
module implies that the core, and the rest of the dynamic modules, don't
matter: we should abandon all processing.
res_corosync is really not that important.
This patch updates the module such that, if it fails to load, it
politely declines (emitting ERROR messages along the way), and allows
Asterisk to continue to function.
Note that this issue was keeping Asterisk unit tests from running on
certain build agents.
Matt Jordan [Sat, 27 Jun 2015 01:38:58 +0000 (20:38 -0500)]
main/pbx: Resolve case sensitivity regression in PBX hints
When 8297136f was merged for ASTERISK-25040, a regression was introduced
surrounding the case sensitivity of device names within hints.
Previously, device names - such as 'sip/foo' - were compared in a case
insensitive fashion. Thus, 'sip/foo' was equivalent to 'SIP/foo'. After
that patch, only the case sensitive name would match, i.e., 'SIP/foo'.
As a result, some dialplan hints stopped working.
This patch re-introduces case insensitive matching for device names in
hints.
Mark Michelson [Fri, 26 Jun 2015 21:12:33 +0000 (16:12 -0500)]
res_pjsip_nat: Adjust when contact should be rewritten.
A previous change made the contact only get rewritten if the dialog's
route set was not marked frozen. Unfortunately, while the intent of this
is correct, the dialog's route set actually gets marked as frozen
earlier than expected, especially for UAS dialogs.
Instead, the idea is that the contact needs to not be rewritten if there
is a pre-existing route set on the dialog. This is now accomplished by
checking the dialog's route set list instead of checking if the route
set is frozen.
Doing this causes some broken tests to begin passing again.
Richard Mudgett [Fri, 19 Jun 2015 23:27:24 +0000 (18:27 -0500)]
res_pjsip_outbound_registration.c: Add a serializer shutdown group.
The client_state objects contain a serializer used to send the outbound
REGISTER messages. Once all those message transactions are complete then
the module can shutdown.
res_pjsip_refer will attempt to add Referred-By or Replaces headers to
outbound INVITEs at times. If the INVITE gets challenged for
authentication, then we will resend the INVITE. Prior to this patch, the
Referred-By or Replaces header would be re-added to the outbound INVITE,
resulting in duplicated headers.
Mark Michelson [Tue, 23 Jun 2015 22:43:31 +0000 (17:43 -0500)]
res_pjsip_nat: Rewrite route set when required.
When performing some provider testing, the rewrite_contact option was
interfering with proper construction of a route set when sending an ACK
after receiving a 200 OK response to an INVITE.
The initial INVITE was sent to address sip:foo. The 200 OK had a Contact
header with URI sip:bar. In addition, the 200 OK had Record-Route
headers for sip:baz and sip:foo, in that order. Since the Record-Route
headers had the lr parameter, the result should have been:
* Set R-URI of the ACK to sip:bar.
* Add Route headers for sip:foo and sip:baz, in that order.
However, the rewrite_contact option resulted in our rewriting the
Contact header on the 200 OK to sip:foo. The result was:
* R-URI remained sip:foo.
* We added Route headers for sip:foo and sip:baz, in that order.
The result was that sip:bar was not indicated in the ACK at all, so the
far end never received our ACK. The call eventually dropped.
The intention of rewrite_contact is to rewrite the most immediate
destination of our SIP request to be the same address on which we
received a request or response. In the case of processing a SIP response
with Record-Route headers, this means that instead of rewriting the
Contact header, we should instead rewrite the bottom-most Record-Route
header. In the case of processing a SIP request with Record-Route
headers, this means we rewrite the top-most Record-route header.
Like when we rewrite the Contact header, we also ensure to update
the dialog's route set if it exists.
* handle_client_state_destruction() must always be passed a ref to
client_state because it will always unref client_state.
handle_registration_response() was not passing a client_state ref.
* Made the final un-REGISTER message get sent normally using the pjproject
register control structure in handle_client_state_destruction(). The
previous code attempted to short circuit the response handling for the
module to unload. That doesn't work for a couple reasons. One,
pjsip_regc_send() may call the registered callback before it returns and
unbalance the client_state ref count. Two, the registered callback
handles any authentication for the un-REGISTER message.
* Made the distinction between internal registration state and external
registration status with sip_outbound_registration_status_str(). This is
necessary to avoid altering documented AMI messages with internal
changes.
* Removed references to client_state->client outside of the serializer
thread. When handle_client_state_destruction() destroys the pjproject
register control structure that memory is freed and cannot be referenced
anymore. These accesses were to provide information for debug and
off-nominal warning messages.
* In sip_outbound_registration_timer_cb() you should not access entry->id
after unrefing client_state because the passed in entry is normally
pointing to the timer entry in the client_state object.
Richard Mudgett [Mon, 15 Jun 2015 20:28:41 +0000 (15:28 -0500)]
res_pjsip_outbound_registration.c: Use ast_sorcery_object_unregister() API
The sorcery pjsip 'registration' config object needs to be destroyed on
module unload. Otherwise, a reload of res_pjsip could try to use
callbacks for a previously unloaded instance of the module provided by
ast_sorcery_object_register() or one of the variants. Also, if
res_pjsip_outbound_registration were subsequently reloaded, the sorcery
config field objects would be registered in sorcery twice.
Joshua Colp [Thu, 25 Jun 2015 11:42:46 +0000 (08:42 -0300)]
channel: Remove ignore of answer on non-outgoing channels.
Due to the way that channels can now be moved around inside of
Asterisk it is possible for the outgoing flag of a channel to get
cleared before it has been answered. This results in the bridge
not receiving notification that the outgoing leg has been answered.
This most easily exhibits itself with DTMF based blond transfers.
Since the answer of the outgoing leg is ignored the other party
continues to receive both a locally generated ringing and the
media stream of the outgoing leg upon its answer. This results
in no media being heard.
This change removes the ignore of the answer and allows it
to pass through.
Richard Mudgett [Mon, 15 Jun 2015 20:28:00 +0000 (15:28 -0500)]
sorcery: Add ast_sorcery_object_unregister() API call.
Find and unlink the specified sorcery object type to complement
ast_sorcery_object_register(). Without this function you cannot
completely unload individual modules that use sorcery for configuration.
Richard Mudgett [Mon, 15 Jun 2015 18:38:58 +0000 (13:38 -0500)]
res_pjsip_outbound_registration.c: Reorder load_module() and unload_module().
It is best if the loading code creates and initializes the module's
infrastructure before letting the system know of its existence. The
unloading code needs to reverse the actions of the loading code and in the
reverse order.
Richard Mudgett [Wed, 24 Jun 2015 19:39:01 +0000 (14:39 -0500)]
Unit tests: Fix unit test description strings.
Analyzing the code shows that the unit test summary and description
strings should not end with a new-line character. Where these strings are
used in the code a new-line is provided for output.
Joshua Colp [Tue, 23 Jun 2015 16:21:41 +0000 (13:21 -0300)]
app_dial: Hold reference to calling channel formats when dialing outbound.
Currently when requesting a channel the native formats of the
calling channel are provided to the core for usage when dialing
the outbound channel. This occurs without holding the channel lock
or keeping a reference to the formats. This is problematic as
the channel driver may end up changing the formats during this time.
In the case of chan_sip this happens when an SDP negotiation
completes.
This change makes it so app_dial keeps a reference to the native
formats of the calling channel which guarantees that they will
remain valid for the period of time needed.
Joshua Colp [Wed, 17 Jun 2015 10:04:39 +0000 (07:04 -0300)]
res_pjsip_mwi: Set up unsolicited MWI upon registration.
The res_pjsip_mwi previously required a reload to set up the proper
subscriptions to allow unsolicited MWI to work. This change
makes it so the act of registering will also cause this to occur.
This is particularly useful if realtime is involved as no reload
needs to occur within Asterisk to cause the MWI information
to get sent.
Kevin Harwell [Mon, 22 Jun 2015 20:11:18 +0000 (15:11 -0500)]
bridge.c: Hangup attended transfer target if bridged
After completing an attended transfer the transfer target channel was not being
hung up after leaving the bridge. Added an explicit softhangup to hangup said
channel, but only if it was previously bridged.
Alexander Traud [Mon, 22 Jun 2015 14:26:48 +0000 (16:26 +0200)]
chan_sip: Reload peer without its old capabilities.
On reload, previously allowed codecs were not removed. Therefore, it was not
possible to remove codecs while Asterisk was running. Furthermore, newly added
codecs got appended behind the previous codecs. Therefore, it was not possible
to add a codec with a priority of #1. This change removes the old capabilities
before the current ones are added.
ASTERISK-25182 #close
Reported by: Alexander Traud
patches:
asterisk_13_allow_codec_reload.patch uploaded by Alexander Traud (License 6520)
Joshua Colp [Sun, 21 Jun 2015 00:38:02 +0000 (21:38 -0300)]
chan_sip: Destroy peers without holding peers container lock.
Due to the use of stasis_unsubscribe_and_join in the peer destructor
it is possible for a deadlock to occur when an event callback is
occurring at the same time.
This happens because the peer may be destroyed while holding the
peers container lock. If this occurs the event callback will never
be able to acquire the container lock and the unsubscribe will
never complete.
This change makes it so the peers that have been removed from the
peers container are not destroyed with the container lock held.
Mark Michelson [Thu, 18 Jun 2015 18:16:29 +0000 (13:16 -0500)]
Resolve race conditions involving Stasis bridges.
This resolves two observed race conditions.
First, a bit of background on what the Stasis application does:
1a Creates a stasis_app_control structure. This structure is linked into
a global container and can be looked up using a channel's unique ID.
2a Puts the channel in an event loop. The event loop can exit either
because the stasis_app_control structure has been marked done, or
because of some other factor, such as a hangup. In the event loop, the
stasis_app_control determines if any specific ARI commands need to be
run on the channel and will run them from this thread.
3a Checks if the channel is bridged. If the channel is bridged, then
ast_bridge_depart() is called since channels that are added to Stasis
bridges are always imparted as departable.
4a Unlink the stasis_app_control from the container.
When an ARI command is received by Asterisk, the following occurs
1b A thread is spawned to handle the HTTP request
2b The stasis_app_control(s) that corresponds to the channel(s) in the
request is/are retrieved. If the stasis_app_control cannot be
retrieved, then it is assumed that the channel in question has exited
the Stasis app or perhaps was never in Stasis in the first place.
3b A command is queued onto the stasis_app_control, and the channel's
event loop thread is signaled to run the command.
4b While most ARI commands do nothing further, some, such as adding or
removing channels from a bridge, will block until the command they
issued has been completed by the channel's event loop.
The first race condition that is solved by this patch involves a crash
that can occur due to faulty detection of the channel's bridged status
in step 3a. What can happen is that in step 2a, the event loop may run
the ast_bridge_impart() function to asynchronously place the channel
into a bridge, then immediately exit the event loop because the channel
has hung up. In step 3a, we would detect that the channel was not
bridged and would not call ast_bridge_depart(). The reason that the
channel did not appear to be bridged was that the depart_thread that is
spawned by ast_bridge_impart() had not yet started. That is the thread
where the channel is marked as being bridged. Since we did not call
ast_bridge_depart(), the Stasis application would exit, and then the
channel would be destroyed Then the depart_thread would start up and
try to manipulate the destroyed channel, causing a crash.
The fix for this is to switch from using ast_channel_is_bridged() to
checking the NULLity of ast_channel_internal_bridge_channel() to
determine if ast_bridge_depart() needs to be called. The channel's
internal bridge_channel is set when ast_bridge_impart() is called and
is NULLed by the call to ast_bridge_depart(). If the channel's internal
bridge_channel is non-NULL, then the channel must have been imparted
into the bridge and needs to be departed, even if the actual bridging
operation has not yet started. By departing the channel when necessary,
the thread that is running the Stasis application will block until the
bridge gives the okay that the depart_thread has exited.
The second race condition that is solved by this patch involves a leak
of HTTP handler threads. The problem was that step 2b would successfully
retrieve a stasis_app_control structure. Then step 2a would exit the
channel from the event loop due to a hangup. Steps 3a and 4a would
execute, and then finally steps 3b and 4b would. The problem is that at
step 4b, when attempting to add a channel to a bridge, the thread would
block forever since the channel would never execute the queued command
since it was finished with the event loop. This meant that the HTTP
handling thread would be leaked, along with any references that thread
may have owned (in my case, I was seeing bridges leaked).
The fix for this is to hone in better on when the channel has exited the
event loop. The stasis_app_control structure has an is_done field that
is now set at each point where the channel may exit the event loop. If
step 2b retrieves a valid stasis_app_control structure but the control
is marked as done, then the attempted operation exits immediately since
there will be nothing to service the attempted command.