Richard Mudgett [Fri, 27 May 2016 22:31:52 +0000 (17:31 -0500)]
res_pjsip_session: Use distributor serializer for incoming calls.
We must continue using the serializer that the original INVITE came in on
for the dialog. There may be retransmissions already enqueued in the
original serializer that can result in reentrancy and message sequencing
problems.
Outgoing call legs create the pjsip/outsess/<endpoint> serializers for
their dialogs.
Richard Mudgett [Fri, 27 May 2016 17:50:14 +0000 (12:50 -0500)]
res_pjsip_pubsub.c: Use distributor serializer for incoming subscriptions.
We must continue using the serializer that the original SUBSCRIBE came in
on for the dialog. There may be retransmissions already enqueued in the
original serializer that can result in reentrancy and message sequencing
problems. The "sip_transaction Unable to register SUBSCRIBE transaction
(key exists)" message is a notable symptom of this issue.
Outgoing subscriptions still create the pjsip/pubsub/<endpoint>
serializers for their dialogs.
Richard Mudgett [Thu, 26 May 2016 22:35:04 +0000 (17:35 -0500)]
pjsip_distributor.c: Consistently pick a serializer for messages.
Incoming messages that are not part of a dialog or a recognized response
to one of our requests need to be sent to a consistent serializer. Under
load we may be queueing retransmissions before we can process the original
message. We don't need to throw these messages onto random serializers
and cause reentrancy and message sequencing problems.
* Created a pool of pjsip/distributor serializers that get picked by
hashing the call-id and remote tag strings of the received messages.
* Made ast_sip_destroy_distributor() destroy items in the reverse order of
creation.
George Joseph [Fri, 1 Apr 2016 18:30:56 +0000 (12:30 -0600)]
res_pjsip contact: Lock expiration/addition of contacts
Contact expiration can occur in several places: res_pjsip_registrar,
res_pjsip_registrar_expire, and automatically when anyone calls
ast_sip_location_retrieve_aor_contact. At the same time, res_pjsip_registrar
may also be attempting to renew or add a contact. Since none of this was locked
it was possible for one thread to be renewing a contact and another thread to
expire it immediately because it was working off of stale data. This was the
casue of intermittent registration/inbound/nominal/multiple_contacts test
failures.
Now, the new named lock functionality is used to lock the aor during contact
expire and add operations and res_pjsip_registrar_expire now checks the
expiration with the lock held before deleting the contact.
George Joseph [Fri, 1 Apr 2016 01:04:29 +0000 (19:04 -0600)]
lock: Add named lock capability
Locking some objects like sorcery objects can be tricky because the underlying
ao2 object may not be the same for all callers. For instance, two threads that
call ast_sorcery_retrieve_by_id on the same aor name might actually get 2
different ao2 objects if the underlying wizard had to rehydrate the aor from a
database. Locking one ao2 object doesn't have any effect on the other even if
those objects had locks in the first place.
Named locks allow access control by keyspace and key strings. Now an "aor"
named "1000" can be locked and any other thread attempting to lock "aor" "1000"
will wait regardless of whether the underlying ao2 object is the same or not.
Mutex and rwlocks are supported.
This capability will initially be used to lock an aor when multiple threads may
be attempting to prune expired contacts from it.
Joshua Colp [Thu, 2 Jun 2016 17:04:45 +0000 (14:04 -0300)]
res_odbc: Implement a connection pool.
Testing has shown that our usage of UnixODBC is problematic
due to bugs within UnixODBC itself as well as the heavy weight
cost of connecting and disconnecting database connections, even
when pooling is enabled.
For users of UnixODBC 2.3.1 and earlier crashes would occur due
to insufficient protection of the disconnect operation. This was
fixed in UnixODBC 2.3.2 and above.
For users of UnixODBC 2.3.3 and higher a slow-down would occur
under heavy database use due to repeated connection establishment.
A regression is present where on each connection the database
configuration is cached again, with the cache growing out of
control.
The connection pool implementation present in this change helps
to mitigate these issues by reducing how much we connect and
disconnect database connections. We also solve the issue of
crashes under UnixODBC 2.3.1 by defaulting the maximum number of
connections to 1, returning us to the previous working behavior.
For users who may have a fixed version the maximum concurrent
connection limit can be increased helping with performance.
The connection pool works by keeping a list of active connections.
If the connection limit has not been reached a new connection is
established. If the connection limit has been reached then the
request waits until a connection becomes available before
continuing.
George Joseph [Mon, 30 May 2016 15:58:35 +0000 (09:58 -0600)]
pjproject_bundled: Move to pjproject 2.5
Although all the patches we had against 2.4.5 were applied by Teluu,
a new bug was introduced preventing re-use of tcp and tls transports
This patch removes all the previous patches against 2.4.5, updates
the version to 2.5, and adds a new patch to correct the transport
re-use problem.
* Local fax starts rtp call to remote fax
* Remote fax starts t38 call back to local fax.
* Local fax sends t38 no-signal to Asterisk before sending an OK.
* udptl processes the frame and increments the expected sequence number.
* chan_sip drops the frame because the call isn't up so nothing goes out
the external interface to open the port for incoming packets.
* Local fax sends OK and Asterisk sends OK to the remote fax.
* Remote fax sends t38 packets which are dropped by the firewall.
* Local fax re-sends t38 no-signal with the same sequence number.
* udptl drops the frame because it thinks it's a dup.
* Still no outgoing packets to open the firewall.
* t38 negotiation fails.
The patch drops frames t38 received before udptl sequence processing
when the call hasn't been answered yet. The second no-signal frame
is then seen as new and is relayed out the external interface which
opens the port and allows negotiation to continue.
George Joseph [Tue, 17 May 2016 16:14:51 +0000 (10:14 -0600)]
chan_sip: Prevent extra Session-Expires headers from being added
When chan_sip does a re-INVITE to refresh a session and authentication
is required, the INVITE with the Authorization header containes a
second Session-Expires header without the ";refersher=" parameter.
This is causing some proxies to return a 400. Also, when Asterisk is
the uas and the refresher, it is including the Session-Expires and
Min-SE headers in OPTIONS messages which is not allowed per RFC4028.
This patch (based on the reporter's) Checks to see if a Session-Expires
header is already in the message before adding another one. It also
checks that the method is INVITE or UPDATE.
George Joseph [Sat, 7 May 2016 19:39:25 +0000 (13:39 -0600)]
config_transport: Tell pjproject to allow all SSL/TLS protocols
The default tls settings for pjproject only allow TLS 1, TLS 1.1 and TLS 1.2.
SSL is not allowed. So, even if you specify "sslv3" for a transport method,
it's silently ignored and one of the TLS protocols is used. This was a new
behavior of pjsip_tls_setting_default() in 2.4 (when tls.proto was added) that
we never caught.
Now we need to set tls.proto = 0 after we call pjsip_tls_setting_default().
This tells pjproject to set the socket protocol to match the method.
Kevin Harwell [Thu, 5 May 2016 16:37:37 +0000 (11:37 -0500)]
res_pjsip_authenticator_digest: Don't use source port in nonce verification
From the issue reporter:
"res_pjsip_outbound_authenticator_digest builds a nonce that is a hash of
the timestamp, the source address, the source port, a server UUID that is
calculated at startup, and the authentication realm.
Rather than caching nonces that we create, we instead attempt to re-calculate
the nonce when receiving an incoming request with authentication. We then
compare the re-calculated nonce to the incoming nonce, and if they don't match,
then authentication has failed early.
The problem is that it is possible, especially when using TCP, to receive two
requests from the same endpoint but have differing source ports for those
requests. Asterisk itself commonly will use different source ports for
outbound TCP requests."
This patch removes the source port dependency when building the nonce.
Joshua Colp [Thu, 5 May 2016 10:07:50 +0000 (07:07 -0300)]
file: Ensure nativeformats remains valid for lifetime of use.
It is possible for the nativeformats of a channel to change
throughout its lifetime. As a result a user of it needs to either
ensure the channel is locked when accessing the formats or keep
a reference to the nativeformats themselves.
This change fixes the file playback support so it keeps a
reference to the nativeformats when accessing things.
res_pjsip: disable multi domain to improve realtime performace
This patch added new global pjsip option 'disable_multi_domain'.
Disabling Multi Domain can improve Realtime performance by reducing
number of database requests.
When unloading the app_queue module the members in each queue are
destroyed and as part of this they are removed from the pending
members container. Unfortunately a crash would occur as the container
was destroyed before the members were removed.
This change tweaks ordering so the container destruction occurs
after the members are destroyed.
Kevin Harwell [Thu, 21 Apr 2016 19:23:21 +0000 (14:23 -0500)]
app_queue: queue members can receive multiple calls
It was possible for a queue member that is a member of at least 2 or more
queues to receive mulitiple calls at the same time. This happened because
of a race between when a member was being rung and when the device state
notified the other queue(s) member object of the state change.
This patch makes it so when a queue member is being rung it gets added to
a global pool of queue members. If that same member is tried again, e.g.
from another queue, and it is found to already exist in the pending member
container then it will not ring that member.
George Joseph [Fri, 22 Apr 2016 22:53:23 +0000 (16:53 -0600)]
res_agi: Prevent run_agi from eating frames it shouldn't
The run_agi function is eating control frames when it shouldn't be. This is
causing issues when an AGI is run from CONNECTED_LINE_SEND_SUB in a blond
transfer.
Alice calls Bob. Bob attended transfers to Charlie but hangs up before Charlie
answers.
Alice gets the COLP UPDATE indicating Charlie but Charlie never gets an UPDATE
and is left thinking he's connected to Bob.
In this case, when CONNECTED_LINE_SEND_SUB runs on Alice's channel and it calls
an AGI, the extra eaten frames prevent CONNECTED_LINE_SEND_SUB from running on
Charlie's channel.
The fix was to accumulate deferrable frames in the "forever" loop instead of
dropping them, and re-queue them just before running the actual agi command
or exiting.
Richard Mudgett [Fri, 15 Apr 2016 19:36:59 +0000 (14:36 -0500)]
res_stasis: Handle re-enter stasis bridge with swap channel.
We lose the fact that there is a swap channel if there is one. We
currently wind up rejoining the stasis bridge as a normal join after the
swap channel has already been kicked from the bridge.
This patch preserves the swap channel so the AMI/ARI events can note that
the channel joining the bridge is swapping with another channel. Another
benefit to swaqpping in one operation is if there are any channels that
get lonely (MOH, bridge playback, and bridge record channels). The lonely
channels won't leave before the joining channel has a chance to come back
in under stasis if the swap channel is the only reason the lonely channels
are staying in the bridge.
ASTERISK-25947 #close
Reported by: Richard Mudgett
Richard Mudgett [Tue, 19 Apr 2016 21:58:32 +0000 (16:58 -0500)]
bridge: Hold off more than one imparting channel at a time.
An earlier patch blocked the ast_bridge_impart() call until the channel
either entered the target bridge or it failed. Unfortuantely, if the
target bridge is stasis and the imprted channel is not a stasis channel,
stasis bounces the channel out of the bridge to come back into the bridge
as a proper stasis channel. When the channel is bounced out, that
released the block on ast_bridge_impart() to continue. If the impart was
a result of a transfer, then it became a race to see if the swap channel
would get hung up before the imparted channel could come back into the
stasis bridge. If the imparted channel won then everything is fine. If
the swap channel gets hung up first then the transfer will fail because
the swap channel is leaving the bridge.
* Allow a chain of ast_bridge_impart()'s to happen before any are
unblocked to prevent the race condition described above. When the channel
finally joins the bridge or completely fails to join the bridge then the
ast_bridge_impart() instances are unblocked.
George Joseph [Tue, 19 Apr 2016 22:52:15 +0000 (16:52 -0600)]
res_pjsip_callerid: Clear out display name if id->name is not valid
When create_new_id_hdr creates a new RPID or PAI header, it starts by cloning
the From header, then it overwrites the display name and uri from the channel's
connected.id. If the connected.id.name wasn't valid, create_new_id_hdr was
leaving the display name from the From header in the new RPID or PAI header.
On an attended transfer where the originator had a caller id number set but not
a display name, the re-INVITE to the final transferee had the number of the
originator but the display name of the transferer.
Added a check to clear out the display name in the new header if
connected.id.name was invalid.
Mark Michelson [Mon, 18 Apr 2016 17:12:37 +0000 (12:12 -0500)]
PJSIP: Remove PJSIP parsing functions from uri length validation.
The PJSIP parsing functions provide a nice concise way to check the
length of a hostname in a SIP URI. The problem is that in order to use
those parsing functions, it's required to use them from a thread that
has registered with PJLib.
On startup, when parsing AOR configuration, the permanent URI handler
may not be run from a PJLib-registered thread. Specifically, this could
happen when Asterisk was started in daemon mode rather than
console-mode. If PJProject were compiled with assertions enabled, then
this would cause Asterisk to crash on startup.
The solution presented here is to do our own parsing of the contact URI
in order to ensure that the hostname in the URI is not too long. The
parsing does not attempt to perform a full SIP URI parse/validation,
since the hostname in the URI is what is important.
Mark Michelson [Mon, 18 Apr 2016 22:00:42 +0000 (17:00 -0500)]
res_pjsip_registrar: Fix bad memory-ness with user_agent.
Recent changes to the PJSIP registrar resulted in tests failing due to
missing AOR_CONTACT_ADDED test events. The reason for this was that the
user_agent string had junk values in it, resulting in being unable to
generate the event.
I'm going to be honest here, I have no idea why this was happening. Here
are the steps needed for the user_agent variable to get messed up:
* REGISTER is received
* First contact in the REGISTER results in a contact being removed
* Second contact in the REGISTER results in a contact being added
* The contact, AOR, expiration, and user agent all have to be passed as
format parameters to the creation of a string. Any subset of those
parameters would not be enough to cause the problem.
Looking into what was happening, the thing that struck me as odd was
that the user_agent variable was meant to be set to the value of the
User-Agent SIP header in the incoming REGISTER. However, when removing a
contact, the user_agent variable would be set (via ast_strdupa inside a
loop) to the stored contact's user_agent. This means that the
user_agent's value would be incorrect when attempting to process further
contacts in the incoming REGISTER.
The fix here is to use a different variable for the stored user agent
when removing a contact. Correcting the behavior to be correct also
means the memory usage is less weird, and the issue no longer occurs.
res_pjsip_transport_management: Allow unload to occur.
At shutdown it is possible for modules to be unloaded that wouldn't
normally be unloaded. This allows the environment to be cleaned up.
The res_pjsip_transport_management module did not have the unload
logic in it to clean itself up causing the res_pjsip module to not
get unloaded. As a result the res_pjsip monitor thread kept going
processing traffic and timers when it shouldn't.
Mark Michelson [Thu, 14 Apr 2016 18:49:35 +0000 (13:49 -0500)]
transport management: Register thread with PJProject.
The scheduler thread that kills idle TCP connections was not registering
with PJProject properly and causing assertions if PJProject was built in
debug mode.
This change registers the thread with PJProject the first time that the
scheduler callback executes.
"Idle" here means that someone connects to us and does not send a SIP
request. PJProject will not automatically time out such connections, so
it's up to Asterisk to do it instead.
When we receive an incoming TCP connection, we will start a timer
(equivalent to transaction timer D) waiting to receive an incoming
request. If we do not receive a request in that timeframe, then we will
shut down the TCP connection.
Some SIP devices indicate hold/unhold using deferred SDP reinvites. In
other words, they provide no SDP in the reinvite.
A typical transaction that starts hold might look something like this:
* Device sends reinvite with no SDP
* Asterisk sends 200 OK with SDP indicating sendrecv on streams.
* Device sends ACK with SDP indicating sendonly on streams.
At this point, PJMedia's SDP negotiator saves Asterisk's local state as
being recvonly.
Now, when the device attempts to unhold, it again uses a deferred SDP
reinvite, so we end up doing the following:
* Device sends reinvite with no SDP
* Asterisk sends 200 OK with SDP indicating recvonly on streams
* Device sends ACK with SDP indicating sendonly on streams
The problem here is that Asterisk offered recvonly, and by RFC 3264's
rules, if an offer is recvonly, the answer has to be sendonly. The
result is that the device is not taken off hold.
What is supposed to happen is that Asterisk should indicate sendrecv in
the 200 OK that it sends. This way, the device has the freedom to
indicate sendrecv if it wants the stream taken off hold, or it can
continue to respond with sendonly if the purpose of the reinvite was
something else (like a session timer refresher).
The fix here is to alter the SDP negotiator's state when we receive a
reinvite with no SDP. If the negotiator's state is currently in the
recvonly or inactive state, then we alter our local state to be
sendrecv. This way, we allow the device to indicate the stream state as
desired.
ASTERISK-25854 #close
Reported by Robert McGilvray
Richard Mudgett [Mon, 28 Mar 2016 23:10:40 +0000 (18:10 -0500)]
res_stasis: Fix crash on a hanging up channel.
* Give the struct stasis_app_control ao2 object a ref to the channel held
in the object. Now the channel will still be around if a thread needs to
post a stasis message instead of crash because the topic was destroyed.
* Moved stopping any lingering silence generator out of the struct
stasis_app_control destructor and made it a part of exiting the Stasis
application. Who knows which thread the destructor will be called under
so it cannot affect the channel's silence generator. Not only was the
channel unprotected when the silence generator was stopped, stasis may no
longer even control the channel.
Kevin Harwell [Wed, 16 Mar 2016 17:37:01 +0000 (12:37 -0500)]
chan_pjsip: transfers with direct media reinvite has wrong address/port
During a transfer involving direct media a race occurs between when the
transferer channel is swapped out, initiating rtp changes/updates, and the
subsequent reinvites.
When Alice, after speaking with Charlie (Bob is on hold), connects Bob and
Charlie invites are sent to each in order to establish the call between them.
Bob is taken off hold and Charlie is told to have his media flow through
Asterisk. However, if before those invites go out the bridge updates Bob's
and/or Charlie's rtp information with direct media data (i.e. address, port)
then the invite(s) will contain the remote data in the SDP instead of the
Asterisk data.
The race occurs in the native bridge glue code when updating the peer. The
direct_media_address can get set twice before sending out the first invite
during call connection. This can happen because the checking/setting of the
direct_media_address happened in one thread while the sending of the invite(s)
happened in another thread.
This fix removes the race condition by moving the checking/setting of the
direct_media_address to be in the same thread as the sending of the invites(s).
This serializes the checking/setting and sending so they can no longer happen
out of order.
install_prereq: Update repositories before install on Debian systems
When to install packages the indexed local is more old of the
version of software on the repository they have been upgraded by security
update then get the package will give 404 not found.
The patch prevent by update local index to repository for aptitude before
install.
install_prereq: Check if is installed aptitude otherwise to install.
If in Debian or system based, dont have aptitude installed the script do
nothing. This patch checked if aptitude installed, if not installed.
Also, if execute script with all packages installed yet, the script not show
nothing and return exit 1 because the command 'grep' get nothing from pipe from
'awk'.
res_pjsip_refer.c: Fix seg fault in process of Refer-to header.
The "Refer-to" header of an incoming REFER request is parsed by
pjsip_parse_uri(). That function requires the URI parameter to be NULL
terminated. Unfortunately, the previous code added the NULL terminator by
overwriting memory that may not be safe. The overwritten memory results
could be benign, memory corruption, or a segmentation fault. Now the URI
is NULL terminated safely by copying the URI to a new chunk of memory with
the correct size to be NULL terminated.
Richard Mudgett [Fri, 11 Mar 2016 18:22:48 +0000 (12:22 -0600)]
chan_sip.c: Fix mwi resub deadlock potential.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
Richard Mudgett [Thu, 10 Mar 2016 23:01:12 +0000 (17:01 -0600)]
chan_sip.c: Fix registration timeout and expire deadlock potential.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
Richard Mudgett [Thu, 10 Mar 2016 18:17:09 +0000 (12:17 -0600)]
chan_sip.c: Fix t38id deadlock potential.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
Richard Mudgett [Wed, 9 Mar 2016 22:34:53 +0000 (16:34 -0600)]
chan_sip.c: Fix reinviteid deadlock potential.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
* Fix retrans_pkt() to call check_pendings() with both the owner channel
and the private objects locked as required.
* Refactor dialog retransmission packet list to safely remove packet
nodes. The list nodes are now ao2 objects. The list has a ref and the
scheduled entry has a ref.
Richard Mudgett [Wed, 9 Mar 2016 22:26:26 +0000 (16:26 -0600)]
chan_sip.c: Fix waitid deadlock potential.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
* Made always run check_pendings() under the scheduler thread so scheduler
ids can be checked safely.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
Richard Mudgett [Mon, 7 Mar 2016 19:21:44 +0000 (13:21 -0600)]
chan_sip.c: Fix autokillid deadlock potential.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
* Fix clearing autokillid in __sip_autodestruct() even though we could
reschedule.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
Richard Mudgett [Fri, 11 Mar 2016 03:54:03 +0000 (21:54 -0600)]
chan_sip.c: Clear scheduled immediate events on unload.
This patch is part of a series to resolve deadlocks in chan_sip.c.
The reordering of chan_sip's shutdown is to handle any immediate events
that get put onto the scheduler so resources aren't leaked. The typical
immediate events at this time are going to be concerned with stopping
other scheduled events.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Delaying destruction of the chan_sip sip_pvt structures caused the
/channels/chan_sip/test_sip_rtpqos unit test to crash. That test
registers a special test ast_rtp_engine with the rtp engine module. When
the unit test completes it cleans up by unregistering the test
ast_rtp_engine and exits. Since the delayed destruction of the sip_pvt
happens after the unit test returns, the destructor tries to call the rtp
engine destroy callback of the test ast_rtp_engine auto variable which no
longer exists on the stack.
* Change the test ast_rtp_engine auto variable to a static variable. Now
the variable can still exist after the unit test exits so the delayed
sip_pvt destruction can complete successfully.