Walter Doekes [Wed, 25 Nov 2015 19:29:42 +0000 (20:29 +0100)]
main: Slight refactor of main. Improve color situation.
Several issues are addressed here:
- main() is large, and half of it is only used if we're not rasterisk;
fixed by spliting up the daemon part into a separate function.
- Call ast_term_init from rasterisk as well.
- Remove duplicate code reading/writing asterisk history file.
- Attempt to tackle background color issues and color changes that
occur. Tested by starting asterisk -c until the colors stopped
changing at odd locations.
Corey Farrell [Tue, 24 Nov 2015 19:07:12 +0000 (14:07 -0500)]
res_pjsip_notify: Fix CLI usage info
The usage info for 'pjsip send notify' previously referenced the
chan_sip configuration sip_notify.conf. Fix this to reference
the correct configuration pjsip_notify.conf.
Richard Mudgett [Mon, 23 Nov 2015 20:27:27 +0000 (14:27 -0600)]
res_sorcery_realtime.c: Fix crash from NULL sorcery object type.
If the sorcery object type is not found a NULL is returned.
Unfortunately, sorcery_realtime_filter_objectset() will crash after
complaining about not finding the object type and saying to expect errors.
* Use ao2_cleanup() instead of ao2_ref() to prevent the crash.
Matt Jordan [Sat, 21 Nov 2015 03:08:49 +0000 (21:08 -0600)]
chan_pjsip: Handle T.38 faxes with direct media bridges
When a channel is in a direct media bridge, a re-INVITE may arrive that forces
Asterisk to re-negotiate the media to a T.38 fax. When this occurs, the bridge
must change its technology to a simple bridge, and re-INVITE the media back
to Asterisk.
Generally, this logic mostly already exists in Asterisk. However, prior to this
patch, there were a few bugs:
(1) The T.38 framehook currently prevents a channel capable of T.38 faxes from
ever entering into a direct media bridge. This applies even when the only
media being passed over the channel is audio. This patch fixes this bug
by having the framehook specify that it defers caring about any frame type.
This allows the channels to enter into a direct media bridge, which will
be broken when a re-INVITE is received.
(2) When a re-INVITE is received, nothing instructed the bridging layer to
re-inspect the allowed bridging technology. This now occurs when either
a re-INVITE is received from a peer, or when a response is received from
the far end (that is, when the T.38 state changes to either
T38_PEER_REINVITE or T38_LOCAL_REINVITE).
(3) chan_pjsip needs to do a small amount of work to prevent a direct media
bridge from being chosen when a T.38 session is in progress. When a T.38
session supplement has a t38 datastore - which is added when we detect
we should start thinking about T.38 on a channel - we now refuse a native
RTP bridge.
(4) When a BYE request is received, we don't terminate the T.38 session. If
the other side of a T.38 fax survives the hangup (due to the 'g' flag
in Dial, for example), we don't currently re-INVITE the media on the
other channel back to audio. This patch now has res_pjsip_t38 intercept
BYE requests and inform the far side that the T.38 session is terminated.
This naturally causes the correct re-INVITEs to be sent.
Matt Jordan [Sat, 21 Nov 2015 03:07:27 +0000 (21:07 -0600)]
res/res_pjsip_t38: Add debug statements
This patch adds some debug statements to res_pjsip_t38. These statements help
to determine which SDP negotiation callbacks are being executed, and, when
a particular callback exits, why a callback may not have applied its logic
to the local or remote SDP.
Matt Jordan [Thu, 22 Oct 2015 14:44:43 +0000 (09:44 -0500)]
main/cli: Use proper string methods to check existence of context/exten/app
Because the context, extension, and application are stored in stringfields,
checking for them being NULL doesn't work so well. This patch uses the
appropriate string library call, ast_strlen_zero, to see if there is a value
in the context/exten/app values.
Matt Jordan [Wed, 18 Nov 2015 15:43:08 +0000 (09:43 -0600)]
res/res_endpoint_stats: Add module to emit endpoint StatsD statistics
This patch adds a module that emits StatsD statistics about Asterisk
endpoints. This includes:
* A GUAGE statistic for endpoint states, tracking how many endpoints are in
a particular state.
* A GUAGE statistic for each endpoint, counting the number of channels
currently associated with an endpoint.
Matt Jordan [Wed, 18 Nov 2015 16:07:09 +0000 (10:07 -0600)]
res_pjsip/pjsip_options: Add StatsD statistics for PJSIP contacts
This patch adds the ability to send StatsD statistics related to the
state of PJSIP contacts. This includes:
* A GUAGE statistic measuring the count of contacts in a particular state.
This measures how many contacts are reachable, unreachable, etc.
* The RTT time for each contact, if those contacts are qualified. This
provides StatsD engines useful time-based data about each contact.
Matt Jordan [Fri, 13 Nov 2015 16:34:03 +0000 (10:34 -0600)]
res/res_pjsip_outbound_registration: Add registration statistics for StatsD
This patch adds outbound registration statistics for StatsD. This includes
the following:
* A GUAGE metric for the overall count of outbound registrations.
* A GUAGE metric for each state an outbound registration can be in. As the
outbound registrations change state, the overall count of how many
outbound registrations are in the particular state is changed.
These statistics are particularly useful for systems with a large number of
SIP trunks, and where measuring the change in state of the trunks is useful
for monitoring.
Matt Jordan [Thu, 19 Nov 2015 15:40:24 +0000 (09:40 -0600)]
res/res_pjsip_outbound_registration: Apply configuration on object type load
When Asterisk is configured to use a dynamic sorcery backend (such as
res_sorcery_astdb) with 'registration' objects, it will fail to create the
internal state objects associated with the registration objects on module
load. This is due to nothing actually querying for the specific objects
and calling their sorcery apply handler during module load.
This patch fixes that by calling get_registrations in the sorcery observer's
object_type_loaded handler. Doing this causes the sorcery backends to be
asked for the current state of all registration objects, which causes the
apply handler to be called and the internal run-time state to be created.
Alexander Traud [Wed, 11 Nov 2015 17:51:17 +0000 (18:51 +0100)]
translate: Provide translation modules the result of SDP negotiation.
Previously, a trancoding module did not have access to the joint but cached
format. Therefore, the module did not have access to the attributes negotiated
via SDP (line fmtp). Now, a translation module receives the joint format.
Alexander Traud [Thu, 19 Nov 2015 07:14:33 +0000 (08:14 +0100)]
res_format_attr_h264: Do not reset string buffer.
When no parameter is present, Asterisk does not generate the line fmtp, as
expected. However, because a buffer was reset, even rtpmap and fmtp of previous
media codecs got removed. Now, Asterisk does not reset other codecs in case of
no parameter for H.264.
Matt Jordan [Wed, 18 Nov 2015 16:05:07 +0000 (10:05 -0600)]
res_statsd: Add functions that support variable arguments
Often, the metric names of statistics we are generating for StatsD have some
dynamic component to them. This can be the name of a particular resource, or
some internal status label in Asterisk. With the current set of functions,
callers of the statsd API must first build the metric name themselves, then
pass this to the API functions. This results in a large amount of boilerplate
code and usage of either fixed length static buffers or dynamic memory
allocation, neither of which is desireable.
This patch adds two new functions to the StatsD API that support a printf
style format specifier for constructing the metric name. A dynamic string,
allocated in threadstorage, is used to build the metric name. This eases
the burden on users of the StatsD API.
Richard Mudgett [Tue, 17 Nov 2015 20:53:57 +0000 (14:53 -0600)]
res_pjsip_outbound_registration.c: Fix 423 response handling.
Receiving a 423 Interval Too Brief response after authentication for an
outbound registration attempt results in assuming that the registrar has
rejected the registration permanently. If there are no configured retries
for fatal responses then the outbound registration is stopped for that
endpoint.
For registrations, PJSIP/PJPROJECT intercepts the handling of 423
responses and does not include any authentication in the updated
registration request. When the updated request is challenged then the
Asterisk code assumes that we were challenged again because the peer
rejected the authentication we sent earlier.
* Made registration challenges keep track of the CSeq number to determine
if the received challenge response was for the request we thought we sent.
If the response's CSeq number differs from the CSeq number we last sent
with authentication then authenticate again because it is a challenge to a
different request.
tcambron [Tue, 3 Nov 2015 20:36:43 +0000 (14:36 -0600)]
StatsD: Add res_statsd compatibility
Added a new api to res_statsd.c to allow it to receive a
character pointer for the value argument. This allows for a
'+' and a '-' to easily be sent with the value.
Matt Jordan [Mon, 16 Nov 2015 19:56:49 +0000 (13:56 -0600)]
res/res_pjsip: Fix off nominal crash with requests that fail and have a timer
When a request is sent using pjsip_endpt_send_request and fails, a condition
exists where the request wrapper, which is an AO2 object, may be de-ref'd
more times than it should. This occurs when the request's callback is called,
and, in the callback, the timer on the PJSIP heap is cancelled. When that
occurs, the request wrapper's lifetime is decremented. When
pjsip_endpt_send_request fails, we unilaterally decrement the lifetime of
the request wrapper again, even though we've already cancelled the reference
associated with the timer.
This patch checks the return result of pj_timer_heap_cancel_if_active before
removing the reference associated with the timer. We now only decrement it
in this case if a timer is cancelled as a result of the function call.
Mark Michelson [Fri, 13 Nov 2015 20:03:35 +0000 (14:03 -0600)]
Confbridge: Add a user timeout option
This option adds the ability to specify a timeout, in seconds, for a
participant in a ConfBridge. When the user's timeout has been reached,
the user is ejected from the conference with the CONFBRIDGE_RESULT
channel variable set to "TIMEOUT".
The rationale for this change is that there have been times where we
have seen channels get "stuck" in ConfBridge because a network issue
results in a SIP BYE not being received by Asterisk. While these
channels can be hung up manually via CLI/AMI/ARI, adding some sort of
automatic cleanup of the channels is a nice feature to have.
Joshua Colp [Sat, 14 Nov 2015 13:02:10 +0000 (09:02 -0400)]
hashtab: Add NULL check when destroying iterator.
The hashtab API is pretty NULL tolerant which has resulted
in remaining callers not doing much checks themselves.
Unfortunately the function to destroy an iterator does not
do a NULL check and will result in a crash if passed NULL.
This change fixes that.
Richard Mudgett [Fri, 13 Nov 2015 20:32:10 +0000 (14:32 -0600)]
res_pjsip_rfc3326.c: Fix crash when channel goes away.
If an authenticated incoming caller does not respond to our 200 OK INVITE
response with an ACK then PJSIP will hangup the call. Unfortunately,
there is a chance that the session's channel will go away between one use
of the channel pointer and another when building the BYE request because
the BYE is being built by the monitor thread and not the call's serializer
thread.
* Added a check to ensure that the thread trying to add the Reason header
is the call's serializer thread. This ensures that the channel will not
go away on us.
Mark Michelson [Fri, 13 Nov 2015 20:19:35 +0000 (14:19 -0600)]
Taskprocessors: Increase high-water mark
In practical tests, we have seen certain taskprocessors, specifically
Stasis subscription taskprocessors, cross the recently-added high-water
mark and emit a warning. This high-water mark warning is only intended
to be emitted when things have tanked on the system and things are
heading south quickly. In the practical tests, the Stasis taskprocessors
sometimes had a max depth of 180 tasks in them, and Asterisk wasn't in
any danger at all.
As such, this ups the high-water mark to 500 tasks instead. It also
redefines the SIP threadpool request denial number to be a multiple of
the taskprocessor high-water mark.
Alexander Traud [Wed, 11 Nov 2015 17:46:28 +0000 (18:46 +0100)]
format: Register format-attribute module with cached formats.
In Asterisk 13, cached formats are created before their corresponding format-
attribute module is registered. Cached formats are involved when a local
extension is called. Therefore, ast_format_generate_sdp_fmtp did not work
on local extensions. This change affects the Opus Codec, H.263 (Plus), H.264,
and format-attribute modules provided externally.
Mark Michelson [Thu, 12 Nov 2015 17:17:51 +0000 (11:17 -0600)]
res_pjsip distributor: Don't send 503 response to responses.
When the SIP threadpool is backed up with tasks, we send 503 responses
to ensure that we don't try to overload ourselves. The problem is that
we were not insuring that we were not trying to send a 503 to an
incoming SIP response.
This change makes it so that we only send the 503 on incoming requests.
Mark Michelson [Wed, 11 Nov 2015 23:11:53 +0000 (17:11 -0600)]
res_pjsip: Deny requests when threadpool queue is backed up.
We have observed situations where the SIP threadpool may become
deadlocked. However, because incoming traffic is still arriving, the SIP
threadpool's queue can continue to grow, eventually running the system
out of memory.
This change makes it so that incoming traffic gets rejected with a 503
response if the queue is backed up too much.
Joshua Colp [Thu, 12 Nov 2015 12:24:06 +0000 (08:24 -0400)]
format_cap: Don't append the 'none' format when appending all.
When appending all formats of a type all the codecs are iterated
and added. This operation was incorrectly adding the ast_format_none
format which is special in that it is supposed to be used when no
format is present. It shouldn't be appended.
Steve Davies [Wed, 11 Nov 2015 10:16:22 +0000 (10:16 +0000)]
Further fixes to improper usage of scheduler
When ASTERISK-25449 was closed, a number of scheduler issues mentioned in
the comments were missed. These have since beed raised in ASTERISK-25476
and elsewhere.
This patch attempts to collect all of the scheduler issues discovered so
far and address them sensibly.
Joshua Colp [Wed, 11 Nov 2015 17:04:08 +0000 (13:04 -0400)]
threadpool: Handle worker thread transitioning to dead when going active.
This change adds handling of dead worker threads when moving them
to be active. When this happens the worker thread is removed from
both the active and idle threads container. If no threads are able
to be moved to active then the pool grows as configured.
A unit test has also been added which thrashes the idle timeout
and thread activation to exploit any race conditions between the
two.
Alexander Traud [Tue, 10 Nov 2015 15:27:57 +0000 (16:27 +0100)]
rtp_engine: Init a format-attribute module to its RFC defaults.
Previously, format-attribute modules relied on an existing fmtp line in SDP
negotiation. However, fmtp is optional for several formats like the Opus Codec.
Now, the format-attribute module is called with an empty fmtp, which allows the
module to initialise itself to RFC defaults. Furthermore now, Asterisk is able
to differentiate between internally and externally created formats.
Alexander Traud [Mon, 9 Nov 2015 13:04:43 +0000 (14:04 +0100)]
ast_format_cap: Avoid format creation on module load, use cache instead.
Since Asterisk 13, formats are immutable and cached. However while loading a
module like chan_sip, some formats were created instead using cached ones.
Walter Doekes [Fri, 6 Nov 2015 13:54:59 +0000 (14:54 +0100)]
func_callerid: Document that CALLERID(pres) is available.
CALLERPRES() says that it's deprecated in favor of CALLERID(num-pres)
and CALLERID(name-pres). But for channel driver that don't make a
distinction between the two (e.g. SIP), it makes more sense to get/set
both at once. This change reveals the availability of CALLERID(pres),
CONNECTEDLINE(pres), REDIRECTING(orig-pres), REDIRECTING(to-pres) and
REDIRECTING(from-pres).
Walter Doekes [Fri, 6 Nov 2015 13:36:40 +0000 (14:36 +0100)]
xmldoc: Improve xmldoc wrapping of 'core show ...' output.
Previously, the wrapping did both lookahead and lookback, which,
together with color escape sequences, caused some lines to be wrapped
way earlier than other lines. This led to inconsistent output.
This simplifies the wrapping code and makes it more sane: if maxcolumns
is hit, we simply jump back to the last space and wrap there.
Alexander Traud [Fri, 6 Nov 2015 12:57:15 +0000 (13:57 +0100)]
res_pjsip_sdp_rtp: Enable Opus to be negotiated via SIP/SDP.
In SIP/SDP, Opus has two channels always (see RFC 7587 section 7). The actual
amount of channels is negotiated in-band. Therefore now, the Opus codec and its
attribute rtpmap are registered with two channels.
ASTERISK-24779 #close
Reported by: PowerPBX
Tested by: Alexander Traud
patches:
asterisk-24779.patch submitted by Sean Bright (license #5060)
Jonathan Rose [Tue, 3 Nov 2015 22:19:43 +0000 (16:19 -0600)]
taskprocessor: Add high water mark warnings
If a taskprocessor's queue grows large, this can indicate that there
may be a problem with tasks not leaving the processor or else that
the number of available task processors for a given type of task is
too low. This patch makes it so that if a taskprocessor's task queue
grows above 100 queued tasks that it will emit a warning message.
Warning messages are emitted only once per task processor.
Matt Jordan [Wed, 4 Nov 2015 20:31:28 +0000 (14:31 -0600)]
main/dial: Protect access to the format_cap structure of the requesting channel
When a dial attempt is made that involves a requesting channel, we previously
were not:
a) Protecting access to the native format capabilities structure on the
requesting channel. That is inherently unsafe.
b) Reference bumping the lifetime of the format capabilities structure.
In both cases, something else could sneak in, blow away the format
capabilities, and we'd be holding onto an invalid format_cap structure. When
the newly created channel attempts to construct its format capabilities, things
go poorly.
This patch:
a) Ensures that we get a reference to the native format capabilities while
the requesting channel is locked
b) Holds a reference to the native format capabilities during the creation
of the new channel.
Corey Farrell [Sat, 31 Oct 2015 03:57:58 +0000 (23:57 -0400)]
Fix cli display of build options.
A previous commit reduced the AST_BUILDOPTS compiler define to
only include options that affected ABI. This included some options
that were previously displayed by cli "core show settings". This
change corrects the CLI display while still restricting buildopts.h
to ABI effecting options only.
Matt Jordan [Tue, 3 Nov 2015 17:15:09 +0000 (11:15 -0600)]
res_pjsip/location: Destroy contact_status objects on contact deletion
The contact_status Sorcery objects are currently not destroyed when a contact
is deleted. This causes the contact's last known RTT/status to be 'sticky'
when the contact itself may no longer exist. This patch causes the
contact_status objects associated with both dynamic and static contacts to
be destroyed if the AoR holding those contacts is also destroyed (or via
other paths where a contact may be deleted.)
Matt Jordan [Tue, 3 Nov 2015 16:58:47 +0000 (10:58 -0600)]
pjsip_configuration: On delete, remove the persistent version of an endpoint
When an endpoint is deleted (such as through an API), the persistent endpoint
currently continues to lurk around. While this isn't harmful from a memory
consumption perspective - as all persistent endpoints are reclaimed on
shutdown - it does cause Stasis endpoint related operations to continue
to believe that the endpoint may or may not exist.
This patch causes the persistent endpoint related to a PJSIP endpoint to be
destroyed if the PJSIP endpoint is deleted.
Matt Jordan [Tue, 3 Nov 2015 14:15:16 +0000 (08:15 -0600)]
main/stasis_endpoints: Fix ContactStatusChange JSON for roundtrip_usec field
The JSON packing for the ContactStatusChange event forgot to include the
roundtrip_usec field. As a result, the field never showed up in any event,
even when the data was available. This patch corrects that error by properly
packing the JSON blob with the data.
Corey Farrell [Tue, 3 Nov 2015 02:24:58 +0000 (21:24 -0500)]
chan_sip: Allow websockets to be disabled.
This patch adds a new setting "websockets_enabled" to sip.conf.
Setting this to false allows chan_sip to be used without causing
conflicts with res_pjsip_transport_websocket.
Mark Michelson [Mon, 2 Nov 2015 23:19:21 +0000 (17:19 -0600)]
res_pjsip: Set threadpool max size default to 50.
During a stress test of subscriptions, a huge blast of
subscription-related traffic resulted in the threadpool expanding to a
ridiculous number of threads. The balooning of threads resulted in an
increase of memory, which led to a crash due to being out of memory.
An easy fix for the particular test was to limit the size of the
threadpool, thus reining in the amount of memory that would be used. It
was decided that there really is no downside to having a non-infinite
default value for the maximum size of the threadpool, so this change
introduces 50 threads as the maximum threadpool size for the SIP
threadpool.
Matt Jordan [Mon, 2 Nov 2015 12:57:22 +0000 (06:57 -0600)]
pjsip_options: Schedule/unschedule qualifies on AoR creation/destruction
When an AoR is created or destroyed dynamically, the scheduled OPTIONS
requests that qualify the contacts on the AoR are not necessarily started
or destroyed, particularly for persistent contacts created for that AoR.
This patch adds create/update/delete sorcery observers for an AoR, which
schedule/unschedule the qualifies as expected.
Matt Jordan [Fri, 30 Oct 2015 18:22:23 +0000 (13:22 -0500)]
Makefile: Add a rule 'basic-pbx' that installs the Basic PBX configs
This patch adds a rule for installing the Super Awesome Company based 'Basic
PBX' configuration files. As part of adding this rule, a bit of the content
that makes up installing the configuration files under the 'samples' target
was refactored into a make subroutine for usage by additional later config
make targets.
Joshua Colp [Thu, 29 Oct 2015 13:28:33 +0000 (10:28 -0300)]
res_pjsip_pubsub: Fix assertion when UAS dialog creation fails.
When compiled with assertions enabled one will occur when destroying
the subscription tree when UAS dialog creation fails. This is because
the code assumes that a dialog will always exist on a subscription
tree when in reality during this specific scenario it won't.
This change makes it so a dialog is not removed from the subscription
tree if it is not present.
Alexander Traud [Mon, 26 Oct 2015 16:42:03 +0000 (17:42 +0100)]
chan_sip: Do not send all codecs on INVITE.
Since version 13, Asterisk sent all allowed codecs as callee, even when the
caller did not request/support them. In case of dynamic RTP payloads, this led
to the same ID for different codecs, which is not allowed by SIP/SDP. Now, the
intersection between the requested and the supported codecs is send again.
George Joseph [Tue, 20 Oct 2015 21:02:30 +0000 (15:02 -0600)]
res_pjsip: Add "like" processing to pjsip list and show commands
Add the ability to filter output from pjsip list and show commands
using the "like" predicate like chan_sip.
For endpoints, aors, auths, registrations, identifyies and transports,
the modification was a simple change of an ast_sorcery_retrieve_by_fields
call to ast_sorcery_retrieve_by_regex. For channels and contacts a
little more work had to be done because neither of those objects are
true sorcery objects. That was just removing the non-matching object
from the final container. Of course, a little extra plumbing in the
common pjsip_cli code was needed to parse the "like" and pass the regex
to the get_container callbacks.
Some of the get_container code in res_pjsip_endpoint_identifier was also
refactored for simplicity.
ASTERISK-25477 #close
Reported by: Bryant Zimmerman
Tested by: George Joseph
Kevin Harwell [Wed, 21 Oct 2015 16:51:13 +0000 (11:51 -0500)]
res_pjsip_outbound_registration: registration stops due to fatal 4xx response
During outbound registration it is possible to receive a fatal (any permanent/
non-temporary 4xx, 5xx, 6xx) response from the registrar that is simply due
to a problem with the registrar itself. Upon receiving the failure response
Asterisk terminates outbound registration for the given endpoint.
This patch adds an option, 'fatal_retry_interval', that when set continues
outbound registration at the given interval up to 'max_retries' upon receiving
a fatal response.
Mark Michelson [Thu, 22 Oct 2015 22:07:55 +0000 (17:07 -0500)]
format_cap: Detect vector allocation failures.
A crash was seen on a system that ran out of memory due to Asterisk not
checking for vector allocation failures in format_cap.c. With this
change, if either of the AST_VECTOR_INIT calls fail, we will return a
value indicating failure.
Mark Michelson [Fri, 2 Oct 2015 20:32:09 +0000 (15:32 -0500)]
res_pjsip_pubsub: Prevent sending NOTIFY on destroyed dialog.
A certain situation can result in our attempting to send a NOTIFY on a
destroyed dialog. Say we attempt to send a NOTIFY to a subscriber, but
that subscriber has dropped off the network. We end up retransmitting
that NOTIFY until the appropriate SIP timer says to destroy the NOTIFY
transaction. When the pjsip evsub code is told that the transaction has
been terminated, it responds in kind by alerting us that the
subscription has been terminated, destroying the subscription, and then
removing its reference to the dialog, thus destroying the dialog.
The problem is that when we get told that the subscription is being
terminated, we detect that we have not sent a terminating NOTIFY
request, so we queue up such a NOTIFY to be sent out. By the time that
queued NOTIFY gets sent, the dialog has been destroyed, so attempting to
send that NOTIFY can result in a crash.
The fix being introduced here is actually a reintroduction of something
the pubsub code used to employ. We hold a reference to the dialog and
wait to decrement our reference to the dialog until our subscription
tree object is destroyed. This way, we can send messages on the dialog
even if the PJSIP evsub code wants to terminate earlier than we would
like.
In doing this, some NULL checks for subscription tree dialogs have been
removed since NULL dialogs are no longer actually possible.
Mark Michelson [Tue, 29 Sep 2015 19:53:22 +0000 (14:53 -0500)]
res_pjsip_pubsub: Ensure dialog lock balance.
When sending a NOTIFY, we lock the dialog and then unlock the dialog
when finished. A recent change made it so that the subscription tree's
dialog pointer will be set NULL when sending the final NOTIFY request
out. This means that when we attempt to unlock the dialog, we pass a
NULL pointer to pjsip_dlg_dec_lock(). The result is that the dialog
remains locked after we think we have unlocked it. When a response to
the NOTIFY arrives, the monitor thread attempts to lock the dialog, but
it cannot because we never released the dialog lock. This results in
Asterisk being unable to process incoming SIP traffic any longer.
The fix in this patch is to use a local pointer to save off the pointer
value of the subscription tree's dialog when locking and unlocking the
dialog. This way, if the subscription tree's dialog pointer is NULLed
out, the local pointer will still have point to the proper place and the
dialog lock will be unlocked as we expect.
Mark Michelson [Mon, 28 Sep 2015 21:36:25 +0000 (16:36 -0500)]
res_pjsip_pubsub: Prevent crashes on final NOTIFY.
The SIP dialog is removed from the subscription tree when the final
NOTIFY is sent. However, after the final NOTIFY is sent, the persistence
update function still attempts to access the cseq from the dialog,
resulting in a crash.
This fix removes the subscription persistence at the same time that the
dialog is removed from the subscription tree. This way, there is no
attempt to update persistence when the subscription is being destroyed.
Mark Michelson [Thu, 17 Sep 2015 22:28:30 +0000 (17:28 -0500)]
res_pjsip_pubsub: Remove serializer when sending final NOTIFY.
There have been crashes seen where a taskprocessor's listener is NULL
unexpectedly.
Looking at backtraces, the problem was specifically seen in PJSIP
serializers.
Subscriptions make the mistake of removing a serializer from a dialog
during subscription tree destruction. Since subscription trees are
reference-counted, guaranteeing the circumstances behind the destruction
are not possible. This makes it so that the dialog serializer can be
removed while not holding the dialog lock. This makes it possible for
the distributor to get a pointer to the dialog serializer and have that
serializer get freed out from under it.
The fix for this is to remove the serializer from a subscription dialog
when sending the final NOTIFY. This guarantees that the serializer is
removed with the dialog lock held. By doing this, we guarantee that if
the distributor gains access to the dialog's serializer, it will not be
possible for the serializer to get freed by another thread.
Mark Michelson [Wed, 2 Sep 2015 14:14:19 +0000 (09:14 -0500)]
res_pjsip_pubsub: Fix crash on destruction of empty subscription tree.
If an old persistent subscription is recreated but then immediately
destroyed because it is out of date, the subscription tree will have no
leaf subscriptions on it. This was resulting in a crash when attempting
to destroy the subscription tree.