George Joseph [Mon, 28 Mar 2016 04:33:29 +0000 (22:33 -0600)]
config: Allow filters when appending to a category
In sorcery based config files where there are multiple categories with the same
name, you can't use the (+) operator to reliably append to a category because
config.c stops looking when it finds the first one with the same name.
Example:
[1000]
type = endpoint
[1000]
type = aor
[1000](+)
authenticate_qualify = yes
This config will fail because config.c appends authenticate_qualify to the
first category it finds, the endpoint, and that's not valid for endpoint.
Solution:
The capability to find a category that contains a certain variable already
exists so the only real change was to parse anything after the '+' that's not a
comma, as a filter string.
[1000]
type = endpoint
[1000]
type = aor
[1000](+type=aor)
authenticate_qualify = yes
This now works as expected.
Although the following example doesn't make any sense for pjsip, you can even
specify multiple filters:
[1000](+type=aor&qualify_frequency=10)
ASTERISK-25868 #close Reported-by: Nick Repin
Change-Id: I10773da4c79db36fbf1993961992af63d3441580
check_installed_debs wasn't handling virtual packages like libsrtp-dev and
libresample-dev and on multiarch systems it was accidentally filtering out all
packages if any :i386 packages were found instead of just filtering out the
:i386 packages themselves.
George Joseph [Wed, 30 Mar 2016 23:34:42 +0000 (17:34 -0600)]
pjproject_bundled: Fix use of LDCONFIG for shared library link creation
LDCONFIG apparently isn't set to something sane on all systems so the creation
of the shared library links fails. Instead of just testing for non-blank,
main/Makefile now checks that LDCONFIG is actually executable and reverts to
LN if it isn't.
This applies to both libasteriskpj and libasteriskssl.
Thanks to 'abelbeck' for pointing out that the issue was LDCONFIG.
ASTERISK-25873 #close Reported-by: Hans van Eijsden
Change-Id: I25b76379bc637726ec044b2c0e709b56b3701729
Richard Mudgett [Tue, 29 Mar 2016 23:06:24 +0000 (18:06 -0500)]
res_ari: Cannot get control also means channel is unavailable.
The only caller of ari_bridges_play_found() has this note:
If ari_bridges_play_found fails because the channel is unavailable for
playback, The channel will be removed from the playback list soon. We can
keep trying to get channels from the list until we either get one that
will work or else there isn't a channel for this bridge anymore, in which
case we'll revert to ari_bridges_play_new.
Richard Mudgett [Tue, 29 Mar 2016 18:47:08 +0000 (13:47 -0500)]
res_stasis: Add control ref to playback and recording structs.
The stasis_app_playback and stasis_app_recording structs need to have a
struct stasis_app_control ref. Other threads can get a reference to the
playback and recording structs from their respective global container.
These other threads can then use the control pointer they contain after
the control struct has gone.
* Add control ref to stasis_app_playback and stasis_app_recording structs.
With the refs added, the control command queue can now have a circular
control reference which will cause the control struct to never get
released if the control's command queue is not flushed when the channel
leaves the Stasis application. Also the command queue needs better
protection from adding commands if the control->is_done flag is set.
Richard Mudgett [Mon, 28 Mar 2016 23:10:40 +0000 (18:10 -0500)]
res_stasis: Fix crash on a hanging up channel.
* Give the struct stasis_app_control ao2 object a ref to the channel held
in the object. Now the channel will still be around if a thread needs to
post a stasis message instead of crash because the topic was destroyed.
* Moved stopping any lingering silence generator out of the struct
stasis_app_control destructor and made it a part of exiting the Stasis
application. Who knows which thread the destructor will be called under
so it cannot affect the channel's silence generator. Not only was the
channel unprotected when the silence generator was stopped, stasis may no
longer even control the channel.
George Joseph [Wed, 30 Mar 2016 17:38:47 +0000 (11:38 -0600)]
res_pjsip_mwi: Allow subscribe to vm access extension as an alias
Background:
If your extension is 1000 and the voicemail access extension is 1571 and you
dial 1571, usually a dialplan rule calls voicemailmain with your extension and
you are placed directly in your mailbox. Therefore most admins program the
voicemail (or other speed dial) button on their phones to the access extension.
Some phones (Snom at least) use whatever is programmed there to also subscribe
for MWI and so can't dial one number and subscribe to another. This works fine
in chan_sip because chan_sip completely ignores the user portion of the
SUBSCRIBE message request URI. If it can match the peer, is subscribes to the
peer's mailbox. The user could be set to anything or nothing and you'd still
get subscribed to your mailbox.
Issue:
chan_pjsip actually uses the user portion of the URI to find an aor and its
mailboxes. Therefore a subscribe to 1571 results in a 404. Sure, you can
create an aor for 1571 but you certainly can't add your entire voicemail
system's mailboxes to it and everyone would get notified of every MWI.
Solution:
When an MWI subscribe comes in and an aor can't be found that matches the
resource directly, check the resource against the endpoint's aors. If an aor
is found that has a voicemail_extension that matches the resource, use it.
ASTERISK-25865 Reported-by: Ross Beer
Change-Id: I770ea185f751f1ada888fafb4b452115f1c06e9e
George Joseph [Fri, 25 Mar 2016 03:55:03 +0000 (21:55 -0600)]
res_pjsip_mwi: Add voicemail extension and mwi_subscribe_replaces_unsolicited
res_pjsip_mwi was missing the chan_sip "vmexten" functionality which adds
the Message-Account header to the MWI NOTIFY. Also, specifying mailboxes
on endpoints for unsolicited mwi and on aors for subscriptions required
that the admin know in advance which the client wanted. If you specified
mailboxes on the endpoint, subscriptions were rejected even if you also
specified mailboxes on the aor.
Voicemail extension:
* Added a global default_voicemail_extension which defaults to "".
* Added voicemail_extension to both endpoint and aor.
* Added ast_sip_subscription_get_dialog for support.
* Added ast_sip_subscription_get_sip_uri for support.
When an unsolicited NOTIFY is constructed, the From header is parsed, the
voicemail extension from the endpoint is substituted for the user, and the
result placed in the Message-Account field in the body.
When a subscribed NOTIFY is constructed, the subscription dialog local uri
is parsed, the voicemail_extension from the aor (looked up from the
subscription resource name) is substituted for the user, and the result
placed in the Message-Account field in the body.
If no voicemail extension was defined, the Message-Account field is not added
to the NOTIFY body.
mwi_subscribe_replaces_unsolicited:
* Added mwi_subscribe_replaces_unsolicited to endpoint.
The previous behavior was to reject a subscribe if a previous internal
subscription for unsolicited MWI was found for the mailbox. That remains the
default. However, if there are mailboxes also set on the aor and the client
subscribes and mwi_subscribe_replaces_unsolicited is set, the existing internal
subscription is removed and replaced with the external subscription. This
allows an admin to configure mailboxes on both the endpoint and aor and allows
the client to select which to use.
ASTERISK-25865 #close Reported-by: Ross Beer
Change-Id: Ic15a9415091760539c7134a5ba3dc4a6a1217cea
George Joseph [Wed, 30 Mar 2016 14:46:32 +0000 (08:46 -0600)]
res_rtp_asterisk: Fix placement of txcount increment
Commit 1bce690ccb36a4744a327c07af23a9a3a0fa20cd was incrementing txcount
for rtcp packets as well as rtp packets and that was causing sender reports
to be generated instead of receiver reports in cases where no rtp was actually
being sent.
Moved the txcount increment from __rtp_sento, which handles both rtp and rtcp,
to rtp_sento which only handles rtp packets.
George Joseph [Sun, 27 Mar 2016 03:33:14 +0000 (21:33 -0600)]
chan_pjsip: Add 'pjsip show channelstats'
Added the ability to show channel statistics to chan_pjsip (cli_functions.c)
Moved the existing 'pjsip show channel(s)' functionality from
pjsip_configuration to cli_functions.c. The stats needed chan_pjsip's
private header so it made sense to move the existing channel commands as well.
Now using stasis_cache_dump to get the channel snapshots rather than retrieving
all endpoints, then getting each one's channel snapshots. Much more efficient.
George Joseph [Fri, 11 Mar 2016 01:52:14 +0000 (18:52 -0700)]
res_pjsip/pjsip_options: Fix From generation on outgoing OPTIONS
No one seemed to notice but every time an OPTIONS goes out, it goes
out with a From of "asterisk" (or whatever the default from_user is set to),
even if you specify an endpoint.
The issue had several causes...
qualify_contact is only called with an endpoint if called from the CLI.
If the endpoint is NULL, qualify_contact only looks up the endpoint if
authenticate_qualify=yes. Even then, it never passes it on to
ast_sip_create_request where the From header is set. Therefore From
is always "asterisk" (or whatever the default from_user is set to).
Even if ast_sip_create_request were to get an endpoint, it only sets
the From if endpoint->from_user is set.
The fix is 4 parts...
First, create_out_of_dialog_request was modified to use the endpoint id
if endpoint was specified and from_user is not set.
Second, qualify_contact was modified to always look up an endpoint if
one wasn't specified regardless of authenticate_qualify. It then passes
the endpoint on to create_out_of_dialog_request.
Third (and most importantly), find_an_endpoint was modified to find
an endpoint by using an "aors LIKE %contact->aor%" predicate with
ast_sorcery_retrieve_by_fields. As such, this patch will only work
if the sorcery realtime optimizations patch goes in. Otherwise we'd
be pulling the entire endpoints database every time we send an OPTIONS.
Since we already know the contact's aor, the on_endpoint callback was also
modified to just check if the contact->aor is an exact match to one of
the endpoint's.
Finally, since we now have an endpoint for every OPTIONS request,
res_pjsip/endpt_send_request (which handles out-of-dialog reqests) was
updated to get the transport from the endpoint and set it on tdata.
Now the correct transport is used.
Jacek Konieczny [Fri, 25 Mar 2016 15:42:12 +0000 (16:42 +0100)]
app_echo: forward and generate VIDUPDATE frames
When using app_echo via WebRTC with VP8 video the video would appear
only after a few minutes, because there would be nothing to request
a full reference frame.
This fixes the problem in both ways:
- echos any VIDUPDATE frames received on the channel
- sends one such frame when first video frame is to be forwarded
This makes the echo work with Firefox and Chrome WebRTC implementation.
George Joseph [Sun, 27 Mar 2016 17:53:16 +0000 (11:53 -0600)]
res_rtp_asterisk: Fix packet stats on bridged connection
rxcount, txcount, rxoctetcount and txoctetcount weren't being calculated
for bridged streams because the calulations were being done after the
bridged short-circuit. Actually, rxoctetcount wasn't ever being calculated.
Moved the calculations so they occur for all valid received packets and
all transmitted packets. Also added rxoctetcount and txoctetcount to
ast_rtp_instance_stat.
Richard Mudgett [Sat, 26 Mar 2016 04:19:22 +0000 (23:19 -0500)]
res_parking: Fix blind transfer dynamic lots creation.
Blind transfers to a recognized parking extension need to use the parker's
channel variable values to create the dynamic parking lot. This is
because there is always only one parker while the parkee may actually be a
multi-party bridge. A multi-party bridge can never supply the needed
channel variables to create the dynamic parking lot. In the multi-party
bridge blind transfer scenario, the parker's CHANNEL(parkinglot) value and
channel variables are inherited by the local channel used to park the
bridge.
* In park_common_setup(), make use the parker instead of the parkee to
supply the dynamic parking lot channel variable values. In all but one
case, the parkee is the same as the parker. However, in the recognized
parking extension blind transfer scenario for a two party bridge they are
different channels. For consistency, we need to use the parker channel.
* In park_local_transfer(), pass the CHANNEL(parkinglot) value to the
local channel when blind transferring a multi-party bridge to a recognized
parking extension.
* When a local channel starts a call, the Local;2 side needs to inherit
the CHANNEL(parkinglot) value from Local;1.
The DTMF one-touch parking case wasn't even trying to create dynamic
parking lots before it aborted the attempt.
* In parking_park_call(), add missing code to create a dynamic parking
lot.
A DTMF bridge hook is documented as returning -1 to remove the hook.
Though the hook caller is really coded to accept non-zero. See the
ast_bridge_hook_callback typedef.
* In feature_park_call(), don't remove the DTMF one-touch parking hook
because of an error.
ASTERISK-24605 #close
Reported by: Philip Correia
Patches:
call_park.patch (license #6672) patch uploaded by Philip Correia
George Joseph [Tue, 8 Mar 2016 21:55:30 +0000 (14:55 -0700)]
sorcery/res_pjsip: Refactor for realtime performance
There were a number of places in the res_pjsip stack that were getting
all endpoints or all aors, and then filtering them locally.
A good example is pjsip_options which, on startup, retrieves all
endpoints, then the aors for those endpoints, then tests the aors to see
if the qualify_frequency is > 0. One issue was that it never did
anything with the endpoints other than retrieve the aors so we probably
could have skipped a step and just retrieved all aors. But nevermind.
This worked reasonably well with local config files but with a realtime
backend and thousands of objects, this was a nightmare. The issue
really boiled down to the fact that while realtime supports predicates
that are passed to the database engine, the non-realtime sorcery
backends didn't.
They do now.
The realtime engines have a scheme for doing simple comparisons. They
take in an ast_variable (or list) for matching, and the name of each
variable can contain an operator. For instance, a name of
"qualify_frequency >" and a value of "0" would create a SQL predicate
that looks like "where qualify_frequency > '0'". If there's no operator
after the name, the engines add an '=' so a simple name of
"qualify_frequency" and a value of "10" would return exact matches.
The non-realtime backends decide whether to include an object in a
result set by calling ast_sorcery_changeset_create on every object in
the internal container. However, ast_sorcery_changeset_create only does
exact string matches though so a name of "qualify_frequency >" and a
value of "0" returns nothing because the literal "qualify_frequency >"
doesn't match any name in the objset set.
So, the real task was to create a generic string matcher that can take a
left value, operator and a right value and perform the match. To that
end, strings.c has a new ast_strings_match(left, operator, right)
function. Left and right are the strings to operate on and the operator
can be a string containing any of the following: = (or NULL or ""), !=,
>, >=, <, <=, like or regex. If the operator is like or regex, the
right string should be a %-pattern or a regex expression. If both left
and right can be converted to float, then a numeric comparison is
performed, otherwise a string comparison is performed.
To use this new function on ast_variables, 2 new functions were added to
config.c. One that compares 2 ast_variables, and one that compares 2
ast_variable lists. The former is useful when you want to compare 2
ast_variables that happen to be in a list but don't want to traverse the
list. The latter will traverse the right list and return true if all
the variables in it match the left list.
Now, the backends' fields_cmp functions call ast_variable_lists_match
instead of ast_sorcery_changeset_create and they can now process the
same syntax as the realtime engines. The realtime backend just passes
the variable list unaltered to the engine. The only gotcha is that
there's no common realtime engine support for regex so that's been noted
in the api docs for ast_sorcery_retrieve_by_fields.
Only one more change to sorcery was done... A new config flag
"allow_unqualified_fetch" was added to reg_sorcery_realtime.
"no": ignore fetches if no predicate fields were supplied.
"error": same as no but emit an error. (good for testing)
"yes": allow (the default);
"warn": allow but emit a warning. (good for testing)
Now on to res_pjsip...
pjsip_options was modified to retrieve aors with qualify_frequency > 0
rather than all endpoints then all aors. Not only was this a big
improvement in realtime retrieval but even for config files there's an
improvement because we're not going through endpoints anymore.
res_pjsip_mwi was modified to retieve only endpoints with something in
the mailboxes field instead of all endpoints then testing mailboxes.
res_pjsip_registrar_expire was completely refactored. It was retrieving
all contacts then setting up scheduler entries to check for expiration.
Now, it's a single thread (like keepalive) that periodically retrieves
only contacts whose expiration time is < now and deletes them. A new
contact_expiration_check_interval was added to global with a default of
30 seconds.
Ross Beer reports that with this patch, his Asterisk startup time dropped
from around an hour to under 30 seconds.
There are still objects that can't be filtered at the database like
identifies, transports, and registrations. These are not going to be
anywhere near as numerous as endpoints, aors, auths, contacts however.
Back to allow_unqualified_fetch. If this is set to yes and you have a
very large number of objects in the database, the pjsip CLI commands
will attempt to retrive ALL of them if not qualified with a LIKE.
Worse, if you type "pjsip show endpoint <tab>" guess what's going to
happen? :) Having a cache helps but all the objects will have to be
retrieved at least once to fill the cache. Setting
allow_unqualified_fetch=no prevents the mass retrieve and should be used
on endpoints, auths, aors, and contacts. It should NOT be used for
identifies, registrations and transports since these MUST be
retrieved in bulk.
Richard Mudgett [Fri, 18 Mar 2016 19:01:02 +0000 (14:01 -0500)]
res_parking: Misc fixes.
res/parking/parking_applications.c:
* Add malloc fail checks in setup_park_common_datastore().
* Fix playing parking failed announcement to only happen on non-blind
transfers in park_app_exec(). It could never go out before because a test
was provedly always false.
res/parking/parking_bridge.c:
* Fix NULL tolerance in generate_parked_user() because
bridge_parking_push() can theoretically pass a NULL parker channel if the
parker channel went away for some reason.
* Clarify some weird code dealing with blind_transfer in
bridge_parking_push().
res/parking/parking_bridge_features.c:
* Made park_local_transfer() set BLINDTRANSFER on the Local;1 channel
which will be bulk copied to the Local;2 channel on the subsequent
ast_call(). The additional advantage is if the parker channel has the
BLINDTRANSFER and ATTENDEDTRANSFER variables set they are now guaranteed
to be overridden.
res/parking/parking_manager.c:
* Fix AMI Park action input range checking of the Timeout header in
manager_park().
* Reduced locking scope to where needed in manager_park().
res/res_parking.c:
* Fix some off nominal missing unlocks by eliminating the returns.
Alexander Traud [Thu, 24 Mar 2016 19:08:10 +0000 (20:08 +0100)]
chan_sip: Do not send all codecs on INVITE. Do not break on Session-Timers.
Asterisk 13.7.0 included a fix for ASTERISK-24543, not to send all those
codecs, which the caller did not request/support. That fix was not complete
because on the second Session Timer all codecs were sent again. Some VoIP/SIP
clients interpreted that complete codec-list as a change in the SIP session.
Because of that, Asterisk did not send the RTP audio via NAT anymore which
created a non-audio scenario after the second Session Timer fired.
Gianluca Merlo [Sat, 19 Mar 2016 12:34:26 +0000 (13:34 +0100)]
config: fix flags in uint option handler
The configuration unsigned integer option handler sets flags for the
parser as if the option should be a signed integer (PARSE_INT32),
leading to errors on "out of range" values. Fix flags (PARSE_UINT32).
A fix to res_pjsip is also present which stops invalid flags from
being passed when registering sorcery object fields for qualify
status.
Mark Michelson [Thu, 10 Mar 2016 22:58:49 +0000 (16:58 -0600)]
Restrict CLI/AMI commands on shutdown.
During stress testing, we have frequently seen crashes occur because a
CLI or AMI command attempts to access information that is in the process
of being destroyed.
When addressing how to fix this issue, we initially considered fixing
individual crashes we observed. However, the changes required to fix
those problems would introduce considerable overhead to the nominal
case. This is not reasonable in order to prevent a crash from occurring
while Asterisk is already shutting down.
Instead, this change makes it so AMI and CLI commands cannot be executed
if Asterisk is being shut down. For AMI, this is absolute. For CLI,
though, certain commands can be registered so that they may be run
during Asterisk shutdown.
Walter Doekes [Thu, 24 Mar 2016 12:45:06 +0000 (13:45 +0100)]
musiconhold: Only warn if music class is not found in memory and database.
The log message when a MusicOnHold music class was not found was changed
from debug level to WARNING level in Asterisk 11.19 and 13.5. For those
using realtime musiconhold, this message is wrong because it warns
before checking the database.
This changeset delays the warning until after the database has been
checked.
Walter Doekes [Thu, 24 Mar 2016 10:38:16 +0000 (11:38 +0100)]
core/logging: Fix broken syslog levels on older glibc.
The fix to ASTERISK-25407 introduced the usage of LOG_MAKEPRI. However
this macro is broken in older glibc (< 2.17); it would left-shift the
facility a second time, causing the resultant priority to become
invalid.
The syslog manpage mentions nothing about LOG_MAKEPRI and suggests this:
The priority argument is formed by ORing the facility and the level
values [...].
chan_sip.c: Space after port causes unnecessary resolution attempt
check_via() already skips leading blanks where the sent-by address (with the
optional port) should be placed.
Since RFC 3261 allows for blanks between the port ant the Via parameters:
> https://tools.ietf.org/html/rfc3261#section-20.42
(actually it allows a lot of blanks more ;-)). I just switched from
ast_skip_blanks() to ast_strip() on the local copy of the string.
Gianluca Merlo [Sat, 19 Mar 2016 01:32:51 +0000 (02:32 +0100)]
func_aes: fix misuse of strlen on binary data
The encryption code for AES_ENCRYPT evaluates the length of the data to
be encoded in base64 using strlen. The data is binary, thus the length
of it can be underestimated at the first NULL character.
Reuse the write pointer offset to evaluate it, instead.
Kevin Harwell [Wed, 16 Mar 2016 17:37:01 +0000 (12:37 -0500)]
chan_pjsip: transfers with direct media reinvite has wrong address/port
During a transfer involving direct media a race occurs between when the
transferer channel is swapped out, initiating rtp changes/updates, and the
subsequent reinvites.
When Alice, after speaking with Charlie (Bob is on hold), connects Bob and
Charlie invites are sent to each in order to establish the call between them.
Bob is taken off hold and Charlie is told to have his media flow through
Asterisk. However, if before those invites go out the bridge updates Bob's
and/or Charlie's rtp information with direct media data (i.e. address, port)
then the invite(s) will contain the remote data in the SDP instead of the
Asterisk data.
The race occurs in the native bridge glue code when updating the peer. The
direct_media_address can get set twice before sending out the first invite
during call connection. This can happen because the checking/setting of the
direct_media_address happened in one thread while the sending of the invite(s)
happened in another thread.
This fix removes the race condition by moving the checking/setting of the
direct_media_address to be in the same thread as the sending of the invites(s).
This serializes the checking/setting and sending so they can no longer happen
out of order.
install_prereq: Update repositories before install on Debian systems
When to install packages the indexed local is more old of the
version of software on the repository they have been upgraded by security
update then get the package will give 404 not found.
The patch prevent by update local index to repository for aptitude before
install.
install_prereq: Check if is installed aptitude otherwise to install.
If in Debian or system based, dont have aptitude installed the script do
nothing. This patch checked if aptitude installed, if not installed.
Also, if execute script with all packages installed yet, the script not show
nothing and return exit 1 because the command 'grep' get nothing from pipe from
'awk'.
res_pjsip_refer.c: Fix seg fault in process of Refer-to header.
The "Refer-to" header of an incoming REFER request is parsed by
pjsip_parse_uri(). That function requires the URI parameter to be NULL
terminated. Unfortunately, the previous code added the NULL terminator by
overwriting memory that may not be safe. The overwritten memory results
could be benign, memory corruption, or a segmentation fault. Now the URI
is NULL terminated safely by copying the URI to a new chunk of memory with
the correct size to be NULL terminated.
Richard Mudgett [Fri, 11 Mar 2016 18:22:48 +0000 (12:22 -0600)]
chan_sip.c: Fix mwi resub deadlock potential.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
Richard Mudgett [Thu, 10 Mar 2016 23:01:12 +0000 (17:01 -0600)]
chan_sip.c: Fix registration timeout and expire deadlock potential.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
Richard Mudgett [Thu, 10 Mar 2016 18:17:09 +0000 (12:17 -0600)]
chan_sip.c: Fix t38id deadlock potential.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
Richard Mudgett [Wed, 9 Mar 2016 22:34:53 +0000 (16:34 -0600)]
chan_sip.c: Fix reinviteid deadlock potential.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
* Fix retrans_pkt() to call check_pendings() with both the owner channel
and the private objects locked as required.
* Refactor dialog retransmission packet list to safely remove packet
nodes. The list nodes are now ao2 objects. The list has a ref and the
scheduled entry has a ref.
Richard Mudgett [Wed, 9 Mar 2016 22:26:26 +0000 (16:26 -0600)]
chan_sip.c: Fix waitid deadlock potential.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
* Made always run check_pendings() under the scheduler thread so scheduler
ids can be checked safely.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
Richard Mudgett [Mon, 7 Mar 2016 19:21:44 +0000 (13:21 -0600)]
chan_sip.c: Fix autokillid deadlock potential.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
* Fix clearing autokillid in __sip_autodestruct() even though we could
reschedule.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Stopping a scheduled event can result in a deadlock if the scheduled event
is running when you try to stop the event. If you hold a lock needed by
the scheduled event while trying to stop the scheduled event then a
deadlock can happen. The general strategy for resolving the deadlock
potential is to push the actual starting and stopping of the scheduled
events off onto the scheduler/do_monitor() thread by scheduling an
immediate one shot scheduled event. Some restructuring may be needed
because the code may assume that the start/stop of the scheduled events is
immediate.
Richard Mudgett [Fri, 11 Mar 2016 03:54:03 +0000 (21:54 -0600)]
chan_sip.c: Clear scheduled immediate events on unload.
This patch is part of a series to resolve deadlocks in chan_sip.c.
The reordering of chan_sip's shutdown is to handle any immediate events
that get put onto the scheduler so resources aren't leaked. The typical
immediate events at this time are going to be concerned with stopping
other scheduled events.
This patch is part of a series to resolve deadlocks in chan_sip.c.
Delaying destruction of the chan_sip sip_pvt structures caused the
/channels/chan_sip/test_sip_rtpqos unit test to crash. That test
registers a special test ast_rtp_engine with the rtp engine module. When
the unit test completes it cleans up by unregistering the test
ast_rtp_engine and exits. Since the delayed destruction of the sip_pvt
happens after the unit test returns, the destructor tries to call the rtp
engine destroy callback of the test ast_rtp_engine auto variable which no
longer exists on the stack.
* Change the test ast_rtp_engine auto variable to a static variable. Now
the variable can still exist after the unit test exits so the delayed
sip_pvt destruction can complete successfully.
Joshua Colp [Mon, 14 Mar 2016 13:59:10 +0000 (10:59 -0300)]
build: Add configure check for proto field of PJSIP TLS transport setting.
Older versions of PJSIP do not have the proto field on the TLS transport
setting structure. This change adds a configure check so even if it is
not present we will still be able to build.
George Joseph [Sat, 12 Mar 2016 22:02:20 +0000 (15:02 -0700)]
build_system: Split COMPILE_DOUBLE from DONT_OPTIMIZE
I can't ever recall actually needing the intermediate files or the checking
that a double compile produces. What I CAN remember is every DONT_OPTIMIZE
build needing 3 invocations of gcc instead of 1 just to do the checks and
produce those intermediate files.
Having said that, Richard pointed out that the reason for the double compile
was that there were cases in the past where a submitted patch failed to compile
because the submitter never tried it with the optimizations turned on.
To get the best of both worlds, COMPILE_DOUBLE has been split into its own
option. If DONT_OPTIMIZE is turned on, COMPILE_DOUBLE will also be selected
BUT you can then turn it off if all you need are the debugging symbols. This
way you have to make an informed decision about disabling COMPILE_DOUBLE.
To allow COMPILE_DOUBLE to be both auto-selected and turned off, a new feature
was added to menuselect. The <use> element can now contain an "autoselect"
attribute which will turn the used member on but not create a hard dependency.
The cflags.xml implementation for COMPILE_DOUBLE looks like this...
When DONT_OPTIMIZE is turned on, COMPILE_DOUBLE is turned on because
of the use.
When DONT_OPTIMIZE is turned off, COMPILE_DOUBLE is turned off because
of the depend.
When COMPILE_DOUBLE is turned on, DONT_OPTIMIZE is turned on because
of the depend.
When COMPILE_DOUBLE is turned off, DONT_OPTIMIZE is left as is because
it only uses COMPILE_DOUBLE, it doesn't depend on it.
I also made a few tweaks to the ncurses implementation to move things
left a bit to allow longer descriptions.
George Joseph [Thu, 10 Mar 2016 19:09:13 +0000 (12:09 -0700)]
pjproject: Pass (dont_)optimize flags to pjproject and fix pjsua
The pjproject Makefile now uses the Asterisk optimization flags which
are determined by the setting of the DONT_OPTMIZE menuselect flag.
The Makefile was also restructured so a change to the top level
menuselect.makeopts will result in a rebuild of pjproject.
Also, "--disable-resample" was removed from the pjproject configure
options. Without resample, pjsua (which is used by the testsuite)
can't make audio calls. When it can't, it segfaults.
Walter Doekes [Fri, 11 Mar 2016 22:03:08 +0000 (23:03 +0100)]
app_chanspy: Fix occasional deadlock with ChanSpy and Local channels.
Channel masquerading had a conflict with autochannel locking.
When locking autochannel->channel, the channel is fetched from the
autochannel and then locked. During the fetch, the autochannel -- which
has no locks itself -- can be modified by someone who owns the channel
lock. That means that the value of autochan->channel cannot be trusted
until you hold the lock.
In practice, this caused problems with Local channels getting
masqueraded away while the ChanSpy attempted to get info from that
channel. The old channel which was about to get removed got locked, but
the new (replaced) channel got unlocked (no-op). Because the replaced
channel was now locked (and would never get unlocked), it couldn't get
removed from the channel list in a timely manner, and would now cause
deadlocks when iterating over the channel list.
This change checks the autochannel after locking the channel for changes
to the autochannel. If the channel had been changed, the lock is
reobtained on the new channel.
In theory it seems possible that after this fix, the lock attempt on the
old (wrong) channel can be on an already destroyed lock, maybe causing
a crash. But that hasn't been observed in the wild and is harder induce
than the current deadlock.
Thanks go to Filip Frank for suggesting a fix similar to this and
especially to IRC user hexanol for pointing out why this deadlock was
possible and testing this fix. And to Richard for catching my rookie
while loop mistake ;)
George Joseph [Sun, 6 Mar 2016 20:38:41 +0000 (13:38 -0700)]
res_pjsip: Strip spaces from items parsed from comma-separated lists
Configurations like "aors = a, b, c" were either ignoring everything after "a"
or trying to look up " b". Same for mailboxes, ciphers, contacts and a few
others.