Joshua Colp [Tue, 12 Nov 2019 11:00:44 +0000 (07:00 -0400)]
parking: Use channel snapshot instead of channel.
There exists a scenario where a thread can hold a lock on the
channels container while trying to lock a bridge. At the same
time another thread can hold the lock for said bridge while
attempting to retrieve a channel. This causes a deadlock.
This change fixes this scenario by retrieving a channel snapshot
instead of a channel, as information present in the snapshot
is all that is needed.
Kevin Harwell [Tue, 12 Nov 2019 18:36:37 +0000 (12:36 -0600)]
res_pjsip_session: initialize pending's topology to endpoint's
Found during some testing, there is a race condition between selecting an
appropriate bridge type for a call versus the applying of media on the callee's
session. In some instances a native bridge type would have been chosen, but
due to the callee's media not yet being established at bridge compatibility
check time the simple bridge type is picked instead.
When using chan_pjsip this initiates a topology change event. The topologies
are then compared for the two sessions. However, when the topology was created
for the caller its streams are initialized to "inactive". This topology is then
used as a base when creating the callee's topology, and streams. Soon after
the caller's topology's stream(s) get updated based on the sdp (get set to
sendrecv in the failing scenario).
Now when the topology change event is raised, and the two topologies are
compared, the comparison fails due to a stream state mismatch (sendrecv vs
inactive). And since they differ a reinvite is sent out (to the caller in
this case).
This patch makes it such that when the caller's topology is initially created
it gets created based on its configured endpoint's media topology. When the
endpoint's topology is created its stream's state(s) are initialized to
sendrecv instead of inactive. Subsequently, now when the callee's topology is
created its topology streams are now initialized to sendrecv. Thus when the
topology change event occurs due to the mentioned scenario the stream states
match for the given sessions, and the reinvite is not sent unless due to some
other valid mismatch.
Note, this patch only changes one pending media state's creation point. It's
possible other places *could* be changed, however for now it was deemed best
to only alter what's here.
George Joseph [Wed, 6 Nov 2019 11:47:17 +0000 (04:47 -0700)]
stasis: Don't hold app_registry and session locks unnecessarily
resource_events:stasis_app_message_handler() was locking the session,
then attempting to determine if the app had debug enabled which
locked the app_registry container. res_stasis:__stasis_app_register
was locking the app_registry container then calling app_update
which caused app_handler (which locks the session) to run.
The result was a deadlock.
* Updated resource_events:stasis_app_message_handler() to determine
if debug was set (which locks the app_registry) before obtaining the
session lock.
* Updated res_stasis:__stasis_app_register to release the app_registry
container lock before calling app_update (which locks the sesison).
Joshua Colp [Thu, 24 Oct 2019 10:21:31 +0000 (07:21 -0300)]
res_ari_events: Add module reference when a WebSocket is open.
This change ensures that the module isn't unloaded when a
WebSocket is open. Previously it was possible to unload the
module manually or during shutdown which could cause a crash
when any active WebSockets were terminated.
George Joseph [Fri, 18 Oct 2019 11:36:12 +0000 (05:36 -0600)]
ExternalMedia: Change return object from ExternalMedia to Channel
When we created the External Media addition to ARI we created an
ExternalMedia object to be returned from the channels/externalMedia
REST endpoint. This object contained the channel object that was
created plus local_address and local_port attributes (which are
also in the Channel variables). At the time, we thought that
creating an ExternalMedia object would give us more flexibility
in the future but as we created the sample speech to text
application, we discovered that it doesn't work so well with ARI
client libraries that a) don't have the ExternalMedia object
defined and/or b) can't promote the embedded channel structure
to a first-class Channel object.
This change causes the channels/externalMedia REST endpoint to
return a Channel object (like channels/create and channels/originate)
instead of the ExternalMedia object.
Salah Ahmed [Fri, 18 Oct 2019 09:22:22 +0000 (11:22 +0200)]
Crash during "pjsip show channelstats" execution
During execution "pjsip show channelstats" cli command by an
external module asterisk crashed. It seems this is a separate
thread running to fetch and print rtp stats. The crash happened on
the ao2_lock method, just before it going to read the rtp stats on
a rtp instance. According to gdb backtrace log, it seems the
session media was already cleaned up at that moment.
app_voicemail.c: Support multiple file formats for forwarded messages.
If you specify multiple formats in voicemail.conf, eg. "format = gsm|wav"
and are using realtime ODBC backend, only the first format gets stored
in the database. So when you forward a message later on, there is a bug
generating the email, related to the stored format (GSM) being different
than the desired email format (WAV) specified for the user. Sox can
handle this, but Asterisk needs to tell sox exactly what to do.
cdr_pgsql cel_pgsql res_config_pgsql: compatibility with PostgreSQL 12
PostgreSQL 12 finally removed column adsrc from table pg_catalog.pg_attrdef
(column default values), which has been deprecated since version 8.0.
Since then, the official/correct/supported way to retrieve the column
default value from the catalog is function pg_catalog.pg_get_expr().
This change breaks compatibility with pre-8.0 PostgreSQL servers,
but has reached end-of-support more than a decade ago.
cdr_pgsql and res_config_pgsql still have support for pre-7.3
servers, but cleaning that up is perhaps a topic for a major release,
not this bugfix.
Kevin Harwell [Thu, 10 Oct 2019 20:30:06 +0000 (15:30 -0500)]
res_pjsip_mwi: potential double unref, and potential unwanted double link
When creating an unsolicited MWI aggregate subscription it was possible for
the subscription object to be double unref'ed. This patch removes the explicit
unref as it is not needed since the RAII_VAR will handle it at function end.
Less concerning there was also a bug that could potentially allow the aggregate
subscription object to be added to the unsolicited container twice. This patch
ensures it is added only once.
George Joseph [Wed, 9 Oct 2019 14:32:45 +0000 (08:32 -0600)]
pjproject_bundled: Replace earlier reverts with official fixes.
Issues in pjproject 2.9 caused us to revert some of their changes
as a work around. This introduced another issue where pjproject
wouldn't build with older gcc versions such as that found on
CentOS 6. This commit replaces the reverts with the official
fixes for the original issues and allows pjproject to be built
on CentOS 6 again.
Kevin Harwell [Wed, 9 Oct 2019 20:17:59 +0000 (15:17 -0500)]
pbx: deadlock when outgoing dialed channel hangs up too quickly
Here's the basic scenario that occurred when executing an AMI fast originate
while at the same time something else locks the channels container, and also
wants a lock on the dialed channel:
1. pbx_outgoing_attempt obtains a lock on a dialed channel
2. concurrently another thread obtains a lock on the channels container, and
subsequently requests a lock on the dialed channel. It waits on #1. For
instance, "core show channel <dialed channel"
3. the outgoing call does not fail, but ends before the pbx_outgoing_attempt
function exits
4. pbx_outgoing_attempt function exits, the outgoing structure destructs, and
attempts to hang up the dialed channel
5. hang up tries to obtain the channels container lock, but can't due to #2.
6. Asterisk is deadlocked.
The solution was to allow the pbx_outgoing_exec function to "steal" ownership
of the dialed channel, and handle hanging it up. The channel now is either hung
up prior to it being potentially locked by the initiating thread, or if locked
the hang up takes place in a different thread, thus alleviating the deadlock.
ASTERISK-28561
patches:
iliketrains.diff submitted by Joshua Colp (license 5000)
Reason for revert: Problematic for users who store their voicemail
on network storage devices, or share voicemail storage between
multiple Asterisk instances.
Kevin Harwell [Wed, 2 Oct 2019 17:18:50 +0000 (12:18 -0500)]
res_pjsip_mwi: use an ao2_global object for mwi containers
On shutdown it's possible for the unsolicited mwi container to be freed before
other dependent threads are done using it. This patch ensures this can no
longer happen by wrapping the container in an ao2_global object. The solicited
container was also changed too.
Kevin Harwell [Wed, 2 Oct 2019 17:17:43 +0000 (12:17 -0500)]
serializer: move/add asterisk serializer pool functionality
Serializer pools have previously existed in Asterisk. However, for the most
part the code has been duplicated across modules. This patch abstracts the
code into an 'ast_serializer_pool' object. As well the code is now centralized
in serializer.c/h.
In addition serializer pools can now optionally be monitored by a shutdown
group. This will prevent the pool from being destroyed until all serializers
have completed.
Kevin Harwell [Wed, 2 Oct 2019 17:18:09 +0000 (12:18 -0500)]
res_pjsip/res_pjsip_mwi: use centralized serializer pools
Both res_pjsip and res_pjsip_mwi made use of serializer pools. However, they
both implemented their own serializer pool functionality that was pretty much
identical in each of the source files. This patch removes the duplicated code,
and uses the new 'ast_serializer_pool' object instead.
Additionally res_pjsip_mwi enables a shutdown group on the pool since if the
timing was right the module could be unloaded while taskprocessor threads still
needed to execute, thus causing a crash.
This improves the way which stasis_state reference counting works.
Since manager->states holds onto the proxy object instead of the real
object this allows stasis_state objects to be freed when appropriate
without use of a special state_remove function. Additionally each
distinct eid associated with the state holds a reference to the state to
prevent early release and potentially allow easier debug of leaks.
There are some warning messages which are not informative without endpoint:
"No registered subscribe handler for event presence.winfo"
"No registered publish handler for event presence"
This patch adds an endpoint name to these messages.
Sean Bright [Wed, 25 Sep 2019 16:01:33 +0000 (12:01 -0400)]
pbx: Prevent Realtime switch crash on invalid priority
pbx_extension_helper takes two 'context' arguments. One (con) is a
pointer directly to a 'struct ast_context' and the other (context) is
the name of the context. In all cases, one of these arguments is NULL
and the other is non-NULL.
Functions that are ultimately called by pbx_extension_helper expect that
'context' will be non-NULL, so we set it unconditionally on entry into
this function.
Ben Ford [Tue, 24 Sep 2019 20:44:14 +0000 (15:44 -0500)]
taskprocessor.c: Added "like" support to 'core show taskprocessors'
Added "like" support for 'core show taskprocessors'. Now you
can specify a specific set of taskprocessors (or just one) by
adding the keyword "like" to the above command, followed by
your search criteria.
Sean Bright [Wed, 18 Sep 2019 11:56:05 +0000 (07:56 -0400)]
res_musiconhold: Add new 'playlist' mode
Allow the list of files to be played to be provided explicitly in the
music class's configuration. The primary driver for this change is to
allow URLs to be used for MoH.
Kevin Harwell [Tue, 24 Sep 2019 16:21:12 +0000 (11:21 -0500)]
res_pjsip_pubsub: change warning to debug
The following message:
"Subscription request from endpoint <blah> rejected. Expiration of 0 is invalid"
Would sometimes spam the log with warnings if Asterisk restarted and a bunch
of clients sent unsubscribes. This patch changes it from a warning to a debug
message.
astobj2.c declares DEBUG_THREADS_LOOSE_ABI to avoid overhead of debug
threads tracking information in the internal structures of astobj2.
Unfortunately this means that ao2_global_obj contains the statically
allocated debug threads tracking fields which are used by initialization
and cleanup but main/astobj2.c believed those fields and associated
space did not exist.
Ben Ford [Tue, 24 Sep 2019 14:40:35 +0000 (09:40 -0500)]
taskprocessor.c: Add CLI commands to reset taskprocessor stats.
Added two new CLI commands to reset stats for taskprocessors. You can
reset stats for a single, specific taskprocessor ('core reset
taskprocessor <taskprocessor>'), or you can reset all taskprocessors
('core reset taskprocessors'). These commands will reset the counter for
the number of tasks processed as well as the max queue size.
We've found a connection re-use regression in pjproject 2.9
introduced by commit
"Close #1019: Support for multiple listeners."
https://trac.pjsip.org/repos/changeset/6002
https://trac.pjsip.org/repos/ticket/1019
Normally, multiple SSL requests should reuse the same connection
if one already exists to the remote server. When a transport
error occurs, the next request should establish a new connection
and any following requests should use that same one. With this
patch, when a transport error occurs, every new request creates
a new connection so you can wind up with thousands of open tcp
sockets, possibly exhausting file handles, and increasing memory
usage.
Reverting pjproject commit 6002 (and related 6021) restores the
expected behavior.
We also found a memory leak in SSL processing that was introduced by
commit
"Fixed #2204: Add OpenSSL remote certificate chain info"
https://trac.pjsip.org/repos/changeset/6014
https://trac.pjsip.org/repos/ticket/2204
Apparently the remote certificate chain is continually recreated
causing the leak.
Reverting pjproject commit 6014 (and related 6022) restores the
expected behavior.
Both of these issues have been acknowledged by Teluu.
Previous to this patch passing a NULL tag to ao2_alloc or ao2_ref based
functions would result in the reference not being logged under
REF_DEBUG. This could sometimes cause inaccurate logging if NULL was
accidentally passed to a reference action. Now reference logging is
only disabled by option passed to the allocation method.
Kevin Harwell [Mon, 23 Sep 2019 16:01:36 +0000 (11:01 -0500)]
res_sorcery_memory_cache: stale item update leak
When a stale item was being updated the object was being retrieved, but its
reference was not being decremented after the update. This patch makes it so
the object is now appropriately de-referenced.
George Joseph [Mon, 23 Sep 2019 12:09:29 +0000 (06:09 -0600)]
astmm.c: Display backtrace with memory show allocations
You can currently capture backtraces of memory allocations but they
only get displayed when you stop asterisk and the atexit hooks
are enabled. Now, if memory backtrace is on and you issue a
"memory show allocations" CLI command for a specific file, then
a backtrace will show for each allocation that occurred after
you turned "memory backtrace on". The backtrace display is shown
only when a specific file's allocations are displayed to prevent
a massive CLI dump of every file's allocations.
stasis: refcounter.py can incorrectly report skewed objects.
It is possible for topic->name to be NULL, this causes the allocation
reference to not be logged. Use the name variable instead which has
been verified to be a non-empty.
This change adds support to the JITTERBUFFER dialplan function
for audio and video synchronization. When enabled the RTCP SR
report is used to produce an NTP timestamp for both the audio and
video streams. Using this information the video frames are queued
until their NTP timestamp is equal to or behind the NTP timestamp
of the audio. The audio jitterbuffer acts as the leader deciding
when to shrink/grow the jitterbuffer when adaptive is in use. For
both adaptive and fixed the video buffer follows the size of the
audio jitterbuffer.
chan_pjsip: Relock correct channel during "fax" redirect.
When fax detection occurs on an outbound PJSIP channel the
redirect operation will result in a masquerade occurring and
the underlying channel on the session changing. The code
incorrectly relocked the new channel instead of the old
channel when returning. This resulted in the new channel
being locked indefinitely. The code now always acts on the
expected channel.
On FreeBSD using the clang/llvm compiler build fails to build due
to the switch statement argument being a non integer type expression.
Switch to an if/else if/else construct to sidestep the issue.
Ben Ford [Tue, 3 Sep 2019 17:20:20 +0000 (12:20 -0500)]
res_rtp_asterisk.c: Send RTCP as compound packets.
According to RFC3550, ALL RTCP packets must be sent in a compond packet
of at least two individual packets, including SR/RR and SDES. REMB,
FIR, and NACK were not following this format, and as a result, would
fail the packet check in ast_rtcp_interpret. This was found from writing
unit tests for RTCP. The browser would accept the way we were
constructing these RTCP packets, but when sending directly from one
Asterisk instance to another, the above mentioned problem would occur.
Sean Bright [Wed, 11 Sep 2019 20:58:29 +0000 (16:58 -0400)]
channels: Allow updating variable value
When modifying an already defined variable in some channel drivers they
add a new variable with the same name to the list, but that value is
never used, only the first one found.
Introduce ast_variable_list_replace() and use it where appropriate.
ASTERISK-23756 #close
Patches:
setvar-multiplie.patch submitted by Michael Goryainov
sungtae kim [Tue, 27 Aug 2019 22:44:33 +0000 (00:44 +0200)]
res_musiconhold: Added unregister realtime moh class
This fix allows a realtime moh class to be unregistered from the command
line. This is useful when the contents of a directory referenced by a
realtime moh class have changed.
The realtime moh class is then reloaded on the next request and uses the
new directory contents.
Ben Ford [Wed, 28 Aug 2019 19:25:57 +0000 (14:25 -0500)]
res_rtp: Add unit tests for RTCP stats.
Added unit tests for RTCP video stats. These tests include NACK, REMB,
FIR/FUR/PLI, SR/RR/SDES, and packet loss statistics. The REMB and FIR
tests are currently disabled due to a bug. We expect to receive a
compound packet, but the code sends this out as a single packet, which
the browser accepts, but makes Asterisk upset.
While writing these tests, I noticed an issue with NACK as well. Where
it is handling a received NACK request, it was reading in only the first
8 bits of following packets that were also lost. This has been changed
to the correct value of 16 bits.
Also made a minor fix to the data buffer unit test.
ChanIsAvail() generates a CDR when unanswered=yes in cdr.conf.
ChanIsAvail() creates a temporary channel with ast_request() to test
resource availability. It should not generate a CDR when it hangs up
this temporary channel.
This patch disables CDR generation for the temporary channel with
ast_cdr_set_property().
chan_dahdi: set CHANNEL(hangupsource) when a PRI channel hangs up
When the remote ISDN party ends an ISDN call on a PRI link
(DISCONNECT), CHANNEL(hangupsource) information is not available.
chan_dahdi already contains an ast_set_hangupsource() in
__dahdi_exception() function but it seems that ISDN message processing
does not use this part of code.
Two other channel modules associate ast_queue_hangup() and
ast_set_hangupsource() functions calls:
- chan_pjsip in chan_pjsip_session_end() function,
- chan_sip in sip_queue_hangup_cause() function.
chan_iax2 separates them, in iax2_queue_hangup()/iax2_destroy() and
set_hangup_source_and_cause().
Thus, I propose to add ast_set_hangupsource() beside
ast_queue_hangup() in sig_pri_queue_hangup(), like chan_pjsip and
chan_sip already do.