git.ipfire.org Git - thirdparty/asterisk.git/log

app_queue: Fix crash when unloading module.

When unloading the app_queue module the members in each queue are
destroyed and as part of this they are removed from the pending
members container. Unfortunately a crash would occur as the container
was destroyed before the members were removed.

This change tweaks ordering so the container destruction occurs
after the members are destroyed.

ASTERISK-16115

Change-Id: I48c728668c55aee3d05b751a5d450fb57e87f44b

Merge "app_queue: queue members can receive multiple calls" into certified/13.1

app_queue: queue members can receive multiple calls

It was possible for a queue member that is a member of at least 2 or more
queues to receive mulitiple calls at the same time. This happened because
of a race between when a member was being rung and when the device state
notified the other queue(s) member object of the state change.

This patch makes it so when a queue member is being rung it gets added to
a global pool of queue members. If that same member is tried again, e.g.
from another queue, and it is found to already exist in the pending member
container then it will not ring that member.

ASTERISK-16115 #close

Change-Id: I546dd474776d158c2b6be44205353dee5bac7e48

res_agi: Prevent run_agi from eating frames it shouldn't

The run_agi function is eating control frames when it shouldn't be. This is
causing issues when an AGI is run from CONNECTED_LINE_SEND_SUB in a blond
transfer.

Alice calls Bob. Bob attended transfers to Charlie but hangs up before Charlie
answers.

Alice gets the COLP UPDATE indicating Charlie but Charlie never gets an UPDATE
and is left thinking he's connected to Bob.

In this case, when CONNECTED_LINE_SEND_SUB runs on Alice's channel and it calls
an AGI, the extra eaten frames prevent CONNECTED_LINE_SEND_SUB from running on
Charlie's channel.

The fix was to accumulate deferrable frames in the "forever" loop instead of
dropping them, and re-queue them just before running the actual agi command
or exiting.

ASTERISK-25951 #close

Change-Id: I0f4bbfd72fc1126c2aaba41da3233a33d0433645

Merge "res_stasis: Handle re-enter stasis bridge with swap channel." into certified/13.1

Merge "bridge: Hold off more than one imparting channel at a time." into certified/13.1

Merge "bridge.c: Fixed race condition during attended transfer" into certified/13.1

Merge "bridge.c: Hangup attended transfer target if bridged" into certified/13.1

Merge "bridge.c: Hangup attended transfer target after it has been swapped out" into certified/13.1

Merge "stasis transfer: fix stasis bridge push race part two" into certified/13.1

Merge "stasis transfer: fix a race condition on stasis bridge push" into certified/13.1

Merge "Bridge core: Pass a ref with the swap channel when joining a bridge." into certified/13.1

res_stasis: Handle re-enter stasis bridge with swap channel.

We lose the fact that there is a swap channel if there is one.  We
currently wind up rejoining the stasis bridge as a normal join after the
swap channel has already been kicked from the bridge.

This patch preserves the swap channel so the AMI/ARI events can note that
the channel joining the bridge is swapping with another channel.  Another
benefit to swaqpping in one operation is if there are any channels that
get lonely (MOH, bridge playback, and bridge record channels).  The lonely
channels won't leave before the joining channel has a chance to come back
in under stasis if the swap channel is the only reason the lonely channels
are staying in the bridge.

ASTERISK-25947 #close
Reported by: Richard Mudgett

ASTERISK-24649
Reported by: John Bigelow

ASTERISK-24782
Reported by: John Bigelow

Change-Id: If37ea508831d1fed6dbfac2f191c638fc0a850ee

bridge: Hold off more than one imparting channel at a time.

An earlier patch blocked the ast_bridge_impart() call until the channel
either entered the target bridge or it failed.  Unfortuantely, if the
target bridge is stasis and the imprted channel is not a stasis channel,
stasis bounces the channel out of the bridge to come back into the bridge
as a proper stasis channel.  When the channel is bounced out, that
released the block on ast_bridge_impart() to continue.  If the impart was
a result of a transfer, then it became a race to see if the swap channel
would get hung up before the imparted channel could come back into the
stasis bridge.  If the imparted channel won then everything is fine.  If
the swap channel gets hung up first then the transfer will fail because
the swap channel is leaving the bridge.

* Allow a chain of ast_bridge_impart()'s to happen before any are
unblocked to prevent the race condition described above.  When the channel
finally joins the bridge or completely fails to join the bridge then the
ast_bridge_impart() instances are unblocked.

ASTERISK-25947
Reported by: Richard Mudgett

ASTERISK-24649
Reported by: John Bigelow

ASTERISK-24782
Reported by: John Bigelow

Change-Id: I8fef369171f295f580024ab4971e95c799d0dde1

bridge.c: Fixed race condition during attended transfer

During an attended transfer a thread is started that handles imparting the
bridge channel. From the start of the thread to when the bridge channel is
ready exists a gap that can potentially cause problems (for instance, the
channel being swapped is hung up before the replacement channel enters the
bridge thus stopping the transfer). This patch adds a condition that waits
for the impart thread to get to a point of acceptable readiness before
allowing the initiating thread to continue.

ASTERISK-24782
Reported by: John Bigelow

This patch is a remedial cherry-pick from v13.

Change-Id: I08fe33a2560da924e676df55b181e46fca604577

bridge.c: Hangup attended transfer target if bridged

After completing an attended transfer the transfer target channel was not being
hung up after leaving the bridge. Added an explicit softhangup to hangup said
channel, but only if it was previously bridged.

ASTERISK-24782 #close
Reported by: John Bigelow

This patch is a remedial cherry-pick from v13.

Change-Id: Idde9543d56842369384a5e8c00d72a22bbc39ada

bridge.c: Hangup attended transfer target after it has been swapped out

After completing an attended transfer the transfer target channel (the one that
gets swapped out) was not being hung up after leaving the bridge. This resulted
in a channel possibly being left around. Added an explicit softhangup for the
channel in question after the transfer is successfully completed in order to
make sure the channel is hung up.

ASTERISK-24782 #close
Reported by: John Bigelow
Review: https://reviewboard.asterisk.org/r/4575/

This patch is a remedial cherry-pick from v13.

Change-Id: I26cc0c207acf74ade93e6567febf7b9776452058

stasis transfer: fix stasis bridge push race part two

When swapping a Local channel in place of one already
in a bridge (to complete a bridge attended transfer),
the channel that was swapped out can actually be hung
up before the stasis bridge push callback executes on
the independant transfer thread. This results in the
stasis app loop dropping out and removing the control
that has the the app name which the local replacement
channel needs so it can re-enter stasis.

To avoid this race condition a new push_peek callback
has been added, and called from the ast_bridge_impart
thread before it launches the independant thread that
will complete the transfer. Now the stasis push_peek
callback can copy the stasis app name before the swap
channel can hang up.

ASTERISK-24649
Review: https://reviewboard.asterisk.org/r/4382/

This patch is a remedial cherry-pick from v13.

Change-Id: I307c3b506af5af80ec506f73e8b78a91d79999e0

stasis transfer: fix a race condition on stasis bridge push

After a bridge transfer completes where a local replacement
channel is used, a stasis transfer message with the details
of the transfer is sent. This is processed by stasis which
then sets the stasis app name and replaced channel snapshot
on the replacement channel.

However, since a separate thread was already started to run
stasis on the new replacement channel, a race was on to see
if the message processing would be completed before the app
name was needed, otherwise the channel would be hung up.

This change moves the calls used to set the stasis app name
and the replace snapshot to the bridge_stasis_push function
callback from the bridge transfer logic, allowing the steps
to be completed earlier and more deterministically, and the
race elimianted.

NOTE: the swap channel parameter to bridge_stasis_push (and
thus all bridge push callbacks) must always be present when
performing a swap with another channel.

ASTERISK-24649 #close
Reported by: John Bigelow
Review: https://reviewboard.asterisk.org/r/4341/

This patch is a remedial cherry-pick from v13.

Change-Id: I35c98989786f74cdd7940677002a1a88d34bd2dd

Bridge core: Pass a ref with the swap channel when joining a bridge.

When code imparts a channel into a bridge to swap with another channel, a
ref needs to be held on the swap channel to ensure that it cannot
dissapear before finding it in the bridge.

* The ast_bridge_join() swap channel parameter now always steals a ref for
the swap channel. This is the only change to the bridge framework's
public API semantics.

* bridge_channel_internal_join() now requires the bridge_channel->swap
channel to pass in a ref.

ASTERISK-24649
Reported by: John Bigelow

Review: https://reviewboard.asterisk.org/r/4354/

This patch is a remedial cherry-pick from v13.

Change-Id: I73fdf13a3a1042566281c7d06d6e83e2ef87c120

res_pjsip_callerid: Clear out display name if id->name is not valid

When create_new_id_hdr creates a new RPID or PAI header, it starts by cloning
the From header, then it overwrites the display name and uri from the channel's
connected.id. If the connected.id.name wasn't valid, create_new_id_hdr was
leaving the display name from the From header in the new RPID or PAI header.
On an attended transfer where the originator had a caller id number set but not
a display name, the re-INVITE to the final transferee had the number of the
originator but the display name of the transferer.

Added a check to clear out the display name in the new header if
connected.id.name was invalid.

ASTERISK-25942 #close

Change-Id: I60b4bf7a7ece9b7425eba74151c0b4969cd2738b

ChangeLog: Updated for certified/13.1-cert6

Release summaries: Add summaries for certified/13.1-cert6

Release summaries: Remove previous versions

.version: Update for certified/13.1-cert6

.lastclean: Update for certified/13.1-cert6

realtime: Add database scripts for certified/13.1-cert6

PJSIP: Remove PJSIP parsing functions from uri length validation.

The PJSIP parsing functions provide a nice concise way to check the
length of a hostname in a SIP URI. The problem is that in order to use
those parsing functions, it's required to use them from a thread that
has registered with PJLib.

On startup, when parsing AOR configuration, the permanent URI handler
may not be run from a PJLib-registered thread. Specifically, this could
happen when Asterisk was started in daemon mode rather than
console-mode. If PJProject were compiled with assertions enabled, then
this would cause Asterisk to crash on startup.

The solution presented here is to do our own parsing of the contact URI
in order to ensure that the hostname in the URI is not too long. The
parsing does not attempt to perform a full SIP URI parse/validation,
since the hostname in the URI is what is important.

ASTERISK-25928 #close
Reported by Joshua Colp

Change-Id: Ic3d6c20ff3502507c17244a8b7e2ca761dc7fb60

Merge "res_pjsip_transport_management: Allow unload to occur." into certified/13.1

res_pjsip_registrar: Fix bad memory-ness with user_agent.

Recent changes to the PJSIP registrar resulted in tests failing due to
missing AOR_CONTACT_ADDED test events. The reason for this was that the
user_agent string had junk values in it, resulting in being unable to
generate the event.

I'm going to be honest here, I have no idea why this was happening. Here
are the steps needed for the user_agent variable to get messed up:
* REGISTER is received
* First contact in the REGISTER results in a contact being removed
* Second contact in the REGISTER results in a contact being added
* The contact, AOR, expiration, and user agent all have to be passed as
format parameters to the creation of a string. Any subset of those
parameters would not be enough to cause the problem.

Looking into what was happening, the thing that struck me as odd was
that the user_agent variable was meant to be set to the value of the
User-Agent SIP header in the incoming REGISTER. However, when removing a
contact, the user_agent variable would be set (via ast_strdupa inside a
loop) to the stored contact's user_agent. This means that the
user_agent's value would be incorrect when attempting to process further
contacts in the incoming REGISTER.

The fix here is to use a different variable for the stored user agent
when removing a contact. Correcting the behavior to be correct also
means the memory usage is less weird, and the issue no longer occurs.

ASTERISK-25929 #close
Reported by Joshua Colp

Change-Id: I7cd24c86a38dec69ebcc94150614bc25f46b8c08

res_pjsip_transport_management: Allow unload to occur.

At shutdown it is possible for modules to be unloaded that wouldn't
normally be unloaded. This allows the environment to be cleaned up.

The res_pjsip_transport_management module did not have the unload
logic in it to clean itself up causing the res_pjsip module to not
get unloaded. As a result the res_pjsip monitor thread kept going
processing traffic and timers when it shouldn't.

Change-Id: Ic8cadee131e3b2c436a81d3ae8bb5775999ae00a

ChangeLog: Updated for certified/13.1-cert5

Release summaries: Add summaries for certified/13.1-cert5

Release summaries: Remove previous versions

.version: Update for certified/13.1-cert5

.lastclean: Update for certified/13.1-cert5

realtime: Add database scripts for certified/13.1-cert5

transport management: Register thread with PJProject.

The scheduler thread that kills idle TCP connections was not registering
with PJProject properly and causing assertions if PJProject was built in
debug mode.

This change registers the thread with PJProject the first time that the
scheduler callback executes.

AST-2016-005

Change-Id: I5f7a37e2c80726a99afe9dc2a4a69bdedf661283

Merge "res_pjsip_transport_management: Kill idle TCP connections." into certified/13.1

Merge "Rename res_pjsip_keepalive res_pjsip_transport_management" into certified/13.1

res_pjsip_transport_management: Kill idle TCP connections.

"Idle" here means that someone connects to us and does not send a SIP
request. PJProject will not automatically time out such connections, so
it's up to Asterisk to do it instead.

When we receive an incoming TCP connection, we will start a timer
(equivalent to transaction timer D) waiting to receive an incoming
request. If we do not receive a request in that timeframe, then we will
shut down the TCP connection.

ASTERISK-25796 #close
Reported by George Joseph

AST-2016-005

Change-Id: I7b0d303e5d140d0ccaf2f7af562071e3d1130ac6

Rename res_pjsip_keepalive res_pjsip_transport_management

ASTERISK-25796
Reported by George Joseph

AST-2016-005

Change-Id: Id322a05f927392293570599730050bc677d99433

AST-2016-004: Fix crash on REGISTER with long URI.

Due to some ignored return values, Asterisk could crash if processing an
incoming REGISTER whose contact URI was above a certain length.

ASTERISK-25707 #close
Reported by George Joseph

Patches:
0001-res_pjsip-Validate-that-URIs-don-t-exceed-pjproject-.patch

AST-2016-004

Change-Id: Ic4f5e49f1a83fef4951ffeeef8f443a7f6ac15eb

res_pjsip: Handle deferred SDP hold/unhold properly.

Some SIP devices indicate hold/unhold using deferred SDP reinvites. In
other words, they provide no SDP in the reinvite.

A typical transaction that starts hold might look something like this:

* Device sends reinvite with no SDP
* Asterisk sends 200 OK with SDP indicating sendrecv on streams.
* Device sends ACK with SDP indicating sendonly on streams.

At this point, PJMedia's SDP negotiator saves Asterisk's local state as
being recvonly.

Now, when the device attempts to unhold, it again uses a deferred SDP
reinvite, so we end up doing the following:

* Device sends reinvite with no SDP
* Asterisk sends 200 OK with SDP indicating recvonly on streams
* Device sends ACK with SDP indicating sendonly on streams

The problem here is that Asterisk offered recvonly, and by RFC 3264's
rules, if an offer is recvonly, the answer has to be sendonly. The
result is that the device is not taken off hold.

What is supposed to happen is that Asterisk should indicate sendrecv in
the 200 OK that it sends. This way, the device has the freedom to
indicate sendrecv if it wants the stream taken off hold, or it can
continue to respond with sendonly if the purpose of the reinvite was
something else (like a session timer refresher).

The fix here is to alter the SDP negotiator's state when we receive a
reinvite with no SDP. If the negotiator's state is currently in the
recvonly or inactive state, then we alter our local state to be
sendrecv. This way, we allow the device to indicate the stream state as
desired.

ASTERISK-25854 #close
Reported by Robert McGilvray

Change-Id: I7615737276165eef3a593038413d936247dcc6ed

res_stasis: Fix crash on a hanging up channel.

* Give the struct stasis_app_control ao2 object a ref to the channel held
in the object.  Now the channel will still be around if a thread needs to
post a stasis message instead of crash because the topic was destroyed.

* Moved stopping any lingering silence generator out of the struct
stasis_app_control destructor and made it a part of exiting the Stasis
application.  Who knows which thread the destructor will be called under
so it cannot affect the channel's silence generator.  Not only was the
channel unprotected when the silence generator was stopped, stasis may no
longer even control the channel.

ASTERISK-25882

Change-Id: I21728161b5fe638cef7976fa36a605043a7497e4

res_pjsip_send_to_voicemail.c: Allow either quoted or not send_to_vm reason.

Change-Id: Id6350b3c7d4ec8df7ec89863566645e2b0f441fd

res_pjsip_pubsub: Move where the subscription is stored to after initialized.

A problem arose when testing the AMI subscription listing actions where it
was possible for a subscription that had not been fully initialized to be
listed. This was problematic as the underlying listing code would crash.

This change makes it so the subscription tree is fully set up before it is
added to the list of subscriptions. This ensures that when the listing actions
get the subscription it is valid.

ASTERISK-25738 #close

Change-Id: Iace2b13641c31bbcc0d43a39f99aba1f340c0f48
(cherry picked from commit 1c4f2a920db173412b38aab785ba22c2cc489f89)

ChangeLog: Updated for certified/13.1-cert4

Release summaries: Add summaries for certified/13.1-cert4

Release summaries: Remove previous versions

.version: Update for certified/13.1-cert4

.lastclean: Update for certified/13.1-cert4

realtime: Add database scripts for certified/13.1-cert4

Merge "app_confbridge: Make non-admin users join a muted conference muted." into certified/13.1

Check for OpenSSL defines before trying to use them.

The SSL_OP_NO_TLSv1_1 and SSL_OP_NO_TLSv1_2 defines did not exist prior
to OpenSSL version 1.0.1. A recent commit attempts to, by default, set
these options, which can cause problems on systems with older OpenSSL
installations.

This commit adds a configure script check for those defines and will not
attempt to make use of those if they do not exist. We will print a
warning urging the user to upgrade their OpenSSL installation if those
defines are not present.

Change-Id: I6a2eb9a43fd0738b404d8f6f2cf4b5c22d9d752d

res_stasis_device_state: Fix refcounting error.

Device state subscription lifetimes were governed by when the
subscription was established and unsubscribed from. However, it is
possible that at the time of unsubscription, there could be device state
events still in flight. When those device state events occur, the device
state callback could attempt to dereference a freed pointer. Crash.

This change ensures that the lifetime of the device state subscription
does not end until the underlying stasis subscription has confirmed that
its final message has been sent.

Change-Id: I25a0f1472894c1a562252fb7129671478e25e9b2

ChangeLog: Updated for certified/13.1-cert3

Release summaries: Add summaries for certified/13.1-cert3

.version: Update for certified/13.1-cert3

.lastclean: Update for certified/13.1-cert3

realtime: Add database scripts for certified/13.1-cert3

Merge "AST-2016-003 udptl.c: Fix uninitialized values." into certified/13.1

Merge "AST-2016-002 chan_sip.c: Fix retransmission timeout integer overflow." into certified/13.1

AST-2016-001 http: Provide greater control of TLS and set modern defaults.

This change exposes the configuration of various aspects of the TLS
support and sets the default to the modern standards.

The TLS cipher is now set to the best values according to the
Mozilla OpSec team, different TLS versions can now be disabled, and
the cipher order can be forced to be that of the server instead of
the client.

ASTERISK-24972 #close

Change-Id: I8635470e722ce6d47951a5045ae9ef348271d395

AST-2016-003 udptl.c: Fix uninitialized values.

Sending UDPTL packets to Asterisk with the right amount of missing
sequence numbers and enough redundant 0-length IFP packets, can make
Asterisk crash.

ASTERISK-25603 #close
Reported by: Walter Doekes

ASTERISK-25742 #close
Reported by: Torrey Searle

Change-Id: I97df8375041be986f3f266ac1946a538023a5255

AST-2016-002 chan_sip.c: Fix retransmission timeout integer overflow.

Setting the sip.conf timert1 value to a value higher than 1245 can cause
an integer overflow and result in large retransmit timeout times. These
large timeout times hold system file descriptors hostage and can cause the
system to run out of file descriptors.

NOTE: The default sip.conf timert1 value is 500 which does not expose the
vulnerability.

* The overflow is now detected and the previous timeout time is
calculated.

ASTERISK-25397 #close
Reported by: Alexander Traud

Change-Id: Ia7231f2f415af1cbf90b923e001b9219cff46290

config: Allow options to register when documentation is unavailable.

The config options framework is strict in that configuration options must
be documented unless XML documentation support is not available. In
practice this is useful as it ensures documentation exists however in
off-nominal cases this can cause strange problems.

If it is expected that a config option has a non-zero or non-empty
default value but the config option documentation is unavailable
this reasonable expectation will not be met. This can cause obscure
crashes and weirdness depending on how the code handles it.

This change tweaks the behavior to ensure that the config option
is still allowed to register, apply default values, and be set when
devmode is not enabled. If devmode is enabled then the option can
NOT be set.

This also does not remove the initial documentation error message that
is output on load when registering the configuration option.

ASTERISK-25725 #close

Change-Id: Iec42fca6b35f31326c33fcdc25473f6fd7bc8af8
(cherry picked from commit f22074e5d9ed1882be976299311b8e093d25e1da)

app_confbridge: Make non-admin users join a muted conference muted.

ASTERISK-20987 #close
Reported by: hristo

Change-Id: Ic61a2b524ab3a4cfadf227fc6b3506527bc03f38

res_pjsip_pubsub: Prevent crash from AMI command on freed subscription.

A test recently uncovered that running an ill-timed AMI command to show
inbound subscriptions could cause a crash since Asterisk will try to
operate on a freed subscription.

The fix for this is to remove the subscription tree from the list of
subscriptions at the time that we are sending our final NOTIFY request
out. This way, as the subscription is in the process of dying, it is
inaccessible from AMI.

Change-Id: Ic0239003d8d73e04c47c12dd2a7e23867e5b5b23
(cherry picked from commit b073244c511f9634de57ea401ab9dbebcf2390e8)

res/res_pjsip/presence_xml.c: Add missing 2nd call presence state case.

ASTERISK-25712 #close
Reported by: Richard Mudgett

Change-Id: I70634df24f8c6c3a2c66c45af61d021e4999253f

bridge_basic: don't cache xferfailsound during an attended transfer

The xferfailsound was read from the channel at the beginning of the transfer,
and that value is "cached" for the duration of the transfer. Therefore, changing
the xferfailsound on the channel using the FEATURE() dialplan function does
nothing once the transfer is under way.

This makes it so the transfer code instead gets the xferfailsound configuration
options from the channel when it is actually going to be used.

This patch also fixes a potential memory leak of the props object as well as
making sure the condition variable gets initialized before being destroyed.

ASTERISK-25696 #close

Change-Id: Ic726b0f54ef588bd9c9c67f4b0e4d787934f85e4

test_time: Provide a timeout when waiting.

The test_timezone_watch unit test is written to expect a
condition to be signaled when the inotify daemon thread runs.
There exists a small window where the test_timezone_watch
thread can signal the inotify daemon thread while it is not
reading on the underlying file descriptor. If this occurs
the test_timezone_watch thread will wait indefinitely for a
signal that will never arrive.

This change adds a timeout to the condition so it will return
regardless after a period of time.

Change-Id: Ifed981879df6de3d93acd3ee0a70f92546517390
(cherry picked from commit c8499b8d5adc805efadb91b483d9d987f62891ff)

app: Queue hangup if channel is hung up during sub or macro execution.

This issue was exposed when executing a connected line subroutine.
When connected or redirected subroutines or macros are executed it is
expected that the underlying applications and logic invoked are fast
and do not consume frames. In practice this constraint is not enforced
and if not adhered to will cause channels to continue when they shouldn't.
This is because each caller of the connected or redirected logic does not
check whether the channel has been hung up on return. As a result the
the hung up channel continues.

This change makes it so when the API to execute a subroutine or
macro is invoked the channel is checked to determine if it has hung up.
If it has then a hangup is queued again so the caller will see it
and stop.

ASTERISK-25690 #close

Change-Id: I1f9a8ceb1487df0389f0d346ce0f6dcbcaf476ea

pbx: Deadlock between contexts container and context_merge locks

Recent changes (ASTERISK-25394 commit 2bd27d12223fe33b58c453965ed5c6ed3af7c4f5)
introduced the possibility of a deadlock. Due to the mentioned modifications
ast_change_hints now needs to keep both merge/delete and state callbacks from
occurring while it executes. Unfortunately, sometimes ast_change_hints can be
called with the contexts container locked. When this happens it's possible for
another thread to grab the context_merge_lock before the thread calling into
ast_change_hints does and then try to obtain the contexts container lock. This
of course causes a deadlock between the two threads. The thread calling into
ast_change_hints waits for the other thread to release context_merge_lock and
the other thread is waiting on that one to release the contexts container lock.

Unfortunately, there is not a great way to fix this problem. When hints change,
the subsequent state callbacks cannot run at the same time as a merge/delete,
nor when the usual state callbacks do. This patch alleviates the problem by
having those particular callbacks (the ones run after a hint change) occur in a
serialized task. By moving the context_merge_lock to a task it can now safely be
attempted or held without a deadlock occurring.

ASTERISK-25640 #close
Reported by: Krzysztof Trempala

Change-Id: If2210ea241afd1585dc2594c16faff84579bf302

PJSIP: Prevent deadlock due to dialog/transaction lock inversion.

A deadlock was observed where the monitor thread was stuck, therefore
resulting in no incoming SIP traffic being processed.

The problem occurred when two 200 OK responses arrived in response to a
terminating NOTIFY request sent from Asterisk. The first 200 OK was
dispatched to a threadpool worker, who locked the corresponding
transaction. The second 200 OK arrived, resulting in the monitor thread
locking the dialog. At this point, the two threads are at odds, because
the monitor thread attempts to lock the transaction, and the threadpool
thread loops attempting to try to lock the dialog.

In this case, the fix is to not have the monitor thread attempt to hold
both the dialog and transaction locks at the same time. Instead, we
release the dialog lock before attempting to lock the transaction.

There have also been some debug messages added to the process in an
attempt to make it more clear what is going on in the process.

ASTERISK-25668 #close
Reported by Mark Michelson

Change-Id: I4db0705f1403737b4360e33a8e6276805d086d4a

chan_sip: Add TCP/TLS keepalive to TCP/TLS server

Adds the TCP Keep Alive option to TCP and TLS server sockets. Previously
this option was only being set on session sockets.
http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/
According to the link above, the SO_KEEPALIVE option is useful for knowing
when a TCP connected endpoint has severed communication without indicating
it or has become unreachable for some reason. Without this patch, keep
alive is not set on the socket listening for incoming TCP sessions and
in Komatsu's report this resulted in the thread listening for TCP becoming
stuck in a waiting state.

ASTERISK-25364 #close
Reported by: Hiroaki Komatsu

Change-Id: I7ed7bcfa982b367dc64b4b73fbd962da49b9af36

PJSIP FAX: Fix T.38 automatic reject timer NULL channel pointer dereferences.

When a caller calls a FAX number and then hangs up right after the call is
answered then the T.38 re-INVITE automatic reject timer may still be
running after the channel goes away.

* Added session NULL channel checks on the code paths that get executed by
t38_automatic_reject() to prevent a crash when the T.38 re-INVITE
automatic reject timer expires.

ASTERISK-25168
Reported by: Carl Fortin

Change-Id: I07b6cd23815aedce5044f8f32543779e2f7a2403
(cherry picked from commit 8ea214aed782424a884b9a2f67d6dca270854e83)

Merge "sched.c: Make not return a sched id of 0." into certified/13.1

Merge topic 'ASTERISK-25476' into certified/13.1

* changes:
Audit improper usage of scheduler exposed by 5c713fdf18f. (v13 additions)
Audit improper usage of scheduler exposed by 5c713fdf18f.

Unset BRIDGEPEER when leaving a bridge

Currently if a channel is transferred out of a bridge, the BRIDGEPEER
variable (also BRIDGEPVTCALLID) remain set even once the channel is
out of the bridge. This patch removes these variables when leaving
the bridge.

ASTERISK-25600 #close
Reported by: Mark Michelson

Change-Id: I753ead2fffbfc65427ed4e9244c7066610e546da

sched.c: Make not return a sched id of 0.

According to the API doxygen a sched ID of 0 is valid. Unfortunately, 0
was never returned historically and several users incorrectly coded usage
of the returned sched ID assuming that 0 was invalid.

ASTERISK-25476

Change-Id: Ib19c7ebb44ec9fd393ef6646dea806d4f34e3a20

Audit improper usage of scheduler exposed by 5c713fdf18f. (v13 additions)

chan_sip.c:
* Initialize mwi subscription scheduler ids earlier because of ASTOBJ to
ao2 conversion.

* Initialize register scheduler ids earlier because of ASTOBJ to ao2
conversion.

chan_skinny.c:
* Fix more scheduler usage for the valid 0 id value.

ASTERISK-25476

Change-Id: If9f0e5d99638b2f9d102d1ebc9c5a14b2d706e95

Audit improper usage of scheduler exposed by 5c713fdf18f.

channels/chan_iax2.c:
* Initialize struct chan_iax2_pvt scheduler ids earlier because of
iax2_destroy_helper().

channels/chan_sip.c:
channels/sip/config_parser.c:
* Fix initialization of scheduler id struct members. Some off nominal
paths had 0 as a scheduler id to be destroyed when it was never started.

chan_skinny.c:
* Fix some scheduler id comparisons that excluded the valid 0 id.

channel.c:
* Fix channel initialization of the video stream scheduler id.

pbx_dundi.c:
* Fix channel initialization of the packet retransmission scheduler id.

ASTERISK-25476

Change-Id: I07a3449f728f671d326a22fcbd071f150ba2e8c8

res_sorcery_realtime.c: Fix crash from NULL sorcery object type.

If the sorcery object type is not found a NULL is returned.
Unfortunately, sorcery_realtime_filter_objectset() will crash after
complaining about not finding the object type and saying to expect errors.

* Use ao2_cleanup() instead of ao2_ref() to prevent the crash.

ASTERISK-25165
Reported by Corey Farrell

Change-Id: Ic3b64453ea3058cb68d5c26d97d4fe7b8eea2e97

features: Fix crash when transferee hangs up during DTMF attended transfer.

A crash happens with this sequence of steps:
1) Party A is connected to party B.
2) Party B starts a DTMF attended transfer.
3) Party A hangs up while party B is dialing party C.

When party A hangs up the bridge that party A and party B are in is
dissolved and party B is kicked out of the bridge.  When party B finishes
dialing party C he attempts to move to the new bridge with party C.  Since
party B is no longer in a bridge the attempted move dereferences a NULL
bridge_channel pointer and crashes.

* Made the hold(), unhold(), ringing(), and the bridge_move() functions
tolerant of the channel not being in a bridge.  The assertion that party B
is always in a bridge is not true if the bridged peer of party B hangs up
and dissolves the bridge.  Being tolerant of not being in a bridge allows
the peer hangup stimulus to be processed by the FSM.

* Made the bridge_move() function return void since where the return value
for a failed move was checked generated a FSM coding ERROR message for a
normal off-nominal condition.

* Eliminated most uses of RAII_VAR in bridge_basic.c.

ASTERISK-25003 #close
Reported by: Artem Volodin

Change-Id: Ie2c1b14e5e647d4ea6de300bf56d69805d7bcada
(cherry picked from commit be1260a35f88faea4fa029d59343b124d250a8a6)

Merge "Confbridge: Add a user timeout option" into certified/13.1

app_queue: (try_calling): mutex 'qe->chan' freed more times than we've locked!

commit aae45acbd (Mark Michelson 2015-04-15 10:38:02 -0500 6525)
refer ASTERISK-24958

above commit removed ast_channel_lock(qe->chan);
but failed to remove corresponding ast_channel_unlock(qe->chan);

ASTERISK-25561 #close
Reported Alec Davis

Change-Id: Ie05f4e2d08912606178bf1fded57cc022c7a2e1a

Confbridge: Add a user timeout option

This option adds the ability to specify a timeout, in seconds, for a
participant in a ConfBridge. When the user's timeout has been reached,
the user is ejected from the conference with the CONFBRIDGE_RESULT
channel variable set to "TIMEOUT".

The rationale for this change is that there have been times where we
have seen channels get "stuck" in ConfBridge because a network issue
results in a SIP BYE not being received by Asterisk. While these
channels can be hung up manually via CLI/AMI/ARI, adding some sort of
automatic cleanup of the channels is a nice feature to have.

ASTERISK-25549 #close
Reported by Mark Michelson

Change-Id: I2996b6c5e16a3dda27595f8352abad0bda9c2d98

Taskprocessors: Increase high-water mark

In practical tests, we have seen certain taskprocessors, specifically
Stasis subscription taskprocessors, cross the recently-added high-water
mark and emit a warning. This high-water mark warning is only intended
to be emitted when things have tanked on the system and things are
heading south quickly. In the practical tests, the Stasis taskprocessors
sometimes had a max depth of 180 tasks in them, and Asterisk wasn't in
any danger at all.

As such, this ups the high-water mark to 500 tasks instead. It also
redefines the SIP threadpool request denial number to be a multiple of
the taskprocessor high-water mark.

Change-Id: Ic8d3e9497452fecd768ac427bb6f58aa616eebce

res_pjsip distributor: Don't send 503 response to responses.

When the SIP threadpool is backed up with tasks, we send 503 responses
to ensure that we don't try to overload ourselves. The problem is that
we were not insuring that we were not trying to send a 503 to an
incoming SIP response.

This change makes it so that we only send the 503 on incoming requests.

Change-Id: Ie2b418d89c0e453cc6c2b5c7d543651c981e1404

Merge "Further fixes to improper usage of scheduler" into certified/13.1

Further fixes to improper usage of scheduler

When ASTERISK-25449 was closed, a number of scheduler issues mentioned in
the comments were missed. These have since beed raised in ASTERISK-25476
and elsewhere.

This patch attempts to collect all of the scheduler issues discovered so
far and address them sensibly.

ASTERISK-25476 #close

Change-Id: I87a77d581e2e0d91d33b4b2fbff80f64a566d05b
(cherry picked from commit 07583c288828a496cd7730b55112128fea31eaef)

res_pjsip: Deny requests when threadpool queue is backed up.

We have observed situations where the SIP threadpool may become
deadlocked. However, because incoming traffic is still arriving, the SIP
threadpool's queue can continue to grow, eventually running the system
out of memory.

This change makes it so that incoming traffic gets rejected with a 503
response if the queue is backed up too much.

Change-Id: I4e736d48a2ba79fd1f8056c0dcd330e38e6a3816

threadpool: Handle worker thread transitioning to dead when going active.

This change adds handling of dead worker threads when moving them
to be active. When this happens the worker thread is removed from
both the active and idle threads container. If no threads are able
to be moved to active then the pool grows as configured.

A unit test has also been added which thrashes the idle timeout
and thread activation to exploit any race conditions between the
two.

ASTERISK-25546 #close

Change-Id: I6c455f9a40de60d9e86458d447b548fb52ba1143

taskprocessor: Add high water mark warnings

If a taskprocessor's queue grows large, this can indicate that there
may be a problem with tasks not leaving the processor or else that
the number of available task processors for a given type of task is
too low. This patch makes it so that if a taskprocessor's task queue
grows above 100 queued tasks that it will emit a warning message.
Warning messages are emitted only once per task processor.

ASTERISK-25518 #close
Reported by: Jonathan Rose

Change-Id: Ib1607c35d18c1d6a0575b3f0e3ff5d932fd6600c

app_dial: Hold reference to calling channel formats when dialing outbound.

Currently when requesting a channel the native formats of the
calling channel are provided to the core for usage when dialing
the outbound channel. This occurs without holding the channel lock
or keeping a reference to the formats. This is problematic as
the channel driver may end up changing the formats during this time.
In the case of chan_sip this happens when an SDP negotiation
completes.

This change makes it so app_dial keeps a reference to the native
formats of the calling channel which guarantees that they will
remain valid for the period of time needed.

ASTERISK-25172 #close

Change-Id: I2f0a67bd0d5d14c3bdbaae552b4b1613a283f0db
(cherry picked from commit 3b2b004d699b8cc7b808f62536bb2bc4db8b4e0e)

main/dial: Protect access to the format_cap structure of the requesting channel

When a dial attempt is made that involves a requesting channel, we previously
were not:
a) Protecting access to the native format capabilities structure on the
   requesting channel. That is inherently unsafe.
b) Reference bumping the lifetime of the format capabilities structure.

In both cases, something else could sneak in, blow away the format
capabilities, and we'd be holding onto an invalid format_cap structure. When
the newly created channel attempts to construct its format capabilities, things
go poorly.

This patch:
a) Ensures that we get a reference to the native format capabilities while
   the requesting channel is locked
b) Holds a reference to the native format capabilities during the creation
   of the new channel.

ASTERISK-25522 #close

Change-Id: I0bfb7ba8b9711f4158cbeaae96edf9626e88a54f

res_pjsip: Set threadpool max size default to 50.

During a stress test of subscriptions, a huge blast of
subscription-related traffic resulted in the threadpool expanding to a
ridiculous number of threads. The balooning of threads resulted in an
increase of memory, which led to a crash due to being out of memory.

An easy fix for the particular test was to limit the size of the
threadpool, thus reining in the amount of memory that would be used. It
was decided that there really is no downside to having a non-infinite
default value for the maximum size of the threadpool, so this change
introduces 50 threads as the maximum threadpool size for the SIP
threadpool.

ASTERISK-25513 #close
Reported by John Bigelow

Change-Id: If0b9514f1d9b172540ce1a6e2f2ffa1f2b6119be

Merge "res_pjsip_pubsub: Prevent sending NOTIFY on destroyed dialog." into certified/13.1

Merge "res_pjsip_pubsub: Ensure dialog lock balance." into certified/13.1