Joshua Colp [Thu, 25 Aug 2016 12:06:41 +0000 (12:06 +0000)]
app_queue: Ensure member is removed from pending when hanging up.
When dialing channels it is possible that they may not ever
leave the not in use state (Local channels in particular) by
the time we cancel them. If this occurs but we know they were
dialed we explicitly remove them from the pending members
container so that subsequent call attempts occur.
app_queue: Only remove queue member from pending when state changes.
It is possible for a not in use state change to occur multiple
times causing a queue member to be removed from the pending call
container prematurely.
The first not in use state change will remove the queue member
from the container. At this moment the member may be called and
placed in the pending container. After this another not in use
state change can be received which will remove it from the
container. Despite being called at this point the code will
incorrectly see that there are no pending calls to it.
This change only removes it from the pending container if the
state has actually changed.
ASTERISK-26133 #close
patches:
app_queue.diff submitted by Richard Miller (license 5685)
George Joseph [Tue, 15 Nov 2016 18:01:04 +0000 (11:01 -0700)]
file.c/__ast_file_read_dirs: Fix issues on filesystems without d_type
One of the code paths in __ast_file_read_dirs will only get executed if
the OS doesn't support dirent->d_type OR if the filesystem the
particular file is on doesn't support it. So, while standard Linux
systems support the field, some filesystems like XFS do not. In this
case, we need to call stat() to determine whether the directory entry
is a file or directory so we append the filename to the supplied
directory path and call stat. We forgot to truncate path back to just
the directory afterwards though so we were passing a complete file name
to the callback in the dir_name parameter instead of just the directory
name.
The logic has been re-written to only create a full_path if we need to
call stat() or if we need to descend into another directory.
Kevin Harwell [Fri, 28 Oct 2016 20:11:35 +0000 (15:11 -0500)]
stasis_recording/stored: remove calls to deprecated readdir_r function.
The readdir_r function has been deprecated and should no longer be used. This
patch removes the readdir_r dependency (replaced it with readdir) and also moves
the directory search code to a more centralized spot (file.c)
Also removed a strict dependency on the dirent structure's d_type field as it
is not portable. The code now checks to see if the value is available. If so,
it tries to use it, but defaults back to using the stats function if necessary.
Lastly, for most implementations of readdir it *should* be thread-safe to make
concurrent calls to it as long as different directory streams are specified.
glibc falls into this category. However, since it is possible that there exist
some implementations that are not safe, locking has been added for those other
than glibc.
Martin Tomec [Fri, 9 Dec 2016 18:23:37 +0000 (19:23 +0100)]
app_queue: Ensure member is removed from pending when hanging up.
In some cases member is added to pending_members, and the channel
is hung up before any extension state change. So the member would
stay in pending_members forever. So when we call do_hang, we
should also remove member from pending.
HCOLON = *( LOWCTL / SP ) ":" SWS
LOWCTL = %x00-1F ; CTL without DEL
This discrepancy meant that SIP proxies in front of Asterisk with
chan_sip could pass on unknown headers with \x00-\x1F in them, which
would be treated by Asterisk as a different (known) header. For
example, the "To\x01:" header would gladly be forwarded by some proxies
as irrelevant, but chan_sip would treat it as the relevant "To:" header.
Those relying on a SIP proxy to scrub certain headers could mistakenly
get unexpected and unvalidated data fed to Asterisk.
This change fixes so chan_sip only considers SP/HTAB as valid tokens
before the colon, making it agree on the headers with other speakers of
SIP.
pjsip: Fix deadlock with suspend taskprocessor on masquerade
If both channels which should be masqueraded
are in the same serializer:
1st channel will be locked waiting condition 'complete'
2nd channel will be locked waiting condition 'suspended'
On heavy load system a chance that both channels will be in
the same serializer 'pjsip/distibutor' is very high.
To reproduce compile res_pjsip/pjsip_distributor.c with
DISTRIBUTOR_POOL_SIZE=1
Steps to reproduce:
1. Party A calls Party B (bridged call 'AB')
2. Party B places Party A on hold
3. Party B calls Voicemail app (non-bridged call 'BV')
4. Party B attended transfers Party A to voicemail using REFER.
5. When asterisk masquerades calls 'AB' and 'BV',
a deadlock is happened.
This patch adds a suspension indicator to the taskprocessor.
When a session suspends/unsuspends the serializer
it sets the indicator to the appropriate state.
The session checks the suspension indicator before
suspend the serializer.
Mark Michelson [Tue, 8 Nov 2016 16:48:32 +0000 (10:48 -0600)]
res_pjsip_session: Do not call session supplements when it's too late.
res_pjsip_sesssion was hooking into transaction and invite state
changes. One of the reasons for doing so was due to the
PJSIP_EVENT_TX_MSG event. The idea was that we were hooking into the
message sending process, and so we should call session supplements to
alter the outgoing message.
In reality, this event was meant to indicate that the message either
a) had already been sent, or
b) required a DNS lookup and would be sent when the DNS query
completed.
In case (a), this meant we were altering an already-sent
request/response for no reason. In case (b), this potentially meant we
could be trying to alter a request/response at the same time that the
DNS resolution completed. In this case, it meant we might be stomping on
memory being used by the thread actually sending the message. This
caused potential crashes and memory corruption.
This patch removes the calls to session supplements from the case where
the PJSIP_EVENT_TX_MSG event occurs. In all of these cases, trying to
alter the message at this point is too late, and it can cause nothing
but harm to try to do it. Because there were no longer any calls to the
handle_outgoing() function, it has been removed.
Joshua Colp [Wed, 2 Nov 2016 14:15:14 +0000 (14:15 +0000)]
app_dial: Fix incorrect device state when channel is picked up.
Given the scenario where multiple channels are dialed using Dial()
but the caller is picked up using PickupChan() all outgoing channels
except the channel specified to PickupChan() would be marked
as ringing until the call had been hung up.
When using the PickupChan application the channel executing the
application is swapped into place of another channel. As part
of this process the channel is answered. The Dial application
has explicit logic which checks if the channel is answered,
cancels all other outgoing channels, and bridges. This logic is
different than the normal logic that is executed when an outgoing
channel is answered. This different logic failed to publish dial
events stating that the other outgoing channels had been canceled.
As a result references to the outgoing channels were held onto by
the dial masquerade process until the call had been ended and
the channels had gone away. This would result in the channels
appearing in the "core show channels" list despite not being present
anymore and would also result in incorrect device state.
This change makes it so that this logic also publishes
dial events stating that the other outgoing channels have been
canceled.
Richard Mudgett [Wed, 12 Oct 2016 21:24:14 +0000 (16:24 -0500)]
Audit ast_json_pack() calls for needed UTF-8 checks.
Added needed UTF-8 checks before constructing json objects in various
files for strings obtained outside the system. In this case string values
from a channel driver's peer and not from the user setting channel
variables.
* aoc.c: Fixed type mismatch in s_to_json() for time and granularity json
object construction.
res_pjsip_pubsub: fixed a bug when pjsip_tx_data_dec_ref is called twice.
This patch removed call of pjsip_tx_data_dec_ref in send_notify
if send_request failed.
The pjsip_dlg_send_request deletes the message on error by itself.
It seems this patch fixes next issues:
ASTERISK-26199
ASTERISK-26166
ASTERISK-26174
George Joseph [Wed, 5 Oct 2016 19:53:10 +0000 (13:53 -0600)]
pjproject_bundled: Add MALLOC_DEBUG capability
pjproject_bundled will now use the asterisk memory debugging APIs
if MALLOC_DEBUG is turned on in menuselect.
Because this required stubs for the executable programs and the python
bindings, some Makefile reorganization was needed to properly handle
the dependencies. As a result, the makefile now individually makes
each of the pjproject libraries separately instead of making them all
in 1 shot. The only visible change is that there are separate status
lines printed for each library instead oif 1 for all libs. Also, the
making of the pjproject dependency files was eliminated. They're not
needed for building unless you're actively modifying pjproject source
files and it makes the build process faster. Finally, any issues with
parallel builds should be resolved again making the build faster.
NOTE: The certified/13.8 version of this patch also builds libresample
which is needed by pjsua. Later versions do not need libresample.
Richard Mudgett [Mon, 29 Aug 2016 23:08:22 +0000 (18:08 -0500)]
res_pjsip: Add ignore_uri_user_options option.
This implements the chan_sip legacy_useroption_parsing option but with a
better name.
* Made the caller-id number and redirecting number strings obtained from
incoming SIP URI user fields always truncated at the first semicolon.
People don't care about anything after the semicolon showing up on their
displays even though the RFC allows the semicolon.
Joshua Colp [Tue, 23 Aug 2016 11:35:11 +0000 (11:35 +0000)]
chan_sip: Don't allocate new RTP instances on top of old ones.
In some scenarios dialog_initialize_rtp can be called multiple times on
the same dialog. This can cause RTP instances to be leaked along with
multiple file descriptors for each instance.
This change makes it so the existing RTP instances are destroyed and
not overwritten, stopping the memory leak.
ASTERISK-26272 #close
patches:
ASTERISK-26272-13.patch submitted by Corey Farrell (license 5909)
Mark Michelson [Wed, 10 Aug 2016 20:14:09 +0000 (15:14 -0500)]
ConfBridge: Make some announcements asynchronous.
Confbridge announcements tend to block a channel while they are being
played. In some circumstances, this is warranted since you want that
particular channel not to hear the announcement (Example: "John Doe has
entered the conference"). For others it makes less sense.
This change first introduces methods for playing sounds asynchronously
into the conference. This is very similar to how synchronous sounds are
played, except the channel initiating the playback does not wait for the
sound to complete before moving on.
Asynchronous announcements are used for two circumstances:
* Sounds played for a user after they have left the bridge
* Sounds that play first to a single user and then the rest of the
conference (if the channel and conference use the same language)
Mark Michelson [Wed, 10 Aug 2016 20:14:09 +0000 (15:14 -0500)]
ConfBridge: Rework announcer channel methodology
NOTE: This patch was submitted earlier and reverted because of a failing
test. The test has been patched so that it adjusts for the changes here,
so this is being resubmitted for review.
One feature that confbridge has is the ability to play sounds to all
participants in the conference. Prior to this commit, the algorithm for
this was as follows:
* Grab the playback lock
* Push the conference announcer channel into the bridge
* Play back the sound
* Pull the conference announcer channel from the bridge
* Release the playback lock
The issue here is that the act of adding the playback channel to the
bridge and removing it for each announcement is expensive. Amongst the
expenses:
* The announcer channel is imparted into the bridge, meaning a new
thread is spun up for each playback.
* When the announcer is added or removed from the bridge, it results
in the BRIDGEPEER channel variable being set on all channels in the
bridge. This requires keeping the bridge locked and locking each
individual channel in order to set it.
* There's also just the general overhead of adding the channel and
removing it from the bridge. The bridge potentially has to reconfigure
every single time
With this commit, the paradigm for playing back announcements has
shifted.
* The announcer channel is now added to the bridge when the conference
is allocated, and it is hung up when the conference is destroyed.
* A taskprocessor is used to queue playbacks onto the announcer channel.
This keeps the behavior from before where playbacks do not overlap.
* The announcer channel is no longer placed into the bridge as
departable. Since we are not constantly removing the channel from
the bridge, it is safe to add the channel using an independent thread
and simply hang the channel up when it is time for the conference to
be destroyed.
The use of the taskprocessor for playbacks opens up the interesting
possibility of having asynchronous announcements played. In this commit,
however, the behavior is still exactly the same as it previously was.
Mark Michelson [Wed, 10 Aug 2016 20:14:09 +0000 (15:14 -0500)]
ConfBridge: Rework announcer channel methodology
One feature that confbridge has is the ability to play sounds to all
participants in the conference. Prior to this commit, the algorithm for
this was as follows:
* Grab the playback lock
* Push the conference announcer channel into the bridge
* Play back the sound
* Pull the conference announcer channel from the bridge
* Release the playback lock
The issue here is that the act of adding the playback channel to the
bridge and removing it for each announcement is expensive. Amongst the
expenses:
* The announcer channel is imparted into the bridge, meaning a new
thread is spun up for each playback.
* When the announcer is added or removed from the bridge, it results
in the BRIDGEPEER channel variable being set on all channels in the
bridge. This requires keeping the bridge locked and locking each
individual channel in order to set it.
* There's also just the general overhead of adding the channel and
removing it from the bridge. The bridge potentially has to reconfigure
every single time
With this commit, the paradigm for playing back announcements has
shifted.
* The announcer channel is now added to the bridge when the conference
is allocated, and it is hung up when the conference is destroyed.
* A taskprocessor is used to queue playbacks onto the announcer channel.
This keeps the behavior from before where playbacks do not overlap.
* The announcer channel is no longer placed into the bridge as
departable. Since we are not constantly removing the channel from
the bridge, it is safe to add the channel using an independent thread
and simply hang the channel up when it is time for the conference to
be destroyed.
The use of the taskprocessor for playbacks opens up the interesting
possibility of having asynchronous announcements played. In this commit,
however, the behavior is still exactly the same as it previously was.
Kevin Harwell [Tue, 9 Aug 2016 17:07:20 +0000 (12:07 -0500)]
alembic/sqlalchemy: auto increment only allowed on a single column
The extensions table defined two columns (id and priority) as primary key
autoincrement columns. However only one is allowed when defining the primary
key.
This patch removes the autoincrement attribute from the priority column since
it does not need to be as such and really should not have been on there in the
first place.
This patch also removes 'context', 'exten', and 'priority' from the primary key
index and creates a new combined unique contraint index on them.
Mark Michelson [Tue, 9 Aug 2016 21:19:34 +0000 (16:19 -0500)]
res_rtp_asterisk: Cache local RTCP address.
When an RTCP packet is sent or received, res_rtp_asterisk generates a
Stasis event that contains the RTCP report as well as the local and
remote addresses that the report pertains to.
The addresses are determined using ast_find_ourip(). For the local
address, this will typically result in a lookup of the hostname of the
server, and then a DNS lookup of that hostname. If you do not have the
host in /etc/hosts, then this results in a full DNS lookup, which can
potentially block for some time.
This is especially problematic when performing RTCP reads, since those
are done on the same thread responsible for reading and writing media.
This patch addresses the issue by performing a lookup of the local
address when RTCP is allocated. We then use this cached local address
for the Stasis events when necessary.
PJSIP: provide transport type with received messages
The receipt of a SIP MESSAGE may occur over any transport including TCP
and TLS. When the message is received, the original URI is added to the
message in the field PJSIP_RECVADDR, but this is insufficient to ensure
a reply message can reach the originating endpoint. This patch adds the
PJSIP_TRANSPORT field populated with the transport type.
Richard Mudgett [Fri, 22 Jul 2016 03:28:25 +0000 (22:28 -0500)]
dsp.c: Fix erroneous fax tone detection.
The Goertzel calculations get less accurate the lower the signal level
being worked with becomes because there is less resolution remaining.
If it is too low we can erroneously detect a tone where none really
exists. The searched for fax frequencies not only need to be so much
stronger than the background noise they must also be a minimum strength.
* Add needed minimum threshold test to tone_detect().
* Set TONE_THRESHOLD to allow low volume frequency spread detection.
ASTERISK-26237 #close
Reported by: Richard Mudgett
George Joseph [Thu, 21 Jul 2016 14:05:03 +0000 (08:05 -0600)]
chan_sip: Prevent deadlock when issuing "sip show channels"
sip_show_channels locks the dialogs container first then locks each
sip_pvt so it can spit out the details. The rest of sip dialog
processing locks the sip_pvt first then locks the dialogs container
if it needs to. Both lock in the order they need but deadlocks can
result. To fix, sip_show_channels and sip_show_channelstats have
been converted to use an iterator rather than ao2_callback. This way
the container is locked only while getting the next entry and is
unlocked when the callback is called.
Richard Mudgett [Tue, 12 Jul 2016 22:24:54 +0000 (17:24 -0500)]
res_fax: Fix FAXOPT(faxdetect) timeout option.
The fax detection timeout option did not work because basically the wrong
variable was checked in fax_detect_framehook(). As a result, the timer
would timeout immediately and disable fax detection.
* Fixed ignoring negative timeout values. We'd complain and then go right
on using the negative value.
* Fixed destroy_faxdetect() in the off-nominal case of an incomplete
object creation.
* Added more range checking to FAXOPT(gateway) timeout parameter.
ASTERISK-26214 #close
Reported by: Richard Mudgett
Richard Mudgett [Mon, 18 Jul 2016 21:16:56 +0000 (16:16 -0500)]
chan_dahdi: Add faxdetect_timeout option.
The new option allows the channel driver's faxdetect option to timeout on
a call after the specified number of seconds into a call. The new feature
is disabled if the timeout is set to zero. The option is disabled by
default.
* Don't clear dsp_features after passing them to the dsp code in
my_pri_ss7_open_media(). We should still remember them especially for the
new faxdetect_timeout option.
The new endpoint option allows the PJSIP channel driver's fax_detect
endpoint option to timeout on a call after the specified number of
seconds into a call. The new feature is disabled if the timeout is set
to zero. The option is disabled by default.
chan_sip/res_pjsip_t38: Handle a request to negotiate T.38 after it is enabled.
Some T.38 implementations may send another re-invite after the initial
one which adds additional negotiation details (such as the max bitrate).
Currently this will fail when passthrough is being done in chan_sip as we
do nothing if T.38 is already active.
Other handlers of T.38 inside of Asterisk (such as res_fax) handle this
scenario so this change adds support for it to chan_sip and res_pjsip_t38.
If a request to negotiate is received while T.38 is already enabled a
new re-INVITE is sent and negotiation is done again.
George Joseph [Thu, 9 Jun 2016 14:20:33 +0000 (08:20 -0600)]
build: Fix ast_sockaddr initialization to be more portable
A change to glibc 2.22 changed the order of the sockadddr_storage
members which caused the places where we do an initialization of
ast_sockaddr with '{ { 0, 0, } }' to fail compilation. Those
initializers (which we shouldn't have been using anyway) have been
replaced with memsets.
George Joseph [Sun, 12 Jun 2016 16:19:27 +0000 (10:19 -0600)]
res_pjsip_pubsub: Address SEGV when attempting to terminate a subscription
Occasionally under load we'll attempt to send a final NOTIFY on a
subscription that's already been terminated and a SEGV will occur
down in pjproject's evsub_destroy function. This is a result of a
race condition between all the paths that can generate a notify
and/or destroy the underlying pjproject evsub object:
* The client can send a SUBSCRIBE with Expires: 0.
* The client can send a SUBSCRIBE/refresh.
* The subscription timer can expire.
* An extension state can change.
* An MWI event can be generated.
* The pjproject transaction timer (timer_b) can expire.
Normally when our pubsub_on_evsub_state is called with a terminate,
we push a task to the serializer and return at which point the dialog
is unlocked. This is usually not a problem because the task runs
immediately and locks the dialog again. When the system is heavily
loaded though, there may be a delay between the unlock and relock
during which another event may occur such as the subscription timer
or timer_b expiring, an extension state change, etc. These may also
cause a terminate to be processed and if so, we could cause pjproject
to try to destroy the evsub structure twice. There's no way for us to
tell that the evsub was already destroyed and the evsub's group lock
can't tolerate this and SEGVs.
The remedy is twofold.
* A patch has been submitted to Teluu and added to the bundled
pjproject which adds add/decrement operations on evsub's group lock.
* In res_pjsip_pubsub:
* configure.ac and pjproject-bundled's configure.m4 were updated
to check for the new evsub group lock APIs.
* We now add a reference to the evsub group lock when we create
the subscription and remove the reference when we clean up the
subscription. This prevents evsub from being destroyed before
we're done with it.
* A state has been added to the subscription tree structure so
termination progress can be tracked through the asyncronous tasks.
* The pubsub_on_evsub_state callback has been split so it's not doing
double duty. It now only handles the final cleanup of the
subscription tree. pubsub_on_rx_refresh now handles both client
refreshes and client terminates. It was always being called for
both anyway.
* The serialized_on_server_timeout task was removed since
serialized_pubsub_on_rx_refresh was almost identical.
* Missing state checks and ao2_cleanups were added.
* Some debug levels were adjusted to make seeing only off-nominal
things at level 1 and nominal or progress things at level 2+.
ASTERISK-26099 #close Reported-by: Ross Beer.
Change-Id: I779d11802cf672a51392e62a74a1216596075ba1
This patch fixes a race condition processing received REGISTER requests
and their retransmissions caused by REGISTER requests being processed by
two threads. The "sip_transaction Unable to register REGISTER transaction
(key exists)" message is a notable symptom of this issue.
This issue was more likely to happen before the pjsip/distributor
serializers were created. Instead of steps one and two below placing the
REGISTER messages into the same pjsip/distributor they were placed in
random pjsip/default serializers.
1) REGISTER requests come in and get placed on the pjsip/distributor
serializer.
2) Before the first request is processed a retransmission comes in and is
placed on the same pjsip/distributor serializer.
3) The first request goes up the pjsip stack and is then shunted off to
the pjsip/aor/<aor> serializer.
4) Before the first request is completed processing in the pjsip/aor/<aor>
serializer, the second request goes up the pjsip stack and is also shunted
off to the pjsip/aor/<aor> serializer.
5) The first request completes processing and sends out its response.
6) The second request completes processing and tries to send out its
response but pjlib complains that the REGISTER transaction key already
exists.
7) Sadness ensues.
* The race is eliminated by removing the pjsip/aor/<aor> serializer and
continuing the processing in the pjsip/distributor serializer. Now any
retransmissions queued in the pjsip/distributor serializer will be
processed after the first message is completely processed.
ASTERISK-26088 #close
Reported by: Richard Mudgett
Stasis subscriptions and message routers create taskprocessors to process
the event messages. API calls are needed to be able to set the congestion
levels of these taskprocessors for selected subscriptions and message
routers.
* Updated CDR, CEL, and manager's stasis subscription congestion levels
based upon stress testing. Increased the congestion levels to reduce the
potential for bursty call setup/teardown activity from triggering the
taskprocessor overload alert. CDRs in particular need an extra high
congestion level because they can take awhile to process the stasis
messages.
Richard Mudgett [Thu, 2 Jun 2016 23:19:13 +0000 (18:19 -0500)]
sorcery: Add setting object type congestion levels.
Sorcery creates taskprocessors for object types to process object observer
callbacks. An API call is needed to be able to set the congestion levels
of these taskprocessors for selected object types.
* Updated PJSIP's contact and contact_status sorcery object type observer
default congestion levels based upon stress testing. Increased the
congestion levels to reduce the potential for bursty register/unregister
and subscribe/unsubscribe activity from triggering the taskprocessor
overload alert.
Richard Mudgett [Thu, 2 Jun 2016 21:08:19 +0000 (16:08 -0500)]
taskprocessors: Implement high/low water mark alerts.
When taskprocessors get backed up, there is a good chance that we are
being overloaded and need to defer adding new work to the system.
* Implemented a high/low water alert mechanism for modules to check if the
system is being overloaded and take appropriate action. When a
taskprocessor is created it has default congestion levels set. A
taskprocessor can later have those congestion levels altered for specific
needs if stress testing shows that the taskprocessor is a symptom of
overloading or needs to handle bursty activity without triggering an
overload alert.
* Add CLI "core show taskprocessor" low/high water columns.
* Fixed __allocate_taskprocessor() to not use RAII_VAR(). RAII_VAR() was
never a good thing to use when creating a taskprocessor because of the
nature of how its references needed to be cleaned up on a partial
creation.
* Made res_pjsip's distributor check if the taskprocessor overload alert
is active before placing a message representing brand new work onto a
distributor serializer.
Richard Mudgett [Fri, 27 May 2016 22:31:52 +0000 (17:31 -0500)]
res_pjsip_session: Use distributor serializer for incoming calls.
We must continue using the serializer that the original INVITE came in on
for the dialog. There may be retransmissions already enqueued in the
original serializer that can result in reentrancy and message sequencing
problems.
Outgoing call legs create the pjsip/outsess/<endpoint> serializers for
their dialogs.
Richard Mudgett [Fri, 27 May 2016 17:50:14 +0000 (12:50 -0500)]
res_pjsip_pubsub.c: Use distributor serializer for incoming subscriptions.
We must continue using the serializer that the original SUBSCRIBE came in
on for the dialog. There may be retransmissions already enqueued in the
original serializer that can result in reentrancy and message sequencing
problems. The "sip_transaction Unable to register SUBSCRIBE transaction
(key exists)" message is a notable symptom of this issue.
Outgoing subscriptions still create the pjsip/pubsub/<endpoint>
serializers for their dialogs.
Richard Mudgett [Thu, 26 May 2016 22:35:04 +0000 (17:35 -0500)]
pjsip_distributor.c: Consistently pick a serializer for messages.
Incoming messages that are not part of a dialog or a recognized response
to one of our requests need to be sent to a consistent serializer. Under
load we may be queueing retransmissions before we can process the original
message. We don't need to throw these messages onto random serializers
and cause reentrancy and message sequencing problems.
* Created a pool of pjsip/distributor serializers that get picked by
hashing the call-id and remote tag strings of the received messages.
* Made ast_sip_destroy_distributor() destroy items in the reverse order of
creation.
George Joseph [Fri, 1 Apr 2016 18:30:56 +0000 (12:30 -0600)]
res_pjsip contact: Lock expiration/addition of contacts
Contact expiration can occur in several places: res_pjsip_registrar,
res_pjsip_registrar_expire, and automatically when anyone calls
ast_sip_location_retrieve_aor_contact. At the same time, res_pjsip_registrar
may also be attempting to renew or add a contact. Since none of this was locked
it was possible for one thread to be renewing a contact and another thread to
expire it immediately because it was working off of stale data. This was the
casue of intermittent registration/inbound/nominal/multiple_contacts test
failures.
Now, the new named lock functionality is used to lock the aor during contact
expire and add operations and res_pjsip_registrar_expire now checks the
expiration with the lock held before deleting the contact.
George Joseph [Fri, 1 Apr 2016 01:04:29 +0000 (19:04 -0600)]
lock: Add named lock capability
Locking some objects like sorcery objects can be tricky because the underlying
ao2 object may not be the same for all callers. For instance, two threads that
call ast_sorcery_retrieve_by_id on the same aor name might actually get 2
different ao2 objects if the underlying wizard had to rehydrate the aor from a
database. Locking one ao2 object doesn't have any effect on the other even if
those objects had locks in the first place.
Named locks allow access control by keyspace and key strings. Now an "aor"
named "1000" can be locked and any other thread attempting to lock "aor" "1000"
will wait regardless of whether the underlying ao2 object is the same or not.
Mutex and rwlocks are supported.
This capability will initially be used to lock an aor when multiple threads may
be attempting to prune expired contacts from it.
Joshua Colp [Thu, 2 Jun 2016 17:04:45 +0000 (14:04 -0300)]
res_odbc: Implement a connection pool.
Testing has shown that our usage of UnixODBC is problematic
due to bugs within UnixODBC itself as well as the heavy weight
cost of connecting and disconnecting database connections, even
when pooling is enabled.
For users of UnixODBC 2.3.1 and earlier crashes would occur due
to insufficient protection of the disconnect operation. This was
fixed in UnixODBC 2.3.2 and above.
For users of UnixODBC 2.3.3 and higher a slow-down would occur
under heavy database use due to repeated connection establishment.
A regression is present where on each connection the database
configuration is cached again, with the cache growing out of
control.
The connection pool implementation present in this change helps
to mitigate these issues by reducing how much we connect and
disconnect database connections. We also solve the issue of
crashes under UnixODBC 2.3.1 by defaulting the maximum number of
connections to 1, returning us to the previous working behavior.
For users who may have a fixed version the maximum concurrent
connection limit can be increased helping with performance.
The connection pool works by keeping a list of active connections.
If the connection limit has not been reached a new connection is
established. If the connection limit has been reached then the
request waits until a connection becomes available before
continuing.
George Joseph [Mon, 30 May 2016 15:58:35 +0000 (09:58 -0600)]
pjproject_bundled: Move to pjproject 2.5
Although all the patches we had against 2.4.5 were applied by Teluu,
a new bug was introduced preventing re-use of tcp and tls transports
This patch removes all the previous patches against 2.4.5, updates
the version to 2.5, and adds a new patch to correct the transport
re-use problem.
* Local fax starts rtp call to remote fax
* Remote fax starts t38 call back to local fax.
* Local fax sends t38 no-signal to Asterisk before sending an OK.
* udptl processes the frame and increments the expected sequence number.
* chan_sip drops the frame because the call isn't up so nothing goes out
the external interface to open the port for incoming packets.
* Local fax sends OK and Asterisk sends OK to the remote fax.
* Remote fax sends t38 packets which are dropped by the firewall.
* Local fax re-sends t38 no-signal with the same sequence number.
* udptl drops the frame because it thinks it's a dup.
* Still no outgoing packets to open the firewall.
* t38 negotiation fails.
The patch drops frames t38 received before udptl sequence processing
when the call hasn't been answered yet. The second no-signal frame
is then seen as new and is relayed out the external interface which
opens the port and allows negotiation to continue.
George Joseph [Tue, 17 May 2016 16:14:51 +0000 (10:14 -0600)]
chan_sip: Prevent extra Session-Expires headers from being added
When chan_sip does a re-INVITE to refresh a session and authentication
is required, the INVITE with the Authorization header containes a
second Session-Expires header without the ";refersher=" parameter.
This is causing some proxies to return a 400. Also, when Asterisk is
the uas and the refresher, it is including the Session-Expires and
Min-SE headers in OPTIONS messages which is not allowed per RFC4028.
This patch (based on the reporter's) Checks to see if a Session-Expires
header is already in the message before adding another one. It also
checks that the method is INVITE or UPDATE.
George Joseph [Sat, 7 May 2016 19:39:25 +0000 (13:39 -0600)]
config_transport: Tell pjproject to allow all SSL/TLS protocols
The default tls settings for pjproject only allow TLS 1, TLS 1.1 and TLS 1.2.
SSL is not allowed. So, even if you specify "sslv3" for a transport method,
it's silently ignored and one of the TLS protocols is used. This was a new
behavior of pjsip_tls_setting_default() in 2.4 (when tls.proto was added) that
we never caught.
Now we need to set tls.proto = 0 after we call pjsip_tls_setting_default().
This tells pjproject to set the socket protocol to match the method.
Kevin Harwell [Thu, 5 May 2016 16:37:37 +0000 (11:37 -0500)]
res_pjsip_authenticator_digest: Don't use source port in nonce verification
From the issue reporter:
"res_pjsip_outbound_authenticator_digest builds a nonce that is a hash of
the timestamp, the source address, the source port, a server UUID that is
calculated at startup, and the authentication realm.
Rather than caching nonces that we create, we instead attempt to re-calculate
the nonce when receiving an incoming request with authentication. We then
compare the re-calculated nonce to the incoming nonce, and if they don't match,
then authentication has failed early.
The problem is that it is possible, especially when using TCP, to receive two
requests from the same endpoint but have differing source ports for those
requests. Asterisk itself commonly will use different source ports for
outbound TCP requests."
This patch removes the source port dependency when building the nonce.
Joshua Colp [Thu, 5 May 2016 10:07:50 +0000 (07:07 -0300)]
file: Ensure nativeformats remains valid for lifetime of use.
It is possible for the nativeformats of a channel to change
throughout its lifetime. As a result a user of it needs to either
ensure the channel is locked when accessing the formats or keep
a reference to the nativeformats themselves.
This change fixes the file playback support so it keeps a
reference to the nativeformats when accessing things.