George Joseph [Sat, 23 Jan 2016 21:50:57 +0000 (14:50 -0700)]
loader: Retry dlopen when loading fails
Although we use the RTLD_LAZY flag when calling dlopen
the first time on a module, this only defers resolution
for function calls. Pointer references to functions are
determined at link time so dlopen expects them to be there.
Since we don't cross-module link, pointers to functions
in other modules won't be available and dlopen will fail.
Doing a "hardened" build also causes problems because it
typically sets "-z now" on the ld command line which
overrides RTLD_LAZY at run time.
If the failing module isn't a GLOBAL_SYMBOLS module, then
dlopen will be called again after all the GLOBAL_SYMBOLS
modules have been loaded and they'll eventually resolve.
If the calling module IS a GLOBAL_SYMBOLS module itself
and a third module depends on it, then there's an issue
because the second time through the dlopen loop,
GLOBAL_SYMBOLS modules aren't given any special treatment
and since the order in which dlopen is called isn't
deterministic, the dependent may again be tried before the
module it needs is loaded.
Simple solution: Save modules that fail load_resource
because of a dlopen error in a list and retry them
immediately after the first pass. Keep retrying until
the failed list is empty or we reach a #defined max
retries. Error messages are suppressed until the final
pass which also gets rid of those confusing error messages
about module failures that are later corrected.
Kevin Harwell [Tue, 1 Mar 2016 22:18:21 +0000 (16:18 -0600)]
bridge.c: Crash during attended transfer when missing a local channel half
It's possible for the transferer channel to get hung up early during the
attended transfer process. For instance, a phone may send a "bye" immediately
upon receiving a sip notify that contains a sip frag 100 (I'm looking at you
Jitsi). When this occurs a race begins between the transferer being hung up
and completion of the transfer code.
If the channel hangs up too early during a transfer involving stasis bridging
for instance, then when the created local channel goes to look up its swap
channel (and associated datastore) it can't find it (since it is no longer in
the bridge) thus it fails to enter the stasis application. Consequently, the
created local channel(s) hang up as well. If the timing is just right then the
bridging code attempts to add the message link with missing local channel(s).
Hence the crash.
Unfortunately, there is no great way to solve the problem of the unexpected
"bye". While we can't guarantee we won't receive an early hangup, and in this
case still fail to enter the stasis application, we can make it so asterisk
does not crash.
This patch does just that by locking the local channel structure, checking
that the local channel's peer has not been lost, and then continuing. This
keeps the local channel's peer from being ripped out from underneath it by
the local/unreal hangup code while attempting to set the stasis message link.
Joshua Colp [Thu, 3 Mar 2016 14:26:10 +0000 (10:26 -0400)]
res_pjsip_dtmf_info: NULL terminate the message body.
PJSIP does not ensure that when printing the message body the
buffer will be NULL terminated. This is problematic when searching
for the signal and duration values of the DTMF.
This change ensures the buffer is always NULL terminated.
George Joseph [Wed, 2 Mar 2016 02:03:04 +0000 (19:03 -0700)]
alembic: Fix downgrade and tweak for sqlite
Downgrade had a few issues. First there was an errant 'update' statement in
add_auto_dtmf_mode that looks like it was a copy/paste error. Second, we
weren't cleaning up the ENUMs so subsequent upgrades on postgres failed
because the types already existed.
For sqlite... sqlite doesn't support ALTER or DROP COLUMN directly.
Fortunately alembic batch_operations takes care of this for us if we
use it so the alter and drops were converted to use batch operations.
Here's an example downgrade:
with op.batch_alter_table('ps_endpoints') as batch_op:
batch_op.drop_column('tos_audio')
batch_op.drop_column('tos_video')
batch_op.add_column(sa.Column('tos_audio', yesno_values))
batch_op.add_column(sa.Column('tos_video', yesno_values))
batch_op.drop_column('cos_audio')
batch_op.drop_column('cos_video')
batch_op.add_column(sa.Column('cos_audio', yesno_values))
batch_op.add_column(sa.Column('cos_video', yesno_values))
with op.batch_alter_table('ps_transports') as batch_op:
batch_op.drop_column('tos')
batch_op.add_column(sa.Column('tos', yesno_values))
# Can't cast integers to YESNO_VALUES, so dropping and adding is required
batch_op.drop_column('cos')
batch_op.add_column(sa.Column('cos', yesno_values))
Upgrades from base to head and downgrades from head to base were tested
repeatedly for postgresql, mysql/mariadb, and sqlite3.
George Joseph [Wed, 2 Mar 2016 21:55:48 +0000 (14:55 -0700)]
config_transport: Fix objects returned by ast_sip_get_transport_states
ast_sip_get_transport_states was returning a container of internal_state
objects instead of ast_sip_transport_state objects. This was causing
transport lookups to fail, most noticably in res_pjsip_nat, which
couldn't find the correct external addresses. This was causing contacts
to go out with internal ip addresses.
ASTERISK-25830 #close Reported-by: Sean Bright
Change-Id: I1aee6a2fd46c42e8dd0af72498d17de459ac750e
Richard Mudgett [Sat, 27 Feb 2016 00:57:17 +0000 (18:57 -0600)]
SIP diversion: Fix REDIRECTING(reason) value inconsistencies.
Previous chan_sip behavior:
Before this patch chan_sip would always strip any quotes from an incoming
reason and pass that value up as the REDIRECTING(reason). For an outgoing
reason value, chan_sip would check the value against known values and
quote any it didn't recognize. Incoming 480 response message reason text
was just assigned to the REDIRECTING(reason).
Previous chan_pjsip behavior:
Before this patch chan_pjsip would always pass the incoming reason value
up as the REDIRECTING(reason). For an outgoing reason value, chan_pjsip
would send the reason value as passed down.
With this patch:
Both channel drivers match incoming reason values with values documented
by REDIRECTING(reason) and values documented by RFC5806 regardless of
whether they are quoted or not. RFC5806 values are mapped to the
equivalent REDIRECTING(reason) documented value and is set in
REDIRECTING(reason). e.g., an incoming RFC5806 'unconditional' value or a
quoted string version ('"unconditional"') is converted to
REDIRECTING(reason)'s 'cfu' value. The user's dialplan only needs to deal
with 'cfu' instead of any of the aliases.
The incoming 480 response reason text supported by chan_sip checks for
known reason values and if not matched then puts quotes around the reason
string and assigns that to REDIRECTING(reason).
Both channel drivers send outgoing known REDIRECTING(reason) values as the
unquoted RFC5806 equivalent. User custom values are either sent as is or
with added quotes if SIP doesn't allow a character within the value as
part of a RFC3261 Section 25.1 token. Note that there are still
limitations on what characters can be put in a custom user value. e.g.,
embedding quotes in the middle of the reason string is silly and just
going to cause you grief.
* Setting a REDIRECTING(reason) value now recognizes RFC5806 aliases.
e.g., Setting REDIRECTING(reason) to 'unconditional' is converted to the
'cfu' value.
* Added missing malloc() NULL return check in res_pjsip_diversion.c
set_redirecting_reason().
* Fixed potential read from a stale pointer in res_pjsip_diversion.c
add_diversion_header(). The reason string needed to be copied into the
tdata memory pool to ensure that the string would always be available.
Otherwise, if the reason string returned by reason_code_to_str() was a
user's reason string then the string could be freed later by another
thread.
* Fix double unref of other_party channel in off nominal path.
* This is unlikely to be a real problem. However, for safety,
in handle_incoming_request() keep the datastore ref with the
other_party channel ref until we are finished with the other_party
channel.
From CHANGES:
* To help insure that Asterisk is compiled and run with the same known
version of pjproject, a new option (--with-pjproject-bundled) has been
added to ./configure. When specified, the version of pjproject specified
in third-party/versions.mak will be downloaded and configured. When you
make Asterisk, the build process will also automatically build pjproject
and Asterisk will be statically linked to it. Once a particular version
of pjproject is configured and built, it won't be configured or built
again unless you run a 'make distclean'.
To facilitate testing, when 'make install' is run, the pjsua and pjsystest
utilities and the pjproject python bindings will be installed in
ASTDATADIR/third-party/pjproject.
The default behavior remains building with the shared pjproject
installation, if any.
Building:
All you have to do is include the --with-pjproject-bundled option on
the ./configure command line (and remove any existing --with-pjproject
option if specified). Everything else is automatic.
Behind the scenes:
The top-level Makefile was modified to include 'third-party' in the
list of MOD_SUBDIRS.
The third-party directory was created to contain any third party
packages that may be needed in the future. Its Makefile automatically
iterates over any subdirectories passing on targets.
The third-party/pjproject directory was created to house the pjproject
source distribution. Its Makefile contains targets to download, patch
configure, generate dependencies, compile libs, apps and python bindings,
sanitized build.mak and generate a symbols list.
When bootstrap.sh is run, it automatically includes the configure.m4
file in third-party/pjproject. This file has a macro to download and
conifgure pjproject and get and set PJPROJECT_INCLUDE, PJPROJECT_DIR
and PJPROJECT_BUNDLED. It also tests for the capabilities like
PJ_TRANSACTION_GRP_LOCK by parsing preprocessor output as opposed to
trying to compile. Of course, bootstrap.sh is only run once and the
configure file is incldued in the patch.
When configure is run with the new options, the macro in configure.m4
triggers the download, patch, conifgure and tests. No compilation is
performed at this time. The downloaded tarball is cached in /tmp so
it doesn't get downloaded again on a distclean.
When make is run in the top-level Asterisk source directory, it will
automatically descend all the subdirectories in third_party just as it
does for addons, apps, etc. The top-level Makefile makes sure that
the 'third-party' is built before 'main' so that dependencies from the
other directories are built first.
When main does build, a new shared library (libasteriskpj) is created that
links statically to the pjproject .a files and exports all their symbols.
The asterisk binary links to that, just as it does with libasteriskssl.
When Asterisk is installed, the pjsua and pjsystest apps, and the pjproject
python bindings are installed in ASTDATADIR/third-party/pjproject. This
will facilitate testing, including running the testsuite which will be
updated to check that directory for the pjsua module ahead of the system
python library.
Modules should continue to depend on pjproject if they use pjproject APIs
directly. They should not care about the implementation. No changes to any
res_pjsip modules were made.
Richard Mudgett [Mon, 22 Feb 2016 22:59:40 +0000 (16:59 -0600)]
chan_sip.c: Fix T.38 issues caused by leaving a bridge.
chan_sip could not handle AST_T38_TERMINATED frames being sent to it when
the channel left the bridge. The action resulted in overlapping outgoing
reINVITEs. The testsuite tests/fax/sip/directmedia_reinvite_t38 was not
happy.
* Force T.38 to be remembered as locally bridged. Now when the channel
leaves the native RTP bridge after T.38, the channel remembers that it has
already reINVITEed the media back to Asterisk. It just needs to terminate
T.38 when the AST_T38_TERMINATED arrives.
* Prevent redundant AST_T38_TERMINATED from causing problems. Redundant
AST_T38_TERMINATED frames could cause overlapping outgoing reINVITEs if
they happen before the T.38 state changes to disabled. Now the T.38 state
is set to disabled before the reINVITE is sent.
Richard Mudgett [Fri, 19 Feb 2016 00:27:02 +0000 (18:27 -0600)]
res_pjsip_t38.c: Back out part of an earlier fix attempt.
This backs out item 4 of the 4875e5ac32f5ccad51add6a4216947bfb385245d
commit. Item 4 added the t38_bye_supplement. Unfortunately, the frame
that it puts into the bridge may or may not be processed by the time the
bridged peer is kicked out of the bridge. If it is processed then all is
well. However, if it is not processed then that channel is stuck in fax
mode until it hangs up or maybe if it joins another bridge for T.38
faxing.
Richard Mudgett [Sat, 20 Feb 2016 01:06:14 +0000 (19:06 -0600)]
bridge_channel: Don't settle owed events on an optimization.
Local channel optimization could cause DTMF digits to be duplicated.
Pending DTMF end events would be posted to a bridge when the local channel
optimizes out and is replaced by the channel further down the chain. When
the real digit ends, the channel would get another DTMF end posted to the
bridge.
A -- LocalA;1/n -- LocalA;2/n -- LocalB;1 -- LocalB;2 -- B
1) LocalA has the /n flag to prevent optimization.
2) B is sending DTMF to A through the local channel chain.
3) When LocalB optimizes out it can move B to the position of LocalB;1
4) Without this patch, when B swaps with LocalB;1 then LocalB;1 would
settle an owed DTMF end to the bridge toward LocalA;2.
5) When B finally ends its DTMF it sends the DTMF end down the chain.
6) Without this patch, A would hear the DTMF digit end when LocalB
optimizes out and when B ends the original digit.
Richard Mudgett [Mon, 22 Feb 2016 18:15:34 +0000 (12:15 -0600)]
channel.c: Route all control frames to a channel through the same code.
Frame hooks can conceivably return a control frame in exchange for an
audio frame inside ast_write(). Those returned control frames were not
handled quite the same as if they were sent to ast_indicate(). Now it
doesn't matter if you use ast_write() to send an AST_FRAME_CONTROL to a
channel or ast_indicate().
George Joseph [Thu, 25 Feb 2016 21:13:19 +0000 (14:13 -0700)]
sorcery: Refactor create, update and delete to better deal with caches
The ast_sorcery_create, update and delete function have been refactored
to better deal with caches and errors.
The action is now called on all non-caching wizards first. If ANY succeed,
the action is called on all caching wizards and the observers are notified.
This way we don't put something in the cache (or update or delete) before
knowing the action was performed in at least 1 backend and we only call the
observers once even if there were multiple writable backends.
ast_sorcery_create was never adding to caches in the first place which
was preventing contacts from getting added to a memory_cache when they
were created. In turn this was causing memory_cache to emit errors if
the contact was deleted before being retrieved (which would have
populated the cache).
ASTERISK-25811 #close Reported-by: Ross Beer
Change-Id: Id5596ce691685a79886e57b0865888458d6e7b46
George Joseph [Thu, 25 Feb 2016 21:39:54 +0000 (14:39 -0700)]
res_pjsip_mwi: Turn some NOTICEs and WARNINGs into debug 1s.
There are a few cases where we're emitting notices or warnings
for things that really need neither, like a client retrying to subscribe
to mwi when they're not conifgured for it. They get a 404 so there's no
need for non-debug messages.
The 'reload' mechanism actually involves closing the underlying
socket and calling the appropriate udp, tcp or tls start functions
again. Only outbound_registration, pubsub and session needed work
to reset the transport before sending requests to insure that the
pjsip transport didn't get pulled out from under them.
In my testing, no calls were dropped when a transport was changed
for any of the 3 transport types even if ip addresses or ports were
changed. To be on the safe side however, a new transport option was
added (allow_reload) which defaults to 'no'. Unless it's explicitly
set to 'yes' for a transport, changes to that transport will be ignored
on a reload of res_pjsip. This should preserve the current behavior.
George Joseph [Sun, 7 Feb 2016 23:34:20 +0000 (16:34 -0700)]
res_pjproject: Add ability to map pjproject log levels to Asterisk log levels
Warnings and errors in the pjproject libraries are generally handled by
Asterisk. In many cases, Asterisk wouldn't even consider them to be warnings
or errors so the messages emitted by pjproject directly are either superfluous
or misleading. A good exampe of this are the level-0 errors pjproject emits
when it can't open a TCP/TLS socket to a client to send an OPTIONS. We don't
consider a failure to qualify a UDP client an "ERROR", why should a TCP/TLS
client be treated any differently?
A config file for res_pjproject has bene added (pjproject.conf) and a new
log_mappings object allows mapping pjproject levels to Asterisk levels
(or nothing). The defaults if no pjproject.conf file is found are the same
as those that were hard-coded into res_pjproject initially: 0,1 = LOG_ERROR,
2 = LOG_WARNING, 3,4,5 = LOG_DEBUG<level>
When Asterisk receives a 412 (Conditional Request Failed) response
it has to recreate publish session.
There is bug in res_pjsip_outbound_publish.c
The function sip_outbound_publish_client_alloc is called with wrong object
while processing 412 (Conditional Request Failed) response.
This patch fixes it.
Mark Michelson [Thu, 18 Feb 2016 17:15:22 +0000 (11:15 -0600)]
Fix failing threadpool_auto_increment test.
The threadpool_auto_increment test fails infrequently for a couple of
reasons
* The threadpool listener was notified of fewer tasks being pushed than
were actually pushed
* The "was_empty" flag was set to an unexpected value.
The problem is that the test pushes three tasks into the threadpool.
Test expects the threadpool to essentially gather those three tasks, and
then distribute those to the threadpool threads. It also expects that as
the tasks are pushed in, the threadpool listener is alerted immediately
that the tasks have been pushed. In reality, a task can be distributed
to the threadpool threads quicker than expected, meaning that the
threadpool has already emptied by the time each subsequent task is
pushed. In addition, the internal threadpool queue can be delayed so
that the threadpool listener is not alerted that a task has been pushed
even after the task has been executed.
From the test's point of view, there's no way to be able to predict
exactly the order that task execution/listener notifications will occur,
and there is no way to know which listener notifications will indicate
that the threadpool was previously empty.
For this reason, the test has been updated to only check the things it
can check. It ensures that all tasks get executed, that the threads go
idle after the tasks are executed, and that the listener is told the
proper number of tasks that were pushed.
Richard Mudgett [Wed, 17 Feb 2016 19:30:06 +0000 (13:30 -0600)]
cel.c: Fix mismatch in ast_cel_track_event() return type.
The return type of ast_cel_track_event() is not large enough to return all
64 potential bits of the event enable mask. Fortunately, the defined CEL
events do not really need all 64 bits and the return value is only used to
determine if the requested CEL event is enabled.
* Made the ast_cel_track_event() return 0 or 1 only so the return value
can fit inside an int type instead of zero or a truncated 64 bit non-zero
value.
George Joseph [Tue, 16 Feb 2016 03:31:38 +0000 (20:31 -0700)]
res_pjsip_config_wizard: Add command to export primitive objects
A new command (pjsip export config_wizard primitives) has been added that
will export all the pjsip objects it created to the console or a file
suitable for reuse in a pjsip.conf file.
ASTERISK-24919 #close Reported-by: Ray Crumrine
Change-Id: Ica2a5f494244b4f8345b0437b16d06aa0484452b
George Joseph [Mon, 15 Feb 2016 21:37:30 +0000 (14:37 -0700)]
res_pjsip_caller_id: Fix segfault when replacing rpid or pai header
If the PJSIP_HEADER dialplan function adds a PAI or RPID header and send_rpid
or send_pai is set, res_pjsip_caller_id attemps to retrieve, parse and modify
the header added by the dialplan function. Since the header added by the
dialplan function is generic string, there are no virtual functions to parse
the uri and we get a segfault when we try. Since the modify, was really only
an overwrite, we now just delete the old header if it was type PJSIP_H_OTHER
and recreate it.
This raises a question for another time though: What should happen with
duplicate headers? Right now res_pjsip_header_funcs doesn't check for dups
so if it's session supplement is loaded after res_pjsip_caller_id's (or any
other module that adds headers), there'll be dups in the message.
Mark Michelson [Mon, 15 Feb 2016 19:08:22 +0000 (13:08 -0600)]
Fix creation race of contact_status structures.
It is possible when processing a SIP REGISTER request to have two
threads end up creating contact_status structures in sorcery.
contact_status is created using a "find or create" function. If two
threads call into this at the same time, each thread will fail to find
an existing contact_status, and so both will end up creating a new
contact status.
During testing, we would see sporadic failures because the
PJSIP_CONTACT() dialplan function would operate on a different
contact_status than what had been updated by res_pjsip/pjsip_options.
The fix here is two-fold:
1) The "find or create" function for contact_status now has a lock
around the entire operation. This way, if two threads attempt the
operation simultaneously, the first to get there will create the object,
and the second will find the object created by the first thread.
2) res_sorcery_memory has had its create callback updated so that it
will not allow for objects with duplicate IDs to be created.
Joshua Colp [Mon, 15 Feb 2016 18:52:22 +0000 (14:52 -0400)]
res_pjsip_pubsub: Move where the subscription is stored to after initialized.
A problem arose when testing the AMI subscription listing actions where it
was possible for a subscription that had not been fully initialized to be
listed. This was problematic as the underlying listing code would crash.
This change makes it so the subscription tree is fully set up before it is
added to the list of subscriptions. This ensures that when the listing actions
get the subscription it is valid.
George Joseph [Tue, 9 Feb 2016 23:34:05 +0000 (16:34 -0700)]
res_pjsip: Refactor load_module/unload_module
load_module was just too hairy with every step having to clean up all
previous steps on failure.
Some of the pjproject init calls have now been moved to a separate
load_pjsip function and the unload_pjsip function was enhanced to clean
up everything if an error happened at any stage of the load process.
In the process, a bunch of missing pj_shutdowns, serializer_pool_shutdowns
and ast_threadpool_shutdowns were also corrected.
Pjproject has deprecated pjsip_dlg_create_uas in 2.5 and replaced it with
pjsip_dlg_create_uas_and_inc_lock which, as the name implies, automatically
increments the lock on the returned dialog. To account for this, configure.ac
now detects the presence of pjsip_dlg_create_uas_and_inc_lock and res_pjsip.c
has an #ifdef HAVE_PJSIP_DLG_CREATE_UAS_AND_INC_LOCK to decide whether to use
the original call or the new one. If the new one was used, the ref count is
decremented before returning.
Corey Farrell [Tue, 9 Feb 2016 20:21:05 +0000 (15:21 -0500)]
Simplify and fix conditional in FD_SET.
FD_SET contains a conditional statement to protect against buffer
overruns. The statement was overly complicated and prevented use
of the last array element of ast_fdset. We now just verify the fd
is less than ast_FDMAX.
When terminating the threads thrashing a sorcery memory cache each
would be told to stop and then we would wait on them. During at
least one thrashing test this was problematic due to the specific
usage pattern in use. It would take some time for termination of the
thread to occur.
This would occur due to contention between the threads retrieving
and the threads updating the cache. As the retrieving threads are
given priority it may be some time before the updating threads
are able to proceed.
This change makes it so all threads are told to stop and then each
are joined to ensure they stop. This way all the threads should
stop at around the same time instead of waiting for one to stop,
the next to stop, then the next, and so on. As a result of this
the execution time for each thrash test is much closer to their
expected value than previously seen as well.
George Joseph [Fri, 29 Jan 2016 23:56:42 +0000 (16:56 -0700)]
res_pjsip: Fix infinite recursion when loading transports from realtime
Attempting to load a transport from realtime was forcing asterisk into an
infinite recursion loop. The first thing transport_apply did was to do a
sorcery retrieve by id for an existing transport of the same name. For files,
this just returns the previous object from res_sorcery_config's internal
container, if any. For realtime, the res_sourcery_realtime driver looks in the
database and finds the existing row but now it has to rehydrate it into a
sorcery object which means calling... transport_apply. And so it goes.
The main issue with loading from realtime (apart from the loop) was that
transport stores structures and pointers directly in the ast_sip_transport
structure instead of the separate ast_transport_state structure. This patch
separates those items into the ast_sip_transport_state structure. The pattern
is roughly the same as res_pjsip_outbound_registration.
Although all current usages of ast_sip_transport and ast_sip_transport_state
were modified to use the new ast_sip_get_transport_state API, the original
items are left in ast_sip_transport and kept updated to maintain ABI
compatability for third-party modules. They are marked as deprecated and
noted that they're now in ast_sip_transport_state.
ASTERISK-25606 #close Reported-by: Martin Moučka
Change-Id: Ic7a836ea8e786e8def51fe3f8cce855ea54f5f19
Joshua Colp [Fri, 5 Feb 2016 17:49:14 +0000 (11:49 -0600)]
Merge topic 'ASTERISK-20987'
* changes:
app_confbridge: Add ability to get the muted conference state.
app_confbridge.c: Update CONFBRIDGE and CONFBRIDGE_INFO documentation.
app_confbridge: Make non-admin users join a muted conference muted.
Mark Michelson [Thu, 4 Feb 2016 22:17:55 +0000 (16:17 -0600)]
Check for OpenSSL defines before trying to use them.
The SSL_OP_NO_TLSv1_1 and SSL_OP_NO_TLSv1_2 defines did not exist prior
to OpenSSL version 1.0.1. A recent commit attempts to, by default, set
these options, which can cause problems on systems with older OpenSSL
installations.
This commit adds a configure script check for those defines and will not
attempt to make use of those if they do not exist. We will print a
warning urging the user to upgrade their OpenSSL installation if those
defines are not present.