George Joseph [Thu, 13 Apr 2017 16:14:48 +0000 (10:14 -0600)]
AST-2017-004: chan_skinny: Add EOF check in skinny_session
The while(1) loop in skinny_session wasn't checking for EOF so
a packet that was longer than a header but still truncated
would spin the while loop infinitely. Not only does this
permanently tie up a thread and drive a core to 100% utilization,
the call of ast_log() in such a tight loop eats all available
process memory.
Richard Mudgett [Sat, 13 May 2017 02:04:59 +0000 (21:04 -0500)]
res_pjsip_session.c: Process initial INVITE sooner. (key exists)
Retransmissions of an initial INVITE could be queued in the serializer
before we have processed the first INVITE message. If the first INVITE
message doesn't get completely processed before the retransmissions are
seen then we could try to setup the same call from the retransmissions. A
symptom of this is seeing a (key exists) message associated with an
INVITE. An earlier change attempted to address this kind of problem by
calculating a distributor serializer to use for unassociated messages.
Part of that change also made incoming calls keep using that distributor
serializer. (ASTERISK-26088) However, some leftover code was still
deferring the INVITE processing to the session's serializer even though we
were already in that serializer. This not only is unnecessary but would
cause the same call resetup problem.
* Removed the code to defer processing the initial INVITE to the session's
serializer because we are already running in that serializer.
Joshua Colp [Tue, 9 May 2017 15:34:49 +0000 (15:34 +0000)]
tcptls: Improve error messages for TLS connections.
This change uses the functions provided by OpenSSL to query
and better construct error messages for situations where
the connection encounters a problem.
George Joseph [Fri, 5 May 2017 16:33:34 +0000 (10:33 -0600)]
cel_odbc: Fix timestamp processing for microseconds
When a column is of type timestamp, the fraction part of the event
field's seconds was frequently parsed incorrectly especially if
there were leading zeros. For instance "2017-05-23 23:55:03.023"
would be parsed into an int as "23" then when the timestamp was
formatted again to be inserted into the database column it'd be
"2017-05-23 23:55:03.23" which is now 230 milliseconds instead of
23 milliseconds. "03.000001" would be transformed to "03.1", etc.
* If the event field is 'eventtime' and the db column is timestamp,
then existing processing has already correctly formatted the
timestamp so now we simply use it rather than parsing it and
re-printing it. This is the most common use case anyway.
* If the event field is other than 'eventtime' and the db column
is timestamp, we now parse the seconds, including the fractional
part into a double rather than 2 ints. This preserves the
magnitude and precision of the fractional part. When we print
it, we now print it as a "%09.6lf" which correctly represents the
input.
To be honest, why we parse the string timestamp into components,
test the components, then print the components back into a string
timestamp is beyond me. We should use parse it, test it, then if
it passes, use the original string representation in the database
call. Maybe someone thought that some implementations wouldn't
take a partial timestamp string like "2017-05-06" and decided to
always produce a full timestamp string even if an abbreviated one
was supplied. Anyway, I'm leaving it as it is.
Joshua Colp [Tue, 9 May 2017 10:25:29 +0000 (10:25 +0000)]
res_hep_rtcp: Provide chan_sip Call-ID for RTCP messages.
This change adds the required logic to allow the SIP
Call-ID to be placed into the HEP RTCP traffic if the
chan_sip module is used. In cases where the option is
enabled but the channel is not either SIP or PJSIP then
the code will fallback to the channel name as done
previously.
Based on the change on Nir's branch at:
team/nirs/hep-chan-sip-support
George Joseph [Mon, 8 May 2017 21:11:19 +0000 (15:11 -0600)]
logger: Added logger_queue_limit to the configuration options.
All log messages go to a queue serviced by a single thread
which does all the IO. This setting controls how big that
queue can get (and therefore how much memory is allocated)
before new messages are discarded. The default is 1000.
Should something go bezerk and log tons of messages in a tight
loop, this will prevent memory escalation.
When the limit is reached, a WARNING is logged to that effect
and messages are discarded until the queue is empty again. At
that time another WARNING will be logged with the count of
discarded messages. There's no "low water mark" for this queue
because the logger thread empties the entire queue and processes it
in 1 batch before going back and waiting on the queue again.
Implementing a low water mark would mean additional locking as
the thread processes each message and it's not worth it.
A "test" was added to test_logger.c but since the outcome is
non-deterministic, it's really just a cli command, not a unit
test.
Joshua Colp [Fri, 5 May 2017 13:48:34 +0000 (13:48 +0000)]
func_cdr: Allow empty value for CDR dialplan function.
A regression was introduced in 12 where passing an empty value
to the CDR dialplan function was not longer allowed. This
change returns to the behavior of 11 where it is permitted.
George Joseph [Thu, 4 May 2017 21:04:46 +0000 (15:04 -0600)]
app_confbridge: Fix reference to cfg in menu_template_handler
menu_template_handler wasn't properly accounting for the fact that
it might be called both during a load/reload (which isn't really
valid but not prevented) and by a dialplan function. In both cases
it was attempting to use the "pending" config which wasn't valid in
the latter case. aco_process_config is also partly to blame because
it wasn't properly cleaning "pending" up when a reload was done and
no changes were made. Both of these contributed to a crash if
CONFBRIDGE(menu,template) was called in a dialplan after a reload.
* aco_process_config now sets info->internal->pending to NULL
after it unrefs it although this isn't strictly necessary in the
context of this fix.
* menu_template_handler now uses the "current" config and silently
ignores any attempt to be called as a result of someone uses the
"template" parameter in the conf file.
Luckily there's no other place in the codebase where
aco_pending_config is used outside of aco_process_config.
ASTERISK-25506 #close Reported-by: Frederic LE FOLL
Change-Id: Ib349a17d3d088f092480b19addd7122fcaac21a7
bridge: Fix returning to dialplan when executing Bridge() from AMI.
When using the Bridge AMI action on the same channel multiple times
it was possible for the channel to return to the wrong location in
the dialplan if the other party hung up. This happened because the
priority of the channel was not preserved across each action
invocation and it would fail to move on to the next priority in
other cases.
This change makes it so that the priority of a channel is preserved
when taking control of it from another thread and it is incremented
as appropriate such that the priority reflects where the channel
should next be executed in the dialplan, not where it may or may not
currently be.
The Bridge AMI action was also changed to ensure that it too
starts the channels at the next location in the dialplan.
Kevin Harwell [Mon, 1 May 2017 18:04:16 +0000 (13:04 -0500)]
res_rtp_asterisk: Clearing the remote RTCP address causes RTCP failures
When a call gets put on hold RTP is temporarily stopped and Asterisk was
setting the remote RTCP address to NULL. Then when RTCP data was received
from the remote endpoint, Asterisk would be missing this information when
publishing the rtcp_message stasis event. Consequently, message subscribers
(in this case res_hep_rtcp) trying to parse the "from" field output the
following error:
"ast_sockaddr_split_hostport: Port missing in (null)"
This patch makes it so the remote RTCP address is no longer set to NULL when
stopping RTP. There was only one place that appeared to check if the remote
RTCP address was NULL as a way to tell if RTCP was running. This patch added
an additional check on the RTCP schedid for that case to make sure RTCP was
truly not running.
There is a theoretical potential to deadlock in
ast_rtp_codecs_payloads_copy() because it locks two different
ast_rtp_codecs locks. It is theoretical because the callers of the
function are either copying between a local ast_rtp_codecs struct and a
RTP instance of the ast_rtp_codecs struct. Or they are copying between
the caller and callee channel RTP instances before initiating the call to
the callee. Neither of these situations could actually result in a
deadlock because there cannot be another thread involved at the time.
* Add deadlock avoidance code to ast_rtp_codecs_payloads_copy() since it
locks two ast_rtp_codecs locks to perform a copy.
This only affects v13 since this deadlock avoidance code is already in
newer branches.
Richard Mudgett [Sat, 29 Apr 2017 21:11:21 +0000 (16:11 -0500)]
res_pjsip_t38.c: Fix deadlock in T.38 framehook.
A deadlock can happen between a channel lock and a pjsip session media
container lock. One thread is processing a reINVITE's SDP and walking
through the session's media container when it waits for the channel lock
to put the determined format capabilities onto the channel. The other
thread is writing a frame to the channel and processing the T.38 frame
hook. The T.38 frame hook then waits for the pjsip session's media
container lock. The two threads are now deadlocked.
* Made the T.38 frame hook release the channel lock before searching the
session's media container. This fix has been done to several other
frame hooks to fix deadlocks.
George Joseph [Thu, 27 Apr 2017 13:02:12 +0000 (07:02 -0600)]
res_pjsip_session: Add cleanup to ast_sip_session_terminate
If you use ast_request to create a PJSIP channel but then hang it
up without causing a transaction to be sent, the session will
never be destroyed. This is due ot the fact that it's pjproject
that triggers the session cleanup when the transaction ends.
app_chanisavail was doing this to get more granular channel state
and it's also possible for this to happen via ARI.
* ast_sip_session_terminate was modified to explicitly call the
cleanup tasks and unreference session if the invite state is NULL
AND invite_tsx is NULL (meaning we never sent a transaction).
* chan_pjsip/hangup was modified to bump session before it calls
ast_sip_session_terminate to insure that session stays valid
while it does its own cleanup.
* Added test events to session_destructor for a future testsuite
test.
ASTERISK-26908 #close Reported-by: Richard Mudgett
Change-Id: I52daf6f757184e5544c261f64f6fe9602c4680a9
Kevin Harwell [Wed, 26 Apr 2017 19:20:00 +0000 (14:20 -0500)]
res_pjsip/res_pjsip_callerid: NULL check on caller id name string
It's possible for a name in a party id structure to be marked as valid, but the
name string itself be NULL (for instance this is possible to do by using the
dialplan CALLERID function). There were a couple of places where the name was
validated, but the string itself was not checked before passing it to functions
like 'strlen'. This of course caused a crashed.
This patch adds in a NULL check before attempting to pass it into a function
that is not NULL tolerant.
Kevin Harwell [Tue, 25 Apr 2017 16:43:26 +0000 (11:43 -0500)]
vector: defaults and indexes
Added an pre-defined integer vector declaration. This makes integer vectors
easier to declare and pass around. Also, added the ability to default a vector
up to a given size with a default value. Lastly, added functionality that
returns the "nth" index of a matching value.
Jean Aunis [Thu, 20 Apr 2017 07:13:13 +0000 (09:13 +0200)]
chan_sip: Trigger reinvite if the SDP answer is included in the SIP ACK
Some equipments may send a re-INVITE containing an SDP in the final ACK
request. If this happens in the context of direct media, the remote end
should be updated with a re-INVITE.
This patch queues an "update RTP peer" frame to trigger the re-INVITE,
instead of the "source change" frame wich was used previously.
Interpolated frames are frames which contain a number of
samples but have no actual data. Audiohooks did not
handle this case when translating an incoming frame into
signed linear. It assumed that a frame would always contain
media when it may not. If this occurs audiohooks will now
immediately return and not act on the frame.
As well for users of ast_trans_frameout the function has
been changed to be a bit more sane and ensure that the data
pointer on a frame is set to NULL if no data is actually
on the frame. This allows the various spots in Asterisk that
check for an interpolated frame based on the presence of a
data pointer to work as expected.
Sean Bright [Fri, 21 Apr 2017 17:04:44 +0000 (13:04 -0400)]
cleanup: Fix fread() and fwrite() error handling
Cleaned up some of the incorrect uses of fread() and fwrite(), mostly in
the format modules. Neither of these functions will ever return a value
less than 0, which we were checking for in some cases.
I've introduced a fair amount of duplication in the format modules, but
I plan to change how format modules work internally in a subsequent
patch set, so this is simply a stop-gap.
Sean Bright [Tue, 18 Apr 2017 00:06:10 +0000 (20:06 -0400)]
core: Use eventfd for alert pipes on Linux when possible
The primary win of switching to eventfd when possible is that it only
uses a single file descriptor while pipe() will use two. This means for
each bridge channel we're reducing the number of required file
descriptors by 1, and - if you're using timerfd - we also now have 1
less file descriptor per Asterisk channel.
The API is not ideal (passing int arrays), but this is the cleanest
approach I could come up with to maintain API/ABI.
I've also removed what I believe to be an erroneous code block that
checked the non-blocking flag on the pipe ends for each read. If the
file descriptor is 'losing' its non-blocking mode, it is because of a
bug somewhere else in our code.
In my testing I haven't seen any measurable difference in performance.
Richard Mudgett [Fri, 21 Apr 2017 17:33:34 +0000 (12:33 -0500)]
res_pjsip_session.c: Send 100 Trying out earlier to prevent retransmissions.
If ICE is enabled and a STUN server does not respond then we will block
until we give up on the STUN response. This will take nine seconds. In
the mean time the peer that sent the INVITE will send retransmissions.
* Restructure res_pjsip_session.c:new_invite() to send a 100 Trying out
earlier to prevent these retransmissions.
* Restructure ast_sip_session_alloc() to need less cleanup on off nominal
error paths.
* Made ast_sip_session_alloc() and ast_sip_session_create_outgoing() avoid
unnecessary ref manipulation to return a session. This is faster than
calling a function. That function may do logging of the ref changes with
REF_DEBUG enabled.
Sean Bright [Wed, 19 Apr 2017 20:08:39 +0000 (16:08 -0400)]
pbx: Use same thread if AST_OUTGOING_WAIT_COMPLETE specified
Both ast_pbx_outgoing_app() and ast_pbx_outgoing_exten() cause the core
to spawn a new thread to perform the dial. When AST_OUTGOING_WAIT_COMPLETE
is passed to these functions, the calling thread will be blocked until
the newly created channel has been hung up.
After this patch, we run the dial on the current thread rather than
spawning a new one. The only in-tree code that passes
AST_OUTGOING_WAIT_COMPLETE is pbx_spool, so you should see reduced
thread usage if you are using .call files.
Richard Mudgett [Wed, 19 Apr 2017 18:23:55 +0000 (13:23 -0500)]
res_rtp_asterisk.c: Fix crash in RTCP DTLS operation.
Occasionally a crash happens when processing the RTCP DTLS timeout
handler. The RTCP DTLS timeout timer could be left running if we have not
completed the DTLS handshake before we place the call on hold or we
attempt direct media.
* Made ast_rtp_prop_set() stop the RTCP DTLS timer when disabling RTCP.
* Made some sanity tweaks to ast_rtp_prop_set() when switching from
standard RTCP mode to RTCP multiplexed mode.
The struct ast_rtp_instance has historically been indirectly protected
from reentrancy issues by the channel lock because early channel drivers
held the lock for really long times. Holding the channel lock for such a
long time has caused many deadlock problems in the past. Along comes
chan_pjsip/res_pjsip which doesn't necessarily hold the channel lock
because sometimes there may not be an associated channel created yet or
the channel pointer isn't available.
In the case of ASTERISK-26835 a pjsip serializer thread was processing a
message's SDP body while another thread was reading a RTP packet from the
socket. Both threads wound up changing the rtp->rtcp->local_addr_str
string and interfering with each other. The classic reentrancy problem
resulted in a crash.
In the case of ASTERISK-26853 a pjsip serializer thread was processing a
message's SDP body while another thread was reading a RTP packet from the
socket. Both threads wound up processing ICE candidates in PJPROJECT and
interfering with each other. The classic reentrancy problem resulted in a
crash.
* rtp_engine.c: Make the ast_rtp_instance_xxx() calls lock the RTP
instance struct.
* rtp_engine.c: Make ICE and DTLS wrapper functions to lock the RTP
instance struct for the API call.
* res_rtp_asterisk.c: Lock the RTP instance to prevent a reentrancy
problem with rtp->rtcp->local_addr_str in the scheduler thread running
ast_rtcp_write().
* res_rtp_asterisk.c: Avoid deadlock when local RTP bridging in
bridge_p2p_rtp_write() because there are two RTP instance structs
involved.
* res_rtp_asterisk.c: Avoid deadlock when trying to stop scheduler
callbacks. We cannot hold the instance lock when trying to stop a
scheduler callback.
* res_rtp_asterisk.c: Remove the lock in struct dtls_details and use the
struct ast_rtp_instance ao2 object lock instead. The lock was used to
synchronize two threads to prevent a race condition between starting and
stopping a timeout timer. The race condition is no longer present between
dtls_perform_handshake() and __rtp_recvfrom() because the instance lock
prevents these functions from overlapping each other with regards to the
timeout timer.
* res_rtp_asterisk.c: Remove the lock in struct ast_rtp and use the struct
ast_rtp_instance ao2 object lock instead. The lock was used to
synchronize two threads using a condition signal to know when TURN
negotiations complete.
* res_rtp_asterisk.c: Avoid deadlock when trying to stop the TURN
ioqueue_worker_thread(). We cannot hold the instance lock when trying to
create or shut down the worker thread without a risk of deadlock.
This patch exposed a race condition between a PJSIP serializer thread
setting up an ICE session in ice_create() and another thread reading RTP
packets.
* res_rtp_asterisk.c:ice_create(): Set the new rtp->ice pointer after we
have re-locked the RTP instance to prevent the other thread from trying to
process ICE packets on an incomplete ICE session setup.
A similar race condition is between a PJSIP serializer thread resetting up
an ICE session in ice_create() and the timer_worker_thread() processing
the completion of the previous ICE session.
* res_rtp_asterisk.c:ast_rtp_on_ice_complete(): Protect against an
uninitialized/null remote_address after calling
update_address_with_ice_candidate().
* res_rtp_asterisk.c: Eliminate the chance of ice_reset_session()
destroying and setting the rtp->ice pointer to NULL while other threads
are using it by adding an ao2 wrapper around the PJPROJECT ice pointer.
Now when we have to unlock the RTP instance object to call a PJPROJECT ICE
function we will hold a ref to the wrapper. Also added some rtp->ice NULL
checks after we relock the RTP instance and have to do something with the
ICE structure.
George Joseph [Mon, 17 Apr 2017 00:54:31 +0000 (18:54 -0600)]
make ari-stubs so doc periodic jobs can run
The periodic doc job does a make ari-stubs and checks that
there are no changes before generating the docs. Since I changed
the mustache template (and the generated code directly) recently
and forgot to regenerate the stubs, the doc job thinks they're out
of date.
Sean Bright [Fri, 14 Apr 2017 21:50:56 +0000 (17:50 -0400)]
res_stun_monitor: Don't fail to load if DNS resolution fails
res_stun_monitor will fail to load if DNS resolution of the STUN server
fails. Instead, we continue without the STUN server being resolved and
we will re-attempt the resolution on the STUN refresh interval.
Sean Bright [Fri, 14 Apr 2017 19:36:57 +0000 (15:36 -0400)]
format_pcm: Track actual header size of .au files
Sun's Au file format has a minimum data offset 24 bytes, but this
offset is encoded in each .au file. Instead of assuming the minimum,
read the actual value and store it for later use.
ASTERISK-20984 #close
Reported by: Roman S.
Patches:
asterisk-1.8.20.0-au-clicks-2.diff (license #6474) patch
uploaded by Roman S.
George Joseph [Tue, 11 Apr 2017 16:07:39 +0000 (10:07 -0600)]
modules: change module LOAD_FAILUREs to LOAD_DECLINES
In all non-pbx modules, AST_MODULE_LOAD_FAILURE has been changed
to AST_MODULE_LOAD_DECLINE. This prevents asterisk from exiting
if a module can't be loaded. If the user wishes to retain the
FAILURE behavior for a specific module, they can use the "require"
or "preload-require" keyword in modules.conf.
A new API was added to logger: ast_is_logger_initialized(). This
allows asterisk.c/check_init() to print to the error log once the
logger subsystem is ready instead of just to stdout. If something
does fail before the logger is initialized, we now print to stderr
instead of stdout.
Merge changes from topics 'ASTERISK-26890', 'ASTERISK-26851' into 13
* changes:
stun.c: Fix ast_stun_request() erratic timeout.
sorcery.c: Speed up ast_sorcery_retrieve_by_id()
res_pjsip: Fix pointer use after unref.
res_pjsip_sdp_rtp.c: Don't use deprecated transport struct member.
Richard Mudgett [Fri, 7 Apr 2017 21:14:16 +0000 (16:14 -0500)]
res_rtp_asterisk.c: Add stun_blacklist option
Added the stun_blacklist option to rtp.conf. Some multihomed servers have
IP interfaces that cannot reach the STUN server specified by stunaddr.
Blacklist those interface subnets from trying to send a STUN packet to
find the external IP address. Attempting to send the STUN packet
needlessly delays processing incoming and outgoing SIP INVITEs because we
will wait for a response that can never come until we give up on the
response. Multiple subnets may be listed.
Richard Mudgett [Thu, 6 Apr 2017 22:31:14 +0000 (17:31 -0500)]
stun.c: Fix ast_stun_request() erratic timeout.
If ast_stun_request() receives packets other than a STUN response then we
could conceivably never exit if we continue to receive packets with less
than three seconds between them.
* Fix poll timeout to keep track of the time when we sent the STUN
request. We will now send a STUN request every three seconds regardless
of how many other packets we receive while waiting for a response until we
have completed three STUN request transmission cycles.
Richard Mudgett [Thu, 6 Apr 2017 23:18:16 +0000 (18:18 -0500)]
res_pjsip_sdp_rtp.c: Don't use deprecated transport struct member.
* create_rtp(): Eliminate use of deprecated transport struct member. That
member and several others in the transport structure were deprecated
because of an infinite loop created when using realtime configuration.
See 2451d4e4550336197ee2e482750cc53f30afa352
Richard Mudgett [Mon, 10 Apr 2017 22:45:35 +0000 (17:45 -0500)]
tcptls.c: Cleanup TCP/TLS listener thread on abnormal exit.
Temporarily running out of file descriptors should not terminate the
listener thread. Otherwise, when there becomes more file descriptors
available, nothing is listening.
* Added EMFILE exception to abnormal thread exit.
* Added an abnormal TCP/TLS listener exit error message.
* Closed the TCP/TLS listener socket on abnormal exit so Asterisk does not
appear dead if something tries to connect to the socket.
strings.h: Avoid overflows in the string hash functions
On 2's compliment machines abs(INT_MIN) behavior is undefined and
results in a negative value still being returnd. This results in
negative hash codes that can result in crashes.
This change adds database tables for the PUBLISH support so it
can be configured using realtime. A minor fix to the
res_pjsip_publish_asterisk module was done so that it read the
sorcery configuration from the correct section. Finally the
sample configuration files have been updated.
Alexander Traud [Fri, 7 Apr 2017 13:06:11 +0000 (15:06 +0200)]
pjproject_bundled: Crash on pj_ssl_get_info() while ioqueue_on_read_complete().
When the Asterisk channel driver res_pjsip offers SIP-over-TLS, sometimes, not
reproducible, Asterisk crashed in pj_ssl_sock_get_info() because a NULL pointer
was read. This change avoids this crash.