George Joseph [Tue, 30 May 2017 14:43:49 +0000 (08:43 -0600)]
test_json: Fix test names with reserved words
Some of the test names were actually reserved words (true, false,
int, null, string, bool). When the jenkins test results analyzer
does its thing it tries to create a map using the test names as
keys and fails because they're reserved words.
George Joseph [Wed, 24 May 2017 20:50:56 +0000 (14:50 -0600)]
unittests: Add a unit test that causes a SEGV and...
...that can only be run by explicitly calling it with
'test execute category /DO_NOT_RUN/ name RAISE_SEGV'
This allows us to more easily test CI and debugging tools that
should do certain things when asterisk coredumps.
To allow this a new member was added to the ast_test_info
structure named 'explicit_only'. If set by a test, the test
will be skipped during a 'test execute all' or
'test execute category ...'.
Sean Bright [Tue, 23 May 2017 20:42:04 +0000 (16:42 -0400)]
res_agi: Allow configuration of audio format of EAGI pipe
This change allows the format of the EAGI audio pipe to be changed by
setting the dialplan variable 'EAGI_AUDIO_FORMAT' to the name of one of
the loaded formats.
Sean Bright [Tue, 23 May 2017 18:06:22 +0000 (14:06 -0400)]
res_agi: Prevent crash when SET VARIABLE called without arguments
Explicitly check that the appropriate number of arguments were passed to
SET VARIABLE before attempting to reference them. Also initialize the
arguments array to zeroes before populating it.
Joshua Colp [Mon, 15 May 2017 20:03:36 +0000 (20:03 +0000)]
app_queue: Fix members showing as being in call when not.
A change was done which added an 'in_call' flag to queue
members that was set to true while talking to an agent.
Unfortunately in practice this does not accurately reflect
whether they are talking to an agent or not. If a Local
channel is involved and a transfer is performed then the
app_queue application would incorrectly think the agent
was still in a call with the caller. This was done to
fix a race condition between an agent becoming available
by device state and the checking of the last call information
for the wrapup time. There was a small window where the
last call information would be the previous value instead
of the new one.
This change goes about fixing the original issue in a
different way by considering the call completed if device
state is received which would make the agent available
and if they are currently in a call. If this occurs the
last call information is updated before the agent becomes
available ensuring that old information is not present
when checking if the member should be called. This also
improves the transfer situation by actually updating
and enforcing the wrapup time.
Robert Mordec [Tue, 23 May 2017 10:45:29 +0000 (12:45 +0200)]
app_confbridge: Race between removing and playing name recording while leaving
When user leaves a conference, its channel calls async_play_sound_file()
in order to play the name announcement and then unlinks the sound file.
The async_play_sound_file() function adds a task to conference playback queue,
which then runs playback_common() function in a different thread.
It leads to a race condition when, in some cases, channel thread may unlink
the sound file before playback_common() had a chance to open it.
This patch creates a file deletion task, that is queued after playback.
Kevin Harwell [Mon, 22 May 2017 18:51:40 +0000 (13:51 -0500)]
res_rtp_asterisk: rtcp mux using the wrong srtp unprotecting algorithm
When using rtcp mux if an rtcp payload came in it would still use the srtp
unprotect algorithm instead of the srtp unprotect rtcp method. Since rtcp
data was being passed to the rtp unprotect method this would result in an
error.
This patch ensures that the correct unprotect method is chosen by making
sure the passed in rtcp flag is appropriately set when rtcp mux is enabled
and an rtcp payload is received.
Sean Bright [Fri, 19 May 2017 15:05:36 +0000 (15:05 +0000)]
chan_sip: Better ICE handling for RTCP-MUX
If we are offered or are offering RTCP-MUX, don't consider RTCP ICE
candidates. This confuses certain browsers (current Firefox for
example) and causes intial audio setup delays.
George Joseph [Thu, 13 Apr 2017 16:14:48 +0000 (10:14 -0600)]
AST-2017-004: chan_skinny: Add EOF check in skinny_session
The while(1) loop in skinny_session wasn't checking for EOF so
a packet that was longer than a header but still truncated
would spin the while loop infinitely. Not only does this
permanently tie up a thread and drive a core to 100% utilization,
the call of ast_log() in such a tight loop eats all available
process memory.
Ivan Poddubny [Thu, 11 May 2017 05:25:44 +0000 (07:25 +0200)]
app_queue: Fix duplicate queue_log entries for EXITEMPTY and ABANDON
There are 2 places in app_queue.c that log EXITEMPTY event: one in
wait_our_turn, and another one in queue_exec in the loop trying to
call an agent after wait_our_turn.
In most cases it leads to logging EXITEMPTY twice.
ABANDON is also logged on two places, and in the rare case when an agent
and caller hang up simultaneously it's also possible to get duplicates
in queue_log.
This commit changes wait_our_turn to return -1 ("the caller should exit
the queue") instead of 0 ("the caller's turn has arrived") in case of
leaving when empty, so queue_exec skips the agent calling loop.
Also, leave_queue is now executed only once in this case, because 2nd
time is just a noop when the queue entry has already been removed.
Also, it sets qe->handled to -1 to indicate that the call was not
answered by an agent, but the necessary handling has already been done
in order to avoid logging an extra ABANDON entry.
Joshua Colp [Sat, 13 May 2017 16:40:00 +0000 (16:40 +0000)]
asterisk: Audit locking of channel when manipulating flags.
When manipulating flags on a channel the channel has to be
locked to guarantee that nothing else is also manipulating
the flags. This change introduces locking where necessary to
guarantee this. It also adds helper functions that manipulate
channel flags and lock to reduce repeated code.
Richard Mudgett [Sat, 13 May 2017 02:04:59 +0000 (21:04 -0500)]
res_pjsip_session.c: Process initial INVITE sooner. (key exists)
Retransmissions of an initial INVITE could be queued in the serializer
before we have processed the first INVITE message. If the first INVITE
message doesn't get completely processed before the retransmissions are
seen then we could try to setup the same call from the retransmissions. A
symptom of this is seeing a (key exists) message associated with an
INVITE. An earlier change attempted to address this kind of problem by
calculating a distributor serializer to use for unassociated messages.
Part of that change also made incoming calls keep using that distributor
serializer. (ASTERISK-26088) However, some leftover code was still
deferring the INVITE processing to the session's serializer even though we
were already in that serializer. This not only is unnecessary but would
cause the same call resetup problem.
* Removed the code to defer processing the initial INVITE to the session's
serializer because we are already running in that serializer.
res_pjsip: New endpoint option "refer_blind_progress"
This option was added to turn off notifying the progress details
on Blind Transfer. If this option is not set then the chan_pjsip
will send NOTIFY "200 OK" immediately after "202 Accepted".
Some SIP phones like Mitel/Aastra or Snom keep the line busy until
receive "200 OK".
Joshua Colp [Tue, 9 May 2017 15:34:49 +0000 (15:34 +0000)]
tcptls: Improve error messages for TLS connections.
This change uses the functions provided by OpenSSL to query
and better construct error messages for situations where
the connection encounters a problem.
George Joseph [Fri, 5 May 2017 16:33:34 +0000 (10:33 -0600)]
cel_odbc: Fix timestamp processing for microseconds
When a column is of type timestamp, the fraction part of the event
field's seconds was frequently parsed incorrectly especially if
there were leading zeros. For instance "2017-05-23 23:55:03.023"
would be parsed into an int as "23" then when the timestamp was
formatted again to be inserted into the database column it'd be
"2017-05-23 23:55:03.23" which is now 230 milliseconds instead of
23 milliseconds. "03.000001" would be transformed to "03.1", etc.
* If the event field is 'eventtime' and the db column is timestamp,
then existing processing has already correctly formatted the
timestamp so now we simply use it rather than parsing it and
re-printing it. This is the most common use case anyway.
* If the event field is other than 'eventtime' and the db column
is timestamp, we now parse the seconds, including the fractional
part into a double rather than 2 ints. This preserves the
magnitude and precision of the fractional part. When we print
it, we now print it as a "%09.6lf" which correctly represents the
input.
To be honest, why we parse the string timestamp into components,
test the components, then print the components back into a string
timestamp is beyond me. We should use parse it, test it, then if
it passes, use the original string representation in the database
call. Maybe someone thought that some implementations wouldn't
take a partial timestamp string like "2017-05-06" and decided to
always produce a full timestamp string even if an abbreviated one
was supplied. Anyway, I'm leaving it as it is.
Joshua Colp [Tue, 9 May 2017 10:25:29 +0000 (10:25 +0000)]
res_hep_rtcp: Provide chan_sip Call-ID for RTCP messages.
This change adds the required logic to allow the SIP
Call-ID to be placed into the HEP RTCP traffic if the
chan_sip module is used. In cases where the option is
enabled but the channel is not either SIP or PJSIP then
the code will fallback to the channel name as done
previously.
Based on the change on Nir's branch at:
team/nirs/hep-chan-sip-support
George Joseph [Mon, 8 May 2017 21:11:19 +0000 (15:11 -0600)]
logger: Added logger_queue_limit to the configuration options.
All log messages go to a queue serviced by a single thread
which does all the IO. This setting controls how big that
queue can get (and therefore how much memory is allocated)
before new messages are discarded. The default is 1000.
Should something go bezerk and log tons of messages in a tight
loop, this will prevent memory escalation.
When the limit is reached, a WARNING is logged to that effect
and messages are discarded until the queue is empty again. At
that time another WARNING will be logged with the count of
discarded messages. There's no "low water mark" for this queue
because the logger thread empties the entire queue and processes it
in 1 batch before going back and waiting on the queue again.
Implementing a low water mark would mean additional locking as
the thread processes each message and it's not worth it.
A "test" was added to test_logger.c but since the outcome is
non-deterministic, it's really just a cli command, not a unit
test.
Joshua Colp [Fri, 5 May 2017 13:48:34 +0000 (13:48 +0000)]
func_cdr: Allow empty value for CDR dialplan function.
A regression was introduced in 12 where passing an empty value
to the CDR dialplan function was not longer allowed. This
change returns to the behavior of 11 where it is permitted.
George Joseph [Thu, 4 May 2017 21:04:46 +0000 (15:04 -0600)]
app_confbridge: Fix reference to cfg in menu_template_handler
menu_template_handler wasn't properly accounting for the fact that
it might be called both during a load/reload (which isn't really
valid but not prevented) and by a dialplan function. In both cases
it was attempting to use the "pending" config which wasn't valid in
the latter case. aco_process_config is also partly to blame because
it wasn't properly cleaning "pending" up when a reload was done and
no changes were made. Both of these contributed to a crash if
CONFBRIDGE(menu,template) was called in a dialplan after a reload.
* aco_process_config now sets info->internal->pending to NULL
after it unrefs it although this isn't strictly necessary in the
context of this fix.
* menu_template_handler now uses the "current" config and silently
ignores any attempt to be called as a result of someone uses the
"template" parameter in the conf file.
Luckily there's no other place in the codebase where
aco_pending_config is used outside of aco_process_config.
ASTERISK-25506 #close Reported-by: Frederic LE FOLL
Change-Id: Ib349a17d3d088f092480b19addd7122fcaac21a7
bridge: Fix returning to dialplan when executing Bridge() from AMI.
When using the Bridge AMI action on the same channel multiple times
it was possible for the channel to return to the wrong location in
the dialplan if the other party hung up. This happened because the
priority of the channel was not preserved across each action
invocation and it would fail to move on to the next priority in
other cases.
This change makes it so that the priority of a channel is preserved
when taking control of it from another thread and it is incremented
as appropriate such that the priority reflects where the channel
should next be executed in the dialplan, not where it may or may not
currently be.
The Bridge AMI action was also changed to ensure that it too
starts the channels at the next location in the dialplan.
Kevin Harwell [Mon, 1 May 2017 18:04:16 +0000 (13:04 -0500)]
res_rtp_asterisk: Clearing the remote RTCP address causes RTCP failures
When a call gets put on hold RTP is temporarily stopped and Asterisk was
setting the remote RTCP address to NULL. Then when RTCP data was received
from the remote endpoint, Asterisk would be missing this information when
publishing the rtcp_message stasis event. Consequently, message subscribers
(in this case res_hep_rtcp) trying to parse the "from" field output the
following error:
"ast_sockaddr_split_hostport: Port missing in (null)"
This patch makes it so the remote RTCP address is no longer set to NULL when
stopping RTP. There was only one place that appeared to check if the remote
RTCP address was NULL as a way to tell if RTCP was running. This patch added
an additional check on the RTCP schedid for that case to make sure RTCP was
truly not running.
Richard Mudgett [Sat, 29 Apr 2017 21:11:21 +0000 (16:11 -0500)]
res_pjsip_t38.c: Fix deadlock in T.38 framehook.
A deadlock can happen between a channel lock and a pjsip session media
container lock. One thread is processing a reINVITE's SDP and walking
through the session's media container when it waits for the channel lock
to put the determined format capabilities onto the channel. The other
thread is writing a frame to the channel and processing the T.38 frame
hook. The T.38 frame hook then waits for the pjsip session's media
container lock. The two threads are now deadlocked.
* Made the T.38 frame hook release the channel lock before searching the
session's media container. This fix has been done to several other
frame hooks to fix deadlocks.
George Joseph [Thu, 27 Apr 2017 13:02:12 +0000 (07:02 -0600)]
res_pjsip_session: Add cleanup to ast_sip_session_terminate
If you use ast_request to create a PJSIP channel but then hang it
up without causing a transaction to be sent, the session will
never be destroyed. This is due ot the fact that it's pjproject
that triggers the session cleanup when the transaction ends.
app_chanisavail was doing this to get more granular channel state
and it's also possible for this to happen via ARI.
* ast_sip_session_terminate was modified to explicitly call the
cleanup tasks and unreference session if the invite state is NULL
AND invite_tsx is NULL (meaning we never sent a transaction).
* chan_pjsip/hangup was modified to bump session before it calls
ast_sip_session_terminate to insure that session stays valid
while it does its own cleanup.
* Added test events to session_destructor for a future testsuite
test.
ASTERISK-26908 #close Reported-by: Richard Mudgett
Change-Id: I52daf6f757184e5544c261f64f6fe9602c4680a9
Kevin Harwell [Wed, 26 Apr 2017 19:20:00 +0000 (14:20 -0500)]
res_pjsip/res_pjsip_callerid: NULL check on caller id name string
It's possible for a name in a party id structure to be marked as valid, but the
name string itself be NULL (for instance this is possible to do by using the
dialplan CALLERID function). There were a couple of places where the name was
validated, but the string itself was not checked before passing it to functions
like 'strlen'. This of course caused a crashed.
This patch adds in a NULL check before attempting to pass it into a function
that is not NULL tolerant.
Kevin Harwell [Tue, 25 Apr 2017 16:43:26 +0000 (11:43 -0500)]
vector: defaults and indexes
Added an pre-defined integer vector declaration. This makes integer vectors
easier to declare and pass around. Also, added the ability to default a vector
up to a given size with a default value. Lastly, added functionality that
returns the "nth" index of a matching value.
Interpolated frames are frames which contain a number of
samples but have no actual data. Audiohooks did not
handle this case when translating an incoming frame into
signed linear. It assumed that a frame would always contain
media when it may not. If this occurs audiohooks will now
immediately return and not act on the frame.
As well for users of ast_trans_frameout the function has
been changed to be a bit more sane and ensure that the data
pointer on a frame is set to NULL if no data is actually
on the frame. This allows the various spots in Asterisk that
check for an interpolated frame based on the presence of a
data pointer to work as expected.
Jean Aunis [Thu, 20 Apr 2017 07:13:13 +0000 (09:13 +0200)]
chan_sip: Trigger reinvite if the SDP answer is included in the SIP ACK
Some equipments may send a re-INVITE containing an SDP in the final ACK
request. If this happens in the context of direct media, the remote end
should be updated with a re-INVITE.
This patch queues an "update RTP peer" frame to trigger the re-INVITE,
instead of the "source change" frame wich was used previously.
Sean Bright [Fri, 21 Apr 2017 17:04:44 +0000 (13:04 -0400)]
cleanup: Fix fread() and fwrite() error handling
Cleaned up some of the incorrect uses of fread() and fwrite(), mostly in
the format modules. Neither of these functions will ever return a value
less than 0, which we were checking for in some cases.
I've introduced a fair amount of duplication in the format modules, but
I plan to change how format modules work internally in a subsequent
patch set, so this is simply a stop-gap.
Sean Bright [Tue, 18 Apr 2017 00:06:10 +0000 (20:06 -0400)]
core: Use eventfd for alert pipes on Linux when possible
The primary win of switching to eventfd when possible is that it only
uses a single file descriptor while pipe() will use two. This means for
each bridge channel we're reducing the number of required file
descriptors by 1, and - if you're using timerfd - we also now have 1
less file descriptor per Asterisk channel.
The API is not ideal (passing int arrays), but this is the cleanest
approach I could come up with to maintain API/ABI.
I've also removed what I believe to be an erroneous code block that
checked the non-blocking flag on the pipe ends for each read. If the
file descriptor is 'losing' its non-blocking mode, it is because of a
bug somewhere else in our code.
In my testing I haven't seen any measurable difference in performance.
Richard Mudgett [Fri, 21 Apr 2017 17:33:34 +0000 (12:33 -0500)]
res_pjsip_session.c: Send 100 Trying out earlier to prevent retransmissions.
If ICE is enabled and a STUN server does not respond then we will block
until we give up on the STUN response. This will take nine seconds. In
the mean time the peer that sent the INVITE will send retransmissions.
* Restructure res_pjsip_session.c:new_invite() to send a 100 Trying out
earlier to prevent these retransmissions.
* Restructure ast_sip_session_alloc() to need less cleanup on off nominal
error paths.
* Made ast_sip_session_alloc() and ast_sip_session_create_outgoing() avoid
unnecessary ref manipulation to return a session. This is faster than
calling a function. That function may do logging of the ref changes with
REF_DEBUG enabled.
Sean Bright [Wed, 19 Apr 2017 20:08:39 +0000 (16:08 -0400)]
pbx: Use same thread if AST_OUTGOING_WAIT_COMPLETE specified
Both ast_pbx_outgoing_app() and ast_pbx_outgoing_exten() cause the core
to spawn a new thread to perform the dial. When AST_OUTGOING_WAIT_COMPLETE
is passed to these functions, the calling thread will be blocked until
the newly created channel has been hung up.
After this patch, we run the dial on the current thread rather than
spawning a new one. The only in-tree code that passes
AST_OUTGOING_WAIT_COMPLETE is pbx_spool, so you should see reduced
thread usage if you are using .call files.
Richard Mudgett [Wed, 19 Apr 2017 18:23:55 +0000 (13:23 -0500)]
res_rtp_asterisk.c: Fix crash in RTCP DTLS operation.
Occasionally a crash happens when processing the RTCP DTLS timeout
handler. The RTCP DTLS timeout timer could be left running if we have not
completed the DTLS handshake before we place the call on hold or we
attempt direct media.
* Made ast_rtp_prop_set() stop the RTCP DTLS timer when disabling RTCP.
* Made some sanity tweaks to ast_rtp_prop_set() when switching from
standard RTCP mode to RTCP multiplexed mode.
The struct ast_rtp_instance has historically been indirectly protected
from reentrancy issues by the channel lock because early channel drivers
held the lock for really long times. Holding the channel lock for such a
long time has caused many deadlock problems in the past. Along comes
chan_pjsip/res_pjsip which doesn't necessarily hold the channel lock
because sometimes there may not be an associated channel created yet or
the channel pointer isn't available.
In the case of ASTERISK-26835 a pjsip serializer thread was processing a
message's SDP body while another thread was reading a RTP packet from the
socket. Both threads wound up changing the rtp->rtcp->local_addr_str
string and interfering with each other. The classic reentrancy problem
resulted in a crash.
In the case of ASTERISK-26853 a pjsip serializer thread was processing a
message's SDP body while another thread was reading a RTP packet from the
socket. Both threads wound up processing ICE candidates in PJPROJECT and
interfering with each other. The classic reentrancy problem resulted in a
crash.
* rtp_engine.c: Make the ast_rtp_instance_xxx() calls lock the RTP
instance struct.
* rtp_engine.c: Make ICE and DTLS wrapper functions to lock the RTP
instance struct for the API call.
* res_rtp_asterisk.c: Lock the RTP instance to prevent a reentrancy
problem with rtp->rtcp->local_addr_str in the scheduler thread running
ast_rtcp_write().
* res_rtp_asterisk.c: Avoid deadlock when local RTP bridging in
bridge_p2p_rtp_write() because there are two RTP instance structs
involved.
* res_rtp_asterisk.c: Avoid deadlock when trying to stop scheduler
callbacks. We cannot hold the instance lock when trying to stop a
scheduler callback.
* res_rtp_asterisk.c: Remove the lock in struct dtls_details and use the
struct ast_rtp_instance ao2 object lock instead. The lock was used to
synchronize two threads to prevent a race condition between starting and
stopping a timeout timer. The race condition is no longer present between
dtls_perform_handshake() and __rtp_recvfrom() because the instance lock
prevents these functions from overlapping each other with regards to the
timeout timer.
* res_rtp_asterisk.c: Remove the lock in struct ast_rtp and use the struct
ast_rtp_instance ao2 object lock instead. The lock was used to
synchronize two threads using a condition signal to know when TURN
negotiations complete.
* res_rtp_asterisk.c: Avoid deadlock when trying to stop the TURN
ioqueue_worker_thread(). We cannot hold the instance lock when trying to
create or shut down the worker thread without a risk of deadlock.
This patch exposed a race condition between a PJSIP serializer thread
setting up an ICE session in ice_create() and another thread reading RTP
packets.
* res_rtp_asterisk.c:ice_create(): Set the new rtp->ice pointer after we
have re-locked the RTP instance to prevent the other thread from trying to
process ICE packets on an incomplete ICE session setup.
A similar race condition is between a PJSIP serializer thread resetting up
an ICE session in ice_create() and the timer_worker_thread() processing
the completion of the previous ICE session.
* res_rtp_asterisk.c:ast_rtp_on_ice_complete(): Protect against an
uninitialized/null remote_address after calling
update_address_with_ice_candidate().
* res_rtp_asterisk.c: Eliminate the chance of ice_reset_session()
destroying and setting the rtp->ice pointer to NULL while other threads
are using it by adding an ao2 wrapper around the PJPROJECT ice pointer.
Now when we have to unlock the RTP instance object to call a PJPROJECT ICE
function we will hold a ref to the wrapper. Also added some rtp->ice NULL
checks after we relock the RTP instance and have to do something with the
ICE structure.