Victor Julien [Wed, 20 Oct 2021 11:20:32 +0000 (13:20 +0200)]
flow/bypass: use_cnt desync'd on bypassed flows
Locally bypassed flows had unsafe updates to `Flow::use_cnt` leading to a race
issue. For a packet it would do the flow lookup, attach the flow to the packet,
increment the `use_cnt`. Then it would detect that the flow is in the bypass
state, and unlock it while holding a reference (so alos not decrementing the
`use_cnt`). When the packet was then returned to the packet pool, the flow would
be disconnected from the packet, which would decrement `use_cnt` without holding
the flow lock.
This patch addresses this issue by disconnecting the flow from the packet
immediately when the bypassed state is detected. This moves the `use_cnt`
decrement to within the lock.
Victor Julien [Fri, 5 Nov 2021 19:05:43 +0000 (20:05 +0100)]
packetpool: reset PacketRelease on return to pool
Reset PacketRelease callback to make sure its not set to a capture
specific callback.
As an example:
0x000055e00af09d35 in AFPReleaseDataFromRing (p=0x7f1d884cb830) at source-af-packet.c:653
0x000055e00af09dd0 in AFPReleasePacket (p=0x7f1d884cb830) at source-af-packet.c:678
0x000055e00ab53d7e in TmqhOutputPacketpool (t=0x55e00fb79250, p=0x7f1d884cb830) at tmqh-packetpool.c:465
0x000055e00af08dec in TmThreadsSlotProcessPkt (tv=0x55e00fb79250, s=0x55e012134790, p=0x7f1d884cb830) at tm-threads.h:201
0x000055e00af08e70 in TmThreadsCaptureInjectPacket (tv=0x55e00fb79250, p=0x7f1d884cb830) at tm-threads.h:221
0x000055e00af08f2e in TmThreadsCaptureHandleTimeout (tv=0x55e00fb79250, p=0x0) at tm-threads.h:245
0x000055e00af0ba76 in ReceiveAFPLoop (tv=0x55e00fb79250, data=0x7f1d884ccb60, slot=0x55e01198e4b0) at source-af-packet.c:1321
0x000055e00ab55257 in TmThreadsSlotPktAcqLoop (td=0x55e00fb79250) at tm-threads.c:312
0x00007f1dca9d5609 in start_thread (arg=<optimized out>) at pthread_create.c:477
0x00007f1dca7c6293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Here the packet was a pseudo packet to handle a timeout condition. But
the ReleasePacket callback was still set to AFPReleasePacket from a
previous use of the Packet.
Victor Julien [Sun, 31 Oct 2021 21:13:19 +0000 (22:13 +0100)]
af-packet: fix soft lockup issues
The Suricata AF_PACKET code opens a socket per thread, then after some minor
setup enters a loop where the socket is poll()'d with a timeout. When the
poll() call returns a non zero positive value, the AF_PACKET ring will be
processed.
The ringbuffer processing logic has a pointer into the ring where we last
checked the ring. From this position we will inspect each frame until we
find a frame with tp_status == TP_STATUS_KERNEL (so essentially 0). This
means the frame is currently owned by the kernel.
There is a special case handling for starting the ring processing but
finding a TP_STATUS_KERNEL immediately. This logic then skip to the next
frame, rerun the check, etc until it either finds an initialized frame or
the last frame of the ringbuffer.
The problem was, however, that the initial uninitialized frame was possibly
(likely?) still being initialized by the kernel. A data race between the
notification through the socket (the poll()) and the updating of the
`tp_status` field in the frame could lead to a valid frame getting skipped.
Of note is that for example libpcap does not do frame scanning. Instead it
simply exits it ring processing loop. Also interesting is that libpcap uses
atomic loads and stores on the tp_status field.
This skipping of frames had 2 bad side effects:
1. in most cases, the buffer would be full enough that the frame would
be processed in the next pass of the ring, but now the frame would
out of order. This might have lead to packets belong to the same
flow getting processed in the wrong order.
2. more severe is the soft lockup case. The skipped frame sits at ring
buffer index 0. The rest of the ring has been cleared, after the
initial frame was skipped. As our pass of the ring stops at the end
of the ring (ptv->frame_offset + 1 == ptv->req.v2.tp_frame_nr) the code
exits the ring processing loop at goes back to poll(). However, poll()
will not indicate that there is more data, as the stale frame in the
ring blocks the kernel from populating more frames beyond it. This
is now a dead lock, as the kernel waits for Suricata and Suricata
never touches the ring until it hears from the kernel.
The scan logic will scan the whole ring at most once, so it won't
reconsider the stale frame either.
This patch addresses the issues in several ways:
1. the startup "discard" logic was fixed to not skip over kernel
frames. Doing so would get us in a bad state at start up.
2. Instead of scanning the ring, we now enter a busy wait loop
when encountering a kernel frame where we didn't expect one. This
means that if we got a > 0 poll() result, we'll busy wait until
we get at least one frame.
3. Error handling is unified and cleaned up. Any frame error now
returns the frame to the kernel and progresses the frame pointer.
4. If we find a frame that is owned by us (TP_STATUS_USER_BUSY) we
yield to poll() immediately, as the next expected status of that
frame is TP_STATUS_KERNEL.
5. the ring is no longer processed until the "end" of the ring (so
highest index), but instead we process at most one full ring size
per run.
6. Work with a copy of `tp_status` instead of accessing original touched
also by the kernel.
Victor Julien [Sun, 7 Nov 2021 05:25:31 +0000 (06:25 +0100)]
flow/manager: fix flows not evicted & freed in time
Flows have been shown to linger for a long time w/o giving up their
resources. This would lead to higher memory use and memcaps getting
reached.
Three main causes have been identified:
Slow passes hash passes. By default the flow manager will scan the
flow hash slowly. It is based on the flow timeout settings, and with
the default config it will take 4 minutes for a full scan to be
complete. This leaves a window for flows that are timed out to linger
for minutes longer than expected.
Flow Manager yields under pressure. The per row TryLock causes work
to be delayed more. The Flow manager will use trylock on a hash row
and will yield immediately if the row is busy. This means that it will
take a full pass before the row is revisited again. If the row holds
busy flows, this could happen many times in a row.
Flow Manager favors evicted flows over active flows. The Flow Manager
will only process the evicted flows if they are present. These flows
have been evicted by workers. The active flows on that hash row will
have to wait until the next hash pass. Of course by then there could
be more evicted flows.
Combined these factors could lead to flows not being considered for
freeing and logging for a very long time, potentially even indefinitly.
The patch addresses the latter two flow manager issues by no longer
using TryLock. It will now simply wait for the lock to be released and
then do its work on it. Additionally for each row both the evicted list
and the active flow list will be processed.
Jason Ish [Tue, 5 Oct 2021 16:44:03 +0000 (10:44 -0600)]
github-ci: pin macos build to 10.15
There is currently a build failure with macos-latest (recently updated)
to 11 in the libhtp test suite code. Not sure if there are other
build issues in libhtp or Suricata at this time.
Victor Julien [Wed, 21 Apr 2021 13:20:49 +0000 (15:20 +0200)]
app-layer/pd: only consider actual available data
For size limit checks consider only available data at the stream start
and before any GAPS.
The old check would consider too much data if there were temporary gaps,
like when a data packet was in-window but (far) ahead of the expected
segment.
Victor Julien [Mon, 4 Oct 2021 14:01:47 +0000 (16:01 +0200)]
flow: free spare pool more aggressively
The flows exceeding the spare pools config setting would be freed
per at max 100 flows a second. After a high speed test this would
lead to excessive memory use for a long time.
This patch updates the logic to free 10% of the excess flows per
run, freeing multiple blocks of flows as needed.
Victor Julien [Fri, 1 Oct 2021 11:20:02 +0000 (13:20 +0200)]
flow: process evicted flows on low/no traffic
In a scenario where there was suddenly no more traffic flowing, flows
in a threads `flow_queue` would not be processed. The easiest way to
see this would be in a traffic replay scenario. After the replay is done
no more packets come in and these evicted flows got stuck.
In workers mode, the capture part handles timeout this was updated to
take the `ThreadVars::flow_queue` into account.
The autofp mode the logic that puts a flow into a threads `flow_queue`
would already wake a thread up, but the `flow_queue` was then ignored.
This has been updated to take the `flow_queue` into account.
In both cases a "capture timeout" packet is pushed through the pipeline
to "flush" the queues.
Victor Julien [Tue, 14 Sep 2021 08:35:18 +0000 (10:35 +0200)]
detect: track prefilter by progress, not engine
Fix FNs in case of too many prefilter engines. A transaction was tracking
which engines have run using a u64 bit array. The engines 'local_id' was
used to set and check this bit. However the bit checking code didn't
handle int types correctly, leading to an incorrect left shift result of
a u32 to a u64 bit value.
This commit addresses that by fixing the int handling, but also by
changing how the engines are tracked.
To avoid wasting prefilter engine tracking bit space, track what
ran by the progress they are registered at, instead of the individual
engine id's. While we can have many engines, the protocols use far
fewer unique progress values. So instead of tracking for dozens of
prefilter id's, we track for the handful of progress values.
To allow for this the engine array is sorted by tx_min_progress, then
app_proto and finally local_id. A new field is added to "know" when
the last relevant engine for a progress value is reached, so that we
can set the prefilter bit then.
A consquence is that the progress values have a ceiling now that
needs to fit in a 64 bit bitarray. The values used by parsers currently
does not exceed 5, so that seems to be ok.
Victor Julien [Fri, 3 Sep 2021 15:04:02 +0000 (17:04 +0200)]
detect: unify alert handling; fix bugs
Unify handling of signature matches between various rule types and
between noalert and regular rules.
"noalert" sigs are added to the alert queue initially, but removed
from it after handling their actions. This way all actions are applied
from a single place.
Make sure flow drop and pass are mutually exclusive.
The above addresses issue with pass and drops not getting applied
correctly in various cases.
If an HTTP2 file was within only ont DATA frame, the filetracker
would open it and close it in the same call, preventing the
firther call to incr_files_opened
Victor Julien [Thu, 18 Mar 2021 10:05:35 +0000 (11:05 +0100)]
nfs: don't reuse file transactions
After a file has been closed (CLOSE, COMMIT command or EOF/SYNC part of
READ/WRITE data block) mark it as such so that new file commands on that
file do not reuse the transaction.
When a file transfer is completed it will be flagged as such and not be
found anymore by the NFSState::get_file_tx_by_handle() method. This forces
a new transaction to be created.
Victor Julien [Thu, 18 Mar 2021 13:38:33 +0000 (14:38 +0100)]
filestore: store chunks in packet direction
Storing too early can lead to files being considered TRUNCATED if the
TCP state is not yet CLOSED when logging is triggered. This has been
observed with FTP-DATA and might also be an issue with simple HTTP.
Philippe Antoine [Sun, 28 Mar 2021 15:53:50 +0000 (17:53 +0200)]
rust: bump bitflags dependency version
So that lexical-core, needed by nom, and using bitflags
is used with version 0.7.5 instead of version 0.7.0
which fixed the fact that BITS is now a reserved keyword
in nightly version
Victor Julien [Wed, 18 Aug 2021 18:14:48 +0000 (20:14 +0200)]
threading: don't pass locked flow between threads
Previously the flow manager would share evicted flows with the workers
while keeping the flows mutex locked. This reduced the number of unlock/
lock cycles while there was guaranteed to be no contention.
This turns out to be undefined behavior. A lock is supposed to be locked
and unlocked from the same thread. It appears that FreeBSD is stricter on
this than Linux.
This patch addresses the issue by unlocking before handing a flow off
to another thread, and locking again from the new thread.
Issue was reported and largely analyzed by Bill Meeks.
Eric Leblond [Sun, 15 Aug 2021 10:17:23 +0000 (12:17 +0200)]
flow: fix a debug assert
As the FlowBypassedTimeout function is interacting with the capture
method it is possible that the return changes between the call that
did trigger the timeout and the actual state (ie if packets arrive
in between the two calls). So we should not use the call to
FlowBypassedTimeout in the assert.
Eric Leblond [Sat, 14 Aug 2021 21:05:03 +0000 (23:05 +0200)]
flow: more accurate flow counters
Don't increment the flow timeout counter for flows that are not
really timeout (as use_cnt is non zero). And also don't take into
account bypassed flows in the counter for flow timeout in use.
Victor Julien [Mon, 30 Aug 2021 08:53:49 +0000 (10:53 +0200)]
flow/worker: handle timeout edge case
In the flow worker timeout path it is assumed that we can hand off
flows to the recycler after processing, implying that `Flow::use_cnt` is 0.
However, there was a case where this assumption was incorrect.
When during flow timeout handling the last processed data would trigger a
protocol upgrade, two additional pseudo packets would be created that were
then pushed all the way through the engine packet paths. This would mean
they both took a flow reference and would hold that until after the flow
was handed off to the recycler. Thread safety implementation would make
sure this didn't lead to crashes.
This patch handles this case returning these packets to the pool from
the timeout handling.
Jason Ish [Wed, 30 Jun 2021 16:03:35 +0000 (10:03 -0600)]
rust/ike: suppress some compile warnings when not debug
Due to ef5755338fa6404b60e7f90bfbaca039b2bfda1e, the variables
that are only used for debug output now emit unused variable
warnings when Suricata is not built with debug. Prefix these
variables with _ to suppress these warnings.
Victor Julien [Thu, 13 May 2021 06:06:11 +0000 (08:06 +0200)]
detect: set event if max inspect buffers exceeded
If a parser exceeds 1024 buffers we stop processing them and
set a detect event instead. This is to avoid parser bugs as well as
crafted bad traffic leading to resources starvation due to excessive
loops.
Sascha Steinbiss [Mon, 10 May 2021 12:54:47 +0000 (14:54 +0200)]
detect/mqtt: add topic inspection limit
We add a new 'mqtt.(un)subscribe-topic-match-limit' option
to allow a user to specify the maximum number of topics in
a MQTT SUBSCRIBE or UNSUBSCRIBE message to be evaluated
in detection.