Victor Julien [Thu, 4 Nov 2021 06:52:59 +0000 (07:52 +0100)]
af-packet: PacketSetData can't fail; remove check
PacketSetData() can't fail unless the input pointer is NULL, which is
impossible from the af-packet paths calling it. Remove error check to
avoid possible branching.
Victor Julien [Wed, 3 Nov 2021 20:32:46 +0000 (21:32 +0100)]
af-packet: fix if/down issues with tpacket-v2/autofp
The AFPSwitchState function would close the socket and free the
other resources when the interface went down _and_ the ref cnt was
0. However in autofp mode it was common to get to this point while
packets were still processed in the autofp worker threads, meaning
the ref cnt would not be 0. On the interface coming back up the
initialization code would overwrite the socket and rings, leading
to resource leaks.
Socket ref cnt is decremented from the v2 release callback. If the
callback would get to ref cnt 0, the packet would not be released
in the kernel, but it would (possibly) close the socket if the
iface was down, but not free other resources.
This patch changes the logic to first release the packet to the
kernel and then decrement the ref cnt and it makes the main receive
loop the only one responsible for opening and closing sockets. Wait
with closing the socket and rings until the ref count is 0, which can
happen after AFPSwitchState is called due to packets still being
processed by autofp worker threads.
Victor Julien [Wed, 3 Nov 2021 20:09:46 +0000 (21:09 +0100)]
af-packet: simplify socket handling in tpacket v3
Tpacket v3 only supports workers mode, which means the packet that would
reference a socket won't leave the thread. Therefore keeping a ref count
on the socket is not needed.
This patch removes the per packet reference count increment. The decrement
was missing, so this fixes the ref cnt handling so that after a iface up/
down capture can recover.
It should also lead to a minor performance increase as we avoid a round
of atomic operations per packet.
Victor Julien [Sun, 31 Oct 2021 21:13:19 +0000 (22:13 +0100)]
af-packet: fix soft lockup issues
The Suricata AF_PACKET code opens a socket per thread, then after some minor
setup enters a loop where the socket is poll()'d with a timeout. When the
poll() call returns a non zero positive value, the AF_PACKET ring will be
processed.
The ringbuffer processing logic has a pointer into the ring where we last
checked the ring. From this position we will inspect each frame until we
find a frame with tp_status == TP_STATUS_KERNEL (so essentially 0). This
means the frame is currently owned by the kernel.
There is a special case handling for starting the ring processing but
finding a TP_STATUS_KERNEL immediately. This logic then skip to the next
frame, rerun the check, etc until it either finds an initialized frame or
the last frame of the ringbuffer.
The problem was, however, that the initial uninitialized frame was possibly
(likely?) still being initialized by the kernel. A data race between the
notification through the socket (the poll()) and the updating of the
`tp_status` field in the frame could lead to a valid frame getting skipped.
Of note is that for example libpcap does not do frame scanning. Instead it
simply exits it ring processing loop. Also interesting is that libpcap uses
atomic loads and stores on the tp_status field.
This skipping of frames had 2 bad side effects:
1. in most cases, the buffer would be full enough that the frame would
be processed in the next pass of the ring, but now the frame would
out of order. This might have lead to packets belong to the same
flow getting processed in the wrong order.
2. more severe is the soft lockup case. The skipped frame sits at ring
buffer index 0. The rest of the ring has been cleared, after the
initial frame was skipped. As our pass of the ring stops at the end
of the ring (ptv->frame_offset + 1 == ptv->req.v2.tp_frame_nr) the code
exits the ring processing loop at goes back to poll(). However, poll()
will not indicate that there is more data, as the stale frame in the
ring blocks the kernel from populating more frames beyond it. This
is now a dead lock, as the kernel waits for Suricata and Suricata
never touches the ring until it hears from the kernel.
The scan logic will scan the whole ring at most once, so it won't
reconsider the stale frame either.
This patch addresses the issues in several ways:
1. the startup "discard" logic was fixed to not skip over kernel
frames. Doing so would get us in a bad state at start up.
2. Instead of scanning the ring, we now enter a busy wait loop
when encountering a kernel frame where we didn't expect one. This
means that if we got a > 0 poll() result, we'll busy wait until
we get at least one frame.
3. Error handling is unified and cleaned up. Any frame error now
returns the frame to the kernel and progresses the frame pointer.
4. If we find a frame that is owned by us (TP_STATUS_USER_BUSY) we
yield to poll() immediately, as the next expected status of that
frame is TP_STATUS_KERNEL.
5. the ring is no longer processed until the "end" of the ring (so
highest index), but instead we process at most one full ring size
per run.
6. Work with a copy of `tp_status` instead of accessing original touched
also by the kernel.
Victor Julien [Fri, 5 Nov 2021 19:05:43 +0000 (20:05 +0100)]
packetpool: reset PacketRelease on return to pool
Reset PacketRelease callback to make sure its not set to a capture
specific callback.
As an example:
0x000055e00af09d35 in AFPReleaseDataFromRing (p=0x7f1d884cb830) at source-af-packet.c:653
0x000055e00af09dd0 in AFPReleasePacket (p=0x7f1d884cb830) at source-af-packet.c:678
0x000055e00ab53d7e in TmqhOutputPacketpool (t=0x55e00fb79250, p=0x7f1d884cb830) at tmqh-packetpool.c:465
0x000055e00af08dec in TmThreadsSlotProcessPkt (tv=0x55e00fb79250, s=0x55e012134790, p=0x7f1d884cb830) at tm-threads.h:201
0x000055e00af08e70 in TmThreadsCaptureInjectPacket (tv=0x55e00fb79250, p=0x7f1d884cb830) at tm-threads.h:221
0x000055e00af08f2e in TmThreadsCaptureHandleTimeout (tv=0x55e00fb79250, p=0x0) at tm-threads.h:245
0x000055e00af0ba76 in ReceiveAFPLoop (tv=0x55e00fb79250, data=0x7f1d884ccb60, slot=0x55e01198e4b0) at source-af-packet.c:1321
0x000055e00ab55257 in TmThreadsSlotPktAcqLoop (td=0x55e00fb79250) at tm-threads.c:312
0x00007f1dca9d5609 in start_thread (arg=<optimized out>) at pthread_create.c:477
0x00007f1dca7c6293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Here the packet was a pseudo packet to handle a timeout condition. But
the ReleasePacket callback was still set to AFPReleasePacket from a
previous use of the Packet.
The logic flow we want to achieve with unittests, where first we have
all FAIL statements and then just one PASS statement could become more
convoluted with the existence of the PASS_IF macro. Besides, what could
be written as a FAIL_IF might in some cases be written in not so clear
ways with the PASS_IF option available.
Also: fix inverted check values in documentation, update copyright year
Also: change them to comply with the deletion of PASS_IF macro &
condense checks for invalid dsizes in one test, have all checks on same
valid dsize happen in a single test.
Part of the task to offer better guidance on how and when to write
unit tests or suricata-verify tests
Also updated linking and index files, as well as testing page to refer
to the unit tests pages
This page offers guidance about when to use unittests or s-v tests,
and how to create input for those. Also lists other common ways to test
Suri, such as fuzzing and the CI checks.
Victor Julien [Mon, 25 Oct 2021 17:14:49 +0000 (19:14 +0200)]
flow/worker: run housekeeping for bypassed packets
Run flow eviction and flow inject queues for bypassed packets as well,
to avoid a scenario where these won't get run at all if too much of the
traffic is bypassed.
Victor Julien [Wed, 20 Oct 2021 11:20:32 +0000 (13:20 +0200)]
flow/bypass: use_cnt desync'd on bypassed flows
Locally bypassed flows had unsafe updates to `Flow::use_cnt` leading to a race
issue. For a packet it would do the flow lookup, attach the flow to the packet,
increment the `use_cnt`. Then it would detect that the flow is in the bypass
state, and unlock it while holding a reference (so alos not decrementing the
`use_cnt`). When the packet was then returned to the packet pool, the flow would
be disconnected from the packet, which would decrement `use_cnt` without holding
the flow lock.
This patch addresses this issue by disconnecting the flow from the packet
immediately when the bypassed state is detected. This moves the `use_cnt`
decrement to within the lock.
Philippe Antoine [Thu, 14 Oct 2021 19:31:13 +0000 (21:31 +0200)]
range: move back files ownership in one case
In the case, we receive a range request with expected
overlap then new bytes, but the response does not get to the
new bytes, we are still skipping, but the HttpRangeContainerBlock
had the ownership of the files, and need to give it back
When there are a lot of open transactions, as is possible with
modbus, the default tx_iterator will loop for the whole
transacations vector to find each transaction, that means
quadratic complexity.
Reusing the tx_iterator from the template, and keeping as a state
the last index where to start looking avoids this quadratic
complexity.
app-layer: disable by default if not in configuration
DNP3, ENIP, HTTP2 and Modbus are supposed to be disabled
by default. That means the default configuration does it,
but that also means that, if they are not in suricata.yaml,
the protocol should stay disabled.
Jason Ish [Wed, 6 Oct 2021 16:53:46 +0000 (10:53 -0600)]
queue.h: wrap the system sys/queue.h
Instead of using local implementations for the queue.h macro,
wrap the system provided queue.h and then adding missing
features as needed.
The idea is that Suricata when integrated with another library
that includes sys/queue.h can look at the same source of truth
for these macros.
But not all operating systems include a queue.h with the same
features, and some don't include it at all, like Windows. So
on Windows this will be a full implementation of all the queue.h
features Suricata needs.
ThresholdHandlePacketRule may take ownership of an allocated
DetectThresholdEntry, and places it in a position of the
array th_entry. But it never got released
Jason Ish [Tue, 5 Oct 2021 16:44:03 +0000 (10:44 -0600)]
github-ci: pin macos build to 10.15
There is currently a build failure with macos-latest (recently updated)
to 11 in the libhtp test suite code. Not sure if there are other
build issues in libhtp or Suricata at this time.
Victor Julien [Mon, 4 Oct 2021 14:01:47 +0000 (16:01 +0200)]
flow: free spare pool more aggressively
The flows exceeding the spare pools config setting would be freed
per at max 100 flows a second. After a high speed test this would
lead to excessive memory use for a long time.
This patch updates the logic to free 10% of the excess flows per
run, freeing multiple blocks of flows as needed.
Victor Julien [Fri, 1 Oct 2021 11:20:02 +0000 (13:20 +0200)]
flow: process evicted flows on low/no traffic
In a scenario where there was suddenly no more traffic flowing, flows
in a threads `flow_queue` would not be processed. The easiest way to
see this would be in a traffic replay scenario. After the replay is done
no more packets come in and these evicted flows got stuck.
In workers mode, the capture part handles timeout this was updated to
take the `ThreadVars::flow_queue` into account.
The autofp mode the logic that puts a flow into a threads `flow_queue`
would already wake a thread up, but the `flow_queue` was then ignored.
This has been updated to take the `flow_queue` into account.
In both cases a "capture timeout" packet is pushed through the pipeline
to "flush" the queues.
Jason Ish [Fri, 16 Oct 2020 15:43:29 +0000 (09:43 -0600)]
af-packet: use configured cluster-id when checking for fanout
When testing for fanout support a cluster-id of 1 was always being
used instead of the configured cluster-id. This limited fanout
support to only one Suricata instance.
Instead of hardcoding an ID of 1, use the configured cluster-id.
Also make cluster_id a uint16_t instead of an int in AFPThreadVars.
Semantically speaking it makes more sense, because it stores `msc`
files for dynamic image generation.
Updated files that refered to `img` accordingly, too.
- DNS sequence diagram was incorrect (transactions should be
unidirectional). After changing it, it made sense to rename the file.
Adjusted spacing, too. Updated transactions.rst accordingly.
- TLS sequence diagram was refined to illustrate how Suricata actually
implements the protocol.
A guide on what is a transaction for Suricata engine, focusing on
developers.
- What's the purpose of a transaction;
- transaction states and API callbacks;
- Examples and sequence diagrams.
- doc/devguide: add transactions.rst
- doc/devguide/extending/app-layer/index.rst: add transactions.rst