Victor Julien [Sun, 20 Feb 2022 09:00:48 +0000 (10:00 +0100)]
nfs/rpc: update full record parsers to be more exact
Instead of 'take'ing all data for the RPC prog_data and then
letting the higher level parsers figure out which part to use
take the exact amount.
Comments from Shivani Bhardwaj: This is was more than a cherry pick and
some of the modifications were not directly portable to the nom macros.
So, parsers were changed to make sure the functionality remained same while
making the transition to nom5 while keeping the diff minimal.
Victor Julien [Fri, 18 Feb 2022 09:19:04 +0000 (10:19 +0100)]
output: fix timestamp missing usecs
On ARM 32bit with Musl `tv_usecs` is defined as `int64_t` which lead to
CreateIsoTimeString() printing all zeros on the usecs. Work around this
by first assigning to a `int64_t` and then updating the expected format
string to accept `int64_t`.
Victor Julien [Thu, 17 Feb 2022 12:32:17 +0000 (13:32 +0100)]
radix: improve address range handling
Handle non-exact address ranges from string. This can come directly
from user input, so here it is accepted but the address is converted
to the address range start. A warning will be issued.
Debug validation checks are added to catch this.
This issue could lead to bad input from iprep (with cidr), defrag config
and htp server personalities to produce a bad radix tree.
Victor Julien [Tue, 15 Feb 2022 19:43:27 +0000 (20:43 +0100)]
detect/iponly: fix netmask handling
If the ipaddress was not the address range start, it was not masked to turn
it into that. So 1.2.3.4/24 was not stored as address 1.2.3.0 with netmask 24,
but as 1.2.3.4 with netmask 24. This was then propagated into the radix tree,
where it was used as an exact key in exact lookups, giving unexpected results.
This patch implements the netmask handling for IPv4 and IPv6, and adds a set
of tests for it.
Victor Julien [Fri, 11 Feb 2022 14:50:01 +0000 (15:50 +0100)]
radix: fix FP/FN issue in IP-only
A bug was reported about the IP-only rules not correctly matching. This was
traced to the rules in question not getting recorded into the IP-only radix
tree correctly.
Sequence:
- 100.117.241.0/25 inserted into the tree
- 100.117.241.0/26 inserted into the tree
Both are part of the same radix node, but recorded by their different netmasks
in the user data portion.
The IP-only engine first does a search to get to the user data it may need to
include. It does so for with `SCRadixFindKeyIPV4ExactMatch` for single IPs, or
using `SCRadixFindKeyIPV4Netblock` in case of a netblock. Any "match" from
either of these is considered an "exact match" by the IP-only setup code.
This exact match expectation turned out to be wrong and
`SCRadixFindKeyIPV4Netblock` behaved more like "best match" instead, which is
a non-exact match, but its the next best match if no exact match is found.
The way the look up for 100.117.241.64/26 went wrong, is that it returned
the user data for 100.117.241.0/26. This happens as follows:
- first it would do an exact find, which didn't give a result
- then it removed bits from the keystream until it found a matching node
and explore if any of the netmasks it contained matched. Here the first
step of the bug started:
it considered the netmask (with user data) a match that matched the
number of bits of the matching key, but not of the actual range netmask cidr
value.
So in this case the number of shared bits between `100.117.241.0/25` and
`100.117.241.64/26` was 25, so it assumed that the user data for the
netmask 25 was the match.
To summarize this step, there are 2 problems with this:
1. it returns a match on something that isn't an exact match
2. it considered the wrong netmask value
- the radix code then took the returned node, and did the netmask check
again. This time it did use its own netmask value, so this time
it did find the netmask 26 (+ user data). However because of the node that
was returned, this netmask (+user data) belongs to `100.117.241.0`, not to
`100.117.241.64`.
- the IP-only detection code was satisfied with what it assumed to be
"exact match" and just updated the user data to include the user data that
should have been associated with `100.117.241.64/26` to `100.117.241.0/26`.
This patch addresses the issue as follows:
It makes `SCRadixFindKeyIPV4Netblock` also return an exact match by propagating
the netmask in the search and in the evaluation of the stored netmasks.
It does away with the secondary netmask (+user data) evaluation.
`SCRadixFindKeyIPV4Netblock` is expected to handle this correctly.
The IP-only engine will fall back to the "not found" path, which does an explicit
"best match" lookup and then insert a new entry into the radix tree based on
the user data of the "best match".
dcerpc: avoid delete the rpc state interface context
The bug:
The dcerpc dce_iface keyword just match the packet following the bind. Only the
next request after the rpc is sent will match. However the expected behaviour it
that all the rpc requests/responses sent under the context of the given
interface would match.
In the Open Group c706 the following is indicated:
In 2.2.1 Binding-related Operations, indicates that one category of binding
operations are those that "operations that establish internal call routing
information for the server." (The other are to establish the protocol which is
not relevant here). And the following statement can be found:
Operations in the second category establish a set of mappings that the server
can use to route calls internally to the appropriate manager routine. This
routing is based on the interface and version, operation and any object
requested by the call.
It indicates that server routes (to call methods) are based on the operation,
interface and object.
- Operation: To indicate the method to call, and operation number is
specified as indicated in the second step of 2.3.3.2 (Client
Binding Steps).
- Interface: An interface is a set of remotely callable operations offered by a
server and invokable by clients. (2.1.1.1)
- Object: Is the manager that implements the interface, as stated in section
Interface and Manager Selection of 2.3.3.3. It is not mandatory, can
be nil.
To call a method, a client must send a request message as defined in 2.6.4.9,
that contains these identifiers:
- opnum: The opnum field identifies the operation being invoked within the
interface.
- p_cont_id (Context ID in Wireshark): The p_cont_id field holds a presentation
context identifier that identifies the
data representation and interface, as
defined in 12.6.3.4 (Context Identifiers).
- object: The object field is contained if the PFC_OBJECT_UUID is set. (Could be
interesting to create a keyword dce_object for matching this UUID)
Therefore, to get the correct method to invoke, the server must map the context
to the correct interface. This is negotiated by the bind request
Interfaces are first negotiated using the bind message (12.6.4.3), contained in
the p_context_elem array. Then they are accepted or rejected using the bind_ack
message (12.6.4.4).
Once these contexts are established, both client and server can use the context
id, which is the index of the p_context_elem array, to refer the interface they
are using.
Moreover, in the middle of the connection, the context can be changed with the
alter_context message.
This is way suricata shouldn't delete the bindack attribute, that contains
the contexts, used by match_backuuid. This is the only way to know the interface
a request message is referring to.
smb/dce_iface: avoid deleting current ifaces from state
The smb dce_iface keyword must match for all those dcerpc requests
and responses sent in the context of the given interface. They are
not matching as the current bind interfaces are deleted by any
non bind message.
The smb dce_iface keyword must match for all those dcerpc requests and
responses sent in the context of the given interface. They are not
matching because in rs_smb_tx_get_dce_iface, x.req_cmd is erroneously
compared with 1. Fix this by comparing with DCERPC_TYPE_REQUEST instead.
The smb dce_opnum matches all the opnums that are higher that the
indicated opnum. This is due the range comparison if was put in the
exact comparison context, and in case the opnum doesn't match exactly,
then the range comparison is triggered (the upper limit is always true).
Move the erroneus if to the outer context, as else option of the block
checks if comparison should be exact or range.
The smb dce_opnum keyword doesn't match the dcerpc requests/responses.
This occurs because in the rs_smb_tx_match_dce_opnum function, the
x.req_cmd is matched against the erroneous code 1. Fix this by using
DCERPC_TYPE_REQUEST for the comparison instead.
Angelo Mirabella [Thu, 20 Jan 2022 14:52:33 +0000 (14:52 +0000)]
stream-tcp-reassemble: fix reassembly direction for FIN packets
Suricata invokes the stream reassembly logic only for the current packet
direction if the packet contains a FIN flag. However, this does not
handle the case in which the packet ACKs data from the opposing direction.
This patch forces the invocation of the stream reassembly logic
on both direction when Suricata sees a FIN packet.
Victor Julien [Thu, 13 Jan 2022 11:13:43 +0000 (12:13 +0100)]
stream: fix stream pruning being too aggressive
Pruning of StreamBufferBlocks could remove blocks that fell entirely
after the target offset due to a logic error. This could lead to data
being evicted that was still meant to be processed in theapp-layer
parsers.
Jason Ish [Tue, 21 Dec 2021 22:34:05 +0000 (16:34 -0600)]
dns: create transaction even if z-bit was set
It appears that DNS servers will still process a DNS request even if the
z-bit is set, our parser will fail the transaction. So create the
transaction, but still set the event.
Victor Julien [Mon, 25 Oct 2021 17:14:49 +0000 (19:14 +0200)]
flow/worker: run housekeeping for bypassed packets
Run flow eviction and flow inject queues for bypassed packets as well,
to avoid a scenario where these won't get run at all if too much of the
traffic is bypassed.
Jason Ish [Fri, 16 Oct 2020 15:43:29 +0000 (09:43 -0600)]
af-packet: use configured cluster-id when checking for fanout
When testing for fanout support a cluster-id of 1 was always being
used instead of the configured cluster-id. This limited fanout
support to only one Suricata instance.
Instead of hardcoding an ID of 1, use the configured cluster-id.
Also make cluster_id a uint16_t instead of an int in AFPThreadVars.
Eric Leblond [Wed, 17 Nov 2021 15:43:23 +0000 (16:43 +0100)]
profiling: fix profiling with sample rate
Rules profiling was returning invalid results when used with sample
rate. The problem was that the sample condition was run twice in the
packet flow. As a result, the second pass was not initializing the
variable storing the initial CPU ticks and the resulting performance
counters were reporting invalid values.
Jason Ish [Wed, 10 Nov 2021 22:38:35 +0000 (16:38 -0600)]
smtp: log transaction even if no email present
The SMTP transaction logger was not writing the log if the email
portion of the logger failed, such as in the case of STARTTLS
where this is no email decoded.
ThresholdHandlePacketRule may take ownership of an allocated
DetectThresholdEntry, and places it in a position of the
array th_entry. But it never got released
A transaction to client is always considered
complete in the direction to server and vice versa.
Otherwise, transactions are never complete for
AppLayerParserTransactionsCleanup
Jason Ish [Wed, 17 Nov 2021 01:34:11 +0000 (19:34 -0600)]
dhcp: fix url in comment
rustdoc was complaining about the format of the URL in a comment
while trying to generate documentation. Convert the comment to a
non-rustdoc comment for now to satisfy rustdoc.
Philippe Antoine [Thu, 16 Sep 2021 14:54:37 +0000 (16:54 +0200)]
tcp: rejects FIN+SYN packets as invalid
Ticket: #4569
If a FIN+SYN packet is sent, the destination may keep the
connection alive instead of starting to close it.
In this case, a later SYN packet will be ignored by the
destination.
Previously, Suricata considered this a session reuse, and thus
used the sequence number of the last SYN packet, instead of
using the one of the live connection, leading to evasion.
This commit errors on FIN+SYN so that they do not get
processed as regular FIN packets.
Victor Julien [Tue, 5 Oct 2021 12:48:27 +0000 (14:48 +0200)]
stream/tcp: handle RST with MD5 or AO header
Special handling for RST packets if they have an TCP MD5 or AO header option.
The options hash can't be validated. The end host might be able to validate
it, as it can have a key/password that was communicated out of band.
The sender could use this to move the TCP state to 'CLOSED', leading to
a desync of the TCP session.
This patch builds on top of 843d0b7a10bb ("stream: support RST getting lost/ignored")
It flags the receiver as having received an RST and moves the TCP state
into the CLOSED state. It then reverts this if the sender continues to
send traffic. In this case it sets the following event:
Philippe Antoine [Mon, 13 Sep 2021 10:18:34 +0000 (12:18 +0200)]
detect: fixes InspectionBuffer id with transforms
When InspectionBufferGet gets called with base_id
Later InspectionBufferSetup must also be called with base_id
In case there were transforms, we had base_id != list_id
Not calling InspectionBufferSetup with the right id
resulted in leaving a dangling pointer,
because it was not added to det_ctx->inspect.to_clear_queue
Victor Julien [Wed, 20 Oct 2021 11:20:32 +0000 (13:20 +0200)]
flow/bypass: use_cnt desync'd on bypassed flows
Locally bypassed flows had unsafe updates to `Flow::use_cnt` leading to a race
issue. For a packet it would do the flow lookup, attach the flow to the packet,
increment the `use_cnt`. Then it would detect that the flow is in the bypass
state, and unlock it while holding a reference (so alos not decrementing the
`use_cnt`). When the packet was then returned to the packet pool, the flow would
be disconnected from the packet, which would decrement `use_cnt` without holding
the flow lock.
This patch addresses this issue by disconnecting the flow from the packet
immediately when the bypassed state is detected. This moves the `use_cnt`
decrement to within the lock.
Victor Julien [Fri, 5 Nov 2021 19:05:43 +0000 (20:05 +0100)]
packetpool: reset PacketRelease on return to pool
Reset PacketRelease callback to make sure its not set to a capture
specific callback.
As an example:
0x000055e00af09d35 in AFPReleaseDataFromRing (p=0x7f1d884cb830) at source-af-packet.c:653
0x000055e00af09dd0 in AFPReleasePacket (p=0x7f1d884cb830) at source-af-packet.c:678
0x000055e00ab53d7e in TmqhOutputPacketpool (t=0x55e00fb79250, p=0x7f1d884cb830) at tmqh-packetpool.c:465
0x000055e00af08dec in TmThreadsSlotProcessPkt (tv=0x55e00fb79250, s=0x55e012134790, p=0x7f1d884cb830) at tm-threads.h:201
0x000055e00af08e70 in TmThreadsCaptureInjectPacket (tv=0x55e00fb79250, p=0x7f1d884cb830) at tm-threads.h:221
0x000055e00af08f2e in TmThreadsCaptureHandleTimeout (tv=0x55e00fb79250, p=0x0) at tm-threads.h:245
0x000055e00af0ba76 in ReceiveAFPLoop (tv=0x55e00fb79250, data=0x7f1d884ccb60, slot=0x55e01198e4b0) at source-af-packet.c:1321
0x000055e00ab55257 in TmThreadsSlotPktAcqLoop (td=0x55e00fb79250) at tm-threads.c:312
0x00007f1dca9d5609 in start_thread (arg=<optimized out>) at pthread_create.c:477
0x00007f1dca7c6293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Here the packet was a pseudo packet to handle a timeout condition. But
the ReleasePacket callback was still set to AFPReleasePacket from a
previous use of the Packet.
Victor Julien [Sun, 31 Oct 2021 21:13:19 +0000 (22:13 +0100)]
af-packet: fix soft lockup issues
The Suricata AF_PACKET code opens a socket per thread, then after some minor
setup enters a loop where the socket is poll()'d with a timeout. When the
poll() call returns a non zero positive value, the AF_PACKET ring will be
processed.
The ringbuffer processing logic has a pointer into the ring where we last
checked the ring. From this position we will inspect each frame until we
find a frame with tp_status == TP_STATUS_KERNEL (so essentially 0). This
means the frame is currently owned by the kernel.
There is a special case handling for starting the ring processing but
finding a TP_STATUS_KERNEL immediately. This logic then skip to the next
frame, rerun the check, etc until it either finds an initialized frame or
the last frame of the ringbuffer.
The problem was, however, that the initial uninitialized frame was possibly
(likely?) still being initialized by the kernel. A data race between the
notification through the socket (the poll()) and the updating of the
`tp_status` field in the frame could lead to a valid frame getting skipped.
Of note is that for example libpcap does not do frame scanning. Instead it
simply exits it ring processing loop. Also interesting is that libpcap uses
atomic loads and stores on the tp_status field.
This skipping of frames had 2 bad side effects:
1. in most cases, the buffer would be full enough that the frame would
be processed in the next pass of the ring, but now the frame would
out of order. This might have lead to packets belong to the same
flow getting processed in the wrong order.
2. more severe is the soft lockup case. The skipped frame sits at ring
buffer index 0. The rest of the ring has been cleared, after the
initial frame was skipped. As our pass of the ring stops at the end
of the ring (ptv->frame_offset + 1 == ptv->req.v2.tp_frame_nr) the code
exits the ring processing loop at goes back to poll(). However, poll()
will not indicate that there is more data, as the stale frame in the
ring blocks the kernel from populating more frames beyond it. This
is now a dead lock, as the kernel waits for Suricata and Suricata
never touches the ring until it hears from the kernel.
The scan logic will scan the whole ring at most once, so it won't
reconsider the stale frame either.
This patch addresses the issues in several ways:
1. the startup "discard" logic was fixed to not skip over kernel
frames. Doing so would get us in a bad state at start up.
2. Instead of scanning the ring, we now enter a busy wait loop
when encountering a kernel frame where we didn't expect one. This
means that if we got a > 0 poll() result, we'll busy wait until
we get at least one frame.
3. Error handling is unified and cleaned up. Any frame error now
returns the frame to the kernel and progresses the frame pointer.
4. If we find a frame that is owned by us (TP_STATUS_USER_BUSY) we
yield to poll() immediately, as the next expected status of that
frame is TP_STATUS_KERNEL.
5. the ring is no longer processed until the "end" of the ring (so
highest index), but instead we process at most one full ring size
per run.
6. Work with a copy of `tp_status` instead of accessing original touched
also by the kernel.
Victor Julien [Sun, 7 Nov 2021 05:25:31 +0000 (06:25 +0100)]
flow/manager: fix flows not evicted & freed in time
Flows have been shown to linger for a long time w/o giving up their
resources. This would lead to higher memory use and memcaps getting
reached.
Three main causes have been identified:
Slow passes hash passes. By default the flow manager will scan the
flow hash slowly. It is based on the flow timeout settings, and with
the default config it will take 4 minutes for a full scan to be
complete. This leaves a window for flows that are timed out to linger
for minutes longer than expected.
Flow Manager yields under pressure. The per row TryLock causes work
to be delayed more. The Flow manager will use trylock on a hash row
and will yield immediately if the row is busy. This means that it will
take a full pass before the row is revisited again. If the row holds
busy flows, this could happen many times in a row.
Flow Manager favors evicted flows over active flows. The Flow Manager
will only process the evicted flows if they are present. These flows
have been evicted by workers. The active flows on that hash row will
have to wait until the next hash pass. Of course by then there could
be more evicted flows.
Combined these factors could lead to flows not being considered for
freeing and logging for a very long time, potentially even indefinitly.
The patch addresses the latter two flow manager issues by no longer
using TryLock. It will now simply wait for the lock to be released and
then do its work on it. Additionally for each row both the evicted list
and the active flow list will be processed.
Jason Ish [Tue, 5 Oct 2021 16:44:03 +0000 (10:44 -0600)]
github-ci: pin macos build to 10.15
There is currently a build failure with macos-latest (recently updated)
to 11 in the libhtp test suite code. Not sure if there are other
build issues in libhtp or Suricata at this time.
Victor Julien [Wed, 21 Apr 2021 13:20:49 +0000 (15:20 +0200)]
app-layer/pd: only consider actual available data
For size limit checks consider only available data at the stream start
and before any GAPS.
The old check would consider too much data if there were temporary gaps,
like when a data packet was in-window but (far) ahead of the expected
segment.