git.ipfire.org Git - thirdparty/nftables.git/log

tests: shell: add test case for interval set with timeout and aborted transaction

Add a regression test for rbtree+bsearch getting out-of-sync in
nf-next kernel.

This covers the syzkaller reproducer from
https://syzkaller.appspot.com/bug?extid=d417922a3e7935517ef6
which triggers abort with earlier gc at insert time and additional corner
case where transaction passes without recording a relevant change in the set
(i.e. no call to either abort or commit).

This test passes even on buggy kernels unless KASAN is enabled.

Signed-off-by: Florian Westphal <fw@strlen.de>

build: support `SOURCE_DATE_EPOCH` for build time-stamp

In order to support reproducible builds, set the build time-stamp to the value
of the environment variable, `SOURCE_DATE_EPOCH`, if set, and fall back to
calling `date`, otherwise.

Link: https://reproducible-builds.org/docs/source-date-epoch/
Fixes: 64c07e38f049 ("table: Embed creating nft version into userdata")
Reported-by: Arnout Engelen <arnout@bzzt.net>
Closes: https://github.com/NixOS/nixpkgs/issues/478048
Suggested-by: Philipp Bartsch <phil@grmr.de>
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>

build: generate build time-stamp once at configure

The existing implementation tries to generate a time-stamp once when make is
run. However, it doesn't work and generates one for every compilation. Getting
this right portably in automake is not straightforward. Instead, do it when
configure is run.

Rename the time-stamp variable since it is no longer generated by make.

Fixes: 64c07e38f049 ("table: Embed creating nft version into userdata")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>

build: simplify the instantation of nftversion.h

Add an nftversion.h.in autoconf input file which configure uses to instantiate
nftversion.h in the usual way.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: shell: add open interval overlap tests

Extend coverage with corner cases with open interval overlaps.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

Makefile.am: Drop pointless per-project AM_CPPFLAGS

These are redundant, the common AM_CPPFLAGS variable has it already.

Fixes: c96e0a17f3699 ("build: no recursive make for "examples/Makefile.am"")
Signed-off-by: Phil Sutter <phil@nwl.cc>

utils: Introduce expr_print_debug()

A simple function to call in random places when debugging
expression-related code.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: Update payload records

This is the bulk change of py test suite payload records with improved
data reg printing in libnftnl using data (component) size and byteorder
collected in nftables.

Aside from printing values in the right byte order and padded with
zeroes to match their actual size, this patch also exposes the improved
set element dump format:
* No '[end]' marker, 'element' clearly separates elements
* No semi-colon for non-map elements
* 'flags' value printed only if non-zero and prefixed by 'flags' to
distinguish from element data

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: tools: Add regen_payloads.sh

Use this script to recreate all payload records.

Signed-off-by: Phil Sutter <phil@nwl.cc>

netlink: Make use of nftnl_{expr,set_elem}_set_imm()

Pass the previously collected byteorder (and sizes) to libnftnl for
improved netlink debug output.

Signed-off-by: Phil Sutter <phil@nwl.cc>

netlink: Introduce struct nft_data_linearize::sizes

This array holds each concat component's actual length in bytes. It is
crucial because component data is padded to match register lengths and
if libnftnl has to print data "in reverse" (to print Little Endian
values byte-by-byte), it will print extra leading zeroes with odd data
lengths and thus indicate number of printed bytes does no longer
correctly reflect actual data length.

Signed-off-by: Phil Sutter <phil@nwl.cc>

netlink: Introduce struct nft_data_linearize::byteorder

Bits in this field indicate data is in host byte order and thus may need
conversion when being printed "byte-by-byte" in libnftnl.

With regular immediate values, this field's value has boolean properties
(if non-zero, data is in host byte order). Concatenations may contain
components in different byte order, so with them each bit (at index N)
indicates whether a component (at the same index) is in host byte order.

Communicate a possible byte order conversion in
__netlink_gen_concat_key() back to caller since this has to be respected
when setting 'byteorder' field in struct nft_data_linearize.

String-based values are special: While defined as being in host byte
order in nftables, libnftnl shall print them without prior conversion
like Big Endian values.

Signed-off-by: Phil Sutter <phil@nwl.cc>

expression: Set range expression 'len' field

The length value is needed for netlink debug output of concatenated
ranges. Set it to one of the inner elements' lengths (which should be
identical).

Since the inner element length may not be set initially, set it in
eval phase again. This covers at least all cases in tests/py.

Without this, netlink_gen_concat_key() et al. would have to inspect
element types and extract lengths accordingly, this is much easier.

Signed-off-by: Phil Sutter <phil@nwl.cc>

intervals: Convert byte order implicitly

When converting ranges to intervals, the latter's high and low values
must be in network byte order. Instead of creating the low/high constant
expressions with host byte order and converting the value, create them
with Big Endian and keep the value as is. Upon export, Little Endian MPZ
values will be byte-swapped by mpz_export_data() if BYTEORDER_BIG_ENDIAN
is passed.

The benefit of this is that value's byteorder may be communicated to
libnftnl later by looking at struct expr::byteorder field. By the time
this information is required during netlink serialization, there is no
other indicator for data byte order available.

Signed-off-by: Phil Sutter <phil@nwl.cc>

mergesort: Align concatenation sort order with Big Endian

By exporting all concat components in a way independent from host
byteorder and importing that blob of data in the same way aligns sort
order between hosts of different Endianness.

Fixes: 741a06ac15d2b ("mergesort: find base value expression type via recursion")
Signed-off-by: Phil Sutter <phil@nwl.cc>

mergesort: Fix sorting of string values

Sorting order was obviously wrong, e.g. "ppp0" ordered before "eth1".
Moreover, this happened on Little Endian only so sorting order actually
depended on host's byteorder. By reimporting string values as Big
Endian, both issues are fixed: On one hand, GMP-internal byteorder no
longer depends on host's byteorder, on the other comparing strings
really starts with the first character, not the last.

Fixes: 14ee0a979b622 ("src: sort set elements in netlink_get_setelems()")
Signed-off-by: Phil Sutter <phil@nwl.cc>

segtree: Fix range aggregation on Big Endian

Interface name wildcards are received as ranges from prefix with zero
padding to prefix with ones padding. E.g. with "abcd*" (in hex):

61626364000000000000000000000000 - 61626364ffffffffffffffffffffffff

The faulty code tries to export the prefix from the lower boundary (r1)
into a buffer, append "*" and allocate a constant expression from the
resulting string. This does not work on Big Endian though:
mpz_export_data() seems to zero-pad data upon export and not necessarily
respect the passed length value. Moreover, this padding appears in the
first bytes of the buffer. The amount of padding seems illogical, too:
While a 6B prefix causes 2B padding and 8B prefix no padding, 10B prefix
causes 4B padding and 12B prefix even 8B padding.

Work around the odd behaviour by exporting the full data into a larger
buffer.

A similar issue is caused by increasing the constant expression's length
to match the upper boundary data length: Data export when printing puts
the padding upfront, so the resulting string starts with NUL-chars.
Since this length adjustment seems not to have any effect in practice,
just drop it.

Fixes: 88b2345a215ef ("segtree: add pretty-print support for wildcard strings in concatenated sets")
Signed-off-by: Phil Sutter <phil@nwl.cc>

monitor: fix memleak in setelem cb

since 4521732ebbf3 ("monitor: missing cache and set handle initialization")
these fields are set via handle_merge(), so don't clobber those
fields in json output case:

==31877==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 16 byte(s) in 2 object(s) allocated from:
#0 0x7f0cb9f29d4b in strdup asan/asan_interceptors.cpp:593
#1 0x7f0cb9b584fd in xstrdup src/utils.c:80
#2 0x7f0cb9b355b3 in handle_merge src/rule.c:127
#3 0x7f0cb9ae12b8 in netlink_events_setelem_cb src/monitor.c:457

Seen when running tests/monitor with asan enabled.

Fixes: 4521732ebbf3 ("monitor: missing cache and set handle initialization")
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: clarify JSON rule positioning with handle field

The existing documentation briefly mentioned that the handle field can be
used for positioning, but the behavior was ambiguous. This commit clarifies:

- ADD with handle: inserts rule AFTER the specified handle
- INSERT with handle: inserts rule BEFORE the specified handle
- Multiple rules added at the same handle are positioned relative to the
original rule, not to previously inserted rules
- Explicit commands (with command wrapper) use handle for positioning
- Implicit commands (without command wrapper, used in export/import)
ignore handle for portability

This clarification helps users understand the correct behavior and avoid
confusion when using the JSON API for rule management.

Signed-off-by: Alexandre Knecht <knecht.alexandre@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: json_echo: Drop rule handle before multi-add

Now that JSON parser respects rule handles in explicit add commands, the
still present rule handle causes an error since the old rule does not
exist anymore.

Fixes: 50b5b71ebeee3 ("parser_json: Rewrite echo support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: shell: add JSON test for handle-based rule positioning

Add comprehensive test for JSON handle-based rule positioning to verify
the handle field correctly positions rules with explicit add/insert
commands while being ignored in implicit format.

Test coverage:
1. ADD with handle positions AFTER the specified handle
2. INSERT with handle positions BEFORE the specified handle
3. INSERT without handle positions at beginning
4. Multiple commands in single transaction (batch behavior)
5. Implicit format ignores handle field for portability

The test uses sed for handle extraction and nft -f format for setup
as suggested in code review. Final state is a table with two rules
from the implicit format test.

Signed-off-by: Alexandre Knecht <knecht.alexandre@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: shell: add JSON test for all object types

Add comprehensive test for JSON add/insert/delete/replace/create
operations on all object types to ensure the handle field changes
don't break non-rule objects.

Tests coverage:
- ADD operations: table, chain, rule, set, counter, quota
- INSERT operations: rule positioning
- REPLACE operations: rule modification
- CREATE operations: table creation with conflict detection
- DELETE operations: rule, set, chain, table

The test verifies that all object types work correctly with JSON
commands and validates intermediate states. Final state is an empty
table from the CREATE test.

Signed-off-by: Alexandre Knecht <knecht.alexandre@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: support handle for rule positioning in explicit JSON format

This patch enables handle-based rule positioning for JSON add/insert
commands by using a context flag to distinguish between explicit and
implicit command formats.

When processing JSON:
- Explicit commands like {"add": {"rule": ...}} set no flag, allowing
  handle fields to be converted to position for rule placement
- Implicit format (bare objects like {"rule": ...}, used in export/import)
  sets CTX_F_IMPLICIT flag, causing handles to be ignored for portability

This approach ensures that:
- Explicit rule adds with handles work for positioning
- Non-rule objects (tables, chains, sets, etc.) are unaffected
- Export/import remains compatible (handles ignored)

The semantics for explicit rule commands are:
  ADD with handle:    inserts rule AFTER the specified handle
  INSERT with handle: inserts rule BEFORE the specified handle

Implementation details:
- CTX_F_IMPLICIT flag (bit 10) marks implicit add commands
- CTX_F_EXPR_MASK uses inverse mask for future-proof expression flag filtering
- Handle-to-position conversion in json_parse_cmd_add_rule()
- Variables declared at function start per project style

Link: https://patchwork.ozlabs.org/project/netfilter-devel/patch/20251029224530.1962783-2-knecht.alexandre@gmail.com/
Suggested-by: Phil Sutter <phil@nwl.cc>
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Alexandre Knecht <knecht.alexandre@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser: move qualified meta expression parsing to flex/bison

The meta keyword currently accepts 'STRING' arguments.
This was originally done to avoid pollution the global token namespace.

However, nowadays we do have flex scopes to avoid this.
Add the tokens currently handled implciitly via STRING within
META flex scope.

SECPATH is a compatibility alias, map this to IPSEC token.
IBRPORT/OBRPORT are also compatibility aliases, remove those tokens
and handle this directly in scanner.l.

This also avoids nft from printing tokens in help texts that are only
there for compatibility with old rulesets.

meta_key_parse() is retained for json input parser.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Phil Sutter <phil@nwl.cc>

parser_bison: on syntax errors, output expected tokens

Now, on syntax errors, e.g., 'nft create fable filter', the user sees:
Error: syntax error, unexpected string
create fable filter
          ^^^^^
The patch builds an error message that lists what the parser expects
to see, in that case it would print:
Error: syntax error, unexpected string
expected any of: synproxy, table, chain, set, element, map,
flowtable, ct, counter, limit, quota, secmark
create fable filter
        ^^^^^
The obvious purpose of this is to help people who learn nft syntax.

The messages are still not as explanatory as one wishes, for it may
list parser token names such as 'string', but it's still better than
no hints at all.

Heed that the list of possible items on the parser's side is not
always consistent with expectations.

For instance, lexer/parser recognizes 'l4proto' in this command:
nft add rule ip F I meta l4proto tcp
as a generic '%token <string> STRING', while 'iifname' in
   nft add rule ip F I meta iifname eth0

is recognized as a '%token IIFNAME'

In such case the parser is only able to say that right after 'meta'
it expects 'iifname' or 'string', rather than 'iifname' and 'l4proto'.

This 'meta STRING' is a historic wart and can be resolved in
a followup patch.

[ fw@strlen.de: minor coding style changes and rewordings ]

Signed-off-by: Jan Kończak <jan.konczak@cs.put.poznan.pl>
Signed-off-by: Florian Westphal <fw@strlen.de>

scanner: Introduce SCANSTATE_RATE

This is a first exclusive start condition, i.e. one which rejects
unscoped tokens. When tokenizing, flex all too easily falls back into
treating something as STRING when it could be split into tokens instead.
Via an exclusive start condition, the string-fallback can be disabled as
needed.

With rates in typical formatting <NUM><bytes-unit>/<time-unit>,
tokenizer result depended on whitespace placement. SCANSTATE_RATE forces
flex to split the string into tokens and fall back to JUNK upon failure.
For this to work, tokens which shall still be recognized must be enabled
in SCANSTATE_RATE (or all scopes denoted by '*'). This includes any
tokens possibly following SCANSTATE_RATE to please the parser's
lookahead behaviour.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>

parser_bison: Introduce bytes_unit

Introduce scoped tokens for "kbytes" and "mbytes", completing already
existing "bytes" one. Then generalize the unit for byte values and
replace both quota_unit and limit_bytes by a combination of NUM and
bytes_unit.

With this in place, data_unit_parse() is not called outside of
datatype.c, so make it static.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>

parser_bison: Introduce tokens for log levels

Since log statement is scoped already, it's just a matter of declaring
the tokens in that scope and using them. This eliminates the redundant
copy of log level string parsing in parser_bison.y - the remaining one,
namely log_level_parse() in statement.c is used by JSON parser.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>

parser_bison: Introduce tokens for osf ttl values

Eliminate the open-coded string parsing and error handling.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>

parser_bison: Introduce tokens for chain types

Use the already existing SCANSTATE_TYPE for keyword scoping.
This is a bit of back-n-forth from string to token and back to string
but it eliminates the helper function and also takes care of error
handling.

Note that JSON parser does not validate the type string at all but
relies upon the kernel to reject wrong ones.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>

parser_bison: Introduce tokens for monitor events

There already is a start condition for "monitor" keyword and also a
DESTROY token. So just add the missing one and get rid of the
intermediate string buffer.

Keep checking the struct monitor::event value in eval phase just to be
on the safe side.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>

tests: shell: add small packetpath test for hash and rbtree types

Add tests to exercise packet path for rbtree and hash set types.
We check both positive (added address is matched) and negative
matches (set doesn't indicate match for deleted address).

For ranges, also validate that addresses preceeding or trailing
a range do not match.

Pipapo has no test to avoid duplicating what is already in
kernel kselftest (nft_concat_range.sh).

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: add small packetpath test for bitmap set type

bitmap sets don't support 'counter' flag, so we can only check
'match' vs 'no match', but we can't tell which set element has
matched.

Static test, counter validation via dumps.

Signed-off-by: Florian Westphal <fw@strlen.de>

mnl: restore create element command with large batches

The rework to reduce memory consumption has introduced a bug that result
in spurious EEXIST with large batches.

The code that tracks the start and end elements of the interval can add
the same element twice to the batch. This works with the add element
command, since it ignores EEXIST error, but it breaks the the create
element command.

Update this codepath to ensure both sides of the interval fit into the
netlink message, otherwise, trim the netlink message to remove them.
So the next netlink message includes the elements that represent the
interval that could not fit.

Fixes: 91dc281a82ea ("src: rework singleton interval transformation to reduce memory consumption")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: cover for large interval sets with create command

commit 91dc281a82ea ("src: rework singleton interval transformation to
reduce memory consumption") duplicates singleton interval elements when
the netlink message gets full, this results in spurious EEXIST errors
when creating many elements in a set.

This patch extends the existing test to cover for this bug.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: monitor: Fix for out-of-path call

When called from another directory without specifying test cases, an
incorrect regexp was used to glob all tests and no test was run at all:

| # ./tests/monitor/run-tests.sh
| echo: running tests from file *.t
| ./tests/monitor/run-tests.sh: line 201: testcases/*.t: No such file or directory
| monitor: running tests from file *.t
| ./tests/monitor/run-tests.sh: line 201: testcases/*.t: No such file or directory
| json-echo: running tests from file *.t
| ./tests/monitor/run-tests.sh: line 201: testcases/*.t: No such file or directory
| json-monitor: running tests from file *.t
| ./tests/monitor/run-tests.sh: line 201: testcases/*.t: No such file or directory

Fixes: 83eaf50c36fe8 ("tests: monitor: Become $PWD agnostic")
Signed-off-by: Phil Sutter <phil@nwl.cc>

doc: fix typo in man-page

"interally" -> "internally"

Fixes: f34381547094 ("doc: minor improvements the `reject` statement")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

build: fix ./configure with non-bash shell

CONFIG_SHELL=/bin/dash ./configure

breaks with:

./config.status: 2044: Syntax error: Bad for loop variable

Fixes: 64c07e38f049 ("table: Embed creating nft version into userdata")
Signed-off-by: Jan Palus <jpalus@fastmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: bad_rule_graphs: add chain linked from different hooks

On a kernel with broken (never upstreamed) patch this fails with:

Accepted bad ruleset with jump from filter type to masquerade (3)
and
Accepted bad ruleset with jump from prerouting to masquerade

... because bogus optimisation suppresses re-validation of 'n2', even
though it becomes reachable from an invalid base chain (filter, but n2
has nat-only masquerade expression).

Another broken corner-case is validation of the different hook types:
When it becomes reachable from nat:prerouting in addition to the allowed
nat:postrouting the validation step must fail.

Improve test coverage to ensure future optimisations catch this.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: Added SNAT/DNAT only cases for nat_ftp

Added cases for SNAT or DNAT only for active and passive modes.

Signed-off-by: Andrii Melnychenko <a.melnychenko@vyos.io>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: Refactored nat_ftp, added rulesets and testcase functions

Refactored the setup of nft rulesets, now it is possible to set up an
SNAT or DNAT-only ruleset for future tests.
Presented the testcase function to test passive or active modes.

Signed-off-by: Andrii Melnychenko <a.melnychenko@vyos.io>
Signed-off-by: Florian Westphal <fw@strlen.de>

build: Bump version to 1.1.6

This requires libnftnl 1.3.1 which includes new tunnel API.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

build: unbreak 'make distcheck'

Pablo reports 'make distcheck' got broken due to a bogus source file
added in the afl split:

make *** No rule to make target '-I./include', needed by 'distdir-am'. Stop.

Get rid of this line.

Fixes: 32c994f84904 ("src: move fuzzer functionality to separate tool")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: refer to python3 in json prettify script

Some distros only refer to python3, update it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: add device to sets/0075tunnel_0 to support older kernels

Older kernels do not support netdev basechain without device, add it so
this works.

Alternative is to skip it by adding:

# NFT_TEST_REQUIRES(NFT_TEST_HAVE_netdev_chain_without_device)

but it seems easier to support it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: add packetpath test for meta time expression.

v2:
- Switched to range syntax instead of two matches as suggested by Phil.

Signed-off-by: Yi Chen <yiche@redhat.com>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>

rule: skip CMD_OBJ_SETELEMS with no elements after set flush

Set declaration + set flush results in a crash because CMD_OBJ_SETELEMS
does not expect no elements. This internal command only shows up if set
contains elements, however, evaluation flushes set content after the set
expansion. Skip this command CMD_OBJ_SETELEMS if set is empty.

Fixes: d3c8051cb767 ("rule: rework CMD_OBJ_SETELEMS logic")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: move fuzzer functionality to separate tool

This means some loss of functionality since you can no longer combine
--fuzzer with options like --debug, --define, --include.

On the upside, this adds new --random-outflags mode which will randomly
switch --terse, --numeric, --echo ... on/off.

Update README to reflect this change.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

tunnel: add missing tunnel object list support

Tunnel object listing support was missing. Now it is possible to list
tunnels. Example:

sudo nft list tunnel netdev x y
table netdev x {
tunnel y {
id 10
ip saddr 192.168.2.10
ip daddr 192.168.2.11
sport 10
dport 20
ttl 10
erspan {
version 1
index 2
}
}
}

Fixes: a937a5dc02db ("src: add tunnel statement and expression support")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>

support for afl++ (american fuzzy lop++) fuzzer

afl comes with a compiler frontend that can add instrumentation suitable
for running nftables via the "afl-fuzz" fuzzer.

This change adds a "--with-fuzzer" option to configure script and enables
specific handling in nftables and libnftables to speed up the fuzzing process.
It also adds the "--fuzzer" command line option.

afl-fuzz initialisation gets delayed until after the netlink context is set up
and symbol tables such as (e.g. route marks) have been parsed.

When afl-fuzz restarts the process with a new input round, it will
resume *after* this point (see __AFL_INIT macro in main.c).

With --fuzzer <stage>, nft will perform multiple fuzzing rounds per
invocation: this increases processing rate by an order of magnitude.
The argument to '--fuzzer' specifies the last stage to run:

1: 'parser':
    Only run / exercise the flex/bison parser.

2: 'eval': stop after the evaluation phase.
    This attempts to build a complete ruleset in memory, does
    symbol resolution, adds needed shift/masks to payload instructions
    etc.

3: 'netlink-ro':
    'netlink-ro' builds the netlink buffer to send to the kernel,
    without actually doing so.

4: 'netlink-rw':
    Pass generated command/ruleset will be passed to the kernel.
    You can combine it with the '--check' option to send data to the kernel
    but without actually committing any changes.
    This could still end up triggering a kernel crash if there are bugs
    in the valiation / transaction / abort phases.

Use 'netlink-ro' if you want to prevent nft from ever submitting any
changes to the kernel or if you are only interested in fuzzing nftables
and its libraries.

In case a kernel splat is detected, the fuzzing process stops and all further
fuzzer attemps are blocked until reboot.

Signed-off-by: Florian Westphal <fw@strlen.de>

doc: libnftables-json: Describe RULESET object

Document the syntax of this meta-object used by "list" and "flush"
commands only.

Fixes: 872f373dc50f7 ("doc: Add JSON schema documentation")
Signed-off-by: Phil Sutter <phil@nwl.cc>

rule: add missing documentation for cmd_obj enum

In cmd_obj enum hooks, tunnel and tunnels elements documentation were
missing.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: don't suggest to disable GSO

The kernel can form aggregate packets whether or not GSO is enabled.
Disabling GSO is not a useful suggestion in this case.

Fixes: 05628cdd677d (doc: describe behaviour of {ip,ip6} length)
Signed-off-by: Florian Westphal <fw@strlen.de>

build: don't install ancillary files without systemd service file

If the systemd service file is not installed, currently the related man-page
and example nft file are still installed. Instead only install them when the
service file is installed.

Fixes: 107580cfa85c ("build: disable --with-unitdir by default")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: fix some man-page mistakes

Correct one typo and two non-native usages.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: minor improvements the `reject` statement

Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: fix/improve documentation of verdicts

- Clarify that a terminating statement also prevents the execution of later
  statements in the same rule and give an example about that.
- Correct that `accept` won’t terminate the evaluation of the ruleset (which is
  generally used for the whole set of all chains, rules, etc.) but only that of
  the current base chain (and any regular chains called from that).
  Indicate that `accept` only accepts the packet from the current base chain’s
  point of view.
  Clarify that not only chains of a later hook could still drop the packet, but
  also ones from the same hook if they have a higher priority.
- Various other minor improvements/clarifications to wording.

Link: https://lore.kernel.org/netfilter-devel/3c7ddca7029fa04baa2402d895f3a594a6480a3a.camel@scientia.org/T/#t
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: add overall description of the ruleset evaluation

Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

utils: Cover for missing newline after BUG() messages

Relieve callers from having to suffix their messages with a newline
escape sequence, have the macro append it to the format string instead.

This is mostly a fix for (the many) calls to BUG() without a newline
suffix. Adjust the previously correct ones since they emit an extra
newline now.

Signed-off-by: Phil Sutter <phil@nwl.cc>

src: add refcount asserts

_get() functions must not be used when refcnt is 0, as expr_free()
releases expressions on 1 -> 0 transition.

Also, check that a refcount would not overflow from UINT_MAX to 0.
Use INT_MAX to also catch refcount leaks sooner, we don't expect
2**31 get()s on same object.

This helps catching use-after-free refcounting bugs even when nft
is built without ASAN support.

v3: use a macro + BUG to get more info without a coredump.

Signed-off-by: Florian Westphal <fw@strlen.de>

doc: remove queue from verdict list

While its correct that the queue statement is internally implemented
via the queue verdict, this is an implementation detail.
We don't list "stolen" as a verdict either.

nft ... queue will always use the nft_queue statement, so move the
reinject detail from statements to queue statement and remove this.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: fix typo in vmap_timeout test script

While executing the test suite from tests/shell folder, the following error
is displayed many times:

tests/shell/testcases/maps/vmap_timeout: line 48: [: : integer expected

Looking at the script, a non-existing variable (expires) is tested instead of
the existing one (expire).

Reproduction:
tests/shell/run-tests.sh -v

Fixes: db80037c0279 ("tests: shell: extend vmap test with updates")
Signed-off-by: Gyorgy Sarvari <skandigraun@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: add more documentation on bitmasks and sets

Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: fix/improve documentation of jump/goto/return

Overhaul the description of `jump`/`goto`/`return`.
`jump` only explains what the statement causes from the point of view of the
new chain (that is: not, how the returning works), which includes that an
implicit `return` is issued at the end of the chain.
`goto` is explained in reference to `jump`.
`return` describes abstractly how the return position is determined and what
happens if there’s no position to return to (but not for example where an
implicit `return` is issued).

List and explain verdict-like statements like `reject` which internally imply
`accept` or `drop`.
Further explain that with respect to evaluation these behave like their
respectively implied verdicts.

Link: https://lore.kernel.org/netfilter-devel/3c7ddca7029fa04baa2402d895f3a594a6480a3a.camel@scientia.org/T/#t
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: follow prefix expression recursively if needed

Included bogons assert:
Assertion `!expr_is_constant(*expr) || expr_is_singleton(*expr)' failed

This is because the "foo*" + prefix combination causes expr_evaluate
to replace the binop + string expression with another prefix that
gets allocated while handling "foo*" (wildcard).

This causes expr_evaluate_prefix to build
a prefix -> prefix -> binop chain.

After this, we get:

Error: Right hand side of relational expression ((null)) must be constant
a b ct helper "2.2.2.2.3*1"/80
~~~~~~~~~~^^^^^^^^^^^^^^^^
Error: Binary operation (&) is undefined for prefix expressions
a b ct helper "2.2.2.****02"/80
^^^^^^^^^^^^^^^^^

for those inputs rather than hitting assert() in byteorder_conversion()
later on.

Signed-off-by: Florian Westphal <fw@strlen.de>

netlink: Zero nft_data_linearize objects when populating

Callers of netlink_gen_{key,data}() pass an uninitialized auto-variable,
avoid misinterpreting garbage in fields "left blank".

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: ip6/vmap.t: Drop double whitespace in rule

Just a harmless typo.

Signed-off-by: Phil Sutter <phil@nwl.cc>

datatype: Increase symbolic constant printer robustness

Do not segfault if passed symbol table is NULL.

Signed-off-by: Phil Sutter <phil@nwl.cc>

netlink: No need to reference array when passing as pointer

Struct nft_data_linearize::value is an array, drop the reference
operator.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: Do not rely upon '[end]' marker

Set element lines reliably start with whitespace followed by the word "element"
and are separated by the same pattern. Use it instead of '[end]' (or anything
enclosed in brackets).

While at it, recognize payload lines as starting with ' [ ' and avoid
searching for the closing bracket.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: Implement payload_record()

This is a helper function to store payload records (and JSON
equivalents) in .got files. The code it replaces missed to insert a
newline before the new entry and also did not check for existing records
in all spots.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: Fix for using wrong payload path

If one family has a per-family payload record, following families used
it by accident for a .got file when they actually should use the generic
name.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: inet/osf.t: Fix element ordering in JSON equivalents

The original rules order set elements differently. Stick to that and add
entries to inet/osf.t.json.output to cover for nftables reordering
entries.

Fixes: 92029c1282958 ("src: osf: add json support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: any/ct.t.json.output: Drop leftover entry

The rule with single element anonymous set was replaced, drop this
leftover.

Fixes: 27f6a4c68b4fd ("tests: replace single element sets")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: any/tcpopt.t.json: Fix JSON equivalent

Set element ordering differed from the rule in standard syntax.

Fixes: d199cca92f9eb ("expression: expr_build_udata_recurse should recurse")
Signed-off-by: Phil Sutter <phil@nwl.cc>

optimize: Fix verdict expression comparison

In verdict expression, 'chain' points at a constant expression of
verdict_type, not a symbol expression. Therefore 'chain->identifier'
points eight bytes (on 64bit systems) into the mpz_t 'value' holding the
chain name. This matches the '_mp_d' data pointer, so works by accident.

Fix this by copying what verdict_jump_chain_print() does and export
chain names before comparing.

Fixes: fb298877ece27 ("src: add ruleset optimization infrastructure")
Signed-off-by: Phil Sutter <phil@nwl.cc>

datatype: Fix boolean type on Big Endian

Pass a reference to a variable with correct size when creating the
expression, otherwise mpz_import_data() will read only the always zero
upper byte on Big Endian hosts.

Fixes: afb6a8e66a111 ("datatype: clamp boolean value to 0 and 1")
Signed-off-by: Phil Sutter <phil@nwl.cc>

src: parser_json: fix format string bugs

After adding fmt attribute annotation:
warning: format not a string literal and no format arguments [-Wformat-security]
131 | erec_queue(error(&loc, err->text), ctx->msgs);
In function 'json_events_cb':
warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type '__u32' {aka 'unsigned int'} [-Wformat=]

Fix that up too.

Fixes: 586ad210368b ("libnftables: Implement JSON parser")
Signed-off-by: Florian Westphal <fw@strlen.de>

src: fix fmt string warnings

for some reason several functions had a __gmp_fmtstring annotation,
but that was an empty macro.

After fixing it up, we get several new warnings:

In file included from src/datatype.c:28:
src/datatype.c:174:24: note: in expansion of macro 'error'
  174 |                 return error(&sym->location,
      |                        ^~~~~
src/datatype.c:405:24: note: in expansion of macro 'error'
  405 |                 return error(&sym->location, "Could not parse %s; did you mean `%s'?",
      |                        ^~~~~

Fmt string says '%s', but unqailified void *, add 'const char *' cast,
it is safe in both cases.

In file included from src/evaluate.c:29:
src/evaluate.c: In function 'byteorder_conversion':
src/evaluate.c:232:35: warning: format '%s' expects a matching 'char *' argument [-Wformat=]
  232 |                                   "Byteorder mismatch: %s expected %s, %s got %s",
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Actual bug, fmt string has one '%s' too many, remove it.

All other warnings were due to '%u' instead of '%lu' / '%zu'.

Signed-off-by: Florian Westphal <fw@strlen.de>

doc: describe include’s collation order to be that of the C locale

Currently, `nft` doesn’t call `setlocale(3)` and thus `glob(3)` uses the `C`
locale.

Document this as it’s possibly relevant to the ordering of included rules.

This also makes the collation order “official” so any future localisation would
need to adhere to that.

Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: minor improvements with respect to the term “ruleset”

Statements are elements of rules. Non-terminal statement are in particular
passive with respect to their rules (and thus automatically with respect to the
whole ruleset).

In “Continue ruleset evaluation”, it’s not necessary to mention the ruleset as
it’s obvious that the evaluation of the current chain will be continued.

Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: reject tunnel section if another one is already present

Included bogon causes a crash because the list head isn't initialised
due to tunnel->type == VXLAN.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>

src: parser_bison: prevent multiple ip daddr/saddr definitions

minor change to the bogon makes it assert because symbolic expression
will have wrong refcount (2) at scope teardown.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>

src: tunnel src/dst must be a symbolic expression

Included bogons crash with segfault and assertion.  After fix:

tunnel_with_garbage_dst:3:12-14: Error: syntax error, unexpected tcp, expecting string or quoted string or string with a trailing asterisk or '$'
  ip saddr tcp dport { }
           ^^^
The parser change restricts the grammar to no longer allow this,
we would crash here because we enter payload evaluation path that
tries to insert a dependency into the rule, but we don't have one
(ctx->rule and ctx->stmt are NULL as expected here).

The eval stage change makes sure we will reject non-value symbols:

tunnel_with_anon_set_assert:1:12-31: Error: must be a value, not set
define s = { 1.2.3.4, 5.6.7.8 }
           ^^^^^^^^^^^^^^^^^^^^

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>

evaluate: tunnel: don't assume src is set

Included bogon crashes, after fix:

empty_geneve_definition_crash:2:9-16: Error: Could not process rule: Invalid argument

Since this feature is undocumented (hint, hint) I don't know
if there are cases where ip daddr can be elided.

If not, a followup patch should reject empty dst upfront
so users get a more verbose error message.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>

tests: shell: add packetpath test for reject statement

Test case for:
91a79b792204 ("netfilter: nf_reject: don't leak dst refcount for loopback packets")
and
db99b2f2b3e2 ("netfilter: nf_reject: don't reply to icmp error messages")

Signed-off-by: Florian Westphal <fw@strlen.de>

doc: clarify evaluation of chains

In particular:
- Mention that grouping of chains in tables is irrelevant to the evaluation
  order.
- Clarify that priorities only define the ordering of chains per hook.
- Improved potentially ambiguous wording “lower priority values have precedence
  over higher ones”, which could be mistaken as that rules from lower priority
  chains might “win” over such from higher ones (which is however only the case
  if they drop/reject packets).
  The new wording merely describes which chains are evaluated first, implicitly
  referring the question which verdict “wins” to the section where verdicts are
  described, and also should work when lower priority chains mangle packets (in
  which case they might actually be considered as having “precedence”).

Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: add packetpath test for meta ibrhwaddr

The test checks that the packets are processed by the bridge device and
not forwarded.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>

meta: introduce meta ibrhwaddr support

Can be used in bridge prerouting hook to redirect the packet to the
receiving physical device for processing.

table bridge nat {
        chain PREROUTING {
                type filter hook prerouting priority 0; policy accept;
                ether daddr de:ad:00:00:be:ef meta pkttype set host ether daddr set meta ibrhwaddr accept
        }
}

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: fix tcpdump example

The expression needs to be enclosed in a single string and combined with
a logical AND to have the desired effect.

Fixes: 1188a69604c3 ("src: introduce SYNPROXY matching")
Signed-off-by: Georg Pfuetzenreuter <mail@georg-pfuetzenreuter.net>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: type_route_chain: use in-tree nftables, not system-wide one

Switch this to $NFT, which contains the locally-compiled binary.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: fix name based checks with CONFIG_MODULES=n

Don't include a trailing space, its only there if nftables is a module:

  hook ingress device foo2 {
     0000000000 chain netdev t c [nf_tables]
  }

with CONFIG_NF_TABLES=y, this gets listed as:
'0000000000 chain netdev t c\n'.

Signed-off-by: Florian Westphal <fw@strlen.de>

mnl: Drop asterisk from end of NFTA_DEVICE_PREFIX strings

The asterisk left in place becomes part of the prefix by accident and is thus
both included when matching interface names as well as dumped back to user
space.

Fixes: c31e887504a90 ("mnl: Support simple wildcards in netdev hooks")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: add regression tests for set flush+add bugs

Create a helper file to:
1. create client <-> router <-> server topology
2. floodping from client to server
3. add a chain + set that contains both client and server
addresses
4. a control counter that should never match
5. then, flush the set (not the ruleset) and re-add the
addresses in one transaction

Report failure when counter had a match.

The test cases for the set types are done in separate files to take
advantage of run-tests.sh parallelization.

The expected behavior is that every ping packet is matched by the set.
The packet path should either match the old state, right before flush,
or the new state, after re-add.

As the flushed addresses are re-added in the same transaction we must
not observe in-limbo state where existing elements are deactivated but
new elements are not found.

Signed-off-by: Florian Westphal <fw@strlen.de>

src: tunnel: handle tunnel delete command

'delete tunnel foo bar' causes nft to bug out.

Fixes: 35d9c77c5745 ("src: add tunnel template support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: objects.t: must use input, not output

synproxy must never be used in output rules, doing so results in kernel
crash due to infinite recursive calls back to nf_hook_slow() for the
emitted reply packet.

Up until recently kernel lacked this validation, and now that the kernel
rejects this the test fails. Use input to make this pass again.

A new test to ensure we reject synproxy in ouput should be added
in the near future.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: Test ifname-based hooks

Assert that:
- Non-matching interface specs are accepted
- Existing interfaces are hooked into upon flowtable/chain creation
- A new device matching the spec is hooked into immediately
- No stale hooks remain in 'nft list hooks' output
- Wildcard hooks basically work

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: Accept ASTERISK_STRING in flowtable_expr_member

All clauses are identical, so instead of adding a third one for
ASTERISK_STRING, use a single one for 'string' (which combines all three
variants).

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

mnl: Support simple wildcards in netdev hooks

When building NFTA_{FLOWTABLE_,}HOOK_DEVS attributes, detect trailing
asterisks in interface names and transmit the leading part in a
NFTA_DEVICE_PREFIX attribute.

Deserialization (i.e., appending asterisk to interface prefixes returned
in NFTA_DEVICE_PREFIX atributes happens in libnftnl.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

libnftables: do not re-add default include directory in include search path

Otherwise globbing might duplicate included files because
include_path_glob() is called twice.

Fixes: 7eb950a8e8fa ("libnftables: include canonical path to avoid duplicates")
Tested-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>