Michal Koutný [Mon, 30 Jun 2025 14:15:26 +0000 (16:15 +0200)]
doc: Clarify cgroup meta variable
The documentation mentions control group id where the meaning is a class
id associated to the cgroup of a socket. This used to be fine until
there came cgroup v2 that use similar terminolgy (cgroup id) for very
different thing -- a numeric identifier of a particular (v2) cgroup.
This contemporary cgroup id isn't exposed by netfilter (v2 matching is
based on paths externally). Fix the docs and decrease confusion by more
precise description of the metavariable.
[ Added comment in description to refer to socket cgroupv2 --pablo ]
Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
f686a17eafa0 ("fib: Support existence check") adds EXPR_F_BOOLEAN as a
workaround to infer from the rhs of the relational expression if the fib
lookup wants to check for a specific output interface or, instead,
simply check for existence. This, however, does not work with maps.
The NFT_FIB_F_PRESENT flag can be used both with NFT_FIB_RESULT_OIF and
NFT_FIB_RESULT_OFINAME, my understanding is that they serve the same
purpose which is to check if a route exists, so they are redundant.
Add a 'check' fib result to check for routes while still keeping the
inference workaround for backward compatibility, but prefer the new
syntax in the listing.
Update man nft(8) and tests/py.
Fixes: f686a17eafa0 ("fib: Support existence check") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The test was technically incorrect: Instead of detecting whether
interface hooks are name-based or not, it actually tested whether
netdev-family chains are removed along with their last hook.
Since the latter behaviour is established in kernel commit fc0133428e7a
("netfilter: nf_tables: Tolerate chains with no remaining hooks") and
thus independent from the name-based hooks change, treating both as the
same kernel feature is not acceptable.
Fix this by detecting whether a netdev-family chain may be added despite
specifying a non-existent interface to hook into. Keep the old check
around with a better name, although unused for now.
Reported-by: Florian Westphal <fw@strlen.de> Fixes: f27e5abd81f29 ("tests: shell: Adjust to ifname-based hooks") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Thu, 26 Jun 2025 00:52:48 +0000 (02:52 +0200)]
evaluate: prevent merge of sets with incompatible keys
Its not enough to check for interval flag, this would assert in interval
code due to concat being passed to the interval code:
BUG: unhandled key type 13
After fix:
same_set_name_but_different_keys_assert:8:6-7: Error: set already exists with
different datatype (concatenation of (IPv4 address, network interface index) vs
network interface index)
set s4 {
^^
This also improves error verbosity when mixing datamap and objref maps:
invalid_transcation_merge_map_and_objref_map:9:13-13:
Error: map already exists with different datatype (IPv4 address vs string)
.. instead of 'Cannot merge map with incompatible existing map of same name'.
The 'Cannot merge map with incompatible existing map of same name' check
is kept in place to catch when ruleset contains a set and map with same name
and same key definition.
Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 23 Jun 2025 19:37:31 +0000 (21:37 +0200)]
evaluate: check that set type is identical before merging
Reject maps and sets of the same name:
BUG: invalid range expression type catch-all set element
nft: src/expression.c:1704: range_expr_value_low: Assertion `0' failed.
After:
Error: Cannot merge set with existing datamap of same name
set z {
^
v2:
Pablo points out that we shouldn't merge datamaps (plain value) and objref
maps either, catch this too and add another test:
nft --check -f invalid_transcation_merge_map_and_objref_map
invalid_transcation_merge_map_and_objref_map:9:13-13: Error: Cannot merge map with incompatible existing map of same name
We should also make sure that both data (for map case) and
set keys are identical, this is added in a followup patch.
Florian Westphal [Tue, 24 Jun 2025 21:20:58 +0000 (23:20 +0200)]
evaluate: avoid double-free on error handling of bogus objref maps
commit 98c51aaac42b ("evaluate: bail out if anonymous concat set defines a non concat expression")
clears set->init to avoid a double-free.
Extend this to also handle object maps.
The included bogon triggers a double-free of set->init expression:
Error: unqualified type invalid specified in map definition. Try "typeof expression" instead of "type datatype".
ct helper set ct saddr map { 1c3:: : "p", dead::beef : "myftp" }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This might not crash, depending on libc/malloc, but ASAN reports this:
==17728==ERROR: AddressSanitizer: heap-use-after-free on address 0x50b0000005e8 at ..
READ of size 4 at 0x50b0000005e8 thread T0
#0 0x7f1be3cb7526 in expr_free src/expression.c:87
#1 0x7f1be3cbdf29 in map_expr_destroy src/expression.c:1488
#2 0x7f1be3cb74d5 in expr_destroy src/expression.c:80
#3 0x7f1be3cb75c6 in expr_free src/expression.c:96
#4 0x7f1be3d5925e in objref_stmt_destroy src/statement.c:331
#5 0x7f1be3d5831f in stmt_free src/statement.c:56
#6 0x7f1be3d583c2 in stmt_list_free src/statement.c:66
#7 0x7f1be3d42805 in rule_free src/rule.c:495
#8 0x7f1be3d48329 in cmd_free src/rule.c:1417
#9 0x7f1be3cd2c7c in __nft_run_cmd_from_filename src/libnftables.c:759
#10 0x7f1be3cd340c in nft_run_cmd_from_filename src/libnftables.c:847
#11 0x55dcde0440be in main src/main.c:535
Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
There is no guarantee that chain[] is null terminated, so snprintf
can read past chain[] array. ASAN report is:
AddressSanitizer: stack-buffer-overflow on address 0x7ffff5f00520 at ..
READ of size 257 at 0x7ffff5f00520 thread T0
#0 0x00000032ffb6 in printf_common(void*, char const*, __va_list_tag*) (src/nft+0x32ffb6)
#1 0x00000033055d in vsnprintf (src/nft+0x33055d)
#2 0x000000332071 in snprintf (src/nft+0x332071)
#3 0x0000004eef03 in netlink_gen_chain src/netlink.c:454:2
#4 0x0000004eef03 in netlink_gen_verdict src/netlink.c:467:4
Reject chain jumps that exceed 255 characters, which matches the netlink
policy on the kernel side.
The included reproducer fails without asan too because the kernel will
reject the too-long chain name. But that happens after the asan detected
bogus read.
Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 24 Jun 2025 19:39:46 +0000 (21:39 +0200)]
tests/py: clean up set backend support fallout
Pablo reports failing py tests woth recent kernel and userland:
any/objects.t: OK
WARNING: line 3: 'add rule ip6 test-ip6 input ..
mismatches 'family 2 __set0 test-ip4 3 backend nft_set_bitmap_type [nf_tables] count 7'
When nf_tables is built as a module, the set backend name coming
from kernel contains the module name ([nf_tables]), this makes the
test script treat it as part of the pseudo instructions.
Skip this line explicitly to avoid these warnings.
Fixes: 7cec20e45a75 ("tests/py: prepare for set debug change") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
src: use constant range expression for interval+concatenation sets
Expand 347039f64509 ("src: add symbol range expression to further
compact intervals") to use constant range expression for elements with
concatenation of intervals.
Ruleset with 100k elements of this type:
table inet x {
set y {
typeof ip saddr . tcp dport
flags interval
elements = {
0.1.2.0-0.1.2.240 . 0-1,
...
}
}
}
tests: shell: add feature check for count output change
New kernels with latest nft release will print the number
of set elements allocated on the kernel side.
This causes shell test dump validation to fail in several
places. We can't just update the affected dump files
because the test cases are also supposed to pass on current
-stable releases.
Add a feature check for this. Dump failure can then use
sed to postprocess the stored dump file and can then call
Next patch will make initial set dump from kernel emit set debug
information, so the obtained netlink debug file won't match what is
recorded in tests/py.
Furthermore, as the python add rules for each of the family the test is
for, subsequent dump will include debug information of the other/previous
families.
Change the script to skip all unrelated information to only compare the
relevant set element information and the generated expressions.
This change still finds changes in [ expr ... ] and set elem debug output.
evaluate: restrict allowed subtypes of concatenations
We need to restrict this, included bogon asserts with:
BUG: unknown expression type prefix
nft: src/netlink_linearize.c:940: netlink_gen_expr: Assertion `0' failed.
Prefix expressions are only allowed if the concatenation is used within
a set element, not when specifying the lookup key.
For the former, anything that represents a value is allowed.
For the latter, only what will generate data (fill a register) is
permitted.
At this time we do not have an annotation that tells if the expression
is on the left hand side (lookup key) or right hand side (set element).
Add a new list recursion counter for this. If its 0 then we're building
the lookup key, if its the latter the concatenation is the RHS part
of a relational expression and prefix, ranges and so on are allowed.
IOW, we don't really need a recursion counter, another type of annotation
that would tell if the expression is placed on the left or right hand side
of another expression would work too.
v2: explicitly list all 'illegal' expression types instead of
using a default label for them.
This will raise a compiler warning to remind us to adjust the case
labels in case a new expression type gets added in the future.
evaluate: rename recursion counter to recursion.binop
The existing recursion counter is used by the binop expression to detect
if we've completely followed all the binops.
We can only chain up to NFT_MAX_EXPR_RECURSION binops, but the evaluation
step can perform constant-folding, so we must recurse until we found the
rightmost (last) binop in the chain.
Then we can check the post-eval chain to see if it is something that can
be serialized later (i.e., if we are within the NFT_MAX_EXPR_RECURSION
after constant folding) or not.
Thus we can't reuse the existing ctx->recursion counter for other
expressions; entering the initial expr_evaluate_binop with
ctx->recursion > 0 would break things.
Therefore rename this to an embedded structure.
This allows us to add a new recursion counter in a followup patch.
Yi Chen [Sun, 22 Jun 2025 12:55:51 +0000 (20:55 +0800)]
test: shell: Don't use system nft binary
Use the defined $NFT variable instead of calling the system nft binary directly.
Add a nat_ftp.nodump file to avoid the following check-tree.sh error:
ERR: "tests/shell/testcases/packetpath/nat_ftp" has no "tests/shell/testcases/packetpath/dumps/nat_ftp.{nft,nodump}" file.
Signed-off-by: Yi Chen <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Phil Sutter [Thu, 12 Jun 2025 10:59:29 +0000 (12:59 +0200)]
tests: py: Properly fix JSON equivalents for netdev/reject.t
Revert commit d1a7b9e19fe65 ("tests: py: update netdev reject test
file"), the stored JSON equivalents were correct in that they matched
the standard syntax input.
In fact, we missed a .json.output file recording the expected deviation
in JSON output.
Fixes: d1a7b9e19fe65 ("tests: py: update netdev reject test file") Fixes: 7ca3368cd7575 ("reject: Unify inet, netdev and bridge delinearization") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Tue, 6 May 2025 22:06:20 +0000 (00:06 +0200)]
tests: shell: Adjust to ifname-based hooks
Interface specs won't disappear anymore upon device removal. Drop them
manually if kernel has ifname-based hooks.
Skip transactions/0050rule_1 if kernel has name-based hooks: The test
relies upon the ruleset being rejected for non-existent interfaces,
which obviously won't happen then.
Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 11 Jun 2025 15:15:22 +0000 (17:15 +0200)]
tests: monitor: Fix for single flag array avoidance
Missed to update the JSON monitor expected output.
Fixes: 6bedb12af1658 ("json: Print single set flag as non-array") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 11 Jun 2025 14:45:48 +0000 (16:45 +0200)]
json: Dump flowtable hook spec only if present
If there is no priority.expr set, assume hook.num is bogus, too.
While this is fixing JSON output, it's hard to tell what commit this is
actually fixing: Before commit 627c451b23513 ("src: allow variables in
the chain priority specification"), there was no way to detect
flowtables missing hook specs (e.g. when printing flowtable delete
monitor event).
Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 11 Jun 2025 12:15:38 +0000 (14:15 +0200)]
netlink: Do not allocate a bogus flowtable priority expr
Code accidentally treats missing NFTNL_FLOWTABLE_PRIO attribute as zero
prio value which may not be correct.
Fixes: db0697ce7f602 ("src: support for flowtable listing") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 11 Jun 2025 11:12:56 +0000 (13:12 +0200)]
netlink: Fix for potential crash parsing a flowtable
Kernel's flowtable message might not contain the
NFTA_FLOWTABLE_HOOK_DEVS attribute. In that case, nftnl_flowtable_get()
will return NULL for the respective nftnl attribute.
Fixes: db0697ce7f602 ("src: support for flowtable listing") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
We can't recover from errors here, but we can abort with a more
precise reason than 'segmentation fault', or stack corruptions
that get caught way later, or not at all.
expr->value is going to be read, we can't cope with other expression
types here.
We will copy to stack buffer of IFNAMSIZ size, abort if we would
overflow.
Check there is a NUL byte present too.
This is a preemptive patch, I've seen one crash in this area but
no reproducer yet.
Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
json: prevent null deref if chain->policy is not set
The two commits mentioned below resolved null dererence crashes when the
policy resp. priority keyword was missing in the chain/flowtable
specification.
Same issue exists in the json output path, so apply similar fix there
and extend the existing test cases.
Fixes: 5b37479b42b3 ("nftables: don't crash in 'list ruleset' if policy is not set") Fixes: b40bebbcee36 ("rule: do not crash if to-be-printed flowtable lacks priority") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Phil Sutter <phil@nwl.cc>
tests: py: fix json single-flag output for fib & synproxy
Blamed commits change output format but did not adjust existing tests:
inet/fib.t: WARNING: line 16: '{"nftables": ..
Fixes: 38f99ee84fe6 ("json: Print single synproxy flags as non-array") Fixes: dbe5c44f2b89 ("json: Print single fib flag as non-array") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Phil Sutter <phil@nwl.cc>
Phil Sutter [Fri, 16 May 2025 17:17:00 +0000 (19:17 +0200)]
netlink: Pass netlink_ctx to netlink_delinearize_setelem()
Prepare for calling netlink_io_error() which needs the context pointer.
Trade this in for the cache pointer since no caller uses a special one.
No functional change intended.
Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Fri, 16 May 2025 11:28:19 +0000 (13:28 +0200)]
netlink_delinearize: Replace some BUG()s by error messages
Netlink parser tries to keep going despite errors. Faced with an
incompatible ruleset, this is much more user-friendly than exiting the
program upon the first obstacle. This patch fixes three more spots to
support this.
Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Fri, 16 May 2025 17:41:19 +0000 (19:41 +0200)]
netlink: Catch unknown types when deserializing objects
Print an error message and discard the object instead of returning it to
the caller. At least when trying to print it, we would hit an assert()
in obj_type_name() anyway.
Fixes: 4756d92e517ae ("src: listing of stateful objects") Signed-off-by: Phil Sutter <phil@nwl.cc>
Phil Sutter [Thu, 8 May 2025 15:28:02 +0000 (17:28 +0200)]
json: Introduce json_add_array_new()
Propagate nat_stmt_add_array() to a generic helper for use in all spots
adding an array property which may reduce to a single item or even not
exist at all.
Phil Sutter [Thu, 8 May 2025 14:40:41 +0000 (16:40 +0200)]
json: Print single fib flag as non-array
Check array size and reduce the array if possible.
The zero array length check is dead code here due to the surrounding 'if
(flags)' block, but it's a common idiom one could replace by a shared
routine later.
Phil Sutter [Thu, 10 Apr 2025 14:42:42 +0000 (16:42 +0200)]
parser_json: Introduce parse_flags_array()
Various objects support a 'flags' property with value usually being an
array of strings. There is a special case, when merely a single flag is
set: The value may be a string representing this flag.
Introduce a function assisting in parsing this polymorphic value. Have
callers pass a parser callback translating a single flag name into a
corresponding value. Luckily, these single flag parsers are very common
already.
As a side-effect, enable the single flag spec for set flags as well and
update the documentation accordingly.
src: netlink: fix crash when ops doesn't support udata
Whenever a new version adds udata support to an expression, then old
versions of nft will crash when trying to list such a ruleset generated
by a more recent version of nftables.
Phil Sutter [Thu, 8 May 2025 10:08:39 +0000 (12:08 +0200)]
tests/shell: Skip netdev_chain_dev_addremove on tainted kernels
The test checks taint state to indicate success or failure. Since this
won't work if the kernel is already tainted at start, skip the test
instead of failing it.
Fixes: 02dbf86f39410 ("tests: shell: add a test case for netdev ruleset flush + parallel link down") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Yi Chen [Wed, 16 Apr 2025 15:53:20 +0000 (23:53 +0800)]
tests: shell: Update packetpath/flowtables
1. The socat receiver should not use the pipfile as output where the sender
reads data from, this could create an infinite data loop.
2. Sending a packet right after establishing the connection helped uncover
a new bug (see kernel commit d2d31ea8cd80, "netfilter: conntrack: fix erronous removal of offload bit").
3. Optimize test log output
Signed-off-by: Yi Chen <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
netlink: bogus concatenated set ranges with netlink message overrun
When building each component of the set element key, a late byteorder
switch is performed to ensure that all components in the interval are
represented in big endian, as required by the pipapo backend.
In case that the set element does not fit into the netlink message, the
byteorder switch happens twice, leading to inserting an element with a
bogus component with large sets, so instead:
Note that 16777216 is 0x1000000, which should instead be 0x00000001 to
represent "lo" as u32.
Fix this by switching the value in a temporary variable and use it to
set the set element key attribute in the netlink message.
Later, revisit this to perform this byteorder switch from evaluation
step.
Add tests/shell unit to cover for this bug.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1792 Fixes: 8ac2f3b2fca3 ("src: Add support for concatenated set ranges") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
parser_bison: add selector_expr rule to restrict typeof_expr
typeof_expr allows for symbol, constant and bitwise expressions,
restrict it to selector expressions.
After this patch, input generated by fuzzer is rejected upfront:
# nft -f test.nft
test.nft:3:53-53: Error: syntax error, unexpected number
typeof numgen inc mod 2 : ip daddr . 0
^
test.nft:2:12-13: Error: set definition does not specify key
map t2 {
^^
test.nft:8:65-67: Error: No such file or directory
meta l4proto tcp dnat ip to numgen inc mod 2 map @t2
^^^
test.nft:8:65-67: Error: No such file or directory
meta l4proto tcp dnat ip to numgen inc mod 2 map @t2
^^^
Revisit 4ab1e5e60779 ("src: allow use of 'verdict' in typeof
definitions") to handle verdict as string, later a token can be added
to the scanner and enable it via flex start conditions.
Fixes: 14357cff40ed ("parser: add typeof keyword for declarations") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
optimize: invalidate merge in case of duplicated key in set/map
-o/--optimize results in EEXIST error when merging two rules that lead
to ambiguous set/map, for instance:
table ip x {
chain v4icmp {}
chain v4icmpc {}
chain y {
ip protocol icmp jump v4icmp
ip protocol icmp goto v4icmpc
}
}
which is not possible because duplicated keys are not possible in
set/map. This is how it shows when running a test:
Merging:
testcases/sets/dumps/sets_with_ifnames.nft:56:3-30: ip protocol icmp jump v4icmp
testcases/sets/dumps/sets_with_ifnames.nft:57:3-31: ip protocol icmp goto v4icmpc
into:
ip protocol vmap { icmp : jump v4icmp, icmp : goto v4icmpc }
internal:0:0-0: Error: Could not process rule: File exists
Add a new step to compare rules that are candidate to be merged to
detect colissions in set/map keys in order to skip them in the next
final merging step.
evaluate: bail out if ct saddr/daddr dependency cannot be inserted
If we have an incomplete rule like "ct original saddr" in inet
family, this function generates an error because it can't determine the required protocol
dependency, hinting at missing ip/ip6 keyword.
We should not go on in this case to avoid a redundant followup error:
nft add rule inet f c ct original saddr 1.2.3.4
Error: cannot determine ip protocol version, use "ip saddr" or "ip6 saddr" instead
add rule inet f c ct original saddr 1.2.3.4
^^^^^^^^^^^^^^^^^
Error: Could not parse symbolic invalid expression
add rule inet f c ct original saddr 1.2.3.4
After this change only the first error is shown.
Fixes: 2b29ea5f3c3e ("src: ct: add eval part to inject dependencies for ct saddr/daddr") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
parser_json: only allow concatenations with 2 or more expressions
The bison parser enforces this implicitly by grammar rules.
Because subkeys have to be conatenated via ".", notation, e.g.
"mark . ip saddr", all concatenation expressions always consist of at
least two elements.
But this doesn't apply to the json frontend which just uses an array:
it can be empty or only contain one element.
The included reproducer makes the eval stage set the "concatenation" flag
on the interval set. This prevents the needed conversion code to turn the
element values into ranges from getting run.
When parsing a verdict map json where element jumps to chain represented
as empty string.
internal:0:0-0: Error: Parsing list expression item at index 0 failed.
internal:0:0-0: Error: Invalid set elem at index 0.
internal:0:0-0: Error: Invalid set elem expression.
internal:0:0-0: Error: Parsing command array at index 2 failed.
expression: initialize list of expression to silence gcc compile warning
The helper function to translate flagcmp expression to binop expression
results in the following compile warning.
src/expression.c: In function 'list_expr_to_binop':
src/expression.c:1286:16: warning: 'last' may be used uninitialized [-Wmaybe-uninitialized]
1286 | return last;
While at it, add assert() to validate the premises where this function
can be called.
Fixes: 4d5990c92c83 ("src: transform flag match expression to binop expression from parser") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>