evaluate: simplify set to list normalisation for device expressions
When evaluating the list of devices, two expressions are possible:
- EXPR_LIST, which is the expected expression type to store the list of
chain/flowtable devices.
- EXPR_SET, in case that a variable is used to express the device list.
This is because it is not possible to know if the variable defines
set elements or devices. Since sets are more common, EXPR_SET is used.
In the latter case, this list expressed as EXPR_SET gets translated to
EXPR_LIST. Before such translation, the EXPR_VARIABLE is evaluated,
therefore all variables are gone and only EXPR_SET_ELEM are possible in
expr_set_to_list().
Remove the EXPR_VALUE and EXPR_VARIABLE cases in expr_set_to_list()
since those are never seen. Add BUG() in case any other expressions than
EXPR_SET_ELEM is seen.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch extends the tunnel metadata object to define geneve tunnel
specific configurations:
table netdev x {
tunnel y {
id 10
ip saddr 192.168.2.10
ip daddr 192.168.2.11
sport 10
dport 20
ttl 10
geneve {
class 0x1010 opt-type 0x1 data "0x12345678"
class 0x1020 opt-type 0x2 data "0x87654321"
class 0x2020 opt-type 0x3 data "0x87654321abcdeffe"
}
}
}
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch allows you to attach tunnel metadata through the tunnel
statement.
The following example shows how to redirect traffic to the erspan0
tunnel device which will take the tunnel configuration that is
specified by the ruleset.
table netdev x {
tunnel y {
id 10
ip saddr 192.168.2.10
ip daddr 192.168.2.11
sport 10
dport 20
ttl 10
erspan {
version 1
index 2
}
}
chain x {
type filter hook ingress device veth0 priority 0;
ip daddr 10.141.10.123 tunnel name y fwd to erspan0
}
}
This patch also allows to match on tunnel metadata via tunnel expression.
Joint work with Fernando.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Tue, 26 Aug 2025 17:05:17 +0000 (19:05 +0200)]
Makefile: Fix for 'make distcheck'
Make sure the files in tools/ are added to the tarball and that the
created nftables.service file is removed upon 'make clean'.
Fixes: c4b17cf830510 ("tools: add a systemd unit for static rulesets") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
mnl: continue on ENOBUFS errors when processing batch
A user reports that:
nft -f ruleset.nft
fails with:
netlink: Error: Could not process rule: No buffer space available
This was triggered by:
table ip6 fule {
set domestic_ip6 {
type ipv6_addr
flags dynamic,interval
elements = $domestic_ip6
}
chain prerouting {
type filter hook prerouting priority 0;
ip6 daddr @domestic_ip6 counter
}
}
where $domestic_ip6 contains a large number of IPv6 addresses.
This set declaration is not supported currently, because dynamic sets
with intervals are not supported, then every IPv6 address that is added
triggers an error, overruning the userspace socket buffer with lots of
NLMSG_ERROR messages (or too big NLMSG_ERROR message to fit into the
socket buffer).
In the particular context of batch processing, ENOBUFS is just an
indication that too many errors have occurred. The kernel cannot store
any more NLMSG_ERROR messages into the userspace socket buffer.
However, there are still NLMSG_ERROR messages in the socket buffer to be
processed that can provide a hint on what is going on.
Instead of breaking on ENOBUFS in batches, continue error processing.
After this patch, the ruleset above displays:
ruleset.nft:2367:7-18: Error: Could not process rule: Operation not supported
set domestic_ip6 {
^^^^^^^^^^^^
ruleset.nft:2367:7-18: Error: Could not process rule: No such file or directory
set domestic_ip6 {
^^^^^^^^^^^^
Florian Westphal [Wed, 20 Aug 2025 12:44:43 +0000 (14:44 +0200)]
mnl: silence compiler warning
gcc 14.3.0 reports this:
src/mnl.c: In function 'mnl_nft_chain_add':
src/mnl.c:916:25: warning: 'nest' may be used uninitialized [-Wmaybe-uninitialized]
916 | mnl_attr_nest_end(nlh, nest);
I guess its because compiler can't know that the conditions cannot change
in-between and assumes nest_end() can be called without nest_start().
Fixes: 01277922fede ("src: ensure chain policy evaluation when specified") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
fib: restore JSON output for relational expressions
JSON output for the fib expression changed:
- "result": "check"
+ "result": "oif"
This breaks third party JSON parsers, revert this change for relational
expressions only via workaround until there are clear rules on how to
proceed with JSON schema updates.
As for set and map statements, keep this new "check" result type since
it is not possible to peek on rhs in such case to guess if the
NFT_FIB_F_PRESENT flag needs to be set on.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1806 Fixes: f4b646032acf ("fib: allow to check if route exists in maps") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jan Engelhardt [Thu, 17 Apr 2025 14:48:33 +0000 (16:48 +0200)]
tools: add a systemd unit for static rulesets
There is a customer request (bugreport) for wanting to trivially load a ruleset
from a well-known location on boot, forwarded to me by M. Gerstner. A systemd
service unit is hereby added to provide that functionality. This is based on
various distributions attempting to do same, for example,
https://src.fedoraproject.org/rpms/nftables/tree/rawhide
https://gitlab.alpinelinux.org/alpine/aports/-/blob/master/main/nftables/nftables.initd
https://gitlab.archlinux.org/archlinux/packaging/packages/nftables Acked-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
chain_stmt_destroy is called from bison destructor, but it turns out
this function won't free the associated chain.
There is no memory leak when bison can parse the input because the chain
statement evaluation step queues the embedded anon chain via cmd_alloc.
Then, a later cmd_free() releases the chain and the embedded statements.
In case of a parser error, the evaluation step is never reached and the
chain object leaks, e.g. in
foo bar jump { return }
Bison calls the right destructor but the anonon chain and all
statements/expressions in it are not released:
HEAP SUMMARY:
in use at exit: 1,136 bytes in 4 blocks
total heap usage: 98 allocs, 94 frees, 840,255 bytes allocated
1,136 (568 direct, 568 indirect) bytes in 1 blocks are definitely lost in loss record 4 of 4
at: calloc (vg_replace_malloc.c:1675)
by: xzalloc (in libnftables.so.1.1.0)
by: chain_alloc (in libnftables.so.1.1.0)
by: nft_parse (in libnftables.so.1.1.0)
by: __nft_run_cmd_from_filename (in libnftables.so.1.1.0)
by: nft_run_cmd_from_filename (in libnftables.so.1.1.0)
To resolve this, make chain_stmt_destroy also release the embedded
chain. This in turn requires chain refcount increases whenever a chain
is assocated with a chain statement, else we get double-free of the
chain.
Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Tue, 12 Aug 2025 15:31:47 +0000 (17:31 +0200)]
json: Do not reduce single-item arrays on output
This is a partial revert of commit a740f2036ad0d ("json: Introduce
json_add_array_new()"), keeping the function but eliminating its primary
task which is to replace arrays of size 1 by their only item. While
support for this on input is convenient for users, it means extra casing
in JSON output parsers to cover for it. The minor reduction in output
size does not justify that.
Fixes: a740f2036ad0d ("json: Introduce json_add_array_new()") Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1806 Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 13 Aug 2025 14:14:08 +0000 (16:14 +0200)]
tests: py: Fix tests added for 'icmpv6 taddr' support
There was a duplicate test, also stored JSON equivalents should match
input as much as possible. The expected deviation in output (just like
with standard syntax) is stored in the .json.output file instead.
Fixes: 2e86f45d0260a ("icmpv6: Allow matching target address in NS/NA, redirect and MLD") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 13 Aug 2025 14:06:46 +0000 (16:06 +0200)]
tests: py: Drop stale entry from ip/snat.t.payload
This payload actually belongs to ip/dnat.t.payload, fixed commit added
it to the wrong file.
Fixes: 8f3048954d40d ("evaluate: postpone transport protocol match check after nat expression evaluation") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 13 Aug 2025 13:50:54 +0000 (15:50 +0200)]
tests: py: Drop stale entries from ip6/{ct,meta}.t.json
Looks like these were added by accident, fixed commit did not add these
test cases.
Fixes: 8221d86e616bd ("tests: py: add test-cases for ct and packet mark payload expressions") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 13 Aug 2025 12:51:39 +0000 (14:51 +0200)]
tests: py: Drop redundant payloads for ip/ip.t
Each was present multiple times, introduced probably by copying from a
respective .got file.
Fixes: 77def2d43466e ("netlink_delinearize: support for bitfield payload statement with binary operation") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 13 Aug 2025 12:32:11 +0000 (14:32 +0200)]
tests: py: Drop stale entry from inet/tcp.t.json
The test was changed but JSON equivalents not updated. Commit c0b685951fabb ("json: fix parse of flagcmp expression") then added an
equivalent matching the changed test, so just drop the old one.
Fixes: c3d57114f119b ("parser_bison: add shortcut syntax for matching flags without binary operations") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 13 Aug 2025 12:17:46 +0000 (14:17 +0200)]
tests: py: Drop stale payload from any/rawpayload.t.payload
There never was a test corresponding to this payload.
Fixes: 857904bdfaf7a ("tests: py: extend raw payload match tests") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 13 Aug 2025 12:12:06 +0000 (14:12 +0200)]
tests: py: Drop duplicate test in any/meta.t
The expected invalid meta hour argument of 24:00 is tested already.
Fixes: a6717ae094db2 ("evaluate: Fix for 'meta hour' ranges spanning date boundaries") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
While this is likely a bug in socat, working around it is simple so
let's tackle it on this side, too.
Note: The second chunk is sufficient to resolve the issue, probably
because the initial ruleset's rate limiter does not trigger during TCP
handshake. Adjust it anyway to keep things consistent.
Suggested-by: Florian Westphal <fw@strlen.de> Fixes: 9352fa7fb0a31 ("test: shell: Add rate_limit test case for 'limit statement'.") Cc: Yi Chen <yiche@redhat.com> Signed-off-by: Phil Sutter <phil@nwl.cc>
Phil Sutter [Thu, 31 Jul 2025 10:40:11 +0000 (12:40 +0200)]
doc: nft.8: Minor NAT STATEMENTS section review
Synopsis insinuates an IP address argument is mandatory in snat/dnat
statements although specifying ports alone is perfectly fine. Adjust it
accordingly and add a paragraph briefly describing the behaviour.
While at it, update the redirect statement description with more
relevant examples, the current one is wrong: To *only* alter the
destination port, dnat statement must be used, not redirect.
Fixes: 6908a677ba04c ("nft.8: Enhance NAT documentation") Signed-off-by: Phil Sutter <phil@nwl.cc>
Phil Sutter [Fri, 25 Jul 2025 15:28:29 +0000 (17:28 +0200)]
evaluate: Fix for 'meta hour' ranges spanning date boundaries
Introduction of EXPR_RANGE_SYMBOL type inadvertently disabled sanitizing
of meta hour ranges where the lower boundary has a higher value than the
upper boundary. This may happen outside of user control due to the fact
that given ranges are converted to UTC which is the kernel's native
timezone.
Perform the conditional match and op inversion with the new RHS
expression type as well after expanding it so values are comparable.
Since this replaces the whole range expression, make it replace the
relational's RHS entirely.
While at it extend testsuites to cover these corner-cases.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1805 Fixes: 347039f64509e ("src: add symbol range expression to further compact intervals") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Tue, 29 Jul 2025 15:55:17 +0000 (17:55 +0200)]
parser_json: Parse into symbol range expression if possible
Apply the bison parser changes in commit 347039f64509e ("src: add symbol
range expression to further compact intervals") to JSON parser as well.
Fixes: 347039f64509e ("src: add symbol range expression to further compact intervals") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Annotate and combine the 'etype' and 'symtype' checks done in bison
parser for readability and because JSON parser will start doing the same
in a follow-up patch.
Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
parser_bison: fix memory leak when parsing flowtable hook declaration
When the hook location is invalid we error out but we do leak both
the priority expression and the flowtable name. Example:
valgrind --leak-check=full nft -f flowtable-parser-err-memleak
[..] Error: unknown chain hook
hook enoent priority filter + 10
^^^^^^
[..]
2 bytes in 1 blocks are definitely lost in loss record 1 of 3
at: malloc (vg_replace_malloc.c:446)
by: strdup (in libc.so.6)
by: xstrdup (in libnftables.so.1.1.0)
by: nft_lex (in libnftables.so.1.1.0)
by: nft_parse (in libnftables.so.1.1.0)
by: __nft_run_cmd_from_filename (in libnftables.so.1.1.0)
by: nft_run_cmd_from_filename (in libnftables.so.1.1.0)
First two reports are due to the priority expression: this needs to call
expr_free(). Third report is due to the flowtable name, the destructor
was missing so add one.
After fix:
All heap blocks were freed -- no leaks are possible
Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: maps: check element data mapping matches set data definition
This change is similar to 7f4d7fef31bd ("evaluate: check element key vs. set definition")
but this time for data mappings.
The included bogon asserts with:
BUG: invalid data expression type catch-all set element
nft: src/netlink.c:596: __netlink_gen_data: Assertion `0' failed.
after:
internal:0:0-0: Error: Element mapping mismatches map definition, expected packet mark, not 'invalid'
Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
json: BASECHAIN flag no longer implies presence of priority expression
This is a followup to 44ea19364637 ("src: BASECHAIN flag no longer implies presence of priority expression"):
feeding the same bogon file into nft -j we get a very similar crash.
When the json is parsed without returning an error the test
fails. Its supposed to log the name of the failed input which
it does for -f but not for -j -f.
Resolve this by validating that the set element key matches the set key
definition.
After this, loading the bogon file gives:
Error: Element mismatches set definition, expected concatenation of (IPv4 address, integer), not 'ICMP type'
elements = {redirect }
^^^^^^^^
Update test to enclose flowtable device names in quotes, otherwise,
it reports a spurious issue:
@@ -1,2 +1,3 @@
add table ip t
-add flowtable ip t ft { hook ingress priority 0; devices = { lo }; }
+add flowtable ip t ft { hook ingress priority 0; devices = { "lo" }; }
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This should help catch subtle bugs due to type confusion.
assert() could be later enabled only in debugging builds to run tests,
keep it by now.
compound_expr_*() still works and it needs the same initial layout for
all of these expressions:
struct list_head expressions;
unsigned int size;
This is implicitly reducing the size of one of the largest structs
in the union area of struct expr, still EXPR_SET_ELEM remains the
largest so no gain is achieved in this iteration.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: add conntrack information to trace monitor mode
Upcoming kernel change provides the packets conntrack state in the
trace message data.
This allows to see if packet is seen as original or reply, the conntrack
state (new, establieshed, related) and the status bits which show if e.g.
NAT was applied. Alsoi include conntrack ID so users can use conntrack
tool to query the kernel for more information via ctnetlink.
This improves debugging when e.g. packets do not pick up the expected
NAT mapping, which could e.g. also happen because of expectations
following the NAT binding of the owning conntrack entry.
Example output ("conntrack: " lines are new):
trace id 32 t PRE_RAW packet: iif "enp0s3" ether saddr [..]
trace id 32 t PRE_RAW rule tcp flags syn meta nftrace set 1 (verdict continue)
trace id 32 t PRE_RAW policy accept
trace id 32 t PRE_MANGLE conntrack: ct direction original ct state new ct id 2641368242
trace id 32 t PRE_MANGLE packet: iif "enp0s3" ether saddr [..]
trace id 32 t ct_new_pre rule jump rpfilter (verdict jump rpfilter)
trace id 32 t PRE_MANGLE policy accept
trace id 32 t INPUT conntrack: ct direction original ct state new ct status dnat-done ct id 2641368242
trace id 32 t INPUT packet: iif "enp0s3" [..]
trace id 32 t public_in rule tcp dport 443 accept (verdict accept)
v3: remove clash bit again, kernel won't expose it anymore.
v2: add more status bits: helper, clash, offload, hw-offload.
add flag explanation to documentation.
Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
tests: py: re-enables nft-test.py to load the local nftables.py
This is a needed follow-up of commit ce443afc21455 ("py: move
package source into src directory") from 2023. Since that change,
nft-test.py started using the host's nftables.py instead of the local
one. But since nft-test.py passes the local src/.libs/libnftables.so.1
as parameter when instantiating the Nftables class, we did nevertheless
use the local libnftables.
Fixes: ce443afc21455 ("py: move package source into src directory") Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Zhongqiu Duan <dzq.aishenghu0@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Michal Koutný [Mon, 30 Jun 2025 14:15:26 +0000 (16:15 +0200)]
doc: Clarify cgroup meta variable
The documentation mentions control group id where the meaning is a class
id associated to the cgroup of a socket. This used to be fine until
there came cgroup v2 that use similar terminolgy (cgroup id) for very
different thing -- a numeric identifier of a particular (v2) cgroup.
This contemporary cgroup id isn't exposed by netfilter (v2 matching is
based on paths externally). Fix the docs and decrease confusion by more
precise description of the metavariable.
[ Added comment in description to refer to socket cgroupv2 --pablo ]
Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
f686a17eafa0 ("fib: Support existence check") adds EXPR_F_BOOLEAN as a
workaround to infer from the rhs of the relational expression if the fib
lookup wants to check for a specific output interface or, instead,
simply check for existence. This, however, does not work with maps.
The NFT_FIB_F_PRESENT flag can be used both with NFT_FIB_RESULT_OIF and
NFT_FIB_RESULT_OFINAME, my understanding is that they serve the same
purpose which is to check if a route exists, so they are redundant.
Add a 'check' fib result to check for routes while still keeping the
inference workaround for backward compatibility, but prefer the new
syntax in the listing.
Update man nft(8) and tests/py.
Fixes: f686a17eafa0 ("fib: Support existence check") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The test was technically incorrect: Instead of detecting whether
interface hooks are name-based or not, it actually tested whether
netdev-family chains are removed along with their last hook.
Since the latter behaviour is established in kernel commit fc0133428e7a
("netfilter: nf_tables: Tolerate chains with no remaining hooks") and
thus independent from the name-based hooks change, treating both as the
same kernel feature is not acceptable.
Fix this by detecting whether a netdev-family chain may be added despite
specifying a non-existent interface to hook into. Keep the old check
around with a better name, although unused for now.
Reported-by: Florian Westphal <fw@strlen.de> Fixes: f27e5abd81f29 ("tests: shell: Adjust to ifname-based hooks") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Thu, 26 Jun 2025 00:52:48 +0000 (02:52 +0200)]
evaluate: prevent merge of sets with incompatible keys
Its not enough to check for interval flag, this would assert in interval
code due to concat being passed to the interval code:
BUG: unhandled key type 13
After fix:
same_set_name_but_different_keys_assert:8:6-7: Error: set already exists with
different datatype (concatenation of (IPv4 address, network interface index) vs
network interface index)
set s4 {
^^
This also improves error verbosity when mixing datamap and objref maps:
invalid_transcation_merge_map_and_objref_map:9:13-13:
Error: map already exists with different datatype (IPv4 address vs string)
.. instead of 'Cannot merge map with incompatible existing map of same name'.
The 'Cannot merge map with incompatible existing map of same name' check
is kept in place to catch when ruleset contains a set and map with same name
and same key definition.
Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 23 Jun 2025 19:37:31 +0000 (21:37 +0200)]
evaluate: check that set type is identical before merging
Reject maps and sets of the same name:
BUG: invalid range expression type catch-all set element
nft: src/expression.c:1704: range_expr_value_low: Assertion `0' failed.
After:
Error: Cannot merge set with existing datamap of same name
set z {
^
v2:
Pablo points out that we shouldn't merge datamaps (plain value) and objref
maps either, catch this too and add another test:
nft --check -f invalid_transcation_merge_map_and_objref_map
invalid_transcation_merge_map_and_objref_map:9:13-13: Error: Cannot merge map with incompatible existing map of same name
We should also make sure that both data (for map case) and
set keys are identical, this is added in a followup patch.