git.ipfire.org Git - thirdparty/nftables.git/log

build: Bump version to 1.0.9

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: validate maximum log statement prefix length

Otherwise too long string overruns the log prefix buffer.

Fixes: e76bb3794018 ("src: allow for variables in the log prefix string")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1714
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: use bash instead of /bin/sh for tests

All tests under "tests/shell" are shell scripts with shebang /bin/bash
or /bin/sh. This may seem expected, since these tests are under
"tests/shell" directory, but any executable file would work.

Anyway. The vast majority of the tests has "#!/bin/bash" as shebang.
A few tests had "#!/bin/sh" or "#!/bin/sh -e". Unify this and always use bash.
Since we anyway require bash, this is not a limitation.

Also, if we know that this is a bash script (by parsing the shebang), we
can let the test wrapper pass "-x" to the script. The next commit will
do that, and it is nicer if the shebangs are all uniform.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: add missing "vlan_8021ad_tag.nodump" file

This is an inconsistency. The test should have either a .nft or a
.nodump file. "./tools/check-tree.sh" enforces that and will in the
future run by `make check`.

Fixes: 74cf3d16d8e9 ('tests: shell: add vlan match test case')
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: suggest != in negation error message

when I run sudo nft insert rule filter FORWARD iifname "ens2f1" ip saddr not @ip_macs counter drop comment \" BLOCK ALL NON REGISTERED IP/MACS \"
I get: Error: negation can only be used with singleton bitmask values

And even I did not spot the problem immediately.
I don't think "not" should have been added, its easily confused with
"not equal"/"neq"/!= and hides that this is allegedly a binop.

At least *mention* that the commandline is asking for a binary
operation here and suggest "!=".

Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: add "-S|--setup-host" option to set sysctl for rootless tests

Most tests can run just fine without root. A few of them will fail if
/proc/sys/net/core/{wmem_max,rmem_max} is too small (as it is by default
on the host).

The easy workaround is to bump those limits once. This has to be
repeated after each reboot.

Doing that manually (every time) is cumbersome. Add a "--setup-host"
option for that.

Usage:

  $ sudo ./tests/shell/run-tests.sh -S
  Setting up host for running as rootless (requires root).
      echo 4096000 > /proc/sys/net/core/rmem_max (previous value 100000)
      echo 4096000 > /proc/sys/net/core/wmem_max (previous value 100000)

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: preserve result directory with NFT_TEST_FAIL_ON_SKIP

On a successful run, the result directory will be deleted (unless run
with "-k|--keep-logs" option or NFT_TEST_KEEP_LOGS=y).

With NFT_TEST_FAIL_ON_SKIP=y, when there are no failures but skipped
tests, also preserve the result.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: mount all of "/var/run" in "test-wrapper.sh"

After reboot, "/var/run/netns" does not exist before we run the first
`ip netns add` command. Previously, "test-wrapper.sh" would mount a
tmpfs on that directory, but that fails, if the directory doesn't exist.
You will notice this, by deleting /var/run/netns (which only root can
delete or create, and which is wiped on reboot).

Instead, mount all of "/var/run". Then we can also create /var/run/netns
directory.

This means, any other content from /var/run is hidden too. That's
probably desirable, because it means we don't depend on stuff that
happens to be there. If we would require other content in /var/run, then
the test runner needs to be aware of the requirement and ensure it's
present. But best is just to not require anything. It's only iproute2
which insists on /var/run/netns.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

doc: remove references to timeout in reset command

After Linux kernel's patch ("netfilter: nf_tables: do not refresh
timeout when resetting element") timers are not reset anymore, update
documentation to keep this in sync.

Fixes: 83e0f4402fb7 ("Implement 'reset {set,map,element}' commands")
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: add vlan match test case

Check that we can match on the 8021ad header and vlan tag, see
af84f9e447a6 ("netfilter: nft_payload: rebuild vlan header on h_proto access").

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: add test for dormant on/off/on bug

Disallow enabling/disabling a table in a single transaction.
Make sure we still allow one update, either to dormant, or
from active to dormant.

Reported-by: "Lee, Cherie-Anne" <cherie.lee@starlabs.sg>
Cc: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Cc: info@starlabs.sg
Signed-off-by: Florian Westphal <fw@strlen.de>

icmpv6: Allow matching target address in NS/NA, redirect and MLD

It was currently not possible to match the target address of a neighbor
solicitation or neighbor advertisement against a dynamic set, unlike in
IPv4.

Since they are many ICMPv6 messages with an address at the same offset,
allow filtering on the target address for all icmp types that have one.

While at it, also allow matching the destination address of an ICMPv6
redirect.

Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: never merge across non-expression statements redux 2

Turns out I also love to forget about nft-test.py -j.

Fixes: 99ab1b8feb16 ("rule: never merge across non-expression statements")
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: sets/reset_command_0: Fix drop_seconds()

The function print_times() skips any time elements which are zero, so
output may lack the ms part. Adjust the sed call dropping anything but
the minutes value to not fail in that case.

Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixes: 255ec36a11525 ("tests: shell: Stabilize sets/reset_command_0 test")
Signed-off-by: Phil Sutter <phil@nwl.cc>

scanner: restrict include directive to regular files

Similar to previous change, also check all

include "foo"

and reject those if they refer to named fifos, block devices etc.

Directories are still skipped, I don't think we can change this
anymore.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1664
Signed-off-by: Florian Westphal <fw@strlen.de>

libnftables: refuse to open onput files other than named pipes or regular files

Don't start e.g. parsing a block device.
nftables is typically run as privileged user, exit early if we
get unexpected input.

Only exception: Allow character device if input is /dev/stdin.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1664
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: never merge across non-expression statements redux

Forgot to 'git add' inet/bridge/netdev payload records.

Fixes: 99ab1b8feb16 ("rule: never merge across non-expression statements")
Signed-off-by: Florian Westphal <fw@strlen.de>

rule: never merge across non-expression statements

The existing logic can merge across non-expression statements,
if there is only one payload expression.

Example:
ether saddr 00:11:22:33:44:55 counter ether type 8021q

is turned into
counter ether saddr 00:11:22:33:44:55 ether type 8021q

which isn't the same thing.

Fix this up and add test cases for adjacent vlan and ip header
fields. 'Counter' serves as a non-merge fence.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: Fix for failing nft-f/sample-ruleset

For whatever reason, my system lacks an entry for 'sip' in
/etc/services. Assuming the service name is not relevant to the test,
just replace it by the respective port number.

Fixes: 68728014435d9 ("tests: shell: add sample ruleset reproducer")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>

datatype: use xmalloc() for allocating datatype in datatype_clone()

The returned memory will be initialized. No need to zero it first. Use
xmalloc() instead of xzalloc().

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

json: add missing map statement stub

Add map statement stub to restore compilation without json support.

Fixes: 27a2da23d508 ("netlink_linearize: skip set element expression in map statement key")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

include: include <string.h> in <nft.h>

<string.h> provides strcmp(), as such it's very basic and used
everywhere.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: fix spurious errors in sets/0036add_set_element_expiration_0

A number of changes to fix spurious errors:

- Add seconds as expiration, otherwise 14m59 reports 14m in minute
granularity, this ensures suficient time in a very slow environment with
debugging instrumentation.

- Provide expected output.

- Update sed regular expression to make 'ms' optional and use -E mode.

Fixes: adf38fd84257 ("tests: shell: use minutes granularity in sets/0036add_set_element_expiration_0")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

mergesort: avoid cloning value in expr_msort_cmp()

If we have a plain EXPR_VALUE value, there is no need to copy
it via mpz_set().

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_linearize: skip set element expression in map statement key

This fix is similar to 22d201010919 ("netlink_linearize: skip set element
expression in set statement key") to fix map statement.

netlink_gen_map_stmt() relies on the map key, that is expressed as a set
element. Use the set element key instead to skip the set element wrap,
otherwise get_register() abort execution:

nft: netlink_linearize.c:650: netlink_gen_expr: Assertion `dreg < ctx->reg_low' failed.

This includes JSON support to make this feature complete and it updates
tests/shell to cover for this support.

Reported-by: Luci Stanescu <luci@cnix.ro>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

json: expose dynamic flag

The dynamic flag is not exported via JSON, this triggers spurious
ENOTSUPP errors when restoring rulesets in JSON with dynamic flags
set on.

Fixes: 6e45b102650a2 ("nft: set: print dynamic flag when set")
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: add map support

Add basic map support to this infrastructure, eg.

!map1 ipv4_addr : mark;ok

Adding elements to map is still not supported.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: features: Fix table owner flag check

The keyword is "flags", not "flag". Resulted in a false-negative:

features/table_flag_owner.nft:4:2-5: Error: syntax error, unexpected string
flag owner;
^^^^

Fixes: 10373f0936cd3 ("tests: shell: skip flowtable-uaf if we lack table owner support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

expression: cleanup expr_ops_by_type() and handle u32 input

Make fewer assumptions about the underlying integer type of the enum.
Instead, be clear about where we have an untrusted uint32_t from netlink
and an enum. Rename expr_ops_by_type() to expr_ops_by_type_u32() to make
this clearer. Later we might make the enum as packed, when this starts
to matter more.

Also, only the code path expr_ops() wants strict validation and assert
against valid enum values. Move the assertion out of
__expr_ops_by_type(). Then expr_ops_by_type_u32() does not need to
duplicate the handling of EXPR_INVALID. We still need to duplicate the
check against EXPR_MAX, to ensure that the uint32_t value can be cast to
an enum value.

[ Remove cast on EXPR_MAX. --pablo ]

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip flowtable-uaf if we lack table owner support

Signed-off-by: Florian Westphal <fw@strlen.de>

parser_json: Default meter size to zero

JSON parser was missed when performing the same change in standard
syntax parser.

Fixes: c2cad53ffc22a ("meters: do not set a defaut meter size from userspace")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Catch nonsense ops in match statement

Since expr_op_symbols array includes binary operators and more, simply
checking the given string matches any of the elements is not sufficient.

Fixes: 586ad210368b7 ("libnftables: Implement JSON parser")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Wrong check in json_parse_ct_timeout_policy()

The conditional around json_unpack() was meant to accept a missing
policy attribute. But the accidentally inverted check made the function
either ignore a given policy or access uninitialized memory.

Fixes: c82a26ebf7e9f ("json: Add ct timeout support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Fix synproxy object mss/wscale parsing

The fields are 16 and 8 bits in size, introduce temporary variables to
parse into.

Fixes: f44ab88b1088e ("src: add synproxy stateful object support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Fix limit object burst value parsing

The field is of type uint32_t, use lower case 'i' format specifier.

Fixes: c36288dbe2ba3 ("JSON: Fix parsing and printing of limit objects")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Fix flowtable prio value parsing

Using format specifier 'I' requires a 64bit variable to write into. The
temporary variable 'prio' is of type int, though.

Fixes: 586ad210368b7 ("libnftables: Implement JSON parser")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Proper ct expectation attribute parsing

Parts of the code were unsafe (parsing 'I' format into uint32_t), the
rest just plain wrong (parsing 'o' format into char *tmp). Introduce a
temporary int variable to parse into.

Fixes: 1dd08fcfa07a4 ("src: add ct expectations support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Fix typo in json_parse_cmd_add_object()

A case of bad c'n'p in the fixed commit broke ct timeout objects
parsing.

Fixes: c7a5401943df8 ("parser_json: Fix for ineffective family value checks")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Catch wrong "reset" payload

The statement happily accepted any valid expression as payload and
assumed it to be a tcpopt expression (actually, a special case of
exthdr). Add a check to make sure this is the case.

Standard syntax does not provide this flexibility, so no need to have
the check there as well.

Fixes: 5d837d270d5a8 ("src: add tcp option reset support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: shell: add feature probe for sctp chunk matching

Skip the relavant parts of the test if nft_exthdr lacks sctp support.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: add feature probe for sets with more than one element

Kernels < 5.11 can handle only one expression per element, e.g.
its possible to attach a counter per key, or a rate limiter,
or a quota, but not two at the same time.

Add a probe file and skip the relevant tests if the feature is absent.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: skip adding catchall elements if unuspported

The test fails on kernels without catchall support, so elide this
small part.

No need to skip the test in this case, the dump file validates that
the added elements are no longer there after the timeout.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: honor NFT_TEST_FAIL_ON_SKIP variable to fail on any skipped tests

The test suite should pass with various kernels and build
configurations. Of course, that means, that some tests will be
gracefully skipped, and we don't treat that as an overall failure.

However, it should be possible to run a specific kernel (net-next?) and
build configuration, where we expect that all tests pass.

Add an option to fail the run, if any tests were skipped. This is to
ensure that we don't have broken tests that never pass.

This will make more sense with automated CI is running, to enable on a
test system and ensure that at least on that system, all tests pass.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

datatype: return const pointer from datatype_get()

"struct datatype" is for the most part immutable, and most callers deal
with const pointers. That's why datatype_get() accepts a const pointer
to increase the reference count (mutating the refcnt field).

It should also return a const pointer. In fact, all callers are fine
with that already.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

payload: use enum icmp_hdr_field_type in payload_may_dependency_kill_icmp()

Don't mix icmp_dep (enum icmp_hdr_field_type) and the uint8_t icmp_type.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

datatype: use "enum byteorder" instead of int in set_datatype_alloc()

Use the enum types as we have them.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: handle invalid etype in set_make_key()

It's not clear to me, what ensures that the etype is always valid.
Handle a NULL.

Fixes: 6e48df5329ea ('src: add "typeof" build/parse/print support')
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

include: fix missing definitions in <cache.h>/<headers.h>

The headers should be self-contained so they can be included in any
order. With exception of <nft.h>, which any internal header can rely on.

Some fixes for <cache.h>/<headers.h>.

In case of <cache.h>, forward declare some of the structs instead of
including the headers. <headers.h> uses struct in6_addr.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

proto: add missing proto_definitions for PROTO_DESC_GENEVE

While at it, make proto_definitions const. For global variables, this
allows the linker to mark the memory as read only. It's just good to do
by default.

Fixes: 156d22654003 ("src: add geneve matching support")
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: fix indentation/whitespace

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

datatype: initialize TYPE_CT_EVENTBIT slot in datatype array

Matching on ct event makes no sense since this is mostly used as
statement to globally filter out ctnetlink events, but do not crash
if it is used from concatenations.

Add the missing slot in the datatype array so this does not crash.

Fixes: 2595b9ad6840 ("ct: add conntrack event mask support")
Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

datatype: initialize TYPE_CT_LABEL slot in datatype array

Otherwise, ct label with concatenations such as:

table ip x {
        chain y {
                ct label . ct mark  { 0x1 . 0x1 }
        }
}

crashes:

../include/datatype.h:196:11: runtime error: member access within null pointer of type 'const struct datatype'
AddressSanitizer:DEADLYSIGNAL
=================================================================
==640948==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fc970d3199b bp 0x7fffd1f20560 sp 0x7fffd1f20540 T0)
==640948==The signal is caused by a READ memory access.
==640948==Hint: address points to the zero page.
sudo     #0 0x7fc970d3199b in datatype_equal ../include/datatype.h:196

Fixes: 2fcce8b0677b ("ct: connlabel matching support")
Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

limit: display default burst when listing ruleset

Default burst for limit is 5 for historical reasons but it is not
displayed when listing the ruleset.

Update listing to display the default burst to disambiguate.

man nft(8) has been recently updated to document this, no action in this
front is therefore required.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: run `nft --check` on persisted dump files

"nft --check" will trigger a rollback in kernel. The existing dump files
might hit new code paths. Take the opportunity to call the command on
the existing files.

And alternative would be to write a separate tests, that iterates over
all files. However, then we can only run all the commands sequentially
(unless we do something smart). That might be slower than the
opportunity to run the checks in parallel. More importantly, it would be
nice if the check for the dump file is clearly tied to the file's test.
So run it right after the test, from the test wrapper.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

libnftables: move init-once guard inside xt_init()

A library should not restrict being used by multiple threads or make
assumptions about how it's being used. Hence a "init_once" pattern
without no locking is racy, a code smell and should be avoided.

Note that libxtables is full of global variables and when linking against
it, libnftables cannot be used from multiple threads either. That is not
easy to fix.

Move the ugliness of "init_once" away from nft_ctx_new(), so that the
problem is concentrated closer to libxtables.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

libnftables: drop gmp_init() and mp_set_memory_functions()

Setting global handles for libgmp via mp_set_memory_functions() is very
ugly. When we don't use mini-gmp, then potentially there are other users
of the library in the same process, and every process fighting about the
allocation functions is not gonna work.

It also means, we must not reset the allocation functions after somebody
already allocated GMP data with them. Which we cannot ensure, as we
don't know what other parts of the process are doing.

It's also unnecessary. The default allocation functions for gmp and
mini-gmp already abort the process on allocation failure ([1], [2]),
just like our xmalloc().

Just don't do this.

[1] https://gmplib.org/repo/gmp/file/8225bdfc499f/memory.c#l37
[2] https://git.netfilter.org/nftables/tree/src/mini-gmp.c?id=6d19a902c1d77cb51b940b1ce65f31b1cad38b74#n286

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: perform mark datatype compatibility check from maps

Wrap datatype compatibility check into a helper function and use it for
map evaluation, otherwise the following bogus error message is
displayed:

Error: datatype mismatch, map expects packet mark, mapping expression has type integer

Add unit tests to improve coverage for this usecase.

Fixes: 5d8e33ddb112 ("evaluate: relax type-checking for integer arguments in mark statements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: expand sets and maps before evaluation

3975430b12d9 ("src: expand table command before evaluation") moved
ruleset expansion before evaluation, except for sets and maps. For
sets and maps there is still a post_expand() phase.

This patch moves sets and map expansion to allocate an independent
CMD_OBJ_SETELEMS command to add elements to named set and maps which is
evaluated, this consolidates the ruleset expansion to happen always
before the evaluation step for all objects, except for anonymous sets
and maps.

This approach avoids an interference with the set interval code which
detects overlaps and merges of adjacents ranges. This set interval
routine uses set->init to maintain a cache of existing elements. Then,
the post_expand() phase incorrectly expands set->init cache and it
triggers a bogus ENOENT errors due to incorrect bytecode (placing
element addition before set creation) in combination with user declared
sets using the flat syntax notation.

Since the evaluation step (coming after the expansion) creates
implicit/anonymous sets and maps, those are not expanded anymore. These
anonymous sets still need to be evaluated from set_evaluate() path and
the netlink bytecode generation path, ie. do_add_set(), needs to deal
with anonymous sets.

Note that, for named sets, do_add_set() does not use set->init. Such
content is part of the existing cache, and the CMD_OBJ_SETELEMS command
is responsible for adding elements to named sets.

Fixes: 3975430b12d9 ("src: expand table command before evaluation")
Reported-by: Jann Haber <jannh@selfnet.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: fix memleak in prefix evaluation with wildcard interface name

The following ruleset:

  table ip x {
        chain y {
                meta iifname { abcde*, xyz }
        }
  }

triggers the following memleak:

==6871== 16 bytes in 1 blocks are definitely lost in loss record 1 of 1
==6871==    at 0x483877F: malloc (vg_replace_malloc.c:307)
==6871==    by 0x48AD898: xmalloc (utils.c:37)
==6871==    by 0x4BC8B22: __gmpz_init2 (in /usr/lib/x86_64-linux-gnu/libgmp.so.10.4.1)
==6871==    by 0x4887E67: constant_expr_alloc (expression.c:424)
==6871==    by 0x488EF1F: expr_evaluate_prefix (evaluate.c:1138)
==6871==    by 0x488EF1F: expr_evaluate (evaluate.c:2725)
==6871==    by 0x488E76D: expr_evaluate_set_elem (evaluate.c:1662)
==6871==    by 0x488E76D: expr_evaluate (evaluate.c:2739)
==6871==    by 0x4891033: list_member_evaluate (evaluate.c:1454)
==6871==    by 0x488E2B6: expr_evaluate_set (evaluate.c:1757)
==6871==    by 0x488E2B6: expr_evaluate (evaluate.c:2737)
==6871==    by 0x48910D0: elems_evaluate (evaluate.c:4605)
==6871==    by 0x4891432: set_evaluate (evaluate.c:4711)
==6871==    by 0x48915BC: implicit_set_declaration (evaluate.c:122)
==6871==    by 0x488F18A: expr_evaluate_relational (evaluate.c:2503)
==6871==    by 0x488F18A: expr_evaluate (evaluate.c:2745)

expr_evaluate_prefix() calls constant_expr_alloc() which have already
called mpz_init2(), the second call to mpz_init2() overlaps the existing
mpz_t data memory area.

Remove extra mpz_init2() call to fix this memleak.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: fix leaking typeof_expr_data/typeof_expr_key in netlink_delinearize_set()

There are various code paths that return without freeing typeof_expr_data
and typeof_expr_key. It's not at all obvious, that there isn't a leak
that way. Quite possibly there is a leak. Fix it, or at least make the
code more obviously correct.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: simplify collecting error result in "test-wrapper.sh"

The previous pattern was unnecessarily confusing.

The "$rc_{dump,valgrind,tainted}" variable should only remember whether
that particular check failed, not the overall exit code of the test
wrapper.

Otherwise, if you want to know in which case the wrapper exits with code
122, you have to oddly follow the rc_valgrind variable.

This change will make more sense, when we add another such variable, but
which will be assigned the non-zero value at multiple places. Assigning
there the exit code of the wrapper, duplicates the places where the
condition maps to the exit code.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: colorize NFT_TEST_HAS_SOCKET_LIMITS

NFT_TEST_HAS_SOCKET_LIMITS= is similar to NFT_TEST_HAVE_* variables and
indicates a feature (or lack thereof), except that it's inverted. Maybe
this should be consolidated, however, NFT_TEST_HAS_SOCKET_LIMITS= is
detected in the root namespace, unlike the shell scripts from features.
So it's unclear how to consolidate them best.

Anyway. Still highlight a lack of the capability, as it can cause tests
to be skipped and we should see that easily.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: don't show the exit status for failed tests

Previously, for failed tests we would print the exit code

W: [FAILED] 2/2 tests/shell/testcases/listing/0013objects_0: got 1

This doesn't seem very useful. For one, we have special exit codes like
0 (OK), 77 (SKIPPED), 124 (DUMP FAIL), 123 (TAINTED), 122 (VALGRIND).
Any other exit code is just an arbitrary failure. We don't define any
special codes, and printing them is not useful.

Note that further exit codes (118 - 121) are reserved, and could be
special purposed, when there is a use.

You can find the real exit code from the test in the result data in the
"rc-failed" file.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: set C locale in "run-tests.sh"

The tests should run always the same, regardless of the user's language
settings. Set LANG=C and LC_ALL=C and unset LANGUAGE. If some part wants
to test a different language, it would set it explicitly. They anyway
wouldn't want to depend on something from the user's environment.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: fix preserving ruleset diff after test

We want to delete the file in the case when there was no diff (and we
expect the file to be empty). The condition was wrong.

Fixes: 55fe071cd193 ('tests/shell: cleanup result handling in "test-wrapper.sh"')
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: check diff in "maps/typeof_maps_0" and "sets/typeof_sets_0" test

These tests run different variants based on NFT_TEST_HAVE_osf support.
Consequently, we cannot check the pre-generated diff.

Instead, construct what we expect dynamically in the script, and compare
the ruleset against that.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: implement NFT_TEST_HAVE_json feature detection as script

No more need to special case the "run a script" approach for detecting
the json feature. Use the new mechanism instead.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: skip reset tests if kernel lacks support

reset is implemented via flush + extra attribute, so older kernels
perform a flush. This means .nft doesn't work, we need to check
if the individual set contents/sets are still in place post-reset.

Make this generic and permit use of feat.sh in addition to the simpler
foo.nft feature files.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: skip test cases if ct expectation and/or timeout lacks support

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: skip test cases involving osf match if kernel lacks support

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: skip catchall tests if kernel lacks support

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: skip destroy tests if kernel lacks support

Destroy support was added for table/flowtable/chain etc. in a single
commit, so no need to add capability tests for each destroy subtype.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: skip inet ingress tests if kernel lacks support

Split the bridge autoremove test to a new file.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: skip some tests if kernel lacks netdev egress support

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: skip bitshift tests if kernel lacks support

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: skip inner matching tests if unsupported

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: skip map query if kernel lacks support

On recent kernels one can perform a lookup in a map without a destination
register (i.e., treat the map like a set -- pure existence check).

Add a feature probe and work around the missing feature in
typeof_maps_add_delete: do the test with a simplified ruleset,

Indicate skipped even though a reduced test was run (earlier errors
cause a failure) to not trigger dump validation error.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: skip netdev_chain_0 if kernel requires netdev device

This test case only works on kernel 6.4+.

Add feature probe for this and tag the test accordingly using
the scheme added by Thomas Haller in

"tests/shell: skip tests if nft does not support JSON mode"

so that run-test.sh skips it if kernel requires a device.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: add and use chain binding feature probe

Alter 30s-stress to suppress anon chains when its unuspported.

Note that 30s-stress is optionally be run standalone, so also update
the test script.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: cleanup creating dummy interfaces in tests

In "tests/shell/testcases/chains/netdev_chain_0", calling "trap ...
EXIT" multiple times does not work. Fix it, by calling one cleanup
function.

Note that we run in separate namespaces, so the cleanup is usually not
necessary. Still do it, we might want to run without unshare (via
NFT_TEST_UNSHARE_CMD=""). Without unshare, it's important that the
cleanup always works. In practice it might not, for example, "trap ...
EXIT" does not run for SIGTERM. A leaked interface might break the
follow up test and tests interfere with each other.

Try to workaround that by first trying to delete the interface.

Also failures to create the interfaces are not considered fatal. I don't
understand under what circumstances this might fail, note that there are
other tests that create dummy interface and don't "exit 77" on failure.
We want to know when something odd is going on.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: suggest 4Mb /proc/sys/net/core/{wmem_max,rmem_max} for rootless

2Mb was not enough to pass "tests/shell/testcases/sets/0030add_many_elements_interval_0"
in an unprivileged/rootless namespace.

Instead, bump the suggestion to 4Mb, which lets the test pass.

Note that the 4Mb are only the recommended value when running the test
as rootless, and is used to autodetect NFT_TEST_HAS_SOCKET_LIMITS=y.
You can set whatever values are suitable for your environment, and
explicitly indicate whether the limits are appropriate or not via
NFT_TEST_HAS_SOCKET_LIMITS=n|y.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: add sample ruleset reproducer

Changes on kernel side no longer permit transactions that reference
a chain after it is bound.

This test case breaks when run with nftables 1.0.6 and earlier.
Keep this as a test case in tree to catch any future problems in
this area.

Link: https://lore.kernel.org/netfilter-devel/20230911213750.5B4B663206F5@dd20004.kasserver.com/
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: colorize NFT_TEST_SKIP_/NFT_TEST_HAVE_ in test output

Having a "SKIP" option as "y" or a "HAVE" option as "n", is note worthy
because tests may be skipped based on that.

Colorize, to make it easier to see in the test output.

Signed-off-by: Thomas Haller <thaller@redhat.com>

tests/shell: add feature probing via "features/*.nft" files

Running selftests on older kernels makes some of them fail very early
because some tests use features that are not available on older kernels,
e.g. -stable releases.

Known examples:
- inner header matching
- anonymous chains
- elem delete from packet path

Also, some test cases might fail because a feature isn't compiled in,
such as netdev chains.

This adds a feature-probing mechanism to shell tests.

Simply drop a 'nft -f' compatible file with a .nft suffix into
"tests/shell/features". "run-tests.sh" will load it via `nft --check`
and will export

NFT_TEST_HAVE_${feature}=y|n

Here ${feature} is the basename of the .nft file without file extension.
It must be all lower-case.

This extends the existing NFT_TEST_HAVE_json= feature detection.
Similarly, NFT_TEST_REQUIRES(NFT_TEST_HAVE_*) tags work to easily skip a
test.

The test script that cannot fully work without the feature should either
skip the test entirely (NFT_TEST_REQUIRES(NFT_TEST_HAVE_*)), or run a
reduced/modified test. If a modified test was run and passes, it is
still a good idea to mark the overall result as skipped (exit 77)
instead of claiming success to the modified test. We want to know when
not the full test was running, while we want to test as much as we can.

This patch is based on Florian's feature probing patch.

Originally-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: fix dump validation message

This test output depends on CONFIG_HZ:
- update @y { ip saddr timeout 1d2h3m4s8ms }
+ update @y { ip saddr timeout 1d2h3m4s10ms }

The dump record is with HZ=1000, on HZ=250 we get failure.

Remove the dump file for now.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests/build: capture more output from "tests/build/run-tests.sh" script

Dropping stdout for various build tests makes it hard to understand what
happens, when a build fails. Redirect both stdout and stderr to the log
files for easier debugging.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: honor CLICOLOR_FORCE to force coloring in run-tests.sh

We honor NO_COLOR= to disable coloring, let's also honor CLICOLOR_FORCE=
to enable it.

The purpose will be for `make` calling the script and redirecting to a
file, while enabling colors.

See-also: https://bixense.com/clicolors/

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: accept $NFT_TEST_TMPDIR_TAG for the result directory

We allow the user to set "$TMPDIR" to affect where the "nft-test.*"
directory is created. However, we don't allow the user to specify the
exact location, so the user doesn't really know which directory was
created.

One remedy is that the test will also create the symlink
"$TMPDIR/nft-test.latest.$USER" to point to the last test result.
However, if you run multiple tests in parallel, that is not reliable to
find the test results.

Accept $NFT_TEST_TMPDIR_TAG and use it as part of the generated
filename. That way, the caller can set it to a unique tag, and find the
directory later based on that. For example

  export TMPDIR=/tmp
  export NFT_TEST_TMPDIR_TAG=".$(uuidgen)"
  ./tests/shell/run-tests.sh
  ls -lad "$TMPDIR/nft-test."*"$NFT_TEST_TMPDIR_TAG"*/

will work reliably -- as long as the tag is chosen uniquely.

The reason to not allow the user to specify the directory name directly,
is because we want that tests results follow the well-known pattern
"/tmp/nft-test*".

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: exit 77 from "run-tests.sh" if all tests were skipped

If there are multiple tests and some of them pass and some are skipped,
the overall result should be success (zero). Because likely the user
just selected a bunch of tests (or all of them). So skipping some tests
does not mean that the entire run is not a success.

However, if all tests are skipped, then mark the overall result as
skipped too. The more common case is if you only run one single test,
then we want to know, that the test didn't run.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tools: add "tools/check-tree.sh" script to check consistency of nft dumps

The script performs some checks on the source tree, and fails if
any problems are found.

Currently it only checks for the dumps files, but it shall be extended
to perform various consistency checks of the source tree.

This script was already successful at finding issues with the dumps.
Running it helps to make sure we don't make mistakes.

Later it should also integrate with `make check` and/or be called
from CI.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: in find_tests() use C locale for sorting tests names

It makes more sense, that the sort order does not depend on the user's
locale.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: special handle base path starting with "./"

When we auto detect the tests with `tests/shell/run-tests.sh -L`, then
commonly the NFT_TEST_BASEDIR starts with a redundant "./". That's a bit
ugly.

Instead, special handle that case and remove the prefix. The effect is
that `tests/shell/run-tests.sh -L` shows

tests/shell/testcases/bitwise/0040mark_binop_0

instead of

./tests/shell/testcases/bitwise/0040mark_binop_0

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: add missing nft/nodump files for tests

Three tests didn't have a nft/nodump file, because previously I only
generated files on Fedora kernel, where those tests are failing.

Generate them on CentOS-Stream-9 with kernel 5.14.0-354.el9.x86_64.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: drop unstable dump for "transactions/0051map_0" test

The file "tests/shell/testcases/transactions/dumps/0051map_0.nft" gets
generated differently on Fedora 38 (6.4.14-200.fc38.x86_64) and
CentOS-Stream-9 (5.14.0-354.el9.x86_64). It's not stable.

    diff --git c/tests/shell/testcases/transactions/dumps/0051map_0.nft w/tests/shell/testcases/transactions/dumps/0051map_0.nft
    index 59d69df70e61..fa7df9f93757 100644
    --- c/tests/shell/testcases/transactions/dumps/0051map_0.nft
    +++ w/tests/shell/testcases/transactions/dumps/0051map_0.nft
    @@ -1,7 +1,11 @@
     table ip x {
    +    chain w {
    +    }
    +
         chain m {
         }

         chain y {
    +         ip saddr vmap { 1.1.1.1 : jump w, 2.2.2.2 : accept, 3.3.3.3 : goto m }
         }
     }

Drop it.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: remove spurious .nft dump files

These are left-over dumps ([1]), or dumps generated with the wrong name
([2]). Remove the files.

[1] commit eb14363d44ce ('tests: shell: move chain priority and policy to chain folder')
[2] commit b4775dec9f80 ('src: ingress inet support')

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: add option to shuffle execution order of tests

The user can set NFT_TEST_SHUFFLE_TESTS=y|n to have the tests shuffled
randomly. The purpose of shuffling is to find tests that depend on each
other, or would break when run in unexpected order.

If unspecified, by default tests are shuffled if no tests are selected
on the command line.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: add "random-source.sh" helper for random-source for sort/shuf

Commands `sort` and `shuf` have a "--random-source" argument. That's
useful for generating stable, reproducible "random" output.

However, we want to do this based on a fixed seed, while the
"--random-source" expects a stream of randomness. Add a helper script
for that.

Also, use the stable randomness for shuf in the test
"tests/shell/testcases/sets/automerge_0".

See-also: https://www.gnu.org/software/coreutils/manual/html_node/Random-sources.html#Random-sources

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: export NFT_TEST_RANDOM_SEED variable for tests

Let "run-tests.sh" export a NFT_TEST_RANDOM_SEED variable, set to
a decimal, random integer (in the range of 0 to 0x7FFFFFFF).

The purpose is to provide a seed to tests for randomization.
Randomizing tests is very useful to increase the coverage while not
testing all combinations (which might not be practical).

The point of NFT_TEST_RANDOM_SEED is that the user can set the
environment variable so that the same series of random events is used.
That is useful for reproducing an issue, that is known to happen with a
certain seed.

- by default, if the user leaves NFT_TEST_RANDOM_SEED unset or empty,
  the script generates a number using $SRANDOM.
- if the user sets NFT_TEST_RANDOM_SEED to an integer it is taken
  as is (modulo 0x80000000).
- otherwise, calculate a number by hashing the value of
  $NFT_TEST_RANDOM_SEED.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

datatype: fix leak and cleanup reference counting for struct datatype

Test `./tests/shell/run-tests.sh -V tests/shell/testcases/maps/nat_addr_port`
fails:

==118== 195 (112 direct, 83 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3
==118==    at 0x484682C: calloc (vg_replace_malloc.c:1554)
==118==    by 0x48A39DD: xmalloc (utils.c:37)
==118==    by 0x48A39DD: xzalloc (utils.c:76)
==118==    by 0x487BDFD: datatype_alloc (datatype.c:1205)
==118==    by 0x487BDFD: concat_type_alloc (datatype.c:1288)
==118==    by 0x488229D: stmt_evaluate_nat_map (evaluate.c:3786)
==118==    by 0x488229D: stmt_evaluate_nat (evaluate.c:3892)
==118==    by 0x488229D: stmt_evaluate (evaluate.c:4450)
==118==    by 0x488328E: rule_evaluate (evaluate.c:4956)
==118==    by 0x48ADC71: nft_evaluate (libnftables.c:552)
==118==    by 0x48AEC29: nft_run_cmd_from_buffer (libnftables.c:595)
==118==    by 0x402983: main (main.c:534)

I think the reference handling for datatype is wrong. It was introduced
by commit 01a13882bb59 ('src: add reference counter for dynamic
datatypes').

We don't notice it most of the time, because instances are statically
allocated, where datatype_get()/datatype_free() is a NOP.

Fix and rework.

- Commit 01a13882bb59 comments "The reference counter of any newly
  allocated datatype is set to zero". That seems not workable.
  Previously, functions like datatype_clone() would have returned the
  refcnt set to zero. Some callers would then then set the refcnt to one, but
  some wouldn't (set_datatype_alloc()). Calling datatype_free() with a
  refcnt of zero will overflow to UINT_MAX and leak:

       if (--dtype->refcnt > 0)
          return;

  While there could be schemes with such asymmetric counting that juggle the
  appropriate number of datatype_get() and datatype_free() calls, this is
  confusing and error prone. The common pattern is that every
  alloc/clone/get/ref is paired with exactly one unref/free.

  Let datatype_clone() return references with refcnt set 1 and in
  general be always clear about where we transfer ownership (take a
  reference) and where we need to release it.

- set_datatype_alloc() needs to consistently return ownership to the
  reference. Previously, some code paths would and others wouldn't.

- Replace

    datatype_set(key, set_datatype_alloc(dtype, key->byteorder))

  with a __datatype_set() with takes ownership.

Fixes: 01a13882bb59 ('src: add reference counter for dynamic datatypes')
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: ensure vgdb-pipe files are deleted from "nft-valgrind-wrapper.sh"

When the valgrind process gets killed, those files can be left over.
They are located in the original $TMPDIR (usually /tmp). They should be
cleaned up.

I tried to cleanup the files from withing "nft-valgrind-wrapper.sh"
itself via a `trap`, but it doesn't work. Instead, let "run-tests.sh"
delete all files with a matching pattern.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>