git.ipfire.org Git - thirdparty/nftables.git/log

tests: shell: skip sets/sets_with_ifnames if no pipapo backend is available

Skip this by now for older kernels until someone detaches the tests that
require the pipapo set backend.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: restore pipapo and chain binding coverage in standalone 30s-stress

Do not disable pipapo and chain binding coverage for standalone runs by
default. Instead, turn them on by default and allow users to disable them
through:

# export NFT_TEST_HAVE_chain_binding=n; bash tests/shell/testcases/transactions/30s-stress 3600
...
running standalone with:
NFT_TEST_HAVE_chain_binding=n
NFT_TEST_HAVE_pipapo=y

given feature detection is not available in this case, thus, user has to
provide an explicit hint on what this kernel supports.

Fixes: c5b5b1044fdd ("tests/shell: add feature probing via "features/*.nft" files")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip pipapo set backend in transactions/30s-stress

Skip tests with concatenations and intervals if kernel does not support it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip if kernel does not support flowtable with no devices

Originally, flowtables required devices in place to work, this was later
relaxed to allow flowtable with no initial devices, see 05abe4456fa3
("netfilter: nf_tables: allow to register flowtable with no devices").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip if kernel does not support flowtable counter

Check if kernel provides flowtable counter supports which is available
since 53c2b2899af7 ("netfilter: flowtable: add counter support").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tools: check for consistency of .json-nft dumps in "check-tree.sh"

Add checks for the newly introduced .json-nft dump files.

Optimally, every test that has a .nft dump should also have a .json-nft
dump, and vice versa.

However, currently some JSON tests fail to validate, and are missing.
Only flag those missing files as warning, without failing the script.
The reason to warn about this, is that we really should fix those tests,
and having a annoying warning increases the pressure and makes it
discoverable.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tools: check more strictly for bash shebang in "check-tree.sh"

There is no problem in principle to allow any executable/shebang. However,
it also not clear why we would want to use anything except bash. Unless
we have a good use case, check and reject anything else.

Also not that `./tests/shell/run-tests.sh -x` only works if the shebang
is either exactly "#!/bin/bash" or "#!/bin/bash -e". While it probably
could be made work with other shebangs, the simpler thing is to just use
bash consistently.

Just check that they are all bash scripts. If there ever is a use-case,
we can always adjust this check.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tools: simplify error handling in "check-tree.sh" by adding msg_err()/msg_warn()

msg_err() also sets EXIT_CODE=, so we don't have to duplicate this.

Also add msg_warn() to print non-fatal warnings. Will be used in the
future. As "check-tree.sh" tests the consistency of the source tree, a
warning only makes sense to point something out that really should be
fixed, but is not yet.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: add JSON dump files

Generate and add ".json-nft" files. These files contain the output of
`nft -j list ruleset` after the test. Also, "test-wrapper.sh" will
compare the current ruleset against the ".json-nft" files and test them
with "nft -j --check -f $FILE`. These are useful extra tests, that we
almost get for free.

Note that for some JSON dumps, `nft -f --check` fails (or prints
something). For those tests no *.json-nft file is added. The bugs needs
to be fixed first.

An example of such an issue is:

    $ DUMPGEN=all ./tests/shell/run-tests.sh tests/shell/testcases/maps/nat_addr_port

which gives a file "rc-failed-chkdump" with

    Command `./tests/shell/../../src/nft -j --check -f "tests/shell/testcases/maps/dumps/nat_addr_port.json-nft"` failed
    >>>>
    internal:0:0-0: Error: Invalid map type 'ipv4_addr . inet_service'.

    internal:0:0-0: Error: Parsing command array at index 3 failed.

    internal:0:0-0: Error: unqualified type integer specified in map definition. Try "typeof expression" instead of "type datatype".

    <<<<

Tests like "tests/shell/testcases/nft-f/0012different_defines_0" and
"tests/shell/testcases/nft-f/0024priority_0" also don't get a .json-nft
dump yet, because their output is not stable. That needs fixing too.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: check and generate JSON dump files

The rules after a successful test are good opportunity to test
`nft -j list ruleset` and `nft -j --check`. This quite possibly touches
code paths that are not hit by other tests yet.

The only downside is the increase of the test runtime (which seems
negligible, given the benefits of increasing test coverage).

Future commits will generate and commit those ".json-nft" dump files.

"DUMPGEN=y" will, like before, regenerate only the existing
"*.{nodump,nft,json-nft}" files (unless a test has none of the 3 files,
in which case they are all generated and the user is suggested to commit
the correct ones). Now also "DUMPGEN=all" is honored, that will generate
all 3 files, regardless of whether they already existed. That is useful
if you start out with a test that only has a .nft file, and then you
want to generate a .json-nft file too.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

json: fix use after free in table_flags_json()

Add `$NFT -j list ruleset` to the end of "tests/shell/testcases/transactions/table_onoff".
Then valgrind will find this issue:

  $ make -j && ./tests/shell/run-tests.sh tests/shell/testcases/transactions/table_onoff -V

Gives:

  ==286== Invalid read of size 4
  ==286==    at 0x49B0261: do_dump (dump.c:211)
  ==286==    by 0x49B08B8: do_dump (dump.c:378)
  ==286==    by 0x49B08B8: do_dump (dump.c:378)
  ==286==    by 0x49B04F7: do_dump (dump.c:273)
  ==286==    by 0x49B08B8: do_dump (dump.c:378)
  ==286==    by 0x49B0E84: json_dump_callback (dump.c:465)
  ==286==    by 0x48AF22A: do_command_list_json (json.c:2016)
  ==286==    by 0x48732F1: do_command_list (rule.c:2335)
  ==286==    by 0x48737F5: do_command (rule.c:2605)
  ==286==    by 0x48A867D: nft_netlink (libnftables.c:42)
  ==286==    by 0x48A92B1: nft_run_cmd_from_buffer (libnftables.c:597)
  ==286==    by 0x402CBA: main (main.c:533)

Fixes: e70354f53e9f ("libnftables: Implement JSON output support")
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: expand create commands

create commands also need to be expanded, otherwise elements are never
evaluated:

# cat ruleset.nft
define ip-block-4 = { 1.1.1.1 }
create set netdev filter ip-block-4-test {
type ipv4_addr
flags interval
auto-merge
elements = $ip-block-4
}
# nft -f ruleset.nft
BUG: unhandled expression type 0
nft: src/intervals.c:211: interval_expr_key: Assertion `0' failed.
Aborted

Same applies to chains in the form of:

create chain x y {
counter
}

which is also accepted by the parser.

Update tests/shell to improve coverage for these use cases.

Fixes: 56c90a2dd2eb ("evaluate: expand sets and maps before evaluation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser: use size_t type for strlen() results

strlen() has a "size_t" as result. Use the correct type.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: split merge nat optimization in two tests

One without pipapo support and another with not to harm existing
coverage.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: split single element in anonymous set

Split this to move set stateful expression support into a separated test
not to harm existing coverage.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: split map test

Split interval + concatenation into a separated file, so older kernels
with no pipapo can still run what it is supported.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: split set NAT interval test

Split test in two, one for interval sets and another with concatenation
+ intervals, so at least intervals are tested in older kernels with no
pipapo support.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip if kernel does not support bitshift

A few tests are missing bitshift checks that has been added to
885845468408 ("tests/shell: skip bitshift tests if kernel lacks
support").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip multidevice chain tests if kernel lacks support

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip comment tests if kernel lacks support

Skip tests that require comment support

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip NAT netmap tests if kernel lacks support

Skip tests that require NAT netmap support

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip stateful expression in sets tests if kernel lacks support

Skip tests that require stateful expressions in sets.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip prerouting reject tests if kernel lacks support

Skip tests that require reject at prerouting hook.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip pipapo tests if kernel lacks support

Skip tests that require net/netfilter/nft_set_pipapo support.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: fix buffer size for user data in netlink_delinearize_chain()

The correct define is NFTNL_UDATA_CHAIN_MAX and not NFTNL_UDATA_OBJ_MAX.
In current libnftnl, they both are defined as 1, so (with current libnftnl)
there is no difference.

Fixes: 702ac2b72c0e ("src: add comment support for chains")
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: remove xfree() and use plain free()

xmalloc() (and similar x-functions) are used for allocation. They wrap
malloc()/realloc() but will abort the program on ENOMEM.

The meaning of xmalloc() is that it wraps malloc() but aborts on
failure. I don't think x-functions should have the notion, that this
were potentially a different memory allocator that must be paired
with a particular xfree().

Even if the original intent was that the allocator is abstracted (and
possibly not backed by standard malloc()/free()), then that doesn't seem
a good idea. Nowadays libc allocators are pretty good, and we would need
a very special use cases to switch to something else. In other words,
it will never happen that xmalloc() is not backed by malloc().

Also there were a few places, where a xmalloc() was already "wrongly"
paired with free() (for example, iface_cache_release(), exit_cookie(),
nft_run_cmd_from_buffer()).

Or note how pid2name() returns an allocated string from fscanf(), which
needs to be freed with free() (and not xfree()). This requirement
bubbles up the callers portid2name() and name_by_portid(). This case was
actually handled correctly and the buffer was freed with free(). But it
shows that mixing different allocators is cumbersome to get right. Of
course, we don't actually have different allocators and whether to use
free() or xfree() makes no different. The point is that xfree() serves
no actual purpose except raising irrelevant questions about whether
x-functions are correctly paired with xfree().

Note that xfree() also used to accept const pointers. It is bad to
unconditionally for all deallocations. Instead prefer to use plain
free(). To free a const pointer use free_const() which obviously wraps
free, as indicated by the name.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add free_const() and use it instead of xfree()

Almost everywhere xmalloc() and friends is used instead of malloc().
This is almost everywhere paired with xfree().

xfree() has two problems. First, it brings the wrong notion that
xmalloc() should be paired with xfree(), as if xmalloc() would not use
the plain malloc() allocator. In practices, xfree() just wraps free(),
and it wouldn't make sense any other way. xfree() should go away. This
will be addressed in the next commit.

The problem addressed by this commit is that xfree() accepts a const
pointer. Paired with the practice of almost always using xfree() instead
of free(), all our calls to xfree() cast away constness of the pointer,
regardless whether that is necessary. Declaring a pointer as const
should help us to catch wrong uses. If the xfree() function always casts
aways const, the compiler doesn't help.

There are many places that rightly cast away const during free. But not
all of them. Add a free_const() macro, which is like free(), but accepts
const pointers. We should always make an intentional choice whether to
use free() or free_const(). Having a free_const() macro makes this very
common choice clearer, instead of adding a (void*) cast at many places.

Note that we now pair xmalloc() allocations with a free() call (instead
of xfree(). That inconsistency will be resolved in the next commit.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

gmputil: add nft_gmp_free() to free strings from mpz_get_str()

mpz_get_str() (with NULL as first argument) will allocate a buffer using
the allocator functions (mp_set_memory_functions()). We should free
those buffers with the corresponding free function.

Add nft_gmp_free() for that and use it.

The name nft_gmp_free() is chosen because "mini-gmp.c" already has an
internal define called gmp_free(). There wouldn't be a direct conflict,
but using the same name is confusing. And maybe our own defines should
have a clear nft prefix.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

datatype: don't return a const string from cgroupv2_get_path()

The caller is supposed to free the allocated string. Return a non-const
string to make that clearer.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: place byteorder conversion before rshift in payload expressions

Use the key from the evaluation context to perform the byteorder
conversion in case that this expression is used for lookups and updates
on explicit sets.

# nft --debug=netlink add rule ip6 t output ip6 dscp @mapv6
ip6 t output
  [ payload load 2b @ network header + 0 => reg 1 ]
  [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ]
  [ byteorder reg 1 = ntoh(reg 1, 2, 2) ]   <-------------- this was missing!
  [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ]
  [ lookup reg 1 set mapv6 ]

Also with set statements (updates from packet path):

# nft --debug=netlink add rule ip6 t output update @mapv6 { ip6 dscp }
ip6 t output
  [ payload load 2b @ network header + 0 => reg 1 ]
  [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ]
  [ byteorder reg 1 = ntoh(reg 1, 2, 2) ]   <------------- also here!
  [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ]
  [ dynset update reg_key 1 set mapv6 ]

Simple matches on values and implicit sets rely on the binary transfer
mechanism to propagate the shift to the constant, no explicit byteorder
is required in such case.

Fixes: 668c18f67203 ("evaluate: place byteorder conversion before rshift in payload statement")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: reset statement length context only for set mappings

map expression (which is used a key to look up for the mapping) needs to
consider the statement length context, otherwise incorrect bytecode is
generated when {ct,meta} statement is generated.

# nft -f - <<EOF
add table ip6 t
add chain ip6 t c
add map ip6 t mapv6 { typeof ip6 dscp : meta mark; }
EOF

# nft -d netlink add rule ip6 t c meta mark set ip6 dscp map @mapv6
ip6 t c
   [ payload load 2b @ network header + 0 => reg 1 ]
   [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ]
   ... missing byteorder conversion here before shift ...
   [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ]
   [ lookup reg 1 set mapv6 dreg 1 ]
   [ meta set mark with reg 1 ]

Reset statement length context only for the mapping side for the
elements in the set.

Fixes: edecd58755a8 ("evaluate: support shifts larger than the width of the left operand")
Reported-by: Brian Davidson <davidson.brian@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

meta: fix hour decoding when timezone offset is negative

Brian Davidson says:

meta hour rules don't display properly after being created when the
hour is on or after 00:00 UTC. The netlink debug looks correct for
seconds past midnight UTC, but displaying the rules looks like an
overflow or a byte order problem. I am in UTC-0400, so today, 20:00
and later exhibits the problem, while 19:00 and earlier hours are
fine.

meta.c only ever worked when the delta to UTC is positive.
We need to add in case the second counter turns negative after
offset adjustment.

Also add a test case for this.

Fixes: f8f32deda31d ("meta: Introduce new conditions 'time', 'day' and 'hour'")
Reported-by: Brian Davidson <davidson.brian@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tproxy: Drop artificial port printing restriction

It does not make much sense to omit printing the port expression if it's
not a value expression: On one hand, input allows for more advanced
uses. On the other, if it is in-kernel, best nft can do is to try and
print it no matter what. Just ignoring ruleset elements can't be
correct.

Fixes: 2be1d52644cf7 ("src: Add tproxy support")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1721
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

build: no recursive make for "doc/Makefile.am"

Merge the Makefile.am under "doc/" into the toplevel Makefile.am. This
is a step in the effort of dropping recursive make.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

build: no recursive make for "examples/Makefile.am"

Merge the Makefile.am under "examples/" into the toplevel Makefile.am.
This is a step in the effort of dropping recursive make.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

build: no recursive make for "src/Makefile.am"

Merge the Makefile.am under "src/" into the toplevel Makefile.am. This
is a step in the effort of dropping recursive make.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

build: no recursive make for "files/**/Makefile.am"

Merge the Makefile.am under "files/" into the toplevel Makefile.am. This
is a step in the effort of dropping recursive make.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

build: no recursive make for "py/Makefile.am"

Merge the Makefile.am under "py/" into the toplevel Makefile.am. This is
a step in the effort of dropping recursive make.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

build: no recursive-make for "include/**/Makefile.am"

Switch from recursive-make to a single top-level Makefile. This is the
first step, the following patches will continue this.

Unlike meson's subdir() or C's #include, automake's SUBDIRS= does not
include a Makefile. Instead, it calls `make -C $dir`.

  https://www.gnu.org/software/make/manual/html_node/Recursion.html
  https://www.gnu.org/software/automake/manual/html_node/Subdirectories.html

See also, "Recursive Make Considered Harmful".

  https://accu.org/journals/overload/14/71/miller_2004/

This has several problems, which we an avoid with a single Makefile:

- recursive-make is harder to maintain and understand as a whole.
  Recursive-make makes sense, when there are truly independent
  sub-projects. Which is not the case here. The project needs to be
  considered as a whole and not one directory at a time. When
  we add unit tests (which we should), those would reside in separate
  directories but have dependencies between directories. With a single
  Makefile, we see all at once. The build setup has an inherent complexity,
  and that complexity is not necessarily reduced by splitting it into more files.
  On the contrary it helps to have it all in once place, provided that it's
  sensibly structured, named and organized.

- typing `make` prints irrelevant "Entering directory" messages. So much
  so, that at the end of the build, the terminal is filled with such
  messages and we have to scroll to see what even happened.

- with recursive-make, during build we see:

    make[3]: Entering directory '.../nftables/src'
      CC       meta.lo
    meta.c:13:2: error: #warning hello test [-Werror=cpp]
       13 | #warning hello test
          |  ^~~~~~~

  With a single Makefile we get

      CC       src/meta.lo
    src/meta.c:13:2: error: #warning hello test [-Werror=cpp]
       13 | #warning hello test
          |  ^~~~~~~

  This shows the full filename -- assuming that the developer works from
  the top level directory. The full name is useful, for example to
  copy+paste into the terminal.

- single Makefile is also faster:

    $ make && perf stat -r 200 -B make -j

  I measure 35msec vs. 80msec.

- recursive-make limits parallel make. You have to craft the SUBDIRS= in
  the correct order. The dependencies between directories are limited,
  as make only sees "LDADD = $(top_builddir)/src/libnftables.la" and
  not the deeper dependencies for the library.

- I presume, some people like recursive-make because of `make -C $subdir`
  to only rebuild one directory. Rebuilding the entire tree is already very
  fast, so this feature seems not relevant. Also, as dependency handling
  is limited, we might wrongly not rebuild a target. For example,

        make check
        touch src/meta.c
        make -C examples check

  does not rebuild "examples/nft-json-file".
  What we now can do with single Makefile (and better than before), is
  `make examples/nft-json-file`, which works as desired and rebuilds all
  dependencies.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

gitignore: ignore ".dirstamp" files

Those will be generated by automake, once the recursive Makefiles
are gone.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: fix mount command in "test-wrapper.sh"

With Fedora 39 (util-linux-core-2.39.2-1.fc39), the mount command starts
to fail. It was still working with Fedora 38 (util-linux-core-2.38.1-4.fc38).

$ unshare -f -p -m --mount-proc -U --map-root-user -n bash -c 'mount -t tmpfs --make-private /var/run && mount'
mount: /run: mount failed: Invalid argument.

Not sure why this starts to fail. But arguably the command line
arguments were wrong. Fix it, we need a pseudo name for the device.

Fixes: df6f1a3e0803 ("tests/shell: bind mount private /var/run/netns in test container")
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: reject set in concatenation

Consider the following ruleset.

define ext_if = { "eth0", "eth1" }
table ip filter {
    chain c {
        iifname . tcp dport { $ext_if . 22 } accept
    }
}

Attempting to load this ruleset results in:

BUG: invalid expression type 'set' in setnft: netlink.c:304: __netlink_gen_concat_key: Assertion `0' failed.
Aborted (core dumped)

After this patch:

# nft -f ruleset.nft
ruleset.nft:1:17-40: Error: cannot use set in concatenation
define ext_if = { "eth0", "eth1" }
                 ^^^^^^^^^^^^^^^^^^

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1715
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

check-tree.sh: check and flag /bin/sh usage

Almost all shell tests use /bin/bash already.

In some cases we've had shell tests use /bin/sh, but still having
a bashism. This causes failures on systems where sh is dash or another,
strict bourne shell.

Flag those via check-tree.sh.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: use /bin/bash in sets/elem_opts_compat_0

So running this test with /bin/sh != /bin/bash works.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: fix length check for ifname in ifname_expr_alloc()

IFNAMSIZ is 16, and the allowed byte length of the name is one less than
that. Fix the length check and adjust a test for covering the longest
allowed interface name.

This is obviously a change in behavior, because previously interface
names with length 16 were accepted and were silently truncated along the
way. Now they are rejected as invalid.

Fixes: fa52bc225806 ("parser: reject zero-length interface names")
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: cover long interface name in "0042chain_variable_0" test

IFNAMSIZ is 16. Adjust "0042chain_variable_0" to use an interface name
with the maximum allowed bytes length.

Instead of adding an entirely different test, adjust an existing one to
use another interface name. The aspect for testing for a long interface
name is not special enough, to warrant a separate test. We can cover it
by extending an existing test.

Note that the length check in "parser_bison.y" is wrong. The test checks
still for the wrong behavior and that "d23456789012345x" is accepted.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: add "bogons/nft-f/zero_length_devicename2_assert"

This is copied from "bogons/nft-f/zero_length_devicename_assert" and
adjusted.

- `device""lo"` looks odd. In this file use `device ""` to only check
against empty strings, without oddity.

- "ip" type has no hook ingress in filter. If the device name would be
valid, the file would still be rejected. Use "netdev".

The purpose is to add a test for a file that would otherwise pass,
except having an empty device name. Without oddities.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tools: reject unexpected files in "tests/shell/testcases/" with "check-tree.sh"

"check-tree.sh" does consistency checks on the source tree. Extend
the check to flag more unexpected files. We don't want to accidentally
have left over files.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: inline input data in "single_anon_set" test

The file "optimizations/dumps/single_anon_set.nft.input" was laying
around, and it was unclear how it was used.

Let's extend "check-patch.sh" to flag all unused files. But the script
cannot understand how "single_anon_set.nft.input" is used (aside allow
listing it).

Instead, inline the script to keep it inside the test (script).

We still write the data to a separate file and don't use `nft -f -`
(because reading stdin uses a different code path we want to cover).

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: test for maximum length of "comment" in "comments_objects_0"

The comment length is limited to NFTNL_UDATA_COMMENT_MAXLEN. Test for
that.

Adjust an existing test for that.

Also rename $EXPECTED to $RULESET. We don't compare the value of
$EXPECTED against the actually configured rules. It also wouldn't work,
because the input is not normalized and wouldn't match. It also isn't
necessary, because there is a .nft dump file.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: add missing "elem_opts_compat_0.nodump" file

This is an inconsistency. The test should have either a .nft or a
.nodump file. "./tools/check-tree.sh" enforces that and will in the
future run by `make check`.

Fixes: 22fab8681a50 ("parser_bison: Fix for broken compatibility with older dumps")
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: honor NFT_TEST_VERBOSE_TEST variable to debug tests via `bash -x`

It can be cumbersome to debug why a test fails. Our tests are just shell
scripts, which for the most part don't print much. That is good, but for
debugging, it can be useful to run the test via `bash -x`. Previously,
we would just patch the source file while debugging.

Add an option "-x" and NFT_TEST_VERBOSE_TEST=y environment variable. If set,
"test-wrapper.sh" will check whether the shebang is "#!/bin/bash" and add
"-x" to the command line.

While at it, let test-wrapper.sh also log a line like

Command: $CMD

With this, we see in the log the command that was run, and how
NFT_TEST_VERBOSE_TEST may have affected it. This is anyway useful,
because many tests don't print anything at all, and we end up with an
empty "testout.log". Empty files are cumbersome, e.g. I like to use
`grep -R ^` to show the content of all files, which does not show empty
files. Ensuring that something is always written is desirable.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

parser_bison: Fix for broken compatibility with older dumps

Commit e6d1d0d611958 ("src: add set element multi-statement
support") changed the order of expressions and other state attached to set
elements are expected in input. This broke parsing of ruleset dumps
created by nft commands prior to that commit.

Restore compatibility by also accepting the old ordering.

Fixes: e6d1d0d611958 ("src: add set element multi-statement support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

build: Bump version to 1.0.9

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: validate maximum log statement prefix length

Otherwise too long string overruns the log prefix buffer.

Fixes: e76bb3794018 ("src: allow for variables in the log prefix string")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1714
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: use bash instead of /bin/sh for tests

All tests under "tests/shell" are shell scripts with shebang /bin/bash
or /bin/sh. This may seem expected, since these tests are under
"tests/shell" directory, but any executable file would work.

Anyway. The vast majority of the tests has "#!/bin/bash" as shebang.
A few tests had "#!/bin/sh" or "#!/bin/sh -e". Unify this and always use bash.
Since we anyway require bash, this is not a limitation.

Also, if we know that this is a bash script (by parsing the shebang), we
can let the test wrapper pass "-x" to the script. The next commit will
do that, and it is nicer if the shebangs are all uniform.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: add missing "vlan_8021ad_tag.nodump" file

This is an inconsistency. The test should have either a .nft or a
.nodump file. "./tools/check-tree.sh" enforces that and will in the
future run by `make check`.

Fixes: 74cf3d16d8e9 ('tests: shell: add vlan match test case')
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: suggest != in negation error message

when I run sudo nft insert rule filter FORWARD iifname "ens2f1" ip saddr not @ip_macs counter drop comment \" BLOCK ALL NON REGISTERED IP/MACS \"
I get: Error: negation can only be used with singleton bitmask values

And even I did not spot the problem immediately.
I don't think "not" should have been added, its easily confused with
"not equal"/"neq"/!= and hides that this is allegedly a binop.

At least *mention* that the commandline is asking for a binary
operation here and suggest "!=".

Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: add "-S|--setup-host" option to set sysctl for rootless tests

Most tests can run just fine without root. A few of them will fail if
/proc/sys/net/core/{wmem_max,rmem_max} is too small (as it is by default
on the host).

The easy workaround is to bump those limits once. This has to be
repeated after each reboot.

Doing that manually (every time) is cumbersome. Add a "--setup-host"
option for that.

Usage:

  $ sudo ./tests/shell/run-tests.sh -S
  Setting up host for running as rootless (requires root).
      echo 4096000 > /proc/sys/net/core/rmem_max (previous value 100000)
      echo 4096000 > /proc/sys/net/core/wmem_max (previous value 100000)

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: preserve result directory with NFT_TEST_FAIL_ON_SKIP

On a successful run, the result directory will be deleted (unless run
with "-k|--keep-logs" option or NFT_TEST_KEEP_LOGS=y).

With NFT_TEST_FAIL_ON_SKIP=y, when there are no failures but skipped
tests, also preserve the result.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: mount all of "/var/run" in "test-wrapper.sh"

After reboot, "/var/run/netns" does not exist before we run the first
`ip netns add` command. Previously, "test-wrapper.sh" would mount a
tmpfs on that directory, but that fails, if the directory doesn't exist.
You will notice this, by deleting /var/run/netns (which only root can
delete or create, and which is wiped on reboot).

Instead, mount all of "/var/run". Then we can also create /var/run/netns
directory.

This means, any other content from /var/run is hidden too. That's
probably desirable, because it means we don't depend on stuff that
happens to be there. If we would require other content in /var/run, then
the test runner needs to be aware of the requirement and ensure it's
present. But best is just to not require anything. It's only iproute2
which insists on /var/run/netns.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

doc: remove references to timeout in reset command

After Linux kernel's patch ("netfilter: nf_tables: do not refresh
timeout when resetting element") timers are not reset anymore, update
documentation to keep this in sync.

Fixes: 83e0f4402fb7 ("Implement 'reset {set,map,element}' commands")
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: add vlan match test case

Check that we can match on the 8021ad header and vlan tag, see
af84f9e447a6 ("netfilter: nft_payload: rebuild vlan header on h_proto access").

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: add test for dormant on/off/on bug

Disallow enabling/disabling a table in a single transaction.
Make sure we still allow one update, either to dormant, or
from active to dormant.

Reported-by: "Lee, Cherie-Anne" <cherie.lee@starlabs.sg>
Cc: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Cc: info@starlabs.sg
Signed-off-by: Florian Westphal <fw@strlen.de>

icmpv6: Allow matching target address in NS/NA, redirect and MLD

It was currently not possible to match the target address of a neighbor
solicitation or neighbor advertisement against a dynamic set, unlike in
IPv4.

Since they are many ICMPv6 messages with an address at the same offset,
allow filtering on the target address for all icmp types that have one.

While at it, also allow matching the destination address of an ICMPv6
redirect.

Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: never merge across non-expression statements redux 2

Turns out I also love to forget about nft-test.py -j.

Fixes: 99ab1b8feb16 ("rule: never merge across non-expression statements")
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: sets/reset_command_0: Fix drop_seconds()

The function print_times() skips any time elements which are zero, so
output may lack the ms part. Adjust the sed call dropping anything but
the minutes value to not fail in that case.

Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixes: 255ec36a11525 ("tests: shell: Stabilize sets/reset_command_0 test")
Signed-off-by: Phil Sutter <phil@nwl.cc>

scanner: restrict include directive to regular files

Similar to previous change, also check all

include "foo"

and reject those if they refer to named fifos, block devices etc.

Directories are still skipped, I don't think we can change this
anymore.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1664
Signed-off-by: Florian Westphal <fw@strlen.de>

libnftables: refuse to open onput files other than named pipes or regular files

Don't start e.g. parsing a block device.
nftables is typically run as privileged user, exit early if we
get unexpected input.

Only exception: Allow character device if input is /dev/stdin.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1664
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: never merge across non-expression statements redux

Forgot to 'git add' inet/bridge/netdev payload records.

Fixes: 99ab1b8feb16 ("rule: never merge across non-expression statements")
Signed-off-by: Florian Westphal <fw@strlen.de>

rule: never merge across non-expression statements

The existing logic can merge across non-expression statements,
if there is only one payload expression.

Example:
ether saddr 00:11:22:33:44:55 counter ether type 8021q

is turned into
counter ether saddr 00:11:22:33:44:55 ether type 8021q

which isn't the same thing.

Fix this up and add test cases for adjacent vlan and ip header
fields. 'Counter' serves as a non-merge fence.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: Fix for failing nft-f/sample-ruleset

For whatever reason, my system lacks an entry for 'sip' in
/etc/services. Assuming the service name is not relevant to the test,
just replace it by the respective port number.

Fixes: 68728014435d9 ("tests: shell: add sample ruleset reproducer")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>

datatype: use xmalloc() for allocating datatype in datatype_clone()

The returned memory will be initialized. No need to zero it first. Use
xmalloc() instead of xzalloc().

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

json: add missing map statement stub

Add map statement stub to restore compilation without json support.

Fixes: 27a2da23d508 ("netlink_linearize: skip set element expression in map statement key")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

include: include <string.h> in <nft.h>

<string.h> provides strcmp(), as such it's very basic and used
everywhere.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: fix spurious errors in sets/0036add_set_element_expiration_0

A number of changes to fix spurious errors:

- Add seconds as expiration, otherwise 14m59 reports 14m in minute
granularity, this ensures suficient time in a very slow environment with
debugging instrumentation.

- Provide expected output.

- Update sed regular expression to make 'ms' optional and use -E mode.

Fixes: adf38fd84257 ("tests: shell: use minutes granularity in sets/0036add_set_element_expiration_0")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

mergesort: avoid cloning value in expr_msort_cmp()

If we have a plain EXPR_VALUE value, there is no need to copy
it via mpz_set().

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_linearize: skip set element expression in map statement key

This fix is similar to 22d201010919 ("netlink_linearize: skip set element
expression in set statement key") to fix map statement.

netlink_gen_map_stmt() relies on the map key, that is expressed as a set
element. Use the set element key instead to skip the set element wrap,
otherwise get_register() abort execution:

nft: netlink_linearize.c:650: netlink_gen_expr: Assertion `dreg < ctx->reg_low' failed.

This includes JSON support to make this feature complete and it updates
tests/shell to cover for this support.

Reported-by: Luci Stanescu <luci@cnix.ro>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

json: expose dynamic flag

The dynamic flag is not exported via JSON, this triggers spurious
ENOTSUPP errors when restoring rulesets in JSON with dynamic flags
set on.

Fixes: 6e45b102650a2 ("nft: set: print dynamic flag when set")
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: add map support

Add basic map support to this infrastructure, eg.

!map1 ipv4_addr : mark;ok

Adding elements to map is still not supported.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: features: Fix table owner flag check

The keyword is "flags", not "flag". Resulted in a false-negative:

features/table_flag_owner.nft:4:2-5: Error: syntax error, unexpected string
flag owner;
^^^^

Fixes: 10373f0936cd3 ("tests: shell: skip flowtable-uaf if we lack table owner support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

expression: cleanup expr_ops_by_type() and handle u32 input

Make fewer assumptions about the underlying integer type of the enum.
Instead, be clear about where we have an untrusted uint32_t from netlink
and an enum. Rename expr_ops_by_type() to expr_ops_by_type_u32() to make
this clearer. Later we might make the enum as packed, when this starts
to matter more.

Also, only the code path expr_ops() wants strict validation and assert
against valid enum values. Move the assertion out of
__expr_ops_by_type(). Then expr_ops_by_type_u32() does not need to
duplicate the handling of EXPR_INVALID. We still need to duplicate the
check against EXPR_MAX, to ensure that the uint32_t value can be cast to
an enum value.

[ Remove cast on EXPR_MAX. --pablo ]

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip flowtable-uaf if we lack table owner support

Signed-off-by: Florian Westphal <fw@strlen.de>

parser_json: Default meter size to zero

JSON parser was missed when performing the same change in standard
syntax parser.

Fixes: c2cad53ffc22a ("meters: do not set a defaut meter size from userspace")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Catch nonsense ops in match statement

Since expr_op_symbols array includes binary operators and more, simply
checking the given string matches any of the elements is not sufficient.

Fixes: 586ad210368b7 ("libnftables: Implement JSON parser")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Wrong check in json_parse_ct_timeout_policy()

The conditional around json_unpack() was meant to accept a missing
policy attribute. But the accidentally inverted check made the function
either ignore a given policy or access uninitialized memory.

Fixes: c82a26ebf7e9f ("json: Add ct timeout support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Fix synproxy object mss/wscale parsing

The fields are 16 and 8 bits in size, introduce temporary variables to
parse into.

Fixes: f44ab88b1088e ("src: add synproxy stateful object support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Fix limit object burst value parsing

The field is of type uint32_t, use lower case 'i' format specifier.

Fixes: c36288dbe2ba3 ("JSON: Fix parsing and printing of limit objects")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Fix flowtable prio value parsing

Using format specifier 'I' requires a 64bit variable to write into. The
temporary variable 'prio' is of type int, though.

Fixes: 586ad210368b7 ("libnftables: Implement JSON parser")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Proper ct expectation attribute parsing

Parts of the code were unsafe (parsing 'I' format into uint32_t), the
rest just plain wrong (parsing 'o' format into char *tmp). Introduce a
temporary int variable to parse into.

Fixes: 1dd08fcfa07a4 ("src: add ct expectations support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Fix typo in json_parse_cmd_add_object()

A case of bad c'n'p in the fixed commit broke ct timeout objects
parsing.

Fixes: c7a5401943df8 ("parser_json: Fix for ineffective family value checks")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_json: Catch wrong "reset" payload

The statement happily accepted any valid expression as payload and
assumed it to be a tcpopt expression (actually, a special case of
exthdr). Add a check to make sure this is the case.

Standard syntax does not provide this flexibility, so no need to have
the check there as well.

Fixes: 5d837d270d5a8 ("src: add tcp option reset support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: shell: add feature probe for sctp chunk matching

Skip the relavant parts of the test if nft_exthdr lacks sctp support.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: add feature probe for sets with more than one element

Kernels < 5.11 can handle only one expression per element, e.g.
its possible to attach a counter per key, or a rate limiter,
or a quota, but not two at the same time.

Add a probe file and skip the relevant tests if the feature is absent.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: skip adding catchall elements if unuspported

The test fails on kernels without catchall support, so elide this
small part.

No need to skip the test in this case, the dump file validates that
the added elements are no longer there after the timeout.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: honor NFT_TEST_FAIL_ON_SKIP variable to fail on any skipped tests

The test suite should pass with various kernels and build
configurations. Of course, that means, that some tests will be
gracefully skipped, and we don't treat that as an overall failure.

However, it should be possible to run a specific kernel (net-next?) and
build configuration, where we expect that all tests pass.

Add an option to fail the run, if any tests were skipped. This is to
ensure that we don't have broken tests that never pass.

This will make more sense with automated CI is running, to enable on a
test system and ensure that at least on that system, all tests pass.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

datatype: return const pointer from datatype_get()

"struct datatype" is for the most part immutable, and most callers deal
with const pointers. That's why datatype_get() accepts a const pointer
to increase the reference count (mutating the refcnt field).

It should also return a const pointer. In fact, all callers are fine
with that already.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

payload: use enum icmp_hdr_field_type in payload_may_dependency_kill_icmp()

Don't mix icmp_dep (enum icmp_hdr_field_type) and the uint8_t icmp_type.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

datatype: use "enum byteorder" instead of int in set_datatype_alloc()

Use the enum types as we have them.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: handle invalid etype in set_make_key()

It's not clear to me, what ensures that the etype is always valid.
Handle a NULL.

Fixes: 6e48df5329ea ('src: add "typeof" build/parse/print support')
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>