git.ipfire.org Git - thirdparty/nftables.git/log

include: include <stdlib.h> in <nft.h>

It provides malloc()/free(), which is so basic that we need it
everywhere. Include via <nft.h>.

The ultimate purpose is to define more things in <nft.h>. While it has
not corresponding C sources, <nft.h> can contain macros and static
inline functions, and is a good place for things that we shall have
everywhere. Since <stdlib.h> provides malloc()/free() and size_t, that
is a very basic dependency, that will be needed for that.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: include <nft.h> for base C environment to "parser_bison.y"

All our C sources should include <nft.h> as first. This prepares an
environment of things that we expect to have available in all our C
sources (and indirectly in our internal header files, because internal
header files are always indirectly from a C source).

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: typeof_integer/raw: prefer @nh for payload matching

@ih fails on kernels where payload expression doesn't support the 'inner'
base offset.

This test isn't about inner headers, so just use @nh which is
universally available.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: make delete_by_handle test work on older releases

This test fails on kernels that lack
05abe4456fa3 ("netfilter: nf_tables: allow to register flowtable with no devices")
v5.8-rc1~165^2~27^2~1

Just add lo as dummy device.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: add "--quick" option to skip slow tests (via NFT_TEST_SKIP_slow=y)

It's important to run (a part) of the tests in a timely manner.
Add an option to skip long running tests.

Thereby, add a more general NFT_TEST_SKIP_* mechanism.

This is related and inverse from "NFT_TEST_HAVE_json", where a test
can require [ "$NFT_TEST_HAVE_json" != n ] to run, but is skipped when
[ "$NFT_TEST_SKIP_slow" = y ].

Currently only NFT_TEST_SKIP_slow is supported. The user can set such
environment variables (or use the -Q|--quick command line option). The
configuration is printed in the test info.

Tests should check for [ "$NFT_TEST_SKIP_slow" = y ] so that the
variable has to be explicitly set to opt-out. For convenience, tests can
also add a

# NFT_TEST_SKIP(NFT_TEST_SKIP_slow)

tag, which is evaluated by test-wrapper.sh. Or they can run a quick, reduced
part of the test, but then should still indicate to be skipped.

Mark 8 tests are as slow, that take longer than 5 seconds on my machine.
With this, a parallel wall time for the non-slow tests is only 7 seconds
(on my machine).

The ultimate point is to integrate a call to "tests/shell/run-tests.sh"
in a `make check` target. For development, you can then export
NFT_TEST_SKIP_slow=y and have a fast `make check`.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: skip tests if nft does not support JSON mode

We can build nft without JSON support, and some tests will fail without
it. Instead, they should be skipped. Also note, that the test accepts any
nft binary via the "NFT" environment variable. So it's not enough to
make the skipping dependent on build configuration, but on the currently
used $NFT variable.

Let "run-test.sh" detect and export a "NFT_TEST_HAVE_json=y|n" variable. This
is heavily inspired by Florian's feature probing patches.

Tests that require JSON can check that variable, and skip. Note that
they check in the form of [ "$NFT_TEST_HAVE_json" != n ], so the test is
only skipped, if we explicitly detect lack of support. That is, don't
check via [ "$NFT_TEST_HAVE_json" = y ].

Some of the tests still run parts of the tests that don't require JSON.
Only towards the end of such partial run, mark the test as skipped.

Some tests require JSON support throughout. For those, add  a mechanism
where tests can add a tag (in their first 10 lines):

  # NFT_TEST_REQUIRES(NFT_TEST_HAVE_json)

This will be checked by "test-wrapper.sh", which will skip the test.
The purpose of this is to make it low-effort to skip a test and to print
the reason in the text output as

    Test skipped due to NFT_TEST_HAVE_json=n (test has "NFT_TEST_REQUIRES(NFT_TEST_HAVE_json)" tag)

This is intentionally not shortened to NFT_TEST_REQUIRES(json), so that
we can grep for NFT_TEST_HAVE_json to find all relevant places.

Note that while NFT_TEST_HAVE_json is autodetected, the caller can also
force it by setting the environment variable. This allows to see what
would happen to such a test.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: print number of completed tests to show progress

Especially with VALGRIND=y, a full test run can take a long time. When
looking at the output, it's interesting to get a feel how far along we
are.

Print the number of completed jobs vs. the number of total jobs, in the
line showing the test result. It gives a nice progress status.

Example:

    I: [OK]           1/373 ./tests/shell/testcases/bitwise/0040mark_binop_1
    I: [OK]           2/373 ./tests/shell/testcases/bitwise/0040mark_binop_0
    ...

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: set valgrind's "--vgdb-prefix=" to orignal TMPDIR

"test-wrapper.sh" sets TMPDIR="$NFT_TEST_TESTTMPDIR". That is useful, so
that temporary files of the tests are placed inside the test result
data.

Sometimes tests miss to delete those files, which would result in piling
up /tmp/tmp.XXXXXXXXXX files. By setting $TMPDIR, those files are
clearly related to the test run that created them, and can be deleted
together.

However, valgrind likes to create files like
"vgdb-pipe-from-vgdb-to-68-by-thom-on-???" inside $TMPDIR. These are
pipes, so if you run `grep -R ^ /tmp/nft-test.latest` while
the test is still running (to inspect the results), then the process
hands reading from the pipe.

Instead, tell valgrind to put those files in the original TMPDIR. For
that purpose, export NFT_TEST_TMPDIR_ORIG from "run-tests.sh".

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: add ".nft" dump files for tests without dumps/ directory

DUMPGEN=y mode skips tests that don't have a corresponding "dumps/"
directory.

Add the "dumps/" directory for tests that lacked it, and generate ".nft"
files by running `./tests/shell/run-tests.sh -g`.

Yes, they are all empty. Not very exciting, but why not check for that
too?

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: add missing ".nodump" file for tests without dumps

These files are generated by running `./tests/shell/run-tests.sh -g`.
Commit the .nodump files to git.

The point is to explicitly make it known that no dump file should be
there. This prevents `./tests/shell/run-tests.sh -g` from generating
the files and proposing (over and over) to add them to git.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: generate and add ".nft" dump files for existing tests

Several tests didn't have a ".nft" dump file committed. Generate one and
commit it to git.

While not all tests have a stable ruleset to compare, many have. Commit
the .nft files for the tests where the output appears to be stable.

This was generated by running `./tests/shell/run-tests.sh -g` twice, and
commit the files that were identical both times. Note that 7 tests on my
machine fail, so those are skipped.

Also skip the files

  tests/shell/testcases/maps/dumps/0004interval_map_create_once_0.nft
  tests/shell/testcases/nft-f/dumps/0011manydefines_0.nft
  tests/shell/testcases/sets/dumps/0011add_many_elements_0.nft
  tests/shell/testcases/sets/dumps/0030add_many_elements_interval_0.nft
  tests/shell/testcases/sets/dumps/0068interval_stack_overflow_0.nft

Those files are larger than 100KB, and I don't think we want to blow up
the git repository this way. Even if they are only text files and
compress well.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: honor .nodump file for tests without nft dumps

For some tests, the dump is not stable or useful to test. For example,
if they have an "expires" timestamps. Those tests don't have a .nft file
in the dumps directory, and don't have it checked.

DUMPGEN=y generates a new dump file, if the "dumps/" directory exists.
Omitting that directory is a way to prevent the generation of the file.
However, many such tests share their directory with tests that do have dumps.

When running tests with DUMPGEN=y, new files for old tests are generated.
Those files are not meant to be compared or committed to git because
it's known to not work.

Whether a test has a dump file, is part of the test. The absence of the
dump file should also be recorded and committed to git.

Add a way to opt-out from such generating such dumps by having .nodump
files instead of the .nft dump.

Later we should add unit tests that checks that no test has both a .nft
and a .nodump file in git, that the .nodump file is always empty, and
that every .nft/.nodump file has a corresponding test committed to git.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

datatype: rename "dtype_clone()" to datatype_clone()

The struct is called "datatype" and related functions have the fitting
"datatype_" prefix. Rename.

Also rename the internal "dtype_alloc()" to "datatype_alloc()".

This is a follow up to commit 01a13882bb59 ('src: add reference counter
for dynamic datatypes'), which started adding "datatype_*()" functions.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

cache: avoid accessing uninitialized varible in implicit_chain_cache()

$ ./tests/shell/run-tests.sh -V tests/shell/testcases/cache/0010_implicit_chain_0

Gives:

  ==59== Conditional jump or move depends on uninitialised value(s)
  ==59==    at 0x48A6A6B: mnl_nft_rule_dump (mnl.c:695)
  ==59==    by 0x48778EA: rule_cache_dump (cache.c:664)
  ==59==    by 0x487797D: rule_init_cache (cache.c:997)
  ==59==    by 0x4877ABF: implicit_chain_cache.isra.0 (cache.c:1032)
  ==59==    by 0x48785C9: cache_init_objects (cache.c:1132)
  ==59==    by 0x48785C9: nft_cache_init (cache.c:1166)
  ==59==    by 0x48785C9: nft_cache_update (cache.c:1224)
  ==59==    by 0x48ADBE1: nft_evaluate (libnftables.c:530)
  ==59==    by 0x48AEC29: nft_run_cmd_from_buffer (libnftables.c:596)
  ==59==    by 0x402983: main (main.c:535)

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: set NFT_TEST_JOBS based on $(nproc)

Choose 150% of $(nproc) for the default vlaue of NFT_TEST_JOBS
(rounded up). The minimal value chosen by default is 2.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: record wall time of test run in result data

When running tests, it's useful to see how long it took. Keep track if
the timestamps in a "times" file.

Try:

    ( \
        for d in /tmp/nft-test.latest.*/test-*/ ; do \
               printf '%10.2f  %s\n' \
                      "$(sed '1!d' "$d/times")" \
                      "$(cat "$d/name")" ; \
        done \
        | sort -n \
        | awk '{print $0; s+=$1} END{printf("%10.2f\n", s)}' ; \
        printf '%10.2f wall time\n' "$(sed '1!d' /tmp/nft-test.latest.*/times)" \
    )

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: no longer enable verbose output when selecting a test

Previously, when selecting a test on the command line, it would also
enable verbose output (except if the "--" separator was used).

This convenience feature seems not great because the output from the
test badly clutters the "run-test.sh" output.

Now that the test results are all on disk, you can search them after the
run with great flexibility (grep).

Additionally, in previous versions, command line argument parsing was
more restrictive, requiring that "-v" always be placed first. Now, the
order does not matter, so it's easy to edit the command prompt and
append a "-v", if that is what you want. Or if you really like verbose
output, then `export VERBOSE=y`.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: print "kernel is tainted" separate from test result

Once the kernel is tainted, it stays until reboot. It would not be
useful to fail the entire test run based on that (and we don't do that).

But then, it seems odd to print this in the same style as the test
results, because a [FAILED] of a test counts as an overall failure.
Instead, print this warning in a different style.

Previously:

    $ ./tests/shell/run-tests.sh -- /usr/bin/true
    ...

    W: [FAILED]     kernel is tainted
    I: [OK]         /usr/bin/true

    I: results: [OK] 1 [SKIPPED] 0 [FAILED] 0 [TOTAL] 1

Now:

    $ ./tests/shell/run-tests.sh -- /usr/bin/true
    ...

    W: kernel is tainted

    I: [OK]         /usr/bin/true

    I: results: [OK] 1 [SKIPPED] 0 [FAILED] 0 [TOTAL] 1

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: redirect output of test script to file too

It's useful to keep around for later. Redirect to the temporary
directory.

Note that the file content may be colorized too. `less -R` helps with
that.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: don't redirect error/warning messages to stderr

Writing some messages to stderr and some to stdout is not helpful.
Once they are written to separate streams, it's hard to be sure about
their relative order.

Use grep to filter messages.

Also, next we will redirect the entire output also to a file. There the
output is also not split in two files.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: print the NFT setting with the VALGRIND=y wrapper

With this we see in the info output

I: info: NFT=./tests/shell/helpers/nft-valgrind-wrapper.sh

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: fix handling failures with VALGRIND=y

With VALGRIND=y, on memleaks the tests did not fail. Fix that by passing
"--error-exitcode=122" to valgrind.

But just returning 122 from $NFT command may not correctly fail the test.
Instead, ensure to write a "rc-failed-valrind" file, which is picked up
by "test-wrapper.sh" to properly handle the valgrind failure (and fail
with error code 122 itself).

Also, accept NFT_TEST_VALGRIND_OPTS variable to a pass additional
arguments to valgrind. For example a "--suppressions" file.

Also show the special error code [VALGRIND] in "run-test.sh".

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: colorize terminal output with test result

Colors help to see what is important.

It honors the common NO_COLOR=<anything> to disable coloring. It also
does not colorize, if [ -t 1 ] indicates that stdout is not a terminal.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: cleanup print_test_result() and show TAINTED error code

We will add more special error codes (122 for VALGRIND). Minor refactor
of print_test_result() to make it easier to extend for that.

Also, we will soon colorize the output. This preparation patch makes
that easier too.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: cleanup result handling in "test-wrapper.sh"

The previous code was mostly correct, but hard to understand.
Rework it.

Also, on failure now always write "rc-failed-exit", which is the exit
code that "test-wrapper.sh" reports to "run-test.sh". Note that this
error code may not be the same as the one returned by the TEST binary.
The latter you can find in one of the files "rc-{ok,skipped,failed}".

In general, you can search the directory with test results for those
"rc-*" files. If you find a "rc-failed" file, it was counted as failure.
There might be other "rc-failed-*" files, depending on whether the diff
failed or kernel got tainted.

Also, reserve all the error codes 118 - 124 for the "test-wrapper.sh".
For example, 124 means a dump difference and 123 means kernel got
tainted. In the future, 122 will mean a valgrind error. Other numbers
are not reserved. If a test command fails with such an reserved code,
"test-wrapper.sh" modifies it to 125, so that "run-test.sh" does not get
the wrong idea about the failure reason. This is not new in this patch,
except that the reserved range was extended for future additions.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: return 77/skip for tests that fail to create dummy device

There are some existing tests, that skip operation when they fail to
create a dummy interface. Use the new exit code 77 to indicate
"SKIPPED".

I wonder why creating a dummy device would ever fail and why we don't
just fail the test altogether in that case. But the patch does not
change that.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: set TMPDIR for tests in "test-wrapper.sh"

Various tests create additional temporary files. They really should just
use "$NFT_TEST_TESTTMPDIR" for that. However, they mostly use `mktemp`.

The scripts are supposed to cleanup those files afterwards. However,
often that does not work correctly and /tmp gets full of left over
temporary files.

Export "TMPDIR" so that they use the test-specific temporary directory.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: fix "0003includepath_0" for different TMPDIR

We are going to set $TMPDIR to another location. The previous code made
assumptions that the generated path would always be in /tmp. Fix that.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: record the test duration (wall time) in the result data

Runtimes are important. Add a way to find out how long tests took.

  $ ./tests/shell/run-tests.sh
  ...
  $ for d in /tmp/nft-test.latest.*/test-*/ ; do \
         printf '%10.2f  %s\n' \
                "$(sed '1!d' "$d/times")" \
                "$(cat "$d/name")" ; \
    done \
    | sort -n \
    | awk '{print $0; s+=$1} END{printf("%10.2f\n", s)}'

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: skip test in rootless that hit socket buffer size limit

The socket buffer limits like /proc/sys/net/core/{rmem_max,wmem_max}
can cause tests to fail, when running rootless. That's because real-root
can override those limits, rootless cannot.

Add an environment variable NFT_TEST_HAS_SOCKET_LIMITS=*|n which is
automatically set by "run-tests.sh".

Certain tests will check for [ "$NFT_TEST_HAS_SOCKET_LIMITS" = y ] and
skip the test.

The user may manually bump those limits (requires root), and set
NFT_TEST_HAS_SOCKET_LIMITS=n to get the tests to pass even as rootless.

For example, the test passes with root:

  sudo ./tests/shell/run-tests.sh -- tests/shell/testcases/sets/automerge_0

Without root, it would fail. Skip it instead:

  ./tests/shell/run-tests.sh -- tests/shell/testcases/sets/automerge_0
  ...
  I: [SKIPPED]    tests/shell/testcases/sets/automerge_0

Or bump the limit:

  $ echo 3000000 | sudo tee /proc/sys/net/core/wmem_max
  $ NFT_TEST_HAS_SOCKET_LIMITS=n ./tests/shell/run-tests.sh -- tests/shell/testcases/sets/automerge_0
  ...
  I: [OK]         tests/shell/testcases/sets/automerge_0

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: bind mount private /var/run/netns in test container

Some tests want to run `ip netns add`, which requires write permissions
to /var/run/netns. Also, /var/run/netns would be a systemwide mount
path, and shared between the tests. We would want to isolate that.

Fix that by bind mount a tmpfs inside the test wrapper, if we appear to
have a private mount namespace.

Fixes

$ ./tests/shell/run-tests.sh -- tests/shell/testcases/netns/0001nft-f_0

Optimally, `ip netns add` would allow to specify a private
location for those bind mounts.

It seems that iproute2 is build with /var/run/netns, instead the more
common /run/netns. Hence, handle /var/run instead of /run.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: support running tests in parallel

Add option to enable running jobs in parallel. The purpose is to speed
up the run time of the tests.

The global cleanup (removal of kernel modules) interferes with parallel
jobs (or even with, unrelated jobs on the system). By setting
NFT_TEST_JOBS= to a positive number, that cleanup is skipped.

This option is too good to miss. Hence parallel execution is enabled by
default, and you have to opt-out from it.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: move valgrind wrapper script to separate script

Previously, in valgrind mode we would generate one script, which had
"$NFT" variable and the temp directory hard coded.

Soon, we will run jobs in parallel, so they would need at least
different temp directories. Also, we want to put the valgrind results
are inside "$NFT_TEST_TESTTMPDIR", along the test data.

Extract the wrapper script to a separate script. It does not need to be
generated ad-hoc, instead it uses the environment variables "$NFT_REAL" and
"$NFT_TEST_TESTTMPDIR", as "run-tests.sh" prepares them.

Also, add a "$NFT_REAL" variable for the actual NFT binary. We wrap the
"$NFT" variable with VALGRIND=y or the user may pass "NFT='valgrind
nft'". We should have access to the real binary. That might be useful
for example to call `ldd "$NFT_REAL" | grep libjansson` to check for
JSON support.

Also, we use libtool. So quite possible the nft binary is actually a
shell script. Calling valgrind on that script results in a lot of leak
reports from shell (and slows down the command). Instead, use `libtool
--mode=execute`.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: move taint check to "test-wrapper.sh"

We will run tests in parallel. That means, we have multiple tests data and results
in fly. That becomes simpler, if we move more result data to the
test-wrapper and out of "run-tests.sh".

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: rework printing of test results

- "test-wrapper.sh" no longer will print the test output to its stdout.
  Instead, it only writes the testout.log file.

- rework the loop "run-tests.sh" for printing the test results. It no
  longer captures the output of the test, as the wrapper is expected to
  be silent. Instead, they get the output from the result directory.
  The benefit is, that there is no duplication in what we print and the
  captured data in the result directory. The verbose mode is only for
  convenience, to safe looking at the test data. It's not essential
  otherwise.

- also move the evaluation of the test result (and printing of the
  information) to a separate function. Later we want to run tests in
  parallel, so the steps need to be clearly separated.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: move the dump diff handling inside "test-wrapper.sh"

This fits there better. At this point, we are  still inside the unshared
namespace and right after the test. The test-wrapper.sh should compare
(and generate) the dumps.

Also change behavior for DUMPGEN=y.

- Previously it would only rewrite the dump if the dumpfile didn't
  exist yet. Now instead, always rewrite the file with DUMPGEN=y.
  The mode of operation is anyway, that the developer afterwards
  checks `git diff|status` to pick up the changes. There should be
  no changes to existing files (as existing tests are supposed to
  pass). So a diff there either means something went wrong (and we
  should see it) or it just means the dumps correctly should be
  regenerated.

- also, only generate the file if the "dumps/" directory exists. This
  allows to write tests that don't have a dump file and don't get it
  automatically generated.

The test wrapper will return a special error code 124 to indicate that
the test passed, but the dumps file differed.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: support --keep-logs option (NFT_TEST_KEEP_LOGS=y) to preserve test output

The test output is now all collected in the temporary directory. On
success, that directory is deleted. Add an option to always preserve
that directory.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: interpret an exit code of 77 from scripts as "skipped"

Allow scripts to indicate that a test could not run by exiting 77.

"77" is chosen as exit code from automake's testsuites ([1]). Compare to
git-bisect which chooses 125 to indicate skipped.

[1] https://www.gnu.org/software/automake/manual/html_node/Scripts_002dbased-Testsuites.html

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: run each test in separate namespace and allow rootless

Don't unshare the entire shell script. Instead, call unshare each test
separately. That means, all tests use now a different sandbox and will
also allow (with further changes) to run them in parallel.

Also, allow to run rootless/unprivileged.

The script first tries to run a separate PID+USER+NET namespace. If that
fails, it downgrades to USER+NET. If that fails, it downgrades to a
separate NET namespace. If unshare still fails, the script fails
entirely. That differs from before, where the script would proceed
without sandboxing. The script will now always require that unsharing
works, unless the user opts-out.

If the user cannot unshare, they can set NFT_TEST_UNSHARE_CMD to the
command used for unsharing. It may be empty for no unshare. The command
line arguments -U/--no-unshare are a shortcut for setting
NFT_TEST_UNSHARE_CMD="".

If we are able to create a separate USER namespace, then this mode
allows to run the test as rootless/unprivileged. We no longer require
[ `id -u` = 0 ]. Some tests may not work as rootless. For example, the
socket buffers is limited by /proc/sys/net/core/{wmem_max,rmem_max}
which real-root can override, but rootless tests cannot. Such tests
should check for [ "$NFT_TEST_HAS_REALROOT" != y ] and skip gracefully.

Usually, the user doesn't need to tell the script whether they have
real-root. The script will autodetect it via [ `id -u` = 0 ]. But that
won't work when run inside a rootless container already. In that case,
the user would want to tell the script that there is no real-root. They
can do so via the -R/--without-root option or NFT_TEST_HAS_REALROOT=n.

If tests wish, the can know whether they run inside "unshare"
environment by checking for [ "$NFT_TEST_HAS_UNSHARED" = y ].

When setting NFT_TEST_UNSHARE_CMD to override the unshare command, users
may want to also set NFT_TEST_HAS_UNSHARED= and NFT_TEST_HAS_REALROOT=
correctly.

As we run each test in a separate unshare environment, we need a wrapper
"tests/shell/helpers/test-wrapper.sh" around the test, which executes
inside the tested environment. Also, each test gets its own temp
directory prepared in NFT_TEST_TESTTMPDIR. This is also the place, where
test artifacts and results will be collected.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: print test configuration

As the script can be configured via environment variables or command
line option, it's useful to show the environment variables that we
received or set during the test setup.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: normalize boolean configuration in environment variables

Previously, we would honor "y" as opt-in, and all other values meant
false.

- accept alternatives to "y", like "1" or "true".

- normalize the value, to either be "y" or "n".

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: export NFT_TEST_BASEDIR and NFT_TEST_TMPDIR for tests

Let the test wrapper prepare and export two environment variables for
the test:

- "$NFT_TEST_BASEDIR" is just the top directory where the test scripts
  lie.

- "$NFT_TEST_TMPDIR" is a `mktemp` directory created by "run-tests.sh"
  and removed at the end. Tests may use that to leave data there.
  This directory will be used for various things, like the "nft" wrapper
  in valgrind mode, the results of the tests and possibly as cache for
  feature detection.

The "$NFT_TEST_TMPDIR" was already used before with the "VALGRIND=y"
mode. It's only renamed and got an extended purpose.

Also drop the unnecessary first detection of "$DIFF" and the "$SRC_NFT"
variable.

Also, note that the mktemp creates the temporary directory under /tmp.
Which is commonly a tempfs. The user can override that by exporting
TMPDIR.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: check test names before start and support directories

Check for valid test names early. That's useful because we treat any
unrecognized options as test names. We should detect a mistake early.

While at it, also support specifying directory names instead of
executable files.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: rework finding tests and add "--list-tests" option

Cleanup finding the test files. Also add a "--list-tests" option to see
which tests are found and would run.

Also get rid of the FIND="$(which find)" detection. Which system doesn't
have a working find? Also, we can just fail when we try to use find, and
don't need a check first.

This is still after "unshare", which will be addressed next.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: rework command line parsing in "run-tests.sh"

Parse the arguments in a loop, so that their order does not matter.
Also, soon more command line arguments will be added, and this way of
parsing seems more maintainable and flexible.

Currently this is still after the is-root check and after unshare. That
will be addressed later.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: 0043concatenated_ranges_0: re-enable all tests

This script suppressed a few tests when ran via run-tests.sh,
don't do that, it would have caught the previous 'get' bug
years ago.

Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: fix get element for concatenated set

given:
table ip filter {
set test {
type ipv4_addr . ether_addr . mark
flags interval
elements = { 198.51.100.0/25 . 00:0b:0c:ca:cc:10-c1:a0:c1:cc:10:00 . 0x0000006f, }
}
}

We get lookup failure:

nft get element ip filter test { 198.51.100.1 . 00:0b:0c:ca:cc:10 . 0x6f }
Error: Could not process rule: No such file or directory

Its possible to work around this via dummy range somewhere in the key, e.g.
nft get element ip filter test { 198.51.100.1 . 00:0b:0c:ca:cc:10 . 0x6f-0x6f }

but that shouldn't be needed, so make sure the INTERVAL flag is enabled
for the queried element if the set is of interval type.

Signed-off-by: Florian Westphal <fw@strlen.de>

doc: describe behaviour of {ip,ip6} length

This field exposes internal kernel GRO/GSO packet aggregation
implementation details to userspace, provide a hint to the user to
understand better when matching on this field.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: revisit anonymous set with single element optimization

This patch reworks it to perform this optimization from the evaluation
step of the relational expression. Hence, when optimizing for protocol
flags, use OP_EQ instead of OP_IMPLICIT, that is:

tcp flags { syn }

becomes (to represent an exact match):

tcp flags == syn

given OP_IMPLICIT and OP_EQ are not equivalent for flags.

01167c393a12 ("evaluate: do not remove anonymous set with protocol flags
and single element") disabled this optimization, which is enabled again
after this patch.

Fixes: 01167c393a12 ("evaluate: do not remove anonymous set with protocol flags and single element")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: place byteorder conversion after numgen for IP address datatypes

The numgen extension generates numbers in little-endian.
This can be very tricky when trying to combine it with IP addresses, which use big endian.
This change adds a new byteorder operation to convert data type endianness.

Before this patch:
$ sudo nft -d netlink add rule nat snat_chain snat to numgen inc mod 7 offset 0x0a000001
ip nat snat_chain
  [ numgen reg 1 = inc mod 7 offset 167772161 ]
  [ nat snat ip addr_min reg 1 ]

After this patch:
$ sudo nft -d netlink add rule nat snat_chain snat to numgen inc mod 7 offset 0x0a000001
ip nat snat_chain
  [ numgen reg 1 = inc mod 7 offset 167772161 ]
  [ byteorder reg 1 = hton(reg 1, 4, 4) ]
  [ nat snat ip addr_min reg 1 ]

Regression tests have been modified to include these new cases.

Signed-off-by: Jorge Ortiz Escribano <jorge.ortiz.escribano@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: set internal_location for table and chain

JSON parser does not seem to set on this, better provide a default
location.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: simplify chain_alloc()

Remove parameter to set the chain name which is only used from netlink
path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: remove check for NULL before calling expr_free()

expr_free() already handles NULL pointer, remove redundant check.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: use internal_location for unspecified location at allocation time

Set location to internal_location instead of NULL to ensure this is
always set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

xt: avoid "-Wmissing-field-initializers" for "original_opts"

Avoid this warning with clang:

      CC       src/xt.lo
    src/xt.c:353:9: error: missing field 'has_arg' initializer [-Werror,-Wmissing-field-initializers]
            { NULL },
                   ^

The warning seems not very useful, because it's well understood that
specifying only some initializers leaves the remaining fields
initialized with the default. However, as this warning is only hit once
in the code base, it doesn't seem that we violate this style frequently.
Hence, fix it instead of disabling the warning.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: silence "implicit-fallthrough" warnings

Gcc with "-Wextra" warns:

    CC       segtree.lo
  segtree.c: In function 'get_set_interval_find':
  segtree.c:129:28: error: this statement may fall through [-Werror=implicit-fallthrough=]
    129 |                         if (expr_basetype(i->key)->type != TYPE_STRING)
        |                            ^
  segtree.c:134:17: note: here
    134 |                 case EXPR_PREFIX:
        |                 ^~~~

    CC       optimize.lo
  optimize.c: In function 'rule_collect_stmts':
  optimize.c:396:28: error: this statement may fall through [-Werror=implicit-fallthrough=]
    396 |                         if (stmt->expr->left->etype == EXPR_CONCAT) {
        |                            ^
  optimize.c:400:17: note: here
    400 |                 case STMT_VERDICT:
        |                 ^~~~

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

utils: call abort() after BUG() macro

Otherwise, we get spurious warnings. The compiler should be aware that there is
no return from BUG(). Call abort() there, which is marked as __attribute__
((__noreturn__)).

    In file included from ./include/nftables.h:6,
                     from ./include/rule.h:4,
                     from src/payload.c:26:
    src/payload.c: In function 'icmp_dep_to_type':
    ./include/utils.h:39:34: error: this statement may fall through [-Werror=implicit-fallthrough=]
       39 | #define BUG(fmt, arg...)        ({ fprintf(stderr, "BUG: " fmt, ##arg); assert(0); })
          |                                 ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    src/payload.c:791:17: note: in expansion of macro 'BUG'
      791 |                 BUG("Invalid map for simple dependency");
          |                 ^~~
    src/payload.c:792:9: note: here
      792 |         case PROTO_ICMP_ECHO: return ICMP_ECHO;
          |         ^~~~

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: fix "const static" declaration

Gcc warns against this with "-Wextra":

    src/rule.c:869:1: error: static is not at beginning of declaration [-Werror=old-style-declaration]
      869 | const static struct prio_tag std_prios[] = {
          | ^~~~~
    src/rule.c:878:1: error: static is not at beginning of declaration [-Werror=old-style-declaration]
      878 | const static struct prio_tag bridge_std_prios[] = {
          | ^~~~~

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: debloat frag.t.payload.netdev

This bytecode output file contains many duplicated entries, remove them.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: extend ip frag-off coverage

Cover matching on DF and MF bits and fragments.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

proto: use hexadecimal to display ip frag-off field

The ip frag-off field in the protocol definition is 16-bits long
and it contains the DF (0x2000) and MF (0x4000) bits too.

iptables-translate also suggests:

ip frag-off & 0x1ffff != 0

to match on fragments. Use hexadecimal for listing this header field.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: do not remove anonymous set with protocol flags and single element

Set lookups with flags search for an exact match, however:

tcp flags { syn }

gets transformed into:

tcp flags syn

which is matching on the syn flag only (non-exact match).

This optimization is safe for ct state though, because only one bit is
ever set on in the ct state bitmask.

Since protocol flags allow for combining flags, skip this optimization
to retain exact match semantics.

Another possible solution is to turn OP_IMPLICIT into OP_EQ for exact
flag match to re-introduce this optimization and deal with this corner
case.

Fixes: fee6bda06403 ("evaluate: remove anon sets with exactly one element")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

include: drop "format" attribute from nft_gmp_print()

nft_gmp_print() passes the format string and arguments to
gmp_vfprintf(). Note that the format string is then interpreted
by gmp, which also understand special specifiers like "%Zx".

Note that with clang we get various compiler warnings:

  datatype.c:299:26: error: invalid conversion specifier 'Z' [-Werror,-Wformat-invalid-specifier]
          nft_gmp_print(octx, "0x%Zx [invalid type]", expr->value);
                                 ~^

gcc doesn't warn, because to gcc 'Z' is a deprecated alias for 'z' and
because the 3rd argument of the attribute((format())) is zero (so gcc
doesn't validate the arguments). But Z specifier in gmp expects a
"mpz_t" value and not a size_t. It's really not the same thing.

The correct solution is not to mark the function to accept a printf format
string.

Fixes: 2535ba7006f2 ('src: get rid of printf')
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: suppress "-Wunused-but-set-variable" warning with "parser_bison.c"

Clang warns:

    parser_bison.c:7606:9: error: variable 'nft_nerrs' set but not used [-Werror,-Wunused-but-set-variable]
        int yynerrs = 0;
            ^
    parser_bison.c:72:25: note: expanded from macro 'yynerrs'
    #define yynerrs         nft_nerrs
                            ^

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: don't needlessly clear full string buffer in stmt_evaluate_log_prefix()

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: rework SNPRINTF_BUFFER_SIZE() and handle truncation

Before, the macro asserts against truncation. This is despite the
callers still checked for truncation and tried to handle it. Probably
for good reason. With stmt_evaluate_log_prefix() it's not clear that the
code ensures that truncation cannot happen, so we must not assert
against it, but handle it.

Also,

- wrap the macro in "do { ... } while(0)" to make it more
  function-like.

- evaluate macro arguments exactly once, to make it more function-like.

- take pointers to the arguments that are being modified.

- use assert() instead of abort().

- use size_t type for arguments related to the buffer size.

- drop "size". It was mostly redundant to "offset". We can know
  everything we want based on "len" and "offset" alone.

- "offset" previously was incremented before checking for truncation.
  So it would point somewhere past the buffer. This behavior does not
  seem useful. Instead, on truncation "len" will be zero (as before) and
  "offset" will point one past the buffer (one past the terminating
  NUL).

Thereby, also fix a warning from clang:

    evaluate.c:4134:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable]
            size_t size = 0;
                   ^

    meta.c:1006:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable]
            size_t size;
                   ^

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: fix check for truncation in stmt_evaluate_log_prefix()

Otherwise, nft crashes with prefix longer than 127 bytes:

# nft add rule x y log prefix \"eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee\"

==159385==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffed5bf4a10 at pc 0x7f3134839269 bp 0x7ffed5bf48b0 sp 0x7ffed5bf4060
WRITE of size 129 at 0x7ffed5bf4a10 thread T0
    #0 0x7f3134839268 in __interceptor_memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:778
    #1 0x7f3133e3074e in __mpz_export_data /tmp/nftables/src/gmputil.c:110
    #2 0x7f3133d21d3c in expr_to_string /tmp/nftables/src/expression.c:192
    #3 0x7f3133ded103 in netlink_gen_log_stmt /tmp/nftables/src/netlink_linearize.c:1148
    #4 0x7f3133df33a1 in netlink_gen_stmt /tmp/nftables/src/netlink_linearize.c:1682
    [...]

Fixes: e76bb3794018 ('src: allow for variables in the log prefix string')
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

datatype: avoid cast-align warning with struct sockaddr result from getaddrinfo()

With CC=clang we get

    datatype.c:625:11: error: cast from 'struct sockaddr *' to 'struct sockaddr_in *' increases required alignment from 2 to 4 [-Werror,-Wcast-align]
                    addr = ((struct sockaddr_in *)ai->ai_addr)->sin_addr;
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    datatype.c:690:11: error: cast from 'struct sockaddr *' to 'struct sockaddr_in6 *' increases required alignment from 2 to 4 [-Werror,-Wcast-align]
                    addr = ((struct sockaddr_in6 *)ai->ai_addr)->sin6_addr;
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    datatype.c:826:11: error: cast from 'struct sockaddr *' to 'struct sockaddr_in *' increases required alignment from 2 to 4 [-Werror,-Wcast-align]
                    port = ((struct sockaddr_in *)ai->ai_addr)->sin_port;
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix that by casting to (void*) first. Also, add an assertion that the
type is as expected.

For inet_service_type_parse(), differentiate between AF_INET and
AF_INET6. It might not have been a problem in practice, because the
struct offsets of sin_port/sin6_port are identical.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: avoid "-Wenum-conversion" warning in parser_bison.y

Clang warns:

    parser_bison.y:3658:83: error: implicit conversion from enumeration type 'enum nft_nat_types' to different enumeration type 'enum nft_nat_etypes' [-Werror,-Wenum-conversion]
                                            { (yyval.stmt) = nat_stmt_alloc(&(yyloc), NFT_NAT_SNAT); }
                                                             ~~~~~~~~~~~~~~           ^~~~~~~~~~~~
    parser_bison.y:3659:83: error: implicit conversion from enumeration type 'enum nft_nat_types' to different enumeration type 'enum nft_nat_etypes' [-Werror,-Wenum-conversion]
                                            { (yyval.stmt) = nat_stmt_alloc(&(yyloc), NFT_NAT_DNAT); }
                                                             ~~~~~~~~~~~~~~           ^~~~~~~~~~~~

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: avoid "-Wenum-conversion" warning in dtype_map_from_kernel()

Clang warns:

    netlink.c:806:26: error: implicit conversion from enumeration type 'enum nft_data_types' to different enumeration type 'enum datatypes' [-Werror,-Wenum-conversion]
                    return datatype_lookup(type);
                           ~~~~~~~~~~~~~~~ ^~~~

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: expand vmap test case to also cause batch abort

Let the last few batches also push an update that contains
elements twice.

This is expected to cause the batch to be aborted,
which increases code coverage on kernel side.

Signed-off-by: Florian Westphal <fw@strlen.de>

parser: permit gc-interval in map declarations

Maps support timeouts, so allow to set the gc interval as well.

Fixes: 949cc39eb93f ("parser: support of maps with timeout")
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: monitor: Fix for wrong ordering in expected JSON output

Adjust JSON for delete before add for replace after respective kernel
fix, too.

Fixes: ba786ac758fba ("tests: monitor: update insert and replace commands")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: monitor: Fix for wrong syntax in set-interval.t

Expected JSON output must be prefixed 'J'.

Fixes: 7ab453a033c9a ("monitor: do not call interval_map_decompose() for concat intervals")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: monitor: Fix time format in ct timeout test

The old "plain" numbers are still accepted (and assumed to be in
seconds), but output will use units which is unexpected due to 'O -'.
Adjust input instead of adding this subtly different output line.

Fixes: 5c25c5a35cbd2 ("parser: allow ct timeouts to use time_spec values")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: monitor: Fix monitor JSON output for insert command

Looks like commit ba786ac758fba ("tests: monitor: update insert and
replace commands") missed to also fix expected JSON output.

Fixes: 48d20b8cf162e ("monitor: honor NLM_F_APPEND flag for rules")
Signed-off-by: Phil Sutter <phil@nwl.cc>

evaluate: Drop dead code from expr_evaluate_mapping()

Since commit 343a51702656a ("src: store expr, not dtype to track data in
sets"), set->data is allocated for object maps in set_evaluate(), all
other map types have set->data initialized by the parser already,
set_evaluate() also checks that.

Drop the confusing check, later in the function set->data is
dereferenced unconditionally.

Fixes: 343a51702656a ("src: store expr, not dtype to track data in sets")
Signed-off-by: Phil Sutter <phil@nwl.cc>

configure: drop AM_PROG_CC_C_O autoconf check

This macro is obsolete since automake 1.14 (2013). It might have been
unnecessary even before, in practice only gcc/clang are supported
compilers.

[1] https://www.gnu.org/software/automake/manual/html_node/Public-Macros.html#index-AM_005fPROG_005fCC_005fC_005fO
[2] https://lists.gnu.org/archive/html/info-gnu/2013-06/msg00009.html

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

include: include <std{bool,int}.h> via <nft.h>

There is a minimum base that all our sources will end up needing. This
is what <nft.h> provides.

Add <stdbool.h> and <stdint.h> there. It's unlikely that we want to
implement anything, without having "bool" and "uint32_t" types
available.

Yes, this means the internal headers are not self-contained, with
respect to what <nft.h> provides. This is the exception to the rule, and
our internal headers should rely to have <nft.h> included for them.
They should not include <nft.h> themselves, because <nft.h> needs always
be included as first. So when an internal header would include <nft.h>
it would be unnecessary, because the header is *always* included
already.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

configure: use AC_USE_SYSTEM_EXTENSIONS to get _GNU_SOURCE

Let "configure" detect which features are available. Also, nftables is a
Linux project, so portability beyond gcc/clang and glibc/musl is less
relevant. And even if it were, then feature detection by "configure"
would still be preferable.

Use AC_USE_SYSTEM_EXTENSIONS ([1]).

Available since autoconf 2.60, from 2006 ([2]).

[1] https://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/Posix-Variants.html#index-AC_005fUSE_005fSYSTEM_005fEXTENSIONS-1046
[2] https://lists.gnu.org/archive/html/autoconf/2006-06/msg00111.html

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

include: don't define _GNU_SOURCE in public header

_GNU_SOURCE is supposed to be defined as first thing, before including any
libc headers. Defining it in the public header of nftables is wrong, because
it would only (somewhat) work if the user includes the nftables header as first
thing too. But that is not what users commonly would do, in particular with
autotools projects, where users would include <config.h> first.

It's also unnecessary. Nothing in "nftables/libnftables.h" itself
requires _GNU_SOURCE. Drop it.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add <nft.h> header and include it as first

<config.h> is generated by the configure script. As it contains our
feature detection, it want to use it everywhere.

Likewise, in some of our sources, we define _GNU_SOURCE. This defines
the C variant we want to use. Such a define need to come before anything
else, and it would be confusing if different source files adhere to a
different C variant. It would be good to use autoconf's
AC_USE_SYSTEM_EXTENSIONS, in which case we would also need to ensure
that <config.h> is always included as first.

Instead of going through all source files and include <config.h> as
first, add a new header "include/nft.h", which is supposed to be
included in all our sources (and as first).

This will also allow us later to prepare some common base, like include
<stdbool.h> everywhere.

We aim that headers are self-contained, so that they can be included in
any order. Which, by the way, already didn't work because some headers
define _GNU_SOURCE, which would only work if the header gets included as
first. <nft.h> is however an exception to the rule: everything we compile
shall rely on having <nft.h> header included as first. This applies to
source files (which explicitly include <nft.h>) and to internal header
files (which are only compiled indirectly, by being included from a source
file).

Note that <config.h> has no include guards, which is at least ugly to
include multiple times. It doesn't cause problems in practice, because
it only contains defines and the compiler doesn't warn about redefining
a macro with the same value. Still, <nft.h> also ensures to include
<config.h> exactly once.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

meta: define _GNU_SOURCE to get strptime() from <time.h>

To use `strptime()`, the documentation indicates

  #define _XOPEN_SOURCE
  #include <time.h>

However, previously this was done wrongly.

For example, when building with musl we got a warning:

      CC       meta.lo
    meta.c:40: warning: "_XOPEN_SOURCE" redefined
       40 | #define _XOPEN_SOURCE
          |
    In file included from /usr/include/errno.h:8,
                     from meta.c:13:
    /usr/include/features.h:16: note: this is the location of the previous definition
       16 | #define _XOPEN_SOURCE 700
          |

Defining "__USE_XOPEN" is wrong. This is a glibc internal define not for
the user.

Note that if we just set _XOPEN_SOURCE (or _XOPEN_SOURCE=700), we won't
get other things like "struct tm.tm_gmtoff".

Instead, we already define _GNU_SOURCE at other places. Do that here
too, it will give us strptime() and all is good.

Also, those directives should be defined as first thing (or via "-D"
command line). See [1].

This is also important, because to use "time_t" in a header, we would
need to include <time.h>. That only works, if we get the feature test
macros right. That is, define the _?_SOURCE macro as first thing.

[1] https://www.gnu.org/software/libc/manual/html_node/Feature-Test-Macros.html

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: 30s-stress: add failslab and abort phase tests

Pablo suggested to also cover abort phase by intentionally deleting
non-existent or adding clashing keys.

While at it:
add rules with anon sets and jumps to anonymous chains and a few
constant sets.

Pick different key sizes so there is a higher chance kernel picks
different backend storages such as bitmap or hash_fast.

add failslab support, this also covers unlikely or "impossible" cases like
failing GFP_KERNEL allocations.

randomly spawn 'nft monitor' in the background for a random duration
to cover notification path.

Try to randomly delete a set or chain from control plane.

Randomly set a table as dormant (and back to normal).

Allow to pass the test runtime as argument, so one can now do

./30s-stress 3600

to have the test run for one hour.

For such long test durations, make sure the ruleset
gets regenerated periodically.

Signed-off-by: Florian Westphal <fw@strlen.de>

py: add Nftables.{get,set}_input_flags() API

Similar to the existing Nftables.{get,set}_debug() API.

Only notable (internal) difference is that nft_ctx_input_set_flags()
returns the old value already, so we don't need to call
Nftables.get_input_flags() first.

The benefit of this API, is that it follows the existing API for debug
flags. Also, when future flags are added it requires few changes to the
python code.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

py: extract flags helper functions for set_debug()/get_debug()

Will be re-used for nft_ctx_input_set_flags() and
nft_ctx_input_get_flags().

There are changes in behavior here.

- when passing an unrecognized string (e.g. `ctx.set_debug('foo')` or
  `ctx.set_debug(['foo'])`), a ValueError is now raised instead of a
  KeyError.

- when passing an out-of-range integer, now a ValueError is no raised.
  Previously the integer was truncated to 32bit.

Changing the exception is an API change, but most likely nobody will
care or try to catch a KeyError to find out whether a flag is supported.
Especially, since such a check would be better performed via `'foo' in
ctx.debug_flags`.

In other cases, a TypeError is raised as before.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

py: fix exception during cleanup of half-initialized Nftables

When we create a Nftables instance against an older library version,
we might not find a symbol and fail with an exception when initializing
the context object.

Then, __del__() is still called, but resulting in a second exception
because self.__ctx is not set. Avoid that second exception.

    $ python -c 'import nftables; nftables.Nftables()'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/data/src/nftables/py/nftables.py", line 90, in __init__
        self.nft_ctx_input_get_flags = lib.nft_ctx_input_get_flags
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.11/ctypes/__init__.py", line 389, in __getattr__
        func = self.__getitem__(name)
               ^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.11/ctypes/__init__.py", line 394, in __getitem__
        func = self._FuncPtr((name_or_ordinal, self))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    AttributeError: /lib64/libnftables.so.1: undefined symbol: nft_ctx_input_get_flags
    Exception ignored in: <function Nftables.__del__ at 0x7f6315a2c540>
    Traceback (most recent call last):
      File "/data/src/nftables/py/nftables.py", line 166, in __del__
        self.nft_ctx_free(self.__ctx)
        ^^^^^^^^^^^^^^^^^
    AttributeError: 'Nftables' object has no attribute 'nft_ctx_free'

Signed-off-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add input flag NFT_CTX_INPUT_JSON to enable JSON parsing

By default, the input is parsed using the nftables grammar. When setting
NFT_CTX_OUTPUT_JSON flag, nftables will first try to parse the input as
JSON before falling back to the nftables grammar.

But NFT_CTX_OUTPUT_JSON flag also turns on JSON for the output. Add a
flag NFT_CTX_INPUT_JSON which allows to treat only the input as JSON,
but keep the output mode unchanged.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add input flag NFT_CTX_INPUT_NO_DNS to avoid blocking

getaddrinfo() blocks while trying to resolve the name. Blocking the
caller of the library is in many cases undesirable. Also, while
reconfiguring the firewall, it's not clear that resolving names via
the network will work or makes sense.

Add a new input flag NFT_CTX_INPUT_NO_DNS to opt-out from getaddrinfo()
and only accept plain IP addresses.

We could also use AI_NUMERICHOST with getaddrinfo() instead of
inet_pton(). By parsing via inet_pton(), we are better aware of
what we expect and can generate a better error message in case of
failure.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add input flags for nft_ctx

Similar to the existing output flags, add input flags. No flags are yet
implemented, that will follow.

One difference to nft_ctx_output_set_flags(), is that the setter for
input flags returns the previously set flags.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

gitignore: ignore cscope files

Cscope is useful. Ignore the files it creates.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: use minutes granularity in sets/0036add_set_element_expiration_0

Use minute granularity to fix bogus failures of this test on slow testbed.

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: error out on meter overlap with an existing set/map declaration

One of the problems with meters is that they use the set/map
infrastructure behind the scenes which might be confusing to users.

This patch errors out in case user declares a meter whose name overlaps
with an existing set/map:

meter.nft:15:18-91: Error: File exists; meter ‘syn4-meter’ overlaps an existing set ‘syn4-meter’ in family inet
tcp dport 22 meter syn4-meter { ip saddr . tcp dport timeout 5m limit rate 20/minute } counter accept
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

An old 5.10 kernel bails out simply with EEXIST, with this patch a
better hint is provided.

Dynamic sets are preferred over meters these days.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: chain listing implicitly sets on terse option

If user specifies a chain to be listed (which is internally handled via
filtering options), then toggle NFT_CACHE_TERSE to skip fetching set
content from kernel for non-anonymous sets.

With a large IPv6 set with bogons, before this patch:

# time nft list chain inet raw x
table inet raw {
        chain x {
                ip6 saddr @bogons6
                ip6 saddr { aaaa::, bbbb:: }
        }
}

real    0m2,913s
user    0m1,345s
sys     0m1,568s

After this patch:

# time nft list chain inet raw prerouting
table inet raw {
        chain x {
                ip6 saddr @bogons6
                ip6 saddr { aaaa::, bbbb:: }
        }
}

real    0m0,056s
user    0m0,018s
sys     0m0,039s

This speeds up chain listing in the presence of a large set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: Stabilize sets/0043concatenated_ranges_0 test

On a slow system, one of the 'delete element' commands would
occasionally fail. Assuming it can only happen if the 2s timeout passes
"too quickly", work around it by adding elements with a 2m timeout
instead and when wanting to test the element expiry just drop and add
the element again with a short timeout.

Fixes: 6231d3fa4af1e ("tests: shell: Fix for unstable sets/0043concatenated_ranges_0")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: shell: Stabilize sets/reset_command_0 test

Timeout/expiry value testing based on seconds is way too fragile,
especially with slow debug kernels. Rewrite the unit to test
minute-based values. This means it is no longer feasible to wait for
values to sufficiently change, so instead specify an 'expires' value
when creating the ruleset and drop the 'sleep' call.

While being at it:

- Combine 'get element' and 'reset element' calls into one, assert the
relevant (sanitized) line appears twice in output instead of comparing
with 'diff'.
- Turn comments into 'echo' calls to help debugging if the test fails.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Phil Sutter <phil@nwl.cc>

meta: use reentrant localtime_r()/gmtime_r() functions

These functions are POSIX.1-2001. We should have them in all
environments we care about.

Use them as they are thread-safe.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

meta: don't assume time_t is 64 bit in date_type_print()

time_t on 32bit arch is not uint64_t. Even if it always were, it would
be ugly to make such an assumption (without a static assert). Copy the
value to a time_t variable first.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

INSTALL: provide examples to install python bindings

Provide a bit more details on how to install python bindings with legacy
setup.py and pip with .toml file.

Original text from Jeremy Sowden <jeremy@azazel.net>.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

nftutils: add and use wrappers for getprotoby{name,number}_r(), getservbyport_r()

We should aim to use the thread-safe variants of getprotoby{name,number}
and getservbyport(). However, they may not be available with other libc,
so it requires a configure check. As that is cumbersome, add wrappers
that do that at one place.

These wrappers are thread-safe, if libc provides the reentrant versions.
Use them.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>