git.ipfire.org Git - thirdparty/nftables.git/log

rule: skip CMD_OBJ_SETELEMS with no elements after set flush

Set declaration + set flush results in a crash because CMD_OBJ_SETELEMS
does not expect no elements. This internal command only shows up if set
contains elements, however, evaluation flushes set content after the set
expansion. Skip this command CMD_OBJ_SETELEMS if set is empty.

Fixes: d3c8051cb767 ("rule: rework CMD_OBJ_SETELEMS logic")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: move fuzzer functionality to separate tool

This means some loss of functionality since you can no longer combine
--fuzzer with options like --debug, --define, --include.

On the upside, this adds new --random-outflags mode which will randomly
switch --terse, --numeric, --echo ... on/off.

Update README to reflect this change.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

tunnel: add missing tunnel object list support

Tunnel object listing support was missing. Now it is possible to list
tunnels. Example:

sudo nft list tunnel netdev x y
table netdev x {
tunnel y {
id 10
ip saddr 192.168.2.10
ip daddr 192.168.2.11
sport 10
dport 20
ttl 10
erspan {
version 1
index 2
}
}
}

Fixes: a937a5dc02db ("src: add tunnel statement and expression support")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>

support for afl++ (american fuzzy lop++) fuzzer

afl comes with a compiler frontend that can add instrumentation suitable
for running nftables via the "afl-fuzz" fuzzer.

This change adds a "--with-fuzzer" option to configure script and enables
specific handling in nftables and libnftables to speed up the fuzzing process.
It also adds the "--fuzzer" command line option.

afl-fuzz initialisation gets delayed until after the netlink context is set up
and symbol tables such as (e.g. route marks) have been parsed.

When afl-fuzz restarts the process with a new input round, it will
resume *after* this point (see __AFL_INIT macro in main.c).

With --fuzzer <stage>, nft will perform multiple fuzzing rounds per
invocation: this increases processing rate by an order of magnitude.
The argument to '--fuzzer' specifies the last stage to run:

1: 'parser':
    Only run / exercise the flex/bison parser.

2: 'eval': stop after the evaluation phase.
    This attempts to build a complete ruleset in memory, does
    symbol resolution, adds needed shift/masks to payload instructions
    etc.

3: 'netlink-ro':
    'netlink-ro' builds the netlink buffer to send to the kernel,
    without actually doing so.

4: 'netlink-rw':
    Pass generated command/ruleset will be passed to the kernel.
    You can combine it with the '--check' option to send data to the kernel
    but without actually committing any changes.
    This could still end up triggering a kernel crash if there are bugs
    in the valiation / transaction / abort phases.

Use 'netlink-ro' if you want to prevent nft from ever submitting any
changes to the kernel or if you are only interested in fuzzing nftables
and its libraries.

In case a kernel splat is detected, the fuzzing process stops and all further
fuzzer attemps are blocked until reboot.

Signed-off-by: Florian Westphal <fw@strlen.de>

doc: libnftables-json: Describe RULESET object

Document the syntax of this meta-object used by "list" and "flush"
commands only.

Fixes: 872f373dc50f7 ("doc: Add JSON schema documentation")
Signed-off-by: Phil Sutter <phil@nwl.cc>

rule: add missing documentation for cmd_obj enum

In cmd_obj enum hooks, tunnel and tunnels elements documentation were
missing.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: don't suggest to disable GSO

The kernel can form aggregate packets whether or not GSO is enabled.
Disabling GSO is not a useful suggestion in this case.

Fixes: 05628cdd677d (doc: describe behaviour of {ip,ip6} length)
Signed-off-by: Florian Westphal <fw@strlen.de>

build: don't install ancillary files without systemd service file

If the systemd service file is not installed, currently the related man-page
and example nft file are still installed. Instead only install them when the
service file is installed.

Fixes: 107580cfa85c ("build: disable --with-unitdir by default")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: fix some man-page mistakes

Correct one typo and two non-native usages.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: minor improvements the `reject` statement

Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: fix/improve documentation of verdicts

- Clarify that a terminating statement also prevents the execution of later
  statements in the same rule and give an example about that.
- Correct that `accept` won’t terminate the evaluation of the ruleset (which is
  generally used for the whole set of all chains, rules, etc.) but only that of
  the current base chain (and any regular chains called from that).
  Indicate that `accept` only accepts the packet from the current base chain’s
  point of view.
  Clarify that not only chains of a later hook could still drop the packet, but
  also ones from the same hook if they have a higher priority.
- Various other minor improvements/clarifications to wording.

Link: https://lore.kernel.org/netfilter-devel/3c7ddca7029fa04baa2402d895f3a594a6480a3a.camel@scientia.org/T/#t
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: add overall description of the ruleset evaluation

Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

utils: Cover for missing newline after BUG() messages

Relieve callers from having to suffix their messages with a newline
escape sequence, have the macro append it to the format string instead.

This is mostly a fix for (the many) calls to BUG() without a newline
suffix. Adjust the previously correct ones since they emit an extra
newline now.

Signed-off-by: Phil Sutter <phil@nwl.cc>

src: add refcount asserts

_get() functions must not be used when refcnt is 0, as expr_free()
releases expressions on 1 -> 0 transition.

Also, check that a refcount would not overflow from UINT_MAX to 0.
Use INT_MAX to also catch refcount leaks sooner, we don't expect
2**31 get()s on same object.

This helps catching use-after-free refcounting bugs even when nft
is built without ASAN support.

v3: use a macro + BUG to get more info without a coredump.

Signed-off-by: Florian Westphal <fw@strlen.de>

doc: remove queue from verdict list

While its correct that the queue statement is internally implemented
via the queue verdict, this is an implementation detail.
We don't list "stolen" as a verdict either.

nft ... queue will always use the nft_queue statement, so move the
reinject detail from statements to queue statement and remove this.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: fix typo in vmap_timeout test script

While executing the test suite from tests/shell folder, the following error
is displayed many times:

tests/shell/testcases/maps/vmap_timeout: line 48: [: : integer expected

Looking at the script, a non-existing variable (expires) is tested instead of
the existing one (expire).

Reproduction:
tests/shell/run-tests.sh -v

Fixes: db80037c0279 ("tests: shell: extend vmap test with updates")
Signed-off-by: Gyorgy Sarvari <skandigraun@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: add more documentation on bitmasks and sets

Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: fix/improve documentation of jump/goto/return

Overhaul the description of `jump`/`goto`/`return`.
`jump` only explains what the statement causes from the point of view of the
new chain (that is: not, how the returning works), which includes that an
implicit `return` is issued at the end of the chain.
`goto` is explained in reference to `jump`.
`return` describes abstractly how the return position is determined and what
happens if there’s no position to return to (but not for example where an
implicit `return` is issued).

List and explain verdict-like statements like `reject` which internally imply
`accept` or `drop`.
Further explain that with respect to evaluation these behave like their
respectively implied verdicts.

Link: https://lore.kernel.org/netfilter-devel/3c7ddca7029fa04baa2402d895f3a594a6480a3a.camel@scientia.org/T/#t
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: follow prefix expression recursively if needed

Included bogons assert:
Assertion `!expr_is_constant(*expr) || expr_is_singleton(*expr)' failed

This is because the "foo*" + prefix combination causes expr_evaluate
to replace the binop + string expression with another prefix that
gets allocated while handling "foo*" (wildcard).

This causes expr_evaluate_prefix to build
a prefix -> prefix -> binop chain.

After this, we get:

Error: Right hand side of relational expression ((null)) must be constant
a b ct helper "2.2.2.2.3*1"/80
~~~~~~~~~~^^^^^^^^^^^^^^^^
Error: Binary operation (&) is undefined for prefix expressions
a b ct helper "2.2.2.****02"/80
^^^^^^^^^^^^^^^^^

for those inputs rather than hitting assert() in byteorder_conversion()
later on.

Signed-off-by: Florian Westphal <fw@strlen.de>

netlink: Zero nft_data_linearize objects when populating

Callers of netlink_gen_{key,data}() pass an uninitialized auto-variable,
avoid misinterpreting garbage in fields "left blank".

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: ip6/vmap.t: Drop double whitespace in rule

Just a harmless typo.

Signed-off-by: Phil Sutter <phil@nwl.cc>

datatype: Increase symbolic constant printer robustness

Do not segfault if passed symbol table is NULL.

Signed-off-by: Phil Sutter <phil@nwl.cc>

netlink: No need to reference array when passing as pointer

Struct nft_data_linearize::value is an array, drop the reference
operator.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: Do not rely upon '[end]' marker

Set element lines reliably start with whitespace followed by the word "element"
and are separated by the same pattern. Use it instead of '[end]' (or anything
enclosed in brackets).

While at it, recognize payload lines as starting with ' [ ' and avoid
searching for the closing bracket.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: Implement payload_record()

This is a helper function to store payload records (and JSON
equivalents) in .got files. The code it replaces missed to insert a
newline before the new entry and also did not check for existing records
in all spots.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: Fix for using wrong payload path

If one family has a per-family payload record, following families used
it by accident for a .got file when they actually should use the generic
name.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: inet/osf.t: Fix element ordering in JSON equivalents

The original rules order set elements differently. Stick to that and add
entries to inet/osf.t.json.output to cover for nftables reordering
entries.

Fixes: 92029c1282958 ("src: osf: add json support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: any/ct.t.json.output: Drop leftover entry

The rule with single element anonymous set was replaced, drop this
leftover.

Fixes: 27f6a4c68b4fd ("tests: replace single element sets")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: any/tcpopt.t.json: Fix JSON equivalent

Set element ordering differed from the rule in standard syntax.

Fixes: d199cca92f9eb ("expression: expr_build_udata_recurse should recurse")
Signed-off-by: Phil Sutter <phil@nwl.cc>

optimize: Fix verdict expression comparison

In verdict expression, 'chain' points at a constant expression of
verdict_type, not a symbol expression. Therefore 'chain->identifier'
points eight bytes (on 64bit systems) into the mpz_t 'value' holding the
chain name. This matches the '_mp_d' data pointer, so works by accident.

Fix this by copying what verdict_jump_chain_print() does and export
chain names before comparing.

Fixes: fb298877ece27 ("src: add ruleset optimization infrastructure")
Signed-off-by: Phil Sutter <phil@nwl.cc>

datatype: Fix boolean type on Big Endian

Pass a reference to a variable with correct size when creating the
expression, otherwise mpz_import_data() will read only the always zero
upper byte on Big Endian hosts.

Fixes: afb6a8e66a111 ("datatype: clamp boolean value to 0 and 1")
Signed-off-by: Phil Sutter <phil@nwl.cc>

src: parser_json: fix format string bugs

After adding fmt attribute annotation:
warning: format not a string literal and no format arguments [-Wformat-security]
131 | erec_queue(error(&loc, err->text), ctx->msgs);
In function 'json_events_cb':
warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type '__u32' {aka 'unsigned int'} [-Wformat=]

Fix that up too.

Fixes: 586ad210368b ("libnftables: Implement JSON parser")
Signed-off-by: Florian Westphal <fw@strlen.de>

src: fix fmt string warnings

for some reason several functions had a __gmp_fmtstring annotation,
but that was an empty macro.

After fixing it up, we get several new warnings:

In file included from src/datatype.c:28:
src/datatype.c:174:24: note: in expansion of macro 'error'
  174 |                 return error(&sym->location,
      |                        ^~~~~
src/datatype.c:405:24: note: in expansion of macro 'error'
  405 |                 return error(&sym->location, "Could not parse %s; did you mean `%s'?",
      |                        ^~~~~

Fmt string says '%s', but unqailified void *, add 'const char *' cast,
it is safe in both cases.

In file included from src/evaluate.c:29:
src/evaluate.c: In function 'byteorder_conversion':
src/evaluate.c:232:35: warning: format '%s' expects a matching 'char *' argument [-Wformat=]
  232 |                                   "Byteorder mismatch: %s expected %s, %s got %s",
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Actual bug, fmt string has one '%s' too many, remove it.

All other warnings were due to '%u' instead of '%lu' / '%zu'.

Signed-off-by: Florian Westphal <fw@strlen.de>

doc: describe include’s collation order to be that of the C locale

Currently, `nft` doesn’t call `setlocale(3)` and thus `glob(3)` uses the `C`
locale.

Document this as it’s possibly relevant to the ordering of included rules.

This also makes the collation order “official” so any future localisation would
need to adhere to that.

Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: minor improvements with respect to the term “ruleset”

Statements are elements of rules. Non-terminal statement are in particular
passive with respect to their rules (and thus automatically with respect to the
whole ruleset).

In “Continue ruleset evaluation”, it’s not necessary to mention the ruleset as
it’s obvious that the evaluation of the current chain will be continued.

Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: reject tunnel section if another one is already present

Included bogon causes a crash because the list head isn't initialised
due to tunnel->type == VXLAN.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>

src: parser_bison: prevent multiple ip daddr/saddr definitions

minor change to the bogon makes it assert because symbolic expression
will have wrong refcount (2) at scope teardown.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>

src: tunnel src/dst must be a symbolic expression

Included bogons crash with segfault and assertion.  After fix:

tunnel_with_garbage_dst:3:12-14: Error: syntax error, unexpected tcp, expecting string or quoted string or string with a trailing asterisk or '$'
  ip saddr tcp dport { }
           ^^^
The parser change restricts the grammar to no longer allow this,
we would crash here because we enter payload evaluation path that
tries to insert a dependency into the rule, but we don't have one
(ctx->rule and ctx->stmt are NULL as expected here).

The eval stage change makes sure we will reject non-value symbols:

tunnel_with_anon_set_assert:1:12-31: Error: must be a value, not set
define s = { 1.2.3.4, 5.6.7.8 }
           ^^^^^^^^^^^^^^^^^^^^

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>

evaluate: tunnel: don't assume src is set

Included bogon crashes, after fix:

empty_geneve_definition_crash:2:9-16: Error: Could not process rule: Invalid argument

Since this feature is undocumented (hint, hint) I don't know
if there are cases where ip daddr can be elided.

If not, a followup patch should reject empty dst upfront
so users get a more verbose error message.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>

tests: shell: add packetpath test for reject statement

Test case for:
91a79b792204 ("netfilter: nf_reject: don't leak dst refcount for loopback packets")
and
db99b2f2b3e2 ("netfilter: nf_reject: don't reply to icmp error messages")

Signed-off-by: Florian Westphal <fw@strlen.de>

doc: clarify evaluation of chains

In particular:
- Mention that grouping of chains in tables is irrelevant to the evaluation
  order.
- Clarify that priorities only define the ordering of chains per hook.
- Improved potentially ambiguous wording “lower priority values have precedence
  over higher ones”, which could be mistaken as that rules from lower priority
  chains might “win” over such from higher ones (which is however only the case
  if they drop/reject packets).
  The new wording merely describes which chains are evaluated first, implicitly
  referring the question which verdict “wins” to the section where verdicts are
  described, and also should work when lower priority chains mangle packets (in
  which case they might actually be considered as having “precedence”).

Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: add packetpath test for meta ibrhwaddr

The test checks that the packets are processed by the bridge device and
not forwarded.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>

meta: introduce meta ibrhwaddr support

Can be used in bridge prerouting hook to redirect the packet to the
receiving physical device for processing.

table bridge nat {
        chain PREROUTING {
                type filter hook prerouting priority 0; policy accept;
                ether daddr de:ad:00:00:be:ef meta pkttype set host ether daddr set meta ibrhwaddr accept
        }
}

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: fix tcpdump example

The expression needs to be enclosed in a single string and combined with
a logical AND to have the desired effect.

Fixes: 1188a69604c3 ("src: introduce SYNPROXY matching")
Signed-off-by: Georg Pfuetzenreuter <mail@georg-pfuetzenreuter.net>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: type_route_chain: use in-tree nftables, not system-wide one

Switch this to $NFT, which contains the locally-compiled binary.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: fix name based checks with CONFIG_MODULES=n

Don't include a trailing space, its only there if nftables is a module:

  hook ingress device foo2 {
     0000000000 chain netdev t c [nf_tables]
  }

with CONFIG_NF_TABLES=y, this gets listed as:
'0000000000 chain netdev t c\n'.

Signed-off-by: Florian Westphal <fw@strlen.de>

mnl: Drop asterisk from end of NFTA_DEVICE_PREFIX strings

The asterisk left in place becomes part of the prefix by accident and is thus
both included when matching interface names as well as dumped back to user
space.

Fixes: c31e887504a90 ("mnl: Support simple wildcards in netdev hooks")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: add regression tests for set flush+add bugs

Create a helper file to:
1. create client <-> router <-> server topology
2. floodping from client to server
3. add a chain + set that contains both client and server
addresses
4. a control counter that should never match
5. then, flush the set (not the ruleset) and re-add the
addresses in one transaction

Report failure when counter had a match.

The test cases for the set types are done in separate files to take
advantage of run-tests.sh parallelization.

The expected behavior is that every ping packet is matched by the set.
The packet path should either match the old state, right before flush,
or the new state, after re-add.

As the flushed addresses are re-added in the same transaction we must
not observe in-limbo state where existing elements are deactivated but
new elements are not found.

Signed-off-by: Florian Westphal <fw@strlen.de>

src: tunnel: handle tunnel delete command

'delete tunnel foo bar' causes nft to bug out.

Fixes: 35d9c77c5745 ("src: add tunnel template support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: objects.t: must use input, not output

synproxy must never be used in output rules, doing so results in kernel
crash due to infinite recursive calls back to nf_hook_slow() for the
emitted reply packet.

Up until recently kernel lacked this validation, and now that the kernel
rejects this the test fails. Use input to make this pass again.

A new test to ensure we reject synproxy in ouput should be added
in the near future.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: Test ifname-based hooks

Assert that:
- Non-matching interface specs are accepted
- Existing interfaces are hooked into upon flowtable/chain creation
- A new device matching the spec is hooked into immediately
- No stale hooks remain in 'nft list hooks' output
- Wildcard hooks basically work

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: Accept ASTERISK_STRING in flowtable_expr_member

All clauses are identical, so instead of adding a third one for
ASTERISK_STRING, use a single one for 'string' (which combines all three
variants).

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

mnl: Support simple wildcards in netdev hooks

When building NFTA_{FLOWTABLE_,}HOOK_DEVS attributes, detect trailing
asterisks in interface names and transmit the leading part in a
NFTA_DEVICE_PREFIX attribute.

Deserialization (i.e., appending asterisk to interface prefixes returned
in NFTA_DEVICE_PREFIX atributes happens in libnftnl.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

libnftables: do not re-add default include directory in include search path

Otherwise globbing might duplicate included files because
include_path_glob() is called twice.

Fixes: 7eb950a8e8fa ("libnftables: include canonical path to avoid duplicates")
Tested-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

fib: Fix for existence check on Big Endian

Adjust the expression size to 1B so cmp expression value is correct.
Without this, the rule 'fib saddr . iif check exists' generates
following byte code on BE:

|  [ fib saddr . iif oif present => reg 1 ]
|  [ cmp eq reg 1 0x00000001 ]

Though with NFTA_FIB_F_PRESENT flag set, nft_fib.ko writes to the first
byte of reg 1 only (using nft_reg_store8()). With this patch in place,
byte code is correct:

|  [ fib saddr . iif oif present => reg 1 ]
|  [ cmp eq reg 1 0x01000000 ]

Fixes: f686a17eafa0b ("fib: Support existence check")
Cc: Yi Chen <yiche@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

Makefile: Enable support for 'make check'

With all test suites running all variants by default, add the various
testsuite runners to TESTS variable so 'make check' will execute them.

Introduce --enable-distcheck configure flag for internal use during
builds triggered by 'make distcheck'. This flag will force TESTS
variable to remain empty, so 'make check' run as part of distcheck will
not call any test suite: Most of the test suites require privileged
execution, 'make distcheck' usually doesn't and probably shouldn't.
Assuming the latter is used during the release process, it may even not
run on a machine which is up to date enough to generate meaningful test
suite results. Hence spare the release process from the likely pointless
delay imposed by 'make check'.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: build: Avoid a recursive 'make check' run

When called by 'make check', the test suite runs with a MAKEFLAGS
variable in environment which defines TEST_LOGS variable with the test
suites' corresponding logs as value. This in turn causes the called
'make distcheck' to run test suites although it is not supposed to.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: build: Do not assume caller's CWD

Cover for being called from a different directory by changing into the
test suite's directory first.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: shell: Skip packetpath/nat_ftp in fake root env

The script relies upon a call to modprobe which does not work in
fake root environments.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: json_echo: Skip if run as non-root

The test suite manipulates the kernel ruleset. Use the well-known return
code 77 to indicate test execution being skipped.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: Prepare exit codes for automake

Make the test suite runners exit 77 when requiring root and running as
regular user, exit 99 for internal errors (unrelated to test cases) and
exit 1 (or any free non-zero value) to indicate test failures.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: Enable JSON and JSON schema by default

Introduce -J/--disable-json and -S/--no-schema to explicitly disable
them if desired.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: monitor: Excercise all syntaxes and variants by default

Introduce -s/--standard flag to restrict execution to standard syntax
and let users select a specific variant by means of -e/--echo and
-m/--monitor flags. Run all four possible combinations by default.

To keep indenting sane, introduce run_testcase() executing tests in a
single test case for a given syntax and variant.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: monitor: Extend testcases a bit

Try to cover for reduced table and chain deletion notifications by
creating them with data which is omitted by the kernel during deletion.

Also try to expose the difference in reported flowtable hook deletion
vs. flowtable deletion.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

monitor: Inform JSON printer when reporting an object delete event

Since kernel commit a1050dd07168 ("netfilter: nf_tables: Reintroduce
shortened deletion notifications"), type-specific data is no longer
dumped when notifying for a deleted object. JSON output was not aware of
this and tried to print bogus data.

Fixes: 9e88aae28e9f4 ("monitor: Use libnftables JSON output")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

mnl: Allow for updating devices on existing inet ingress hook chains

Complete commit a66b5ad9540dd ("src: allow for updating devices on
existing netdev chain") in supporting inet family ingress hook chains as
well. The kernel does already but nft has to add a proper hooknum
attribute to pass the checks.

Calling chain_evaluate() for populating the hook.num field is a bit over
the top and has potentially unwanted side-effects. Introduce a minimal
chain_del_evaluate() for this purpose.

Signed-off-by: Phil Sutter <phil@nwl.cc>

Makefile: Fix for 'make CFLAGS=...'

Appending to CFLAGS from configure.ac like this was too naive, passing
custom CFLAGS in make arguments overwrites it. Extend AM_CFLAGS instead.

Fixes: 64c07e38f0494 ("table: Embed creating nft version into userdata")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip two bitwise tests if multi-register support isn't available

These tests fail in case kernel requires bitwise RHS to be a constant
value.

Fixes: 67d2a8d4c86f ("tests: shell: add parser and packetpath test")
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: monitor: Extend debug output a bit

Dump echo output and output file, surrounded by markers to highlight
empty files and extra newlines.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: monitor: Test JSON echo mode as well

Reuse the expected JSON monitor output for --echo testing as it is
supposed to be "identical" - apart from formatting differences. To match
lines of commands (monitor output) against a single line of JSON object
(echo output), join the former's lines and drop the surrounding object
in the latter since this seems to be the simplest way.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: monitor: Fix regex collecting expected echo output

No input triggered this bug, but the match would accept "insert" and
"replace" keywords anywhere in the line not just at the beginning as was
intended.

Fixes: b2506e5504fed ("tests: Merge monitor and echo test suites")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: monitor: Label diffs to help users

Clarify what was expected and what was actually received.

Signed-off-by: Phil Sutter <phil@nwl.cc>

monitor: Quote device names in chain declarations, too

Fixed commit missed the fact that there are two routines printing chain
declarations.

Fixes: eb30f236d91a8 ("rule: print chain and flowtable devices in quotes")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tools: gitignore nftables.service file

Fixes: c4b17cf830510 ("tools: add a systemd unit for static rulesets")
Signed-off-by: Phil Sutter <phil@nwl.cc>

parser_bison: remove leftover utf-8 character in error

replace "‘" (UTF-8, 0xe280 0x98) with "'" (ASCII 0x27).

Fixes: c92ec3b21979 ("src: remove utf-8 character in printf lines")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

table: Embed creating nft version into userdata

Upon listing a table which was created by a newer version of nftables,
warn about the potentially incomplete content.

Suggested-by: Florian Westphal <fw@strlen.de>
Cc: Dan Winship <danwinship@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: combine flowtable devices with variable expression

Expand test with flowtable devices defined with variables to improve
coverage.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: simplify set to list normalisation for device expressions

When evaluating the list of devices, two expressions are possible:

- EXPR_LIST, which is the expected expression type to store the list of
  chain/flowtable devices.

- EXPR_SET, in case that a variable is used to express the device list.
  This is because it is not possible to know if the variable defines
  set elements or devices. Since sets are more common, EXPR_SET is used.

In the latter case, this list expressed as EXPR_SET gets translated to
EXPR_LIST. Before such translation, the EXPR_VARIABLE is evaluated,
therefore all variables are gone and only EXPR_SET_ELEM are possible in
expr_set_to_list().

Remove the EXPR_VALUE and EXPR_VARIABLE cases in expr_set_to_list()
since those are never seen. Add BUG() in case any other expressions than
EXPR_SET_ELEM is seen.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: replace compound_expr_alloc() by type safe function

Replace compound_expr_alloc() by {set,list,concat}_expr_alloc() to
validate expression type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

expression: replace compound_expr_print() by type safe function

Replace compound_expr_print() by {list,set,concat}_expr_print() to
validate expression type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

expression: replace compound_expr_destroy() by type safe funtion

Replace it by {set,list,concat}_expr_destroy() to validate type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

expression: replace compound_expr_remove() by type safe function

Replace this function by {list,concat,set}_expr_remove() to validate
expression type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

expression: remove compound_expr_add()

No more users of this function after conversion to type safe variant,
remove it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

expression: replace compound_expr_clone() by type safe function

Replace compound_expr_clone() by:

- concat_expr_clone()
- list_expr_clone()
- set_expr_clone()

to validate type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: rename set_compound_expr_add() to set_expr_add_splice()

To avoid confusion when perfoming git grep to search for compound_expr_add()

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: replace compound_expr_add() by type safe list_expr_add()

Replace compound_expr_add() by list_expr_add() to validate type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: replace compound_expr_add() by type safe concat_expr_add()

Replace compound_expr_add by concat_expr_add() to validate type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: replace compound_expr_add() by type safe set_expr_add()

Replace compound_expr_add() by set_expr_add() to validate type.

Add __set_expr_add() to skip size updates in src/intervals.c

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add expr_type_catchall() helper and use it

Add helper function to check if this is a catchall expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: add tunnel shell and python tests

Add tests for tunnel statement and object support. Shell and python
tests both cover standard nft output and json.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tunnel: add tunnel object and statement json support

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tunnel: add geneve support

This patch extends the tunnel metadata object to define geneve tunnel
specific configurations:

table netdev x {
tunnel y {
id 10
ip saddr 192.168.2.10
ip daddr 192.168.2.11
sport 10
dport 20
ttl 10
geneve {
class 0x1010 opt-type 0x1 data "0x12345678"
class 0x1020 opt-type 0x2 data "0x87654321"
class 0x2020 opt-type 0x3 data "0x87654321abcdeffe"
}
}
}

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tunnel: add vxlan support

This patch extends the tunnel metadata object to define vxlan tunnel
specific configurations:

table netdev x {
tunnel y {
id 10
ip saddr 192.168.2.10
ip daddr 192.168.2.11
sport 10
dport 20
ttl 10
vxlan {
gbp 200
}
}
}

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add tunnel statement and expression support

This patch allows you to attach tunnel metadata through the tunnel
statement.

The following example shows how to redirect traffic to the erspan0
tunnel device which will take the tunnel configuration that is
specified by the ruleset.

     table netdev x {
            tunnel y {
                    id 10
                    ip saddr 192.168.2.10
                    ip daddr 192.168.2.11
                    sport 10
                    dport 20
                    ttl 10
                    erspan {
                            version 1
                            index 2
                    }
            }

    chain x {
    type filter hook ingress device veth0 priority 0;

    ip daddr 10.141.10.123 tunnel name y fwd to erspan0
    }
     }

This patch also allows to match on tunnel metadata via tunnel expression.

Joint work with Fernando.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tunnel: add erspan support

This patch extends the tunnel metadata object to define erspan tunnel
specific configurations:

table netdev x {
        tunnel y {
                id 10
                ip saddr 192.168.2.10
                ip daddr 192.168.2.11
                sport 10
                dport 20
                ttl 10
                erspan {
                        version 1
                        index 2
                }
        }
}

Joint work with Fernando.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add tunnel template support

This patch adds tunnel template support, this allows to attach a
metadata template that provides the configuration for the tunnel driver.

Example of generic tunnel configuration:

table netdev x {
        tunnel y {
                id 10
                ip saddr 192.168.2.10
                ip daddr 192.168.2.11
                sport 10
                dport 20
                ttl 10
        }
}

This still requires the tunnel statement to attach this metadata
template, this comes in a follow up patch.

Joint work with Fernando.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

build: Bump version to 1.1.5

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

build: disable --with-unitdir by default

Same behaviour as in the original patch:

--with-unitdir auto-detects the systemd unit path.
--with-unitdir=PATH uses the PATH

no --with-unitdir means this does not install the systemd unit file.

INSTALL file description looks fine for what this does after this
patch.

While at this, extend tests/build/ to cover for this new option.

Fixes: c4b17cf830510 ("tools: add a systemd unit for static rulesets")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Makefile: Fix for 'make distcheck'

Make sure the files in tools/ are added to the tarball and that the
created nftables.service file is removed upon 'make clean'.

Fixes: c4b17cf830510 ("tools: add a systemd unit for static rulesets")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

mnl: continue on ENOBUFS errors when processing batch

A user reports that:

  nft -f ruleset.nft

fails with:

  netlink: Error: Could not process rule: No buffer space available

This was triggered by:

table ip6 fule {
  set domestic_ip6 {
    type ipv6_addr
    flags dynamic,interval
    elements = $domestic_ip6
  }
  chain prerouting {
    type filter hook prerouting priority 0;
    ip6 daddr @domestic_ip6 counter
  }
}

where $domestic_ip6 contains a large number of IPv6 addresses.

This set declaration is not supported currently, because dynamic sets
with intervals are not supported, then every IPv6 address that is added
triggers an error, overruning the userspace socket buffer with lots of
NLMSG_ERROR messages (or too big NLMSG_ERROR message to fit into the
socket buffer).

In the particular context of batch processing, ENOBUFS is just an
indication that too many errors have occurred. The kernel cannot store
any more NLMSG_ERROR messages into the userspace socket buffer.

However, there are still NLMSG_ERROR messages in the socket buffer to be
processed that can provide a hint on what is going on.

Instead of breaking on ENOBUFS in batches, continue error processing.

After this patch, the ruleset above displays:

ruleset.nft:2367:7-18: Error: Could not process rule: Operation not supported
  set domestic_ip6 {
      ^^^^^^^^^^^^
ruleset.nft:2367:7-18: Error: Could not process rule: No such file or directory
  set domestic_ip6 {
      ^^^^^^^^^^^^

Fixes: a72315d2bad4 ("src: add rule batching support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>