git.ipfire.org Git - thirdparty/nftables.git/log

optimize: do not merge raw payload expressions

Merging raw expressions results in a valid concatenation which throws:

Error: can not use variable sized data types (integer) in concat expressions

Disable merging raw expressions until this is supported by skipping raw
expressions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: check for payload base and offset when searching for mergers

Extend the existing checks to cover the payload base and offset.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: merge verdict maps with same lookup key

Merge two consecutive verdict maps with the same lookup key.

For instance, merge the following:

table inet x {
        chain filter_in_tcp {
                tcp dport vmap {
                           80 : accept,
                           81 : accept,
                          443 : accept,
                          931 : accept,
                         5001 : accept,
                         5201 : accept,
                }
                tcp dport vmap {
                         6800-6999  : accept,
                        33434-33499 : accept,
                }
        }
}

into:

table inet x {
        chain filter_in_tcp {
                tcp dport vmap {
                           80 : accept,
                           81 : accept,
                          443 : accept,
                          931 : accept,
                         5001 : accept,
                         5201 : accept,
                         6800-6999  : accept,
                        33434-33499 : accept,
                }
}
}

This patch updates statement comparison routine to inspect the verdict
expression type to detect possible merger.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: add __expr_cmp()

Add helper function to compare expression to allow for reuse.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: Use abort() in case of netlink_abi_error

Library functions should not use exit(), application that uses the
library may contain error handling path, that cannot be executed if
library functions calls exit(). For truly fatal errors, using abort() is
more acceptable than exit().

Signed-off-by: Eugene Crosser <crosser@average.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: missing synproxy support in map declarations

Update parser to allow for maps with synproxy.

Fixes: f44ab88b1088 ("src: add synproxy stateful object support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: remove redundant payload expressions

Now that we keep track of more payload dependencies, more redundant
payloads are eliminated. Remove these from the shell test-cases.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: py: remove redundant payload expressions

Now that we keep track of more payload dependencies, more redundant
payloads are eliminated. Remove these from the Python test-cases.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

src: store more than one payload dependency

Change the payload-dependency context to store a dependency for every
protocol layer. This allows us to eliminate more redundant protocol
expressions.

Signed-off-by: Florian Westphal <fw@strlen.de>

src: add a helper that returns a payload dependency for a particular base

Currently, with only one base and dependency stored this is superfluous,
but it will become more useful when the next commit adds support for
storing a payload for every base.

Remove redundant `ctx->pbase` check.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: py: fix inet/ip.t bridge payload

Correct the statement used to load the protocol in the bridge payload
of one of the ip tests.

A previous commit was supposed, in part, to do this, but the update got
lost.

Fixes: 4b8e51ea5fc8 ("tests: py: fix inet/ip.t payloads")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

src: silence compiler warnings

cache.c:504:22: warning: ‘chain’ may be used uninitialized in this function [-Wmaybe-uninitialized]
cache.c:504:22: warning: ‘table’ may be used uninitialized in this function [-Wmaybe-uninitialized]
erec.c:128:16: warning: ‘line’ may be used uninitialized in this function [-Wmaybe-uninitialized]
optimize.c:524:9: warning: ‘line’ may be used uninitialized in this function [-Wmaybe-uninitialized]

Fixes: 8ad4056e9182 ("erec: expose print_location() and line_location()")
Fixes: afbd102211dc ("src: do not use the nft_cache_filter object from mnl.c")
Fixes: fb298877ece2 ("src: add ruleset optimization infrastructure")
Signed-off-by: Florian Westphal <fw@strlen.de>

libnftables: use xrealloc()

Instead of realloc(), so process stops execution in case memory
allocation fails.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: 'nft list chain' prints anonymous chains correctly

If the user is requesting a chain listing, e.g. nft list chain x y
and a rule refers to an anonymous chain that cannot be found in the cache,
then fetch such anonymous chain and its ruleset.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1577
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: add helper function to fill up the rule cache

Add a helper function to dump the rules and add them to the
corresponding chain.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: do not set error code twice

The 'ret' variable is already set to a negative value to report an
error, do not set it again to a negative value.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: do not use the nft_cache_filter object from mnl.c

Pass the table and chain strings to mnl_nft_rule_dump() instead.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: merge several selectors with different verdict into verdict map

Transform:

  ip saddr 1.1.1.1 ip daddr 2.2.2.2 accept
  ip saddr 2.2.2.2 ip daddr 3.3.3.3 drop

into:

  ip saddr . ip daddr vmap { 1.1.1.1 . 2.2.2.2 : accept, 2.2.2.2 . 3.3.3.3 : drop }

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: merge same selector with different verdict into verdict map

Transform:

  ct state invalid drop
  ct state established,related accept

into:

  ct state vmap { established : accept, related : accept, invalid : drop }

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: merge rules with same selectors into a concatenation

This patch extends the ruleset optimization infrastructure to collapse
several rules with the same selectors into a concatenation.

Transform:

  meta iifname eth1 ip saddr 1.1.1.1 ip daddr 2.2.2.3 accept
  meta iifname eth1 ip saddr 1.1.1.2 ip daddr 2.2.2.5 accept
  meta iifname eth2 ip saddr 1.1.1.3 ip daddr 2.2.2.6 accept

into:

  meta iifname . ip saddr . ip daddr { eth1 . 1.1.1.1 . 2.2.2.6, eth1 . 1.1.1.2 . 2.2.2.5 , eth1 . 1.1.1.3 . 2.2.2.6 } accept

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add ruleset optimization infrastructure

This patch adds a new -o/--optimize option to enable ruleset
optimization.

You can combine this option with the dry run mode (--check) to review
the proposed ruleset updates without actually loading the ruleset, e.g.

# nft -c -o -f ruleset.test
Merging:
ruleset.nft:16:3-37:           ip daddr 192.168.0.1 counter accept
ruleset.nft:17:3-37:           ip daddr 192.168.0.2 counter accept
ruleset.nft:18:3-37:           ip daddr 192.168.0.3 counter accept
into:
        ip daddr { 192.168.0.1, 192.168.0.2, 192.168.0.3 } counter packets 0 bytes 0 accept

This infrastructure collects the common statements that are used in
rules, then it builds a matrix of rules vs. statements. Then, it looks
for common statements in consecutive rules which allows to merge rules.

This ruleset optimization always performs an implicit dry run to
validate that the original ruleset is correct. Then, on a second pass,
it performs the ruleset optimization and add the rules into the kernel
(unless --check has been specified by the user).

From libnftables perspective, there is a new API to enable
this feature:

  uint32_t nft_ctx_get_optimize(struct nft_ctx *ctx);
  void nft_ctx_set_optimize(struct nft_ctx *ctx, uint32_t flags);

This patch adds support for the first optimization: Collapse a linear
list of rules matching on a single selector into a set as exposed in the
example above.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: remove '$' in symbol_expr_print

This is used in --debug=eval mode to annotate symbols that have not yet
been evaluated, remove it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: error reporting with -f and read from stdin

Reading from stdin requires to store the ruleset in a buffer so error
reporting works accordingly, eg.

# cat ruleset.nft | nft -f -
/dev/stdin:3:13-13: Error: unknown identifier 'x'
ip saddr $x
^

The error reporting infrastructure performs a fseek() on the file
descriptor which does not work in this case since the data from the
descriptor has been already consumed.

This patch adds a new stdin input descriptor to perform this special
handling which consists on re-routing this request through the buffer
functions.

Fixes: 935f82e7dd49 ("Support 'nft -f -' to read from stdin")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

erec: expose print_location() and line_location()

Add a few helper functions to reuse code in the new rule optimization
infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: simplify logic governing storing payload dependencies

There are several places where we check whether `ctx->pdctx.pbase`
equal to `PROTO_BASE_INVALID` and don't bother trying to free the
dependency if so. However, these checks are redundant.

In `payload_match_expand` and `trace_gen_stmts`, we skip a call to
`payload_dependency_kill`, but that calls `payload_dependency_exists` to check a
dependency exists before doing anything else.

In `ct_meta_common_postprocess`, we skip an open-coded equivalent to
`payload_dependency_kill` which performs some different checks, but the
first is the same: a call to `payload_dependency_exists`.

Therefore, we can drop the redundant checks and simplify the flow-
control in the functions.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

src: reduce indentation

Re-arrange some switch-cases and conditionals to reduce levels of
indentation.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

src: remove arithmetic on booleans

Instead of subtracting a boolean from the protocol base for stacked
payloads, just decrement the base variable itself.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

netlink_delinearize: fix typo

Correct spelling in comment.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: py: fix inet/ip_tcp.t test

Contrary to the comment and expected output, nft does _not_ eliminate
the redundant `ip protocol` expression from the second test. Dependency
elimination requires a higher level expression. `ip saddr` cannot lead
to the elimination of `ip protocol` since they are both L3 expressions.
`tcp dport` cannot because although `ip saddr` and `ip protocol` both
imply that the L3 protocol is `ip`, only protocol matches are stored as
dependencies, so the redundancy is not apparent, and in fact,
`payload_may_dependency_kill` explicitly checks for the combination of
inet, bridge or netdev family, L4 expression and L3 ipv4 or ipv6
dependency and returns false.

Correct the expected output and comment.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: py: fix inet/ip.t payloads

In one of the bridge payloads, the wrong command is given to load the
protocol.

[ fw@strlen.de: remove the duplicated netdev payload ]

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: py: fix inet/sets.t netdev payload

The netdev payload for one of the inet/sets.t tests was cut-and-pasted
from the inet payload without being properly updated.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: attempt to set_eval flag if dynamic updates requested

When passing no upper size limit, the dynset expression forces
an internal 64k upperlimit.

In some cases, this can result in 'nft -f' to restore the ruleset.
Avoid this by always setting the EVAL flag on a set definition when
we encounter packet-path update attempt in the batch.

Reported-by: Yi Chen <yiche@redhat.com>
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

parser: allow quoted string in flowtable_expr_member

Devices with interface names starting with a digit can not be configured
in flowtables. Trying to do so throws the following error:

Error: syntax error, unexpected number, expecting comma or '}'
devices = { eth0, 6in4-wan6 };

This is however a perfectly valid interface name. Solve the issue by
allowing the use of quoted strings.

Suggested-by: Jo-Philipp Wich <jo@mein.io>
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

build: remove scanner.c and parser_bison.c with `maintainer-clean`

automake recommends shipping the output of bison and lex in distribution
tar-balls and runs bison and lex during `make dist` (this has the
advantage that end-users don't need to have bison or lex installed to
compile the software). Accordingly, automake also recommends removing
these files with `make maintainer-clean` and generates rules to do so.
Therefore, remove scanner.c and parser_bison.c from `CLEANFILES`.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

proto: revisit short-circuit loops over upper protocols

Move the check for NULL protocol description away from the loop to avoid
too long line.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: extend catchall tests for maps

Add a few tests for the catchall features and maps.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

build: fix autoconf warnings

autoconf complains about three obsolete macros.

`AC_CONFIG_HEADER` has been superseded by `AC_CONFIG_HEADERS`, so
replace it.

`AM_PROG_LEX` calls `AC_PROG_LEX` with no arguments, but this usage is
deprecated. The only difference between `AM_PROG_LEX` and `AC_PROG_LEX`
is that the former defines `$LEX` as "./build-aux/missing lex" if no lex
is found to ensure a useful error is reported when make is run. How-
ever, the configure script checks that we have a working lex and exits
with an error if none is available, so `$LEX` will never be called and
we can replace `AM_PROG_LEX` with `AC_PROG_LEX`.

`AM_PROG_LIBTOOL` has been superseded by `LT_INIT`, which is already in
configure.ac, so remove it.

We can also replace `AC_DISABLE_STATIC` with an argument to `LT_INIT`.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: remove stray debug flag.

0040mark_shift_0 was passing --debug=eval to nft. Remove it.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: reject: support ethernet as L2 protocol for inet table

When we are evaluating a `reject` statement in the `inet` family, we may
have `ether` and `ip` or `ip6` as the L2 and L3 protocols in the
evaluation context:

  table inet filter {
    chain input {
      type filter hook input priority filter;
      ether saddr aa:bb:cc:dd:ee:ff ip daddr 192.168.0.1 reject
    }
  }

Since no `reject` option is given, nft attempts to infer one and fails:

  BUG: unsupported familynft: evaluate.c:2766:stmt_evaluate_reject_inet_family: Assertion `0' failed.
  Aborted

The reason it fails is that the ethernet protocol numbers for IPv4 and
IPv6 (`ETH_P_IP` and `ETH_P_IPV6`) do not match `NFPROTO_IPV4` and
`NFPROTO_IPV6`.  Add support for the ethernet protocol numbers.

Replace the current `BUG("unsupported family")` error message with
something more informative that tells the user to provide an explicit
reject option.

Add a Python test case.

Fixes: 5fdd0b6a0600 ("nft: complete reject support")
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1001360
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: correct typo's

There are a couple of mistakes in comments. Fix them.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

proto: short-circuit loops over upper protocols

Each `struct proto_desc` contains a fixed-size array of higher layer
protocols. Only the first few are not NULL. Therefore, we can stop
iterating over the array once we reach a NULL member.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinearize: zero shift removal

Remove shifts-by-0.  These can occur after binop postprocessing
has adjusted the RHS value to account for a mask operation.

Example: frag frag-off @s4

Is internally represented via:

  [ exthdr load ipv6 2b @ 44 + 2 => reg 1 ]
  [ bitwise reg 1 = ( reg 1 & 0x0000f8ff ) ^ 0x00000000 ]
  [ bitwise reg 1 = ( reg 1 >> 0x00000003 ) ]
  [ lookup reg 1 set s ]

First binop masks out unwanted parts of the 16-bit field.
Second binop needs to left-shift so that lookups in the set will work.

When decoding, the first binop is removed after the exthdr load
has been adjusted accordingly.  Constant propagation adjusts the
shift-value to 0 on removal.  This change then gets rid of the
shift-by-0 entirely.

After this change, 'frag frag-off @s4' input is shown as-is.

Signed-off-by: Florian Westphal <fw@strlen.de>

netlink_delinearize: and/shift postprocessing

Before this patch:
in: frag frag-off @s4
in: ip version @s8

out: (@nh,0,8 & 0xf0) >> 4 == @s8
out: (frag unknown & 0xfff8 [invalid type]) >> 3 == @s4

after:
out: frag frag-off >> 0 == @s4
out: ip version >> 0 == @s8

Next patch adds support for zero-shift removal.

Signed-off-by: Florian Westphal <fw@strlen.de>

payload: skip templates with meta key set

meta templates are only there for ease of use (input/parsing).

When listing, they should be ignored:
set s4 { typeof ip version elements = { 1, } }
chain c4 { ip version @s4 accept }

gets listed as 'ip l4proto ...' which is nonsensical.

after this patch we get:
in: ip version @s4
out: (@nh,0,8 & 0xf0) >> 4 == @s4

.. which is (marginally) better.

Next patch adds support for payload decoding.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: add shift+and typeof test cases

These tests work, but I omitted a few lines that do not:

in: frag frag-off @s4 accept
in: ip version @s8

out: (frag unknown & 0xfff8 [invalid type]) >> 3 == @s4
out: (ip l4proto & pfsync) >> 4 == @s8

Next patches resolve this.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: better parameters for the interval stack overflow test

Wider testing has shown that 128 kB stack is too low (e.g. for systems
with 64 kB page size), leading to false failures in some environments.

Based on results from a matrix of RHEL 8 and RHEL 9 systems across
x86_64, aarch64, ppc64le and s390x architectures as well as some
anecdotal testing of other Linux distros on x86_64 machines, 400 kB
seems safe: the normal nft stack (which should stay constant during
this test) on all tested systems doesn't exceed 200 kB (stays around
100 kB on typical systems with 4 kB page size), while always growing
beyond 500 kB in the failing case (nftables before baecd1cf2685) with
the increased set size.

Fixes: d8ccad2a2b73 ("tests: cover baecd1cf2685 ("segtree: Fix segfault when restoring a huge interval set")")
Signed-off-by: Štěpán Němec <snemec@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>

iptopt: fix crash with invalid field/type combo

% nft describe ip option rr value
segmentation fault

after this fix, this exits with 'Error: unknown ip option type/field'.

Problem is that 'rr' doesn't have a value template, so the template struct is
all-zeroes, so we crash when trying to use tmpl->dtype (its NULL).

Furthermore, expr_describe tries to print expr->identifier but expr is
exthdr, not symbol: ->identifier contains garbage.

Signed-off-by: Florian Westphal <fw@strlen.de>

exthdr: support ip/tcp options and sctp chunks in typeof expressions

This did not store the 'op' member and listing always treated this as ipv6
extension header.

Add test cases for this.

Signed-off-by: Florian Westphal <fw@strlen.de>

ipopt: drop unused 'ptr' argument

Its always 0, so remove it.
Looks like this was intended to support variable options that have
array-like members, but so far this isn't implemented, better remove
dead code and implement it properly when such support is needed.

Signed-off-by: Florian Westphal <fw@strlen.de>

cache: Support filtering for a specific flowtable

Extend nft_cache_filter to hold a flowtable name so 'list flowtable'
command causes fetching the requested flowtable only.

Dump flowtables just once instead of for each table, merely assign
fetched data to tables inside the loop.

Signed-off-by: Phil Sutter <phil@nwl.cc>

cache: Filter set list on server side

Fetch either all tables' sets at once, a specific table's sets or even a
specific set if needed instead of iterating over the list of previously
fetched tables and fetching for each, then ignoring anything returned
that doesn't match the filter.

Signed-off-by: Phil Sutter <phil@nwl.cc>

cache: Filter chain list on kernel side

When operating on a specific chain, add payload to NFT_MSG_GETCHAIN so
kernel returns only relevant data. Since ENOENT is an expected return
code, do not treat this as error.

While being at it, improve code in chain_cache_cb() a bit:
- Check chain's family first, it is a less expensive check than
comparing table names.
- Do not extract chain name of uninteresting chains.

Signed-off-by: Phil Sutter <phil@nwl.cc>

cache: Filter rule list on kernel side

Instead of fetching all existing rules in kernel's ruleset and filtering
in user space, add payload to the dump request specifying the table and
chain to filter for.

Since list_rule_cb() no longer needs the filter, pass only netlink_ctx
to the callback and drop struct rule_cache_dump_ctx.

Signed-off-by: Phil Sutter <phil@nwl.cc>

cache: Filter tables on kernel side

Instead of requesting a dump of all tables and filtering the data in
user space, construct a non-dump request if filter contains a table so
kernel returns only that single table.

This should improve nft performance in rulesets with many tables
present.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: py: add tcp subtype match test cases

Signed-off-by: Florian Westphal <fw@strlen.de>

exthdr: fix tcpopt_find_template to use length after mask adjustment

Unify binop handling for ipv6 extension header, ip option and tcp option
processing.

Pass the real offset and length expected, not the one used in the kernel.
This was already done for extension headers and ip options, but tcp
option parsing did not do this.

This was fine before because no existing tcp option template
had a non-byte sized member.

With mptcp addition this isn't the case anymore, subtype field is
only 4 bits wide, but tcp option delinearization passed 8bits instead.

Pass the offset and mask delta, just like ip option/ipv6 exthdr.

This makes nft show 'tcp option mptcp subtype 1' instead of
'tcp option mptcp unknown & 240 == 16'.

Signed-off-by: Florian Westphal <fw@strlen.de>

mptcp: add subtype matching

MPTCP multiplexes the various mptcp signalling data using the
first 4 bits of the mptcp option.

This allows to match on the mptcp subtype via:

tcp option mptcp subtype 1

This misses delinearization support. mptcp subtype is the first tcp
option field that has a length of less than one byte.

Serialization processing will add a binop for this, but netlink
delinearization can't remove them, yet.

Also misses a new datatype/symbol table to allow to use mnemonics like
'mp_join' instead of raw numbers.

For this reason, no tests are added yet.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: py: add test cases for md5sig, fastopen and mptcp mnemonics

Signed-off-by: Florian Westphal <fw@strlen.de>

tcpopt: add md5sig, fastopen and mptcp options

Allow to use "fastopen", "md5sig" and "mptcp" mnemonics rather than the
raw option numbers.

These new keywords are only recognized while scanner is in tcp state.

Signed-off-by: Florian Westphal <fw@strlen.de>

parser: split tcp option rules

At this time the parser will accept nonsensical input like

tcp option mss left 2

which will be treated as 'tcp option maxseg size 2'.
This is because the enum space overlaps.

Split the rules so that 'tcp option mss' will only
accept field names specific to the mss/maxseg option kind.

Signed-off-by: Florian Westphal <fw@strlen.de>
(cherry picked from commit 46168852c03d73c29b557c93029dc512ca6e233a)

scanner: add tcp flex scope

This moves tcp options not used anywhere else (e.g. in synproxy) to a
distinct scope. This will also allow to avoid exposing new option
keywords in the ruleset context.

Signed-off-by: Florian Westphal <fw@strlen.de>

tcpopt: remove KIND keyword

tcp option <foo> kind ... never makes any sense, as "tcp option <foo>"
already tells the kernel to look for the foo <kind>.

"tcp option sack kind 5" matches if the sack option is present; its a
more complicated form of the simpler "tcp option sack exists".

"tcp option sack kind 1" (or any other value than 5) will never match.

So remove this.

Test cases are converted to "exists".

Signed-off-by: Florian Westphal <fw@strlen.de>

netlink_delinearize: binop: make accesses to expr->left/right conditional

This function can be called for different expression types, including
some (EXPR_MAP) where expr->left/right alias to different member
variables.

This makes accesses to those members conditional by checking the
expression type ahead of the access.

Signed-off-by: Florian Westphal <fw@strlen.de>

netlink_delinearize: rename misleading variable

relational_binop_postprocess() is called for EXPR_RELATIONAL,
so "expr->right" is safe to use.

But the RHS can be something other than a value.
This has been extended to handle other types, so rename to 'right'.

No code changes intended.

Signed-off-by: Florian Westphal <fw@strlen.de>

netlink_delinearize: use correct member type

expr is a map, so this should use expr->map, not expr->left.
These fields are aliased, so this would break if that is ever changed.

Signed-off-by: Florian Westphal <fw@strlen.de>

cli: save history on ctrl-d with editline

Missing call to cli_exit() to save the history when ctrl-d is pressed in
nft -i.

Moreover, remove call to rl_callback_handler_remove() in cli_exit() for
editline cli since it does not call rl_callback_handler_install().

Fixes: bc2d5f79c2ea ("cli: use plain readline() interface with libedit")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinearize: Fix for escaped asterisk strings on Big Endian

The original nul-char detection was not functional on Big Endian.
Instead, go a simpler route by exporting the string and working on the
exported data to check for a nul-char and escape a trailing asterisk if
present. With the data export already happening in the caller, fold
escaped_string_wildcard_expr_alloc() into it as well.

Fixes: b851ba4731d9f ("src: add interface wildcard matching")
Signed-off-by: Phil Sutter <phil@nwl.cc>

ct: Fix ct label value parser

Size of array to export the bit value into was eight times too large, so
on Big Endian the data written into the data reg was always zero.

Fixes: 2fcce8b0677b3 ("ct: connlabel matching support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

datatype: Fix size of time_type

Used by 'ct expiration', time_type is supposed to be 32bits. Passing a
64bits variable to constant_expr_alloc() causes the value to be always
zero on Big Endian.

Fixes: 0974fa84f162a ("datatype: seperate time parsing/printing from time_type")
Signed-off-by: Phil Sutter <phil@nwl.cc>

meta: Fix hour_type size

In kernel as well as when parsing, hour_type is assumed to be 32bits.
Having the struct datatype field set to 64bits breaks Big Endian and so
does passing a 64bit value and 32 as length to constant_expr_alloc() as
it makes it import the upper 32bits. Fix this by turning 'result' into a
uint32_t and introduce a temporary uint64_t just for the call to
time_parse() which expects that.

Fixes: f8f32deda31df ("meta: Introduce new conditions 'time', 'day' and 'hour'")
Signed-off-by: Phil Sutter <phil@nwl.cc>

meta: Fix {g,u}id_type on Big Endian

Using a 64bit variable to temporarily hold the parsed value works only
on Little Endian. uid_t and gid_t (and therefore also pw->pw_uid and
gr->gr_gid) are 32bit.
To fix this, use uid_t/gid_t for the temporary variable but keep the
64bit one for numeric parsing so values exceeding 32bits are still
detected.

Fixes: e0ed4c45d9ad2 ("meta: relax restriction on UID/GID parsing")
Signed-off-by: Phil Sutter <phil@nwl.cc>

src: Fix payload statement mask on Big Endian

The mask used to select bits to keep must be exported in the same
byteorder as the payload statement itself, also the length of the
exported data must match the number of bytes extracted earlier.

Signed-off-by: Phil Sutter <phil@nwl.cc>

mnl: Fix for missing info in rule dumps

Commit 0e52cab1e64ab improved error reporting by adding rule's table and
chain names to netlink message directly, prefixed by their location
info. This in turn caused netlink dumps of the rule to not contain table
and chain name anymore. Fix this by inserting the missing info before
dumping and remove it afterwards to not cause duplicated entries in
netlink message.

Signed-off-by: Phil Sutter <phil@nwl.cc>

exthdr: Fix for segfault with unknown exthdr

Unknown exthdr type with NFT_EXTHDR_F_PRESENT flag set caused
NULL-pointer deref. Fix this by moving the conditional exthdr.desc deref
atop the function and use the result in all cases.

Fixes: e02bd59c4009b ("exthdr: Implement existence check")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests/py: Avoid duplicate records in *.got files

If payloads don't contain family-specific bits, they may sit in a single
*.payload file for all tested families. In such case, nft-test.py will
consequently write dissenting payloads into a single *.got file. To
avoid the duplicate entries, check if a matching record exists already
before writing it out.

Signed-off-by: Phil Sutter <phil@nwl.cc>

exthdr: fix type number saved in udata

This should store the index of the protocol template, but
&x[i] - &x[0] is always i, so remove the divide. Also add test case.

Fixes: 01fbc1574b9e ("exthdr: add parse and build userdata interface")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Phil Sutter <phil@nwl.cc>

cli: remove #include <editline/history.h>

This header is not required to compile nftables with editline, remove
it, this unbreak compilation in several distros which have no symlink
from history.h to editline.h

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

mnl: different signedness compilation warning

mnl.c: In function ‘mnl_batch_talk’:
mnl.c:417:17: warning: comparison of integer expressions of different signedness: ‘unsigned in’ and ‘long int’ [-Wsign-compare]
if (rcvbufsiz < NFT_MNL_ECHO_RCVBUFF_DEFAULT)
^

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: do not skip populating anonymous set with -t

--terse does not apply to anonymous set, add a NFT_CACHE_TERSE bit
to skip named sets only.

Moreover, prioritize specific listing filter over --terse to avoid a
bogus:

netlink: Error: Unknown set '__set0' in lookup expression

when invoking:

# nft -ta list set inet filter example

Extend existing test to improve coverage.

Fixes: 9628d52e46ac ("cache: disable NFT_CACHE_SETELEM_BIT on --terse listing only")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

build: Bump version to 1.0.1

Requires libnftnl 1.2.1

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

monitor: do not call interval_map_decompose() for concat intervals

Without this, nft monitor will either print garbage or even segfault
when encountering a concat set because we pass expr->value to libgmp
helpers for concat (non-value) expressions.

Also, for concat case, we need to call concat_range_aggregate() helper.
Add a test case for this. Without this patch, it gives:

tests/monitor/run-tests.sh: line 98: 1163 Segmentation fault
(core dumped) $nft -nn -e -f $command_file > $echo_output

Signed-off-by: Florian Westphal <fw@strlen.de>

parser_json: add raw payload inner header match support

Add missing "ih" base raw payload and extend tests/py to cover this new
usecase.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser: allow for string raw payload base

Remove new 'ih' token, allow to represent the raw payload base with a
string instead.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: remove netdev coverage in ip/ip_tcp.t

The following tests shows a warning in the netdev family:

ip/ip_tcp.t: WARNING: line 9: 'add rule netdev test-netdev ingress ip protocol tcp tcp dport 22': 'tcp dport 22' mismatches 'ip protocol 6 tcp dport 22'

'ip protocol tcp' can be removed in the ip family, but not in netdev.

This test is specific of the ip family, remove the netdev lines.

Fixes: 510c4fad7e78 ("src: Support netdev egress hook")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: missing json output update in ip6/meta.t

Update json output for 'meta protocol ip6 udp dport 67'.

Fixes: 646c5d02a5db ("rule: remove redundant meta protocol from the evaluation step")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: missing ip/snat.t json updates

Missing json update for new tests added recently.

Fixes: 50780456a01a ("evaluate: check for missing transport protocol match in nat map with concatenations")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: missing ip/dnat.t json updates

Missing json update for three new tests added recently.

Fixes: 640dc0c8a3da ("tests: py: extend coverage for dnat with classic range representation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: filter out rules by chain

With an autogenerated ruleset with ~20k chains.

# time nft list ruleset &> /dev/null

real    0m1,712s
user    0m1,258s
sys     0m0,454s

Speed up listing of a specific chain:

# time nft list chain nat MWDG-UGR-234PNG3YBUOTS5QD &> /dev/null

real    0m0,542s
user    0m0,251s
sys     0m0,292s

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: missing family in cache filtering

Check family when filtering out listing of tables and sets.

Fixes: 3f1d3912c3a6 ("cache: filter out tables that are not requested")
Fixes: 635ee1cad8aa ("cache: filter out sets and maps that are not requested")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: do not populate cache if it is going to be flushed

Skip set element netlink dump if set is flushed, this speeds up
set flush + add element operation in a batch file for an existing set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: move list filter under struct

Wrap the table and set fields for list filtering to prepare for the
introduction element filters.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

doc: update ct timeout section with the state names

docs are too terse and did not have the list of valid timeout states.
While at it, adjust default stream timeout of udp to 120, this is the
current kernel default.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: py: update rawpayload.t.json

Missing update of json test.

Fixes: 6ad2058da66a ("datatype: add xinteger_type alias to print in hexadecimal")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: grab reference in set expression evaluation

Do not clone expression when evaluation a set expression, grabbing the
reference counter to reuse the object is sufficient.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: clone variable expression if there is more than one reference

Clone the expression that defines the variable value if there are
multiple references to it in the ruleset. This saves heap memory
consumption in case the variable defines a set with a huge number of
elements.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

mnl: do not build nftnl_set element list

Do not call alloc_setelem_cache() to build the set element list in
nftnl_set. Instead, translate one single set element expression to
nftnl_set_elem object at a time and use this object to build the netlink
header.

Using a huge test set containing 1.1 million element blocklist, this
patch is reducing userspace memory consumption by 40%.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: remove verdict from closing end interval

Kernel does not allow for NFT_SET_ELEM_INTERVAL_END flag and
NFTA_SET_ELEM_DATA. The closing end interval represents a mismatch,
therefore, no verdict can be applied. The existing payload files show
the drop verdict when this is unset (because NF_DROP=0).

This update is required to fix payload warnings in tests/py after
libnftnl's ("set: use NFTNL_SET_ELEM_VERDICT to print verdict").

Fixes: 6671d9d137f6 ("mnl: Set NFTNL_SET_DATA_TYPE before dumping set elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: raw payload match and mangle on inner header / payload data

This patch adds support to match on inner header / payload data:

# nft add rule x y @ih,32,32 0x14000000 counter

you can also mangle payload data:

# nft add rule x y @ih,32,32 set 0x14000000 counter

This update triggers a checksum update at the layer 4 header via
csum_flags, mangling odd bytes is also aligned to 16-bits.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: $NFT needs to be invoked unquoted

The variable has to undergo word splitting, otherwise the shell tries
to find the variable value as an executable, which breaks in cases that
7c8a44b25c22 ("tests: shell: Allow wrappers to be passed as nft command")
intends to support.

Mention this in the shell tests README.

Fixes: d8ccad2a2b73 ("tests: cover baecd1cf2685 ("segtree: Fix segfault when restoring a huge interval set")")
Signed-off-by: Štěpán Němec <snemec@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: shell: README: clarify test file name convention

Since commit 4d26b6dd3c4c, test file name suffix no longer reflects
expected exit code in all cases.

Move the sentence "Since they are located with `find', test files can
be put in any subdirectory." to a separate paragraph.

Fixes: 4d26b6dd3c4c ("tests: shell: change all test scripts to return 0")
Signed-off-by: Štěpán Němec <snemec@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>