]> git.ipfire.org Git - thirdparty/nftables.git/log
thirdparty/nftables.git
3 years agotests: shell: remove stray debug flag.
Jeremy Sowden [Wed, 15 Dec 2021 18:43:41 +0000 (18:43 +0000)] 
tests: shell: remove stray debug flag.

0040mark_shift_0 was passing --debug=eval to nft.  Remove it.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoevaluate: reject: support ethernet as L2 protocol for inet table
Jeremy Sowden [Sat, 11 Dec 2021 18:55:25 +0000 (18:55 +0000)] 
evaluate: reject: support ethernet as L2 protocol for inet table

When we are evaluating a `reject` statement in the `inet` family, we may
have `ether` and `ip` or `ip6` as the L2 and L3 protocols in the
evaluation context:

  table inet filter {
    chain input {
      type filter hook input priority filter;
      ether saddr aa:bb:cc:dd:ee:ff ip daddr 192.168.0.1 reject
    }
  }

Since no `reject` option is given, nft attempts to infer one and fails:

  BUG: unsupported familynft: evaluate.c:2766:stmt_evaluate_reject_inet_family: Assertion `0' failed.
  Aborted

The reason it fails is that the ethernet protocol numbers for IPv4 and
IPv6 (`ETH_P_IP` and `ETH_P_IPV6`) do not match `NFPROTO_IPV4` and
`NFPROTO_IPV6`.  Add support for the ethernet protocol numbers.

Replace the current `BUG("unsupported family")` error message with
something more informative that tells the user to provide an explicit
reject option.

Add a Python test case.

Fixes: 5fdd0b6a0600 ("nft: complete reject support")
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1001360
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoevaluate: correct typo's
Jeremy Sowden [Sat, 11 Dec 2021 18:55:24 +0000 (18:55 +0000)] 
evaluate: correct typo's

There are a couple of mistakes in comments.  Fix them.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoproto: short-circuit loops over upper protocols
Jeremy Sowden [Sat, 11 Dec 2021 18:55:23 +0000 (18:55 +0000)] 
proto: short-circuit loops over upper protocols

Each `struct proto_desc` contains a fixed-size array of higher layer
protocols.  Only the first few are not NULL.  Therefore, we can stop
iterating over the array once we reach a NULL member.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetlink_delinearize: zero shift removal
Florian Westphal [Fri, 3 Dec 2021 19:19:10 +0000 (20:19 +0100)] 
netlink_delinearize: zero shift removal

Remove shifts-by-0.  These can occur after binop postprocessing
has adjusted the RHS value to account for a mask operation.

Example: frag frag-off @s4

Is internally represented via:

  [ exthdr load ipv6 2b @ 44 + 2 => reg 1 ]
  [ bitwise reg 1 = ( reg 1 & 0x0000f8ff ) ^ 0x00000000 ]
  [ bitwise reg 1 = ( reg 1 >> 0x00000003 ) ]
  [ lookup reg 1 set s ]

First binop masks out unwanted parts of the 16-bit field.
Second binop needs to left-shift so that lookups in the set will work.

When decoding, the first binop is removed after the exthdr load
has been adjusted accordingly.  Constant propagation adjusts the
shift-value to 0 on removal.  This change then gets rid of the
shift-by-0 entirely.

After this change, 'frag frag-off @s4' input is shown as-is.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agonetlink_delinearize: and/shift postprocessing
Florian Westphal [Fri, 3 Dec 2021 19:04:31 +0000 (20:04 +0100)] 
netlink_delinearize: and/shift postprocessing

Before this patch:
in:  frag frag-off @s4
in:  ip version @s8

out: (@nh,0,8 & 0xf0) >> 4 == @s8
out: (frag unknown & 0xfff8 [invalid type]) >> 3 == @s4

after:
out: frag frag-off >> 0 == @s4
out: ip version >> 0 == @s8

Next patch adds support for zero-shift removal.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agopayload: skip templates with meta key set
Florian Westphal [Tue, 30 Nov 2021 20:11:23 +0000 (21:11 +0100)] 
payload: skip templates with meta key set

meta templates are only there for ease of use (input/parsing).

When listing, they should be ignored:
 set s4 { typeof ip version elements = { 1, } }
 chain c4 { ip version @s4 accept }

gets listed as 'ip l4proto ...' which is nonsensical.

 after this patch we get:
in: ip version @s4
out: (@nh,0,8 & 0xf0) >> 4 == @s4

.. which is (marginally) better.

Next patch adds support for payload decoding.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agotests: add shift+and typeof test cases
Florian Westphal [Fri, 3 Dec 2021 16:12:17 +0000 (17:12 +0100)] 
tests: add shift+and typeof test cases

These tests work, but I omitted a few lines that do not:

in: frag frag-off @s4 accept
in: ip version @s8

out: (frag unknown & 0xfff8 [invalid type]) >> 3 == @s4
out:  (ip l4proto & pfsync) >> 4 == @s8

Next patches resolve this.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agotests: shell: better parameters for the interval stack overflow test
Štěpán Němec [Wed, 1 Dec 2021 11:12:00 +0000 (12:12 +0100)] 
tests: shell: better parameters for the interval stack overflow test

Wider testing has shown that 128 kB stack is too low (e.g. for systems
with 64 kB page size), leading to false failures in some environments.

Based on results from a matrix of RHEL 8 and RHEL 9 systems across
x86_64, aarch64, ppc64le and s390x architectures as well as some
anecdotal testing of other Linux distros on x86_64 machines, 400 kB
seems safe: the normal nft stack (which should stay constant during
this test) on all tested systems doesn't exceed 200 kB (stays around
100 kB on typical systems with 4 kB page size), while always growing
beyond 500 kB in the failing case (nftables before baecd1cf2685) with
the increased set size.

Fixes: d8ccad2a2b73 ("tests: cover baecd1cf2685 ("segtree: Fix segfault when restoring a huge interval set")")
Signed-off-by: Štěpán Němec <snemec@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agoiptopt: fix crash with invalid field/type combo
Florian Westphal [Fri, 3 Dec 2021 16:07:55 +0000 (17:07 +0100)] 
iptopt: fix crash with invalid field/type combo

% nft describe ip option rr value
segmentation fault

after this fix, this exits with 'Error: unknown ip option type/field'.

Problem is that 'rr' doesn't have a value template, so the template struct is
all-zeroes, so we crash when trying to use tmpl->dtype (its NULL).

Furthermore, expr_describe tries to print expr->identifier but expr is
exthdr, not symbol: ->identifier contains garbage.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agoexthdr: support ip/tcp options and sctp chunks in typeof expressions
Florian Westphal [Fri, 3 Dec 2021 16:07:54 +0000 (17:07 +0100)] 
exthdr: support ip/tcp options and sctp chunks in typeof expressions

This did not store the 'op' member and listing always treated this as ipv6
extension header.

Add test cases for this.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agoipopt: drop unused 'ptr' argument
Florian Westphal [Fri, 3 Dec 2021 16:07:53 +0000 (17:07 +0100)] 
ipopt: drop unused 'ptr' argument

Its always 0, so remove it.
Looks like this was intended to support variable options that have
array-like members, but so far this isn't implemented, better remove
dead code and implement it properly when such support is needed.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agocache: Support filtering for a specific flowtable
Phil Sutter [Tue, 30 Nov 2021 19:06:09 +0000 (20:06 +0100)] 
cache: Support filtering for a specific flowtable

Extend nft_cache_filter to hold a flowtable name so 'list flowtable'
command causes fetching the requested flowtable only.

Dump flowtables just once instead of for each table, merely assign
fetched data to tables inside the loop.

Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agocache: Filter set list on server side
Phil Sutter [Tue, 30 Nov 2021 15:57:54 +0000 (16:57 +0100)] 
cache: Filter set list on server side

Fetch either all tables' sets at once, a specific table's sets or even a
specific set if needed instead of iterating over the list of previously
fetched tables and fetching for each, then ignoring anything returned
that doesn't match the filter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agocache: Filter chain list on kernel side
Phil Sutter [Mon, 29 Nov 2021 15:26:44 +0000 (16:26 +0100)] 
cache: Filter chain list on kernel side

When operating on a specific chain, add payload to NFT_MSG_GETCHAIN so
kernel returns only relevant data. Since ENOENT is an expected return
code, do not treat this as error.

While being at it, improve code in chain_cache_cb() a bit:
- Check chain's family first, it is a less expensive check than
  comparing table names.
- Do not extract chain name of uninteresting chains.

Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agocache: Filter rule list on kernel side
Phil Sutter [Mon, 29 Nov 2021 14:36:45 +0000 (15:36 +0100)] 
cache: Filter rule list on kernel side

Instead of fetching all existing rules in kernel's ruleset and filtering
in user space, add payload to the dump request specifying the table and
chain to filter for.

Since list_rule_cb() no longer needs the filter, pass only netlink_ctx
to the callback and drop struct rule_cache_dump_ctx.

Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agocache: Filter tables on kernel side
Phil Sutter [Mon, 29 Nov 2021 14:28:33 +0000 (15:28 +0100)] 
cache: Filter tables on kernel side

Instead of requesting a dump of all tables and filtering the data in
user space, construct a non-dump request if filter contains a table so
kernel returns only that single table.

This should improve nft performance in rulesets with many tables
present.

Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agotests: py: add tcp subtype match test cases
Florian Westphal [Sun, 21 Nov 2021 22:33:22 +0000 (23:33 +0100)] 
tests: py: add tcp subtype match test cases

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agoexthdr: fix tcpopt_find_template to use length after mask adjustment
Florian Westphal [Sun, 21 Nov 2021 22:33:19 +0000 (23:33 +0100)] 
exthdr: fix tcpopt_find_template to use length after mask adjustment

Unify binop handling for ipv6 extension header, ip option and tcp option
processing.

Pass the real offset and length expected, not the one used in the kernel.
This was already done for extension headers and ip options, but tcp
option parsing did not do this.

This was fine before because no existing tcp option template
had a non-byte sized member.

With mptcp addition this isn't the case anymore, subtype field is
only 4 bits wide, but tcp option delinearization passed 8bits instead.

Pass the offset and mask delta, just like ip option/ipv6 exthdr.

This makes nft show 'tcp option mptcp subtype 1' instead of
'tcp option mptcp unknown & 240 == 16'.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agomptcp: add subtype matching
Florian Westphal [Sun, 21 Nov 2021 22:33:16 +0000 (23:33 +0100)] 
mptcp: add subtype matching

MPTCP multiplexes the various mptcp signalling data using the
first 4 bits of the mptcp option.

This allows to match on the mptcp subtype via:

   tcp option mptcp subtype 1

This misses delinearization support. mptcp subtype is the first tcp
option field that has a length of less than one byte.

Serialization processing will add a binop for this, but netlink
delinearization can't remove them, yet.

Also misses a new datatype/symbol table to allow to use mnemonics like
'mp_join' instead of raw numbers.

For this reason, no tests are added yet.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agotests: py: add test cases for md5sig, fastopen and mptcp mnemonics
Florian Westphal [Sun, 21 Nov 2021 22:33:14 +0000 (23:33 +0100)] 
tests: py: add test cases for md5sig, fastopen and mptcp mnemonics

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agotcpopt: add md5sig, fastopen and mptcp options
Florian Westphal [Sun, 21 Nov 2021 22:33:11 +0000 (23:33 +0100)] 
tcpopt: add md5sig, fastopen and mptcp options

Allow to use "fastopen", "md5sig" and "mptcp" mnemonics rather than the
raw option numbers.

These new keywords are only recognized while scanner is in tcp state.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agoparser: split tcp option rules
Florian Westphal [Sun, 21 Nov 2021 22:33:09 +0000 (23:33 +0100)] 
parser: split tcp option rules

At this time the parser will accept nonsensical input like

 tcp option mss left 2

which will be treated as 'tcp option maxseg size 2'.
This is because the enum space overlaps.

Split the rules so that 'tcp option mss' will only
accept field names specific to the mss/maxseg option kind.

Signed-off-by: Florian Westphal <fw@strlen.de>
(cherry picked from commit 46168852c03d73c29b557c93029dc512ca6e233a)

3 years agoscanner: add tcp flex scope
Florian Westphal [Sun, 21 Nov 2021 22:33:05 +0000 (23:33 +0100)] 
scanner: add tcp flex scope

This moves tcp options not used anywhere else (e.g. in synproxy) to a
distinct scope.  This will also allow to avoid exposing new option
keywords in the ruleset context.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agotcpopt: remove KIND keyword
Florian Westphal [Sun, 21 Nov 2021 22:32:57 +0000 (23:32 +0100)] 
tcpopt: remove KIND keyword

tcp option <foo> kind ... never makes any sense, as "tcp option <foo>"
already tells the kernel to look for the foo <kind>.

"tcp option sack kind 5" matches if the sack option is present; its a
more complicated form of the simpler "tcp option sack exists".

"tcp option sack kind 1" (or any other value than 5) will never match.

So remove this.

Test cases are converted to "exists".

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agonetlink_delinearize: binop: make accesses to expr->left/right conditional
Florian Westphal [Tue, 30 Nov 2021 19:19:44 +0000 (20:19 +0100)] 
netlink_delinearize: binop: make accesses to expr->left/right conditional

This function can be called for different expression types, including
some (EXPR_MAP) where expr->left/right alias to different member
variables.

This makes accesses to those members conditional by checking the
expression type ahead of the access.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agonetlink_delinearize: rename misleading variable
Florian Westphal [Tue, 30 Nov 2021 17:11:41 +0000 (18:11 +0100)] 
netlink_delinearize: rename misleading variable

relational_binop_postprocess() is called for EXPR_RELATIONAL,
so "expr->right" is safe to use.

But the RHS can be something other than a value.
This has been extended to handle other types, so rename to 'right'.

No code changes intended.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agonetlink_delinearize: use correct member type
Florian Westphal [Tue, 30 Nov 2021 16:53:22 +0000 (17:53 +0100)] 
netlink_delinearize: use correct member type

expr is a map, so this should use expr->map, not expr->left.
These fields are aliased, so this would break if that is ever changed.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agocli: save history on ctrl-d with editline
Pablo Neira Ayuso [Wed, 24 Nov 2021 22:15:19 +0000 (23:15 +0100)] 
cli: save history on ctrl-d with editline

Missing call to cli_exit() to save the history when ctrl-d is pressed in
nft -i.

Moreover, remove call to rl_callback_handler_remove() in cli_exit() for
editline cli since it does not call rl_callback_handler_install().

Fixes: bc2d5f79c2ea ("cli: use plain readline() interface with libedit")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetlink_delinearize: Fix for escaped asterisk strings on Big Endian
Phil Sutter [Wed, 10 Mar 2021 18:46:08 +0000 (19:46 +0100)] 
netlink_delinearize: Fix for escaped asterisk strings on Big Endian

The original nul-char detection was not functional on Big Endian.
Instead, go a simpler route by exporting the string and working on the
exported data to check for a nul-char and escape a trailing asterisk if
present. With the data export already happening in the caller, fold
escaped_string_wildcard_expr_alloc() into it as well.

Fixes: b851ba4731d9f ("src: add interface wildcard matching")
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agoct: Fix ct label value parser
Phil Sutter [Wed, 10 Mar 2021 15:56:11 +0000 (16:56 +0100)] 
ct: Fix ct label value parser

Size of array to export the bit value into was eight times too large, so
on Big Endian the data written into the data reg was always zero.

Fixes: 2fcce8b0677b3 ("ct: connlabel matching support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agodatatype: Fix size of time_type
Phil Sutter [Wed, 10 Mar 2021 13:38:37 +0000 (14:38 +0100)] 
datatype: Fix size of time_type

Used by 'ct expiration', time_type is supposed to be 32bits. Passing a
64bits variable to constant_expr_alloc() causes the value to be always
zero on Big Endian.

Fixes: 0974fa84f162a ("datatype: seperate time parsing/printing from time_type")
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agometa: Fix hour_type size
Phil Sutter [Wed, 10 Mar 2021 10:45:47 +0000 (11:45 +0100)] 
meta: Fix hour_type size

In kernel as well as when parsing, hour_type is assumed to be 32bits.
Having the struct datatype field set to 64bits breaks Big Endian and so
does passing a 64bit value and 32 as length to constant_expr_alloc() as
it makes it import the upper 32bits. Fix this by turning 'result' into a
uint32_t and introduce a temporary uint64_t just for the call to
time_parse() which expects that.

Fixes: f8f32deda31df ("meta: Introduce new conditions 'time', 'day' and 'hour'")
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agometa: Fix {g,u}id_type on Big Endian
Phil Sutter [Tue, 9 Mar 2021 20:24:30 +0000 (21:24 +0100)] 
meta: Fix {g,u}id_type on Big Endian

Using a 64bit variable to temporarily hold the parsed value works only
on Little Endian. uid_t and gid_t (and therefore also pw->pw_uid and
gr->gr_gid) are 32bit.
To fix this, use uid_t/gid_t for the temporary variable but keep the
64bit one for numeric parsing so values exceeding 32bits are still
detected.

Fixes: e0ed4c45d9ad2 ("meta: relax restriction on UID/GID parsing")
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agosrc: Fix payload statement mask on Big Endian
Phil Sutter [Thu, 17 Dec 2020 17:19:18 +0000 (18:19 +0100)] 
src: Fix payload statement mask on Big Endian

The mask used to select bits to keep must be exported in the same
byteorder as the payload statement itself, also the length of the
exported data must match the number of bytes extracted earlier.

Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agomnl: Fix for missing info in rule dumps
Phil Sutter [Thu, 17 Dec 2020 14:52:03 +0000 (15:52 +0100)] 
mnl: Fix for missing info in rule dumps

Commit 0e52cab1e64ab improved error reporting by adding rule's table and
chain names to netlink message directly, prefixed by their location
info. This in turn caused netlink dumps of the rule to not contain table
and chain name anymore. Fix this by inserting the missing info before
dumping and remove it afterwards to not cause duplicated entries in
netlink message.

Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agoexthdr: Fix for segfault with unknown exthdr
Phil Sutter [Wed, 17 Mar 2021 19:39:38 +0000 (20:39 +0100)] 
exthdr: Fix for segfault with unknown exthdr

Unknown exthdr type with NFT_EXTHDR_F_PRESENT flag set caused
NULL-pointer deref. Fix this by moving the conditional exthdr.desc deref
atop the function and use the result in all cases.

Fixes: e02bd59c4009b ("exthdr: Implement existence check")
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agotests/py: Avoid duplicate records in *.got files
Phil Sutter [Thu, 4 Feb 2021 14:58:25 +0000 (15:58 +0100)] 
tests/py: Avoid duplicate records in *.got files

If payloads don't contain family-specific bits, they may sit in a single
*.payload file for all tested families. In such case, nft-test.py will
consequently write dissenting payloads into a single *.got file. To
avoid the duplicate entries, check if a matching record exists already
before writing it out.

Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agoexthdr: fix type number saved in udata
Florian Westphal [Mon, 29 Nov 2021 23:50:53 +0000 (00:50 +0100)] 
exthdr: fix type number saved in udata

This should store the index of the protocol template, but
&x[i] - &x[0] is always i, so remove the divide.  Also add test case.

Fixes: 01fbc1574b9e ("exthdr: add parse and build userdata interface")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Phil Sutter <phil@nwl.cc>
3 years agocli: remove #include <editline/history.h>
Pablo Neira Ayuso [Mon, 22 Nov 2021 17:01:52 +0000 (18:01 +0100)] 
cli: remove #include <editline/history.h>

This header is not required to compile nftables with editline, remove
it, this unbreak compilation in several distros which have no symlink
from history.h to editline.h

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agomnl: different signedness compilation warning
Pablo Neira Ayuso [Fri, 19 Nov 2021 10:15:35 +0000 (11:15 +0100)] 
mnl: different signedness compilation warning

mnl.c: In function ‘mnl_batch_talk’:
mnl.c:417:17: warning: comparison of integer expressions of different signedness: ‘unsigned in’ and ‘long int’ [-Wsign-compare]
   if (rcvbufsiz < NFT_MNL_ECHO_RCVBUFF_DEFAULT)
                 ^

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: do not skip populating anonymous set with -t
Pablo Neira Ayuso [Thu, 18 Nov 2021 16:25:36 +0000 (17:25 +0100)] 
cache: do not skip populating anonymous set with -t

--terse does not apply to anonymous set, add a NFT_CACHE_TERSE bit
to skip named sets only.

Moreover, prioritize specific listing filter over --terse to avoid a
bogus:

  netlink: Error: Unknown set '__set0' in lookup expression

when invoking:

  # nft -ta list set inet filter example

Extend existing test to improve coverage.

Fixes: 9628d52e46ac ("cache: disable NFT_CACHE_SETELEM_BIT on --terse listing only")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agobuild: Bump version to 1.0.1 v1.0.1
Pablo Neira Ayuso [Thu, 18 Nov 2021 10:55:30 +0000 (11:55 +0100)] 
build: Bump version to 1.0.1

Requires libnftnl 1.2.1

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agomonitor: do not call interval_map_decompose() for concat intervals
Florian Westphal [Wed, 17 Nov 2021 13:26:21 +0000 (14:26 +0100)] 
monitor: do not call interval_map_decompose() for concat intervals

Without this, nft monitor will either print garbage or even segfault
when encountering a concat set because we pass expr->value to libgmp
helpers for concat (non-value) expressions.

Also, for concat case, we need to call concat_range_aggregate() helper.
Add a test case for this.  Without this patch, it gives:

tests/monitor/run-tests.sh: line 98: 1163 Segmentation fault
(core dumped) $nft -nn -e -f $command_file > $echo_output

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agoparser_json: add raw payload inner header match support
Pablo Neira Ayuso [Wed, 17 Nov 2021 10:10:06 +0000 (11:10 +0100)] 
parser_json: add raw payload inner header match support

Add missing "ih" base raw payload and extend tests/py to cover this new
usecase.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoparser: allow for string raw payload base
Pablo Neira Ayuso [Tue, 16 Nov 2021 11:08:15 +0000 (12:08 +0100)] 
parser: allow for string raw payload base

Remove new 'ih' token, allow to represent the raw payload base with a
string instead.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agotests: py: remove netdev coverage in ip/ip_tcp.t
Pablo Neira Ayuso [Fri, 12 Nov 2021 11:44:44 +0000 (12:44 +0100)] 
tests: py: remove netdev coverage in ip/ip_tcp.t

The following tests shows a warning in the netdev family:

ip/ip_tcp.t: WARNING: line 9: 'add rule netdev test-netdev ingress ip protocol tcp tcp dport 22': 'tcp dport 22' mismatches 'ip protocol 6 tcp dport 22'

'ip protocol tcp' can be removed in the ip family, but not in netdev.

This test is specific of the ip family, remove the netdev lines.

Fixes: 510c4fad7e78 ("src: Support netdev egress hook")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agotests: py: missing json output update in ip6/meta.t
Pablo Neira Ayuso [Fri, 12 Nov 2021 11:23:33 +0000 (12:23 +0100)] 
tests: py: missing json output update in ip6/meta.t

Update json output for 'meta protocol ip6 udp dport 67'.

Fixes: 646c5d02a5db ("rule: remove redundant meta protocol from the evaluation step")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agotests: py: missing ip/snat.t json updates
Pablo Neira Ayuso [Fri, 12 Nov 2021 11:19:37 +0000 (12:19 +0100)] 
tests: py: missing ip/snat.t json updates

Missing json update for new tests added recently.

Fixes: 50780456a01a ("evaluate: check for missing transport protocol match in nat map with concatenations")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agotests: py: missing ip/dnat.t json updates
Pablo Neira Ayuso [Fri, 12 Nov 2021 10:07:55 +0000 (11:07 +0100)] 
tests: py: missing ip/dnat.t json updates

Missing json update for three new tests added recently.

Fixes: 640dc0c8a3da ("tests: py: extend coverage for dnat with classic range representation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: filter out rules by chain
Pablo Neira Ayuso [Wed, 10 Nov 2021 17:08:41 +0000 (18:08 +0100)] 
cache: filter out rules by chain

With an autogenerated ruleset with ~20k chains.

 # time nft list ruleset &> /dev/null

 real    0m1,712s
 user    0m1,258s
 sys     0m0,454s

Speed up listing of a specific chain:

 # time nft list chain nat MWDG-UGR-234PNG3YBUOTS5QD &> /dev/null

 real    0m0,542s
 user    0m0,251s
 sys     0m0,292s

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: missing family in cache filtering
Pablo Neira Ayuso [Tue, 9 Nov 2021 11:15:44 +0000 (12:15 +0100)] 
cache: missing family in cache filtering

Check family when filtering out listing of tables and sets.

Fixes: 3f1d3912c3a6 ("cache: filter out tables that are not requested")
Fixes: 635ee1cad8aa ("cache: filter out sets and maps that are not requested")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: do not populate cache if it is going to be flushed
Pablo Neira Ayuso [Tue, 9 Nov 2021 09:44:46 +0000 (10:44 +0100)] 
cache: do not populate cache if it is going to be flushed

Skip set element netlink dump if set is flushed, this speeds up
set flush + add element operation in a batch file for an existing set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: move list filter under struct
Pablo Neira Ayuso [Tue, 9 Nov 2021 09:35:05 +0000 (10:35 +0100)] 
cache: move list filter under struct

Wrap the table and set fields for list filtering to prepare for the
introduction element filters.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agodoc: update ct timeout section with the state names
Florian Westphal [Thu, 28 Oct 2021 15:36:06 +0000 (17:36 +0200)] 
doc: update ct timeout section with the state names

docs are too terse and did not have the list of valid timeout states.
While at it, adjust default stream timeout of udp to 120, this is the
current kernel default.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agotests: py: update rawpayload.t.json
Pablo Neira Ayuso [Fri, 5 Nov 2021 15:47:57 +0000 (16:47 +0100)] 
tests: py: update rawpayload.t.json

Missing update of json test.

Fixes: 6ad2058da66a ("datatype: add xinteger_type alias to print in hexadecimal")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoevaluate: grab reference in set expression evaluation
Pablo Neira Ayuso [Fri, 5 Nov 2021 14:55:20 +0000 (15:55 +0100)] 
evaluate: grab reference in set expression evaluation

Do not clone expression when evaluation a set expression, grabbing the
reference counter to reuse the object is sufficient.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoevaluate: clone variable expression if there is more than one reference
Pablo Neira Ayuso [Fri, 5 Nov 2021 13:43:17 +0000 (14:43 +0100)] 
evaluate: clone variable expression if there is more than one reference

Clone the expression that defines the variable value if there are
multiple references to it in the ruleset. This saves heap memory
consumption in case the variable defines a set with a huge number of
elements.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agomnl: do not build nftnl_set element list
Pablo Neira Ayuso [Thu, 4 Nov 2021 11:53:11 +0000 (12:53 +0100)] 
mnl: do not build nftnl_set element list

Do not call alloc_setelem_cache() to build the set element list in
nftnl_set. Instead, translate one single set element expression to
nftnl_set_elem object at a time and use this object to build the netlink
header.

Using a huge test set containing 1.1 million element blocklist, this
patch is reducing userspace memory consumption by 40%.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agotests: py: remove verdict from closing end interval
Pablo Neira Ayuso [Thu, 4 Nov 2021 19:28:51 +0000 (20:28 +0100)] 
tests: py: remove verdict from closing end interval

Kernel does not allow for NFT_SET_ELEM_INTERVAL_END flag and
NFTA_SET_ELEM_DATA. The closing end interval represents a mismatch,
therefore, no verdict can be applied. The existing payload files show
the drop verdict when this is unset (because NF_DROP=0).

This update is required to fix payload warnings in tests/py after
libnftnl's ("set: use NFTNL_SET_ELEM_VERDICT to print verdict").

Fixes: 6671d9d137f6 ("mnl: Set NFTNL_SET_DATA_TYPE before dumping set elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agosrc: raw payload match and mangle on inner header / payload data
Pablo Neira Ayuso [Tue, 2 Nov 2021 13:01:58 +0000 (14:01 +0100)] 
src: raw payload match and mangle on inner header / payload data

This patch adds support to match on inner header / payload data:

 # nft add rule x y @ih,32,32 0x14000000 counter

you can also mangle payload data:

 # nft add rule x y @ih,32,32 set 0x14000000 counter

This update triggers a checksum update at the layer 4 header via
csum_flags, mangling odd bytes is also aligned to 16-bits.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agotests: shell: $NFT needs to be invoked unquoted
Štěpán Němec [Fri, 5 Nov 2021 11:39:11 +0000 (12:39 +0100)] 
tests: shell: $NFT needs to be invoked unquoted

The variable has to undergo word splitting, otherwise the shell tries
to find the variable value as an executable, which breaks in cases that
7c8a44b25c22 ("tests: shell: Allow wrappers to be passed as nft command")
intends to support.

Mention this in the shell tests README.

Fixes: d8ccad2a2b73 ("tests: cover baecd1cf2685 ("segtree: Fix segfault when restoring a huge interval set")")
Signed-off-by: Štěpán Němec <snemec@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agotests: shell: README: clarify test file name convention
Štěpán Němec [Fri, 5 Nov 2021 11:39:10 +0000 (12:39 +0100)] 
tests: shell: README: clarify test file name convention

Since commit 4d26b6dd3c4c, test file name suffix no longer reflects
expected exit code in all cases.

Move the sentence "Since they are located with `find', test files can
be put in any subdirectory." to a separate paragraph.

Fixes: 4d26b6dd3c4c ("tests: shell: change all test scripts to return 0")
Signed-off-by: Štěpán Němec <snemec@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agotests: shell: README: $NFT does not have to be a path to a binary
Štěpán Němec [Fri, 5 Nov 2021 11:39:09 +0000 (12:39 +0100)] 
tests: shell: README: $NFT does not have to be a path to a binary

Since commit 7c8a44b25c22, $NFT can contain an arbitrary command,
e.g. 'valgrind nft'.

Fixes: 7c8a44b25c22 ("tests: shell: Allow wrappers to be passed as nft command")
Signed-off-by: Štěpán Němec <snemec@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agotests: shell: README: copy edit
Štěpán Němec [Fri, 5 Nov 2021 11:39:08 +0000 (12:39 +0100)] 
tests: shell: README: copy edit

Grammar, wording, formatting fixes (no substantial change of meaning).

Signed-off-by: Štěpán Němec <snemec@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agodatatype: add xinteger_type alias to print in hexadecimal
Pablo Neira Ayuso [Tue, 2 Nov 2021 13:07:04 +0000 (14:07 +0100)] 
datatype: add xinteger_type alias to print in hexadecimal

Add an alias of the integer type to print raw payload expressions in
hexadecimal.

Update tests/py.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoevaluate: postpone transport protocol match check after nat expression evaluation
Pablo Neira Ayuso [Tue, 2 Nov 2021 10:31:40 +0000 (11:31 +0100)] 
evaluate: postpone transport protocol match check after nat expression evaluation

Fix bogus error report when using transport protocol as map key.

Fixes: 50780456a01a ("evaluate: check for missing transport protocol match in nat map with concatenations")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoparser: extend limit syntax
Jeremy Sowden [Fri, 29 Oct 2021 20:40:09 +0000 (21:40 +0100)] 
parser: extend limit syntax

The documentation describes the syntax of limit statements thus:

  limit rate [over] packet_number / TIME_UNIT [burst packet_number packets]
  limit rate [over] byte_number BYTE_UNIT / TIME_UNIT [burst byte_number BYTE_UNIT]

  TIME_UNIT := second | minute | hour | day
  BYTE_UNIT := bytes | kbytes | mbytes

From this one might infer that a limit may be specified by any of the
following:

  limit rate 1048576/second
  limit rate 1048576 mbytes/second

  limit rate 1048576 / second
  limit rate 1048576 mbytes / second

However, the last does not currently parse:

  $ sudo /usr/sbin/nft add filter input limit rate 1048576 mbytes / second
  Error: wrong rate format
  add filter input limit rate 1048576 mbytes / second
                   ^^^^^^^^^^^^^^^^^^^^^^^^^

Extend the `limit_rate_bytes` parser rule to support it, and add some
new Python test-cases.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoparser: add `limit_rate_pkts` and `limit_rate_bytes` rules
Jeremy Sowden [Fri, 29 Oct 2021 20:40:08 +0000 (21:40 +0100)] 
parser: add `limit_rate_pkts` and `limit_rate_bytes` rules

Factor the `N / time-unit` and `N byte-unit / time-unit` expressions
from limit expressions out into separate `limit_rate_pkts` and
`limit_rate_bytes` rules respectively.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoparser: add new `limit_bytes` rule
Jeremy Sowden [Fri, 29 Oct 2021 20:40:07 +0000 (21:40 +0100)] 
parser: add new `limit_bytes` rule

Refactor the `N byte-unit` expression out of the `limit_bytes_burst`
rule into a separate `limit_bytes` rule.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agotests: run-tests.sh: ensure non-zero exit when $failed != 0
Štěpán Němec [Wed, 20 Oct 2021 12:44:09 +0000 (14:44 +0200)] 
tests: run-tests.sh: ensure non-zero exit when $failed != 0

POSIX [1] does not specify the behavior of `exit' with arguments
outside the 0-255 range, but what generally (bash, dash, zsh, OpenBSD
ksh, busybox) seems to happen is the shell exiting with status & 255
[2], which results in zero exit for certain non-zero arguments.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#exit
[2] https://git.savannah.gnu.org/cgit/bash.git/tree/builtins/common.c#n579

Fixes: 0c6592420586 ("tests: fix return codes")
Signed-off-by: Štěpán Němec <snemec@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agotests: shell: Fix bogus testsuite failure with 250Hz
Phil Sutter [Tue, 2 Nov 2021 19:53:53 +0000 (20:53 +0100)] 
tests: shell: Fix bogus testsuite failure with 250Hz

Previous fix for HZ=100 was not sufficient, a kernel with HZ=250 rounds
the 10ms to 8ms it seems. Do as Lukas suggests and accept the occasional
input/output asymmetry instead of continuing the hide'n'seek game.

Fixes: c9c5b5f621c37 ("tests: shell: Fix bogus testsuite failure with 100Hz")
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agosrc: Support netdev egress hook
Lukas Wunner [Wed, 11 Mar 2020 12:20:06 +0000 (13:20 +0100)] 
src: Support netdev egress hook

Add userspace support for the netdev egress hook which is queued up for
v5.16-rc1, complete with documentation and tests.  Usage is identical to
the ingress hook.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agotests: py: Move netdev-specific tests to appropriate subdirectory
Lukas Wunner [Sun, 24 Oct 2021 07:37:35 +0000 (09:37 +0200)] 
tests: py: Move netdev-specific tests to appropriate subdirectory

The fwd and dup statements are specific to netdev hooks, so move their
tests to the appropriate subdirectory.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agotests: shell: add testcase for --terse
Pablo Neira Ayuso [Wed, 27 Oct 2021 23:50:41 +0000 (01:50 +0200)] 
tests: shell: add testcase for --terse

Compare listing with and without --terse for:

 nft list ruleset
 nft list set x y

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: disable NFT_CACHE_SETELEM_BIT on --terse listing only
Pablo Neira Ayuso [Wed, 27 Oct 2021 23:14:30 +0000 (01:14 +0200)] 
cache: disable NFT_CACHE_SETELEM_BIT on --terse listing only

Instead of NFT_CACHE_SETELEM which also disables set dump.

Fixes: 6bcd0d576a60 ("cache: unset NFT_CACHE_SETELEM with --terse listing")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: ensure evaluate_cache_list flags are set correctly
Chris Arges [Tue, 26 Oct 2021 22:09:28 +0000 (00:09 +0200)] 
cache: ensure evaluate_cache_list flags are set correctly

This change ensures that when listing rulesets with the terse flag that the
terse flag is maintained.

Fixes: 6bcd0d576a60 ("cache: unset NFT_CACHE_SETELEM with --terse listing")
Signed-off-by: Chris Arges <carges@cloudflare.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: honor table in set filtering
Pablo Neira Ayuso [Mon, 25 Oct 2021 21:46:36 +0000 (23:46 +0200)] 
cache: honor table in set filtering

Check if table mismatch, in case the same set name is used in different
tables.

Fixes: 635ee1cad8aa ("cache: filter out sets and maps that are not requested")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: honor filter in set listing commands
Pablo Neira Ayuso [Mon, 25 Oct 2021 21:34:07 +0000 (23:34 +0200)] 
cache: honor filter in set listing commands

Fetch table, set and set elements only for set listing commands, e.g.
nft list set inet filter ipv4_bogons.

Fixes: 635ee1cad8aa ("cache: filter out sets and maps that are not requested")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: always set on NFT_CACHE_REFRESH for listing
Pablo Neira Ayuso [Mon, 25 Oct 2021 21:32:34 +0000 (23:32 +0200)] 
cache: always set on NFT_CACHE_REFRESH for listing

This flag forces a refresh of the cache on list commands, several
object types are missing this flag, this fixes nft --interactive
mode.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoconfigure: default to libedit for cli
Pablo Neira Ayuso [Mon, 25 Oct 2021 20:46:13 +0000 (22:46 +0200)] 
configure: default to libedit for cli

readline support only compiles for libreadline5, set libedit as default
library.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agotests: cover baecd1cf2685 ("segtree: Fix segfault when restoring a huge interval...
Štěpán Němec [Wed, 20 Oct 2021 12:42:20 +0000 (14:42 +0200)] 
tests: cover baecd1cf2685 ("segtree: Fix segfault when restoring a huge interval set")

Test inspired by [1] with both the set and stack size reduced by the
same power of 2, to preserve the (pre-baecd1cf2685) segfault on one
hand, and make the test successfully complete (post-baecd1cf2685) in a
few seconds even on weaker hardware on the other.

(The reason I stopped at 128kB stack size is that with 64kB I was
getting segfaults even with baecd1cf2685 applied.)

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1908127

Signed-off-by: Štěpán Němec <snemec@redhat.com>
Helped-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agomain: _exit() if setuid
Florian Westphal [Sat, 16 Oct 2021 22:56:23 +0000 (00:56 +0200)] 
main: _exit() if setuid

Apparently some people think its a good idea to make nft setuid so
unrivilged users can change settings.

"nft -f /etc/shadow" is just one example of why this is a bad idea.
Disable this.  Do not print anything, fd cannot be trusted.

This change intentionally doesn't affect libnftables, on the off-chance
that somebody creates an suid program and knows what they're doing.

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agotests: shell: auto-removal of chain hook on netns removal
Florian Westphal [Tue, 19 Oct 2021 12:07:25 +0000 (14:07 +0200)] 
tests: shell: auto-removal of chain hook on netns removal

This is the nft equivalent of the syzbot report that lead to
kernel commit 68a3765c659f8
("netfilter: nf_tables: skip netdev events generated on netns removal").

Signed-off-by: Florian Westphal <fw@strlen.de>
3 years agorule: replace three conditionals with one
Jeremy Sowden [Thu, 7 Oct 2021 20:12:22 +0000 (21:12 +0100)] 
rule: replace three conditionals with one

When outputting set definitions, merge three consecutive
`if (!list_empty(&set->stmt_list))` conditionals.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agorule: fix stateless output after listing sets containing counters
Jeremy Sowden [Thu, 7 Oct 2021 20:12:21 +0000 (21:12 +0100)] 
rule: fix stateless output after listing sets containing counters

Before outputting counters in set definitions the
`NFT_CTX_OUTPUT_STATELESS` flag was set to suppress output of the
counter state and unconditionally cleared afterwards, regardless of
whether it had been originally set.  Record the original set of flags
and restore it.

Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994273
Fixes: 6d80e0f15492 ("src: support for counter in set definition")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agorule: remove fake stateless output of named counters
Jeremy Sowden [Thu, 7 Oct 2021 20:12:20 +0000 (21:12 +0100)] 
rule: remove fake stateless output of named counters

When `-s` is passed, no state is output for named quotas and counter and
quota rules, but fake zero state is output for named counters.  Remove
the output of named counters to match the remaining stateful objects.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agodoc: libnftables-json: make the example valid libnftables JSON input
Štěpán Němec [Mon, 11 Oct 2021 11:59:04 +0000 (13:59 +0200)] 
doc: libnftables-json: make the example valid libnftables JSON input

- Add missing comma between array elements.
- Fix chain 'name' property.
- Match 'op' property is mandatory.

Fixes: 2e56f533b36a ("doc: Improve example in libnftables-json(5)")
Fixes: 90d4ee087171 ("JSON: Make match op mandatory, introduce 'in' operator")
Signed-off-by: Štěpán Němec <snemec@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
3 years agocache: unset NFT_CACHE_SETELEM with --terse listing
Pablo Neira Ayuso [Sat, 2 Oct 2021 11:49:53 +0000 (13:49 +0200)] 
cache: unset NFT_CACHE_SETELEM with --terse listing

Skip populating the set element cache in this case to speed up listing.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: filter out sets and maps that are not requested
Pablo Neira Ayuso [Wed, 29 Sep 2021 16:01:47 +0000 (18:01 +0200)] 
cache: filter out sets and maps that are not requested

Do not fetch set content for list commands that specify a
set name.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: filter out tables that are not requested
Pablo Neira Ayuso [Wed, 29 Sep 2021 11:09:03 +0000 (13:09 +0200)] 
cache: filter out tables that are not requested

Do not fetch table content for list commands that specify a
table name, e.g.

 # nft list table filter

This speeds up listing of a given table by not populating the
cache with tables that are not needed.

 - Full ruleset (huge with ~100k lines).

 # sudo nft list ruleset &> /dev/null
 real    0m3,049s
 user    0m2,080s
 sys     0m0,968s

- Listing per table is now faster:

 # nft list table nat &> /dev/null
 real    0m1,969s
 user    0m1,412s
 sys     0m0,556s

 # nft list table filter &> /dev/null
 real    0m0,697s
 user    0m0,478s
 sys     0m0,220s

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1326
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: finer grain cache population for list commands
Pablo Neira Ayuso [Wed, 29 Sep 2021 09:57:41 +0000 (11:57 +0200)] 
cache: finer grain cache population for list commands

Skip full cache population for list commands to speed up listing.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agocache: set on cache flags for nested notation
Pablo Neira Ayuso [Wed, 29 Sep 2021 08:55:19 +0000 (10:55 +0200)] 
cache: set on cache flags for nested notation

Set on the cache flags for the nested notation too, this is fixing nft -f
with two files, one that contains the set declaration and another that
adds a rule that refers to such set.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1474
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoevaluate: check for missing transport protocol match in nat map with concatenations
Pablo Neira Ayuso [Tue, 28 Sep 2021 20:34:10 +0000 (22:34 +0200)] 
evaluate: check for missing transport protocol match in nat map with concatenations

Restore this error with NAT maps:

 # nft add rule 'ip ipfoo c dnat to ip daddr map @y'
 Error: transport protocol mapping is only valid after transport protocol match
 add rule ip ipfoo c dnat to ip daddr map @y
                     ~~~~    ^^^^^^^^^^^^^^^

Allow for transport protocol match in the map too, which is implicitly
pulling in a transport protocol dependency.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoevaluate: check for concatenation in set data datatype
Pablo Neira Ayuso [Tue, 28 Sep 2021 12:09:54 +0000 (14:09 +0200)] 
evaluate: check for concatenation in set data datatype

When adding this rule with an existing map:

  add rule nat x y meta l4proto { tcp, udp } dnat ip to ip daddr . th dport map @fwdtoip_th

reports a bogus:

Error: datatype mismatch: expected IPv4 address, expression has type
concatenation of (IPv4 address, internet network service)

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agomonitor: honor NLM_F_EXCL netlink flag
Pablo Neira Ayuso [Sun, 26 Sep 2021 10:27:45 +0000 (12:27 +0200)] 
monitor: honor NLM_F_EXCL netlink flag

This allow to report for the create command.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agotests: monitor: update insert and replace commands
Pablo Neira Ayuso [Fri, 24 Sep 2021 23:34:36 +0000 (01:34 +0200)] 
tests: monitor: update insert and replace commands

Adjust test after these two kernel fixes:

("netfilter: nf_tables: reverse order in rule replacement expansion")
("netfilter: nf_tables: add position handle in event notification")

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agomonitor: honor NLM_F_APPEND flag for rules
Pablo Neira Ayuso [Mon, 20 Sep 2021 21:39:17 +0000 (23:39 +0200)] 
monitor: honor NLM_F_APPEND flag for rules

Print 'add' or 'insert' according to this netlink flag.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agomonitor: display rule position handle
Pablo Neira Ayuso [Mon, 20 Sep 2021 16:52:18 +0000 (18:52 +0200)] 
monitor: display rule position handle

This allow to locate the incremental update in the ruleset.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetlink: dynset: set compound expr dtype based on set key definition
Florian Westphal [Tue, 28 Sep 2021 19:34:30 +0000 (21:34 +0200)] 
netlink: dynset: set compound expr dtype based on set key definition

"nft add rule ... add @t { ip saddr . 22 ..." will be listed as
'ip saddr . 0x16  [ invalid type]".

This is a display bug, the compound expression created during netlink
deserialization lacks correct datatypes for the value expression.

Avoid this by setting the individual expressions' datatype.
The set key has the needed information, so walk over the types and set
them in the dynset statment.

Also add a test case.

Reported-by: Paulo Ricardo Bruck <paulobruck1@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>