]> git.ipfire.org Git - thirdparty/nftables.git/log
thirdparty/nftables.git
20 months agobuild: Bump version to 1.0.5 v1.0.5
Pablo Neira Ayuso [Tue, 9 Aug 2022 18:44:51 +0000 (20:44 +0200)] 
build: Bump version to 1.0.5

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
20 months agotests/py: disable arp family for queue statement
Pablo Neira Ayuso [Tue, 9 Aug 2022 08:55:15 +0000 (10:55 +0200)] 
tests/py: disable arp family for queue statement

Kernel commit:

  commit 47f4f510ad586032b85c89a0773fbb011d412425
  Author: Florian Westphal <fw@strlen.de>
  Date:   Tue Jul 26 19:49:00 2022 +0200

    netfilter: nft_queue: only allow supported familes and hooks

restricts supported families, excluding arp.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
20 months agometa: don't use non-POSIX formats in strptime()
Jo-Philipp Wich [Mon, 8 Aug 2022 22:18:42 +0000 (00:18 +0200)] 
meta: don't use non-POSIX formats in strptime()

The current strptime() invocations in meta.c use the `%F` format which
is not specified by POSIX and thus unimplemented by some libc flavors
such as musl libc.

Replace all occurrences of `%F` with an equivalent `%Y-%m-%d` format
in order to be able to properly parse user supplied dates in such
environments.

Signed-off-by: Jo-Philipp Wich <jo@mein.io>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
21 months agosrc: allow anon set concatenation with ether and vlan
Florian Westphal [Mon, 25 Jul 2022 19:34:52 +0000 (21:34 +0200)] 
src: allow anon set concatenation with ether and vlan

vlan id uses integer type (which has a length of 0).

Using it was possible, but listing would assert:
python: mergesort.c:24: concat_expr_msort_value: Assertion `ilen > 0' failed.

There are two reasons for this.
First reason is that the udata/typeof information lacks the 'vlan id'
part, because internally this is 'payload . binop(payload AND mask)'.

binop lacks an udata store.  It makes little sense to store it,
'typeof' keyword expects normal match syntax.

So, when storing udata, store the left hand side of the binary
operation, i.e. the load of the 2-byte key.

With that resolved, delinerization could work, but concat_elem_expr()
would splice 12 bits off the elements value, but it should be 16 (on
a byte boundary).

Signed-off-by: Florian Westphal <fw@strlen.de>
21 months agoevaluate: search stacked header list for matching payload dep
Florian Westphal [Mon, 25 Jul 2022 18:02:28 +0000 (20:02 +0200)] 
evaluate: search stacked header list for matching payload dep

"ether saddr 0:1:2:3:4:6 vlan id 2" works, but reverse fails:

"vlan id 2 ether saddr 0:1:2:3:4:6" will give
Error: conflicting protocols specified: vlan vs. ether

After "proto: track full stack of seen l2 protocols, not just cumulative offset",
we have a list of all l2 headers, so search those to see if we had this
proto base in the past before rejecting this.

Reported-by: Eric Garver <eric@garver.life>
Signed-off-by: Florian Westphal <fw@strlen.de>
21 months agonetlink_delinearize: also postprocess OP_AND in set element context
Florian Westphal [Mon, 1 Aug 2022 11:03:18 +0000 (13:03 +0200)] 
netlink_delinearize: also postprocess OP_AND in set element context

Pablo reports:
add rule netdev nt y update @macset { vlan id timeout 5s }

listing still shows the raw expression:
 update @macset { @ll,112,16 & 0xfff timeout 5s }

so also cover the 'set element' case.

Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
21 months agotests: add a test case for ether and vlan listing
Florian Westphal [Mon, 25 Jul 2022 17:31:22 +0000 (19:31 +0200)] 
tests: add a test case for ether and vlan listing

before this patch series, test fails dump validation:
-               update @macset { ether saddr . vlan id timeout 5s } counter packets 0 bytes 0
-               ether saddr . vlan id @macset
+               update @macset { @ll,48,48 . @ll,112,16 & 0xfff timeout 5s } counter packets 0 bytes 0
+               @ll,48,48 . @ll,112,16 & 0xfff @macset

Signed-off-by: Florian Westphal <fw@strlen.de>
21 months agodebug: dump the l2 protocol stack
Florian Westphal [Mon, 25 Jul 2022 14:42:23 +0000 (16:42 +0200)] 
debug: dump the l2 protocol stack

Previously we used to print the cumulative size of the headers,
update this to print the tracked l2 stack.

Signed-off-by: Florian Westphal <fw@strlen.de>
21 months agoproto: track full stack of seen l2 protocols, not just cumulative offset
Florian Westphal [Mon, 25 Jul 2022 12:32:13 +0000 (14:32 +0200)] 
proto: track full stack of seen l2 protocols, not just cumulative offset

For input, a cumulative size counter of all pushed l2 headers is enough,
because we have the full expression tree available to us.

For delinearization we need to track all seen l2 headers, else we lose
information that we might need at a later time.

Consider:

rule netdev nt nc set update ether saddr . vlan id

during delinearization, the vlan proto_desc replaces the ethernet one,
and by the time we try to split the concatenation apart we will search
the ether saddr offset vs. the templates for proto_vlan.

This replaces the offset with an array that stores the protocol
descriptions seen.

Then, if the payload offset is larger than our description, search the
l2 stack and adjust the offset until we're within the expected offset
boundary.

Reported-by: Eric Garver <eric@garver.life>
Signed-off-by: Florian Westphal <fw@strlen.de>
21 months agonetlink_delinearize: postprocess binary ands in concatenations
Florian Westphal [Tue, 14 Jun 2022 19:56:48 +0000 (21:56 +0200)] 
netlink_delinearize: postprocess binary ands in concatenations

Input:
update ether saddr . vlan id timeout 5s @macset
ether saddr . vlan id @macset

Before this patch, gets rendered as:
update @macset { @ll,48,48 . @ll,112,16 & 0xfff timeout 5s }
@ll,48,48 . @ll,112,16 & 0xfff @macset

After this, listing will show:
update @macset { @ll,48,48 . vlan id timeout 5s }
@ll,48,48 . vlan id @macset

The @ll, ... is due to vlan description replacing the ethernet one,
so payload decode fails to take the concatenation apart (the ethernet
header payload info is matched vs. vlan template).

This will be adjusted by a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
21 months agonetlink_delinearize: allow postprocessing on concatenated elements
Florian Westphal [Tue, 14 Jun 2022 19:57:58 +0000 (21:57 +0200)] 
netlink_delinearize: allow postprocessing on concatenated elements

Currently there is no case where the individual expressions inside a
mapped concatenation need to be munged.

However, to support proper delinearization for an input like
'rule netdev nt nc set update ether saddr . vlan id timeout 5s @macset'

we need to allow this.

Right now, this gets listed as:

update @macset { @ll,48,48 . @ll,112,16 & 0xfff timeout 5s }

because the ethernet protocol is replaced by vlan beforehand,
so we fail to map @ll,48,48 to a vlan protocol.

Likewise, we can't map the vlan info either because we cannot
cope with the 'and' operation properly, nor is it removed.

Prepare for this by deleting and re-adding so that we do not
corrupt the linked list.

After this, the list can be safely changed and a followup patch
can start to delete/reallocate expressions.

Signed-off-by: Florian Westphal <fw@strlen.de>
21 months agoparser_json: fix device parsing in netdev family
Pablo Neira Ayuso [Mon, 1 Aug 2022 14:15:08 +0000 (16:15 +0200)] 
parser_json: fix device parsing in netdev family

json_unpack() function is not designed to take a pre-allocated buffer.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1612
Fixes: 3fdc7541fba0 ("src: add multidevice support for netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
21 months agosrc: proto: support DF, LE PHB, VA for DSCP
Oleksandr Natalenko [Mon, 11 Jul 2022 10:47:09 +0000 (12:47 +0200)] 
src: proto: support DF, LE PHB, VA for DSCP

Add a couple of aliases for well-known DSCP values.

As per RFC 4594, add "df" as an alias of "cs0" with 0x00 value.

As per RFC 5865, add "va" for VOICE-ADMIT with 0x2c value.

As per RFC 8622, add "lephb" for Lower-Effort Per-Hop Behavior with 0x01 value.

tc-cake(8) in diffserv8 mode would benefit from having "lephb" defined since
it corresponds to "Tin 0".

https://www.iana.org/assignments/dscp-registry/dscp-registry.xhtml

Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
21 months agodoc: Document limitations of ipsec expression with xfrm_interface
Phil Sutter [Thu, 23 Jun 2022 15:49:20 +0000 (17:49 +0200)] 
doc: Document limitations of ipsec expression with xfrm_interface

Point at a possible solution to match IPsec info of locally generated
traffic routed to an xfrm-type interface.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
21 months agocache: report an error message if cache initialization fails
Pablo Neira Ayuso [Mon, 18 Jul 2022 15:17:37 +0000 (17:17 +0200)] 
cache: report an error message if cache initialization fails

cache initialization failure (which should not ever happen) is not
reported to the user.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
21 months agocache: validate handle string length
Pablo Neira Ayuso [Mon, 18 Jul 2022 14:18:33 +0000 (16:18 +0200)] 
cache: validate handle string length

Maximum supported string length for handle is NFT_NAME_MAXLEN, report an
error if user is exceeding this limit.

By validating from the cache evaluation phase, input is validated for the
native and json parsers.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
21 months agocache: prepare nft_cache_evaluate() to return error
Pablo Neira Ayuso [Mon, 18 Jul 2022 13:56:00 +0000 (15:56 +0200)] 
cache: prepare nft_cache_evaluate() to return error

Move flags as parameter reference and add list of error messages to prepare
for sanity checks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agorule: crash when uncollapsing command with unexisting table or set
Pablo Neira Ayuso [Thu, 7 Jul 2022 13:11:35 +0000 (15:11 +0200)] 
rule: crash when uncollapsing command with unexisting table or set

If ruleset update refers to an unexisting table or set, then
cmd->elem.set is NULL.

Fixes: 498a5f0c219d ("rule: collapse set element commands")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agocache: release pending rules when chain binding lookup fails
Pablo Neira Ayuso [Wed, 6 Jul 2022 11:21:34 +0000 (13:21 +0200)] 
cache: release pending rules when chain binding lookup fails

If the implicit chain is not in the cache, release pending rules in
ctx->list and report EINTR to let the cache core retry to populate a
consistent cache.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1402
Fixes: c330152b7f77 ("src: support for implicit chain bindings")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agoevaluate: report missing interval flag when using prefix/range in concatenation
Pablo Neira Ayuso [Wed, 29 Jun 2022 16:40:00 +0000 (18:40 +0200)] 
evaluate: report missing interval flag when using prefix/range in concatenation

If set declaration is missing the interval flag, and user specifies an
element with either prefix or range, then bail out.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1592
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agoscanner: allow prefix in ip6 scope
Florian Westphal [Wed, 6 Jul 2022 21:49:21 +0000 (23:49 +0200)] 
scanner: allow prefix in ip6 scope

'ip6 prefix' is valid syntax, so make sure scanner recognizes it
also in ip6 context.

Also add test case.

Fixes: a67fce7ffe7e ("scanner: nat: Move to own scope")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1619
Signed-off-by: Florian Westphal <fw@strlen.de>
22 months agosegtree: fix map listing with interface wildcard
Pablo Neira Ayuso [Mon, 27 Jun 2022 10:54:23 +0000 (12:54 +0200)] 
segtree: fix map listing with interface wildcard

 # nft -f - <<'EOF'
 table inet filter {
    chain INPUT {
        iifname vmap {
            "eth0" : jump input_lan,
            "wg*" : jump input_vpn
        }
    }
    chain input_lan {}
    chain input_vpn {}
 }
 EOF
 # nft list ruleset
 nft: segtree.c:578: interval_map_decompose: Assertion `low->len / 8 > 0' failed.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1617
Fixes: 5e393ea1fc0a ("segtree: add string "range" reversal support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agoscanner: don't pop active flex scanner scope
Florian Westphal [Thu, 23 Jun 2022 17:56:19 +0000 (19:56 +0200)] 
scanner: don't pop active flex scanner scope

Currently we can pop a flex scope that is still active, i.e. the
scanner_pop_start_cond() for the scope has not been done.

Example:
  counter ipsec out ip daddr 192.168.1.2 counter name "ipsec_out"

Here, parser fails because 'daddr' is parsed as STRING, not as DADDR token.

Bug is as follows:
COUNTER changes scope to COUNTER. (COUNTER).
Next, IPSEC scope gets pushed, stack is: COUNTER, IPSEC.

Then, the 'COUNTER' scope close happens.  Because active scope has changed,
we cannot pop (we would pop the 'ipsec' scope in flex).
The pop operation gets delayed accordingly.

Next, IP gets pushed, stack is: COUNTER, IPSEC, IP, plus the information
that one scope closure/pop was delayed.

Then, the IP scope is closed.  Because a pop operation was delayed, we pop again,
which brings us back to COUNTER state.

This is bogus: The pop operation CANNOT be done yet, because the ipsec scope
is still open, but the existing code lacks the information to detect this.

After popping the IP scope, we must remain in IPSEC scope until bison
parser calls scanner_pop_start_cond(, IPSEC).

This adds a counter per flex scope so that we can detect this case.
In above case, after the IP scope gets closed, the "new" (previous)
scope (IPSEC) will be treated as active and its close is attempted again
on the next call to scanner_pop_start_cond().

After this patch, transition in above rule is:

push counter (COUNTER)
push IPSEC (COUNTER, IPSEC)
pop COUNTER (delayed: COUNTER, IPSEC, pending-pop for COUNTER),
push IP (COUNTER, IPSEC, IP, pending-pop for COUNTER)
pop IP (COUNTER, IPSEC, pending-pop for COUNTER)
parse DADDR (we're in IPSEC scope, its valid token)
pop IPSEC (pops all remaining scopes).

We could also resurrect the commit:
"scanner: flags: move to own scope", the test case passes with the
new scope closure logic.

Fixes: bff106c5b277 ("scanner: add support for scope nesting")
Signed-off-by: Florian Westphal <fw@strlen.de>
22 months agoparser: add missing synproxy scope closure
Florian Westphal [Thu, 23 Jun 2022 16:28:14 +0000 (18:28 +0200)] 
parser: add missing synproxy scope closure

Fixes: 232f2c3287fc ("scanner: synproxy: Move to own scope")
Signed-off-by: Florian Westphal <fw@strlen.de>
22 months agotests/py: Add a test for failing ipsec after counter
Phil Sutter [Thu, 23 Jun 2022 14:28:42 +0000 (16:28 +0200)] 
tests/py: Add a test for failing ipsec after counter

This is a bug in parser/scanner due to scoping:

| Error: syntax error, unexpected string, expecting saddr or daddr
| add rule ip ipsec-ip4 ipsec-forw counter ipsec out ip daddr 192.168.1.2
|                                                       ^^^^^

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
22 months agoevaluate: fix segfault when adding elements to invalid set
Peter Tirsek [Sun, 26 Jun 2022 05:47:07 +0000 (00:47 -0500)] 
evaluate: fix segfault when adding elements to invalid set

Adding elements to a set or map with an invalid definition causes nft to
segfault. The following nftables.conf triggers the crash:

    flush ruleset
    create table inet filter
    set inet filter foo {}
    add element inet filter foo { foobar }

Simply parsing and checking the config will trigger it:

    $ nft -c -f nftables.conf.crash
    Segmentation fault

The error in the set/map definition is correctly caught and queued, but
because the set is invalid and does not contain a key type, adding to it
causes a NULL pointer dereference of set->key within setelem_evaluate().

I don't think it's necessary to queue another error since the underlying
problem is correctly detected and reported when parsing the definition
of the set. Simply checking the validity of set->key before using it
seems to fix it, causing the error in the definition of the set to be
reported properly. The element type error isn't caught, but that seems
reasonable since the key type is invalid or unknown anyway:

    $ ./nft -c -f ~/nftables.conf.crash
    /home/pti/nftables.conf.crash:3:21-21: Error: set definition does not specify key
    set inet filter foo {}
                        ^

[ Add tests to cover this case --pablo ]

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1597
Signed-off-by: Peter Tirsek <peter@tirsek.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agomnl: store netlink error location for set elements
Pablo Neira Ayuso [Mon, 27 Jun 2022 08:20:46 +0000 (10:20 +0200)] 
mnl: store netlink error location for set elements

Store set element location in the per-command netlink error location
array.  This allows for fine grain error reporting when adding and
deleting elements.

 # nft -f test.nft
 test.nft:5:4-20: Error: Could not process rule: File exists
                        00:01:45:09:0b:26 : drop,
                        ^^^^^^^^^^^^^^^^^

test.nft contains a large map with one redundant entry.

Thus, users do not have to find the needle in the stack.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agosrc: remove NFT_NLATTR_LOC_MAX limit for netlink location error reporting
Pablo Neira Ayuso [Mon, 27 Jun 2022 08:16:48 +0000 (10:16 +0200)] 
src: remove NFT_NLATTR_LOC_MAX limit for netlink location error reporting

Set might have more than 16 elements, use a runtime array to store
netlink error location.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agoparser_bison: fix error location for set elements
Pablo Neira Ayuso [Mon, 27 Jun 2022 08:15:30 +0000 (10:15 +0200)] 
parser_bison: fix error location for set elements

opt_newline causes interfere since it points to the previous line.
Refer to set element key for error reporting.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agointervals: check for EXPR_F_REMOVE in case of element mismatch
Pablo Neira Ayuso [Thu, 23 Jun 2022 16:41:21 +0000 (18:41 +0200)] 
intervals: check for EXPR_F_REMOVE in case of element mismatch

If auto-merge is disable and element to be deleted finds no exact
matching, then bail out.

Fixes: 3e8d934e4f72 ("intervals: support to partial deletion with automerge")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agointervals: fix crash when trying to remove element in empty set
Pablo Neira Ayuso [Thu, 23 Jun 2022 12:20:17 +0000 (14:20 +0200)] 
intervals: fix crash when trying to remove element in empty set

The set deletion routine expects an initialized set, otherwise it crashes.

Fixes: 3e8d934e4f72 ("intervals: support to partial deletion with automerge")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agonetlink_delinearize: memleak when parsing concatenation data
Pablo Neira Ayuso [Thu, 23 Jun 2022 18:07:38 +0000 (20:07 +0200)] 
netlink_delinearize: memleak when parsing concatenation data

netlink_get_register() clones the expression in the register,
release after using it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agolibnftables: release top level scope
Pablo Neira Ayuso [Fri, 17 Jun 2022 17:33:53 +0000 (19:33 +0200)] 
libnftables: release top level scope

Otherwise bogus variable redefinition are reported via -o/--optimize:

  redefinition.conf:5:8-21: Error: redefinition of symbol 'interface_inet'
  define interface_inet = enp5s0
         ^^^^^^^^^^^^^^

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: limit statement is not supported yet
Pablo Neira Ayuso [Fri, 17 Jun 2022 17:03:05 +0000 (19:03 +0200)] 
optimize: limit statement is not supported yet

Revert support for limit statement, the limit statement is stateful and
it applies a ratelimit per rule, transformation for merging rules with
the limit statement needs to use anonymous sets with statements.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: assume verdict is same when rules have no verdict
Pablo Neira Ayuso [Fri, 17 Jun 2022 16:51:40 +0000 (18:51 +0200)] 
optimize: assume verdict is same when rules have no verdict

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: only merge OP_IMPLICIT and OP_EQ relational
Pablo Neira Ayuso [Fri, 17 Jun 2022 16:17:49 +0000 (18:17 +0200)] 
optimize: only merge OP_IMPLICIT and OP_EQ relational

Add test to cover this case.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agotests: shell: run -c -o on ruleset
Pablo Neira Ayuso [Fri, 17 Jun 2022 16:10:19 +0000 (18:10 +0200)] 
tests: shell: run -c -o on ruleset

Just run -o/--optimize on a ruleset.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: add unsupported statement
Pablo Neira Ayuso [Fri, 17 Jun 2022 15:49:59 +0000 (17:49 +0200)] 
optimize: add unsupported statement

Do not try to merge rules with unsupported statements. This patch adds a
dummy unsupported statement which is included in the statement
collection and the rule vs statement matrix.

When looking for possible rule mergers, rules using unsupported
statements are discarded, otherwise bogus rule mergers might occur.

Note that __stmt_type_eq() already returns false for unsupported
statements.

Add a test using meta mark statement, which is not yet supported.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: add hash expression support
Pablo Neira Ayuso [Fri, 17 Jun 2022 16:05:15 +0000 (18:05 +0200)] 
optimize: add hash expression support

Extend expr_cmp() to compare hash expressions used in relational.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: add numgen expression support
Pablo Neira Ayuso [Fri, 17 Jun 2022 16:02:56 +0000 (18:02 +0200)] 
optimize: add numgen expression support

Extend expr_cmp() to compare numgen expressions used in relational.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: add binop expression support
Pablo Neira Ayuso [Fri, 17 Jun 2022 15:48:43 +0000 (17:48 +0200)] 
optimize: add binop expression support

Do recursive call using left expression in the binop expression tree to
search for the primary expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: add fib expression support
Pablo Neira Ayuso [Fri, 17 Jun 2022 15:48:32 +0000 (17:48 +0200)] 
optimize: add fib expression support

Extend expr_cmp() to compare fib expressions used in relational.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: add xfrm expression support
Pablo Neira Ayuso [Fri, 17 Jun 2022 15:47:53 +0000 (17:47 +0200)] 
optimize: add xfrm expression support

Extend expr_cmp() to compare xfrm expressions used in relational.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: add osf expression support
Pablo Neira Ayuso [Fri, 17 Jun 2022 15:47:15 +0000 (17:47 +0200)] 
optimize: add osf expression support

Extend expr_cmp() to compare osf expressions used in relational.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: fix verdict map merging
Pablo Neira Ayuso [Fri, 17 Jun 2022 15:42:58 +0000 (17:42 +0200)] 
optimize: fix verdict map merging

Skip comparison when collecting the statement and building the rule vs
statement matrix. Compare verdict type when merging rules.

When infering rule mergers, honor the STMT_VERDICT with map (ie. vmap).

Fixes: 561aa3cfa8da ("optimize: merge verdict maps with same lookup key")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: fix reject statement
Pablo Neira Ayuso [Fri, 17 Jun 2022 15:28:00 +0000 (17:28 +0200)] 
optimize: fix reject statement

Add missing code to the statement collection routine. Compare reject
expressions when available. Add tests/shell.

Fixes: fb298877ece2 ("src: add ruleset optimization infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: remove comment after merging
Pablo Neira Ayuso [Fri, 17 Jun 2022 15:26:38 +0000 (17:26 +0200)] 
optimize: remove comment after merging

Remove rule comment after merging rules, let the user decide if they want
to reintroduce the comment in the ruleset file.

Update optimizations/merge_stmt test.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: do not print stateful information
Pablo Neira Ayuso [Fri, 17 Jun 2022 15:25:50 +0000 (17:25 +0200)] 
optimize: do not print stateful information

Do not print stateful information such as counters which are likely set
to zero.

Before this patch:

  Merging:
  packets.conf:10:3-29:                 ip protocol  4 counter drop
  packets.conf:11:3-29:                 ip protocol 41 counter drop
  packets.conf:12:3-29:                 ip protocol 47 counter drop
  into:
          ip protocol { 4, 41, 47 } counter packets 0 bytes 0 drop
                                            ^^^^^^^^^^^^^^^^^
After:

  Merging:
  packets.conf:10:3-29:                 ip protocol  4 counter drop
  packets.conf:11:3-29:                 ip protocol 41 counter drop
  packets.conf:12:3-29:                 ip protocol 47 counter drop
  into:
          ip protocol { 4, 41, 47 } counter drop

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: do not merge rules with set reference in rhs
Pablo Neira Ayuso [Fri, 17 Jun 2022 15:20:26 +0000 (17:20 +0200)] 
optimize: do not merge rules with set reference in rhs

Otherwise set reference ends up included in an anonymous set, as an
element, which is not supported.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agooptimize: do not compare relational expression rhs when collecting statements
Pablo Neira Ayuso [Fri, 17 Jun 2022 15:20:17 +0000 (17:20 +0200)] 
optimize: do not compare relational expression rhs when collecting statements

When building the statement matrix, do not compare expression right hand
side, otherwise bogus mismatches might occur.

The fully compared flag is set on when comparing rules to look for
possible mergers.

Fixes: 3f36cc6c3dcd ("optimize: do not merge unsupported statement expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agointervals: Do not sort cached set elements over and over again
Phil Sutter [Thu, 16 Jun 2022 08:56:12 +0000 (10:56 +0200)] 
intervals: Do not sort cached set elements over and over again

When adding element(s) to a non-empty set, code merged the two lists and
sorted the result. With many individual 'add element' commands this
causes substantial overhead. Make use of the fact that
existing_set->init is sorted already, sort only the list of new elements
and use list_splice_sorted() to merge the two sorted lists.

Add set_sort_splice() and use it for set element overlap detection and
automerge.

A test case adding ~25k elements in individual commands completes in
about 1/4th of the time with this patch applied.

Joint work with Pablo.

Fixes: 3da9643fb9ff9 ("intervals: add support to automerge with kernel elements")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agointervals: do not empty cache for maps
Pablo Neira Ayuso [Thu, 16 Jun 2022 08:53:56 +0000 (10:53 +0200)] 
intervals: do not empty cache for maps

Translate set element to range and sort in maps for the NFT_SET_MAP
case, which does not support for automerge yet.

Fixes: 81e36530fcac ("src: replace interval segment tree overlap and automerge")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agointervals: do not report exact overlaps for new elements
Pablo Neira Ayuso [Mon, 13 Jun 2022 15:22:47 +0000 (17:22 +0200)] 
intervals: do not report exact overlaps for new elements

Two new elements that represent an exact overlap should not trigger an error.

   add table t
   add set t s { type ipv4_addr; flags interval; }
   add element t s { 1.0.1.0/24 }
   ...
   add element t s { 1.0.1.0/24 }

result in a bogus error.

 # nft -f set.nft
 set.nft:1002:19-28: Error: conflicting intervals specified
 add element t s { 1.0.1.0/24 }
                   ^^^^^^^^^^

Fixes: 3da9643fb9ff ("intervals: add support to automerge with kernel elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agorule: collapse set element commands
Pablo Neira Ayuso [Mon, 13 Jun 2022 15:22:44 +0000 (17:22 +0200)] 
rule: collapse set element commands

Robots might generate a long list of singleton element commands such as:

  add element t s { 1.0.1.0/24 }
  ...
  add element t s { 1.0.2.0/23 }

collapse them into one single command before the evaluation step, ie.

  add element t s { 1.0.1.0/24, ..., 1.0.2.0/23 }

this speeds up overlap detection and set element automerge operations in
this worst case scenario.

Since 3da9643fb9ff9 ("intervals: add support to automerge with kernel
elements"), the new interval tracking relies on mergesort. The pattern
above triggers the set sorting for each element.

This patch adds a list to cmd objects that store collapsed commands.
Moreover, expressions also contain a reference to the original command,
to uncollapse the commands after the evaluation step.

These commands are uncollapsed after the evaluation step to ensure error
reporting works as expected (command and netlink message are mapped
1:1).

For the record:

- nftables versions <= 1.0.2 did not perform any kind of overlap
  check for the described scenario above (because set cache only contained
  elements in the kernel in this case). This is a problem for kernels < 5.7
  which rely on userspace to detect overlaps.

- the overlap detection could be skipped for kernels >= 5.7.

- The extended netlink error reporting available for set elements
  since 5.19-rc might allow to remove the uncollapse step, in this case,
  error reporting does not rely on the netlink sequence to refer to the
  command triggering the problem.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agotests: shell: runtime set element automerge
Pablo Neira Ayuso [Mon, 13 Jun 2022 15:05:22 +0000 (17:05 +0200)] 
tests: shell: runtime set element automerge

Add a test to cover runtime set element automerge.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agoRevert "scanner: flags: move to own scope"
Florian Westphal [Fri, 10 Jun 2022 11:01:46 +0000 (13:01 +0200)] 
Revert "scanner: flags: move to own scope"

Excess nesting of scanner scopes is very fragile and error prone:

rule `iif != lo ip daddr 127.0.0.1/8 counter limit rate 1/second log flags all prefix "nft_lo4 " drop`
fails with `Error: No symbol type information` hinting at `prefix`

Problem is that we nest via:
 counter
   limit
     log
    flags

By the time 'prefix' is scanned, state is still stuck in 'counter' due
to this nesting.  Working around "prefix" isn't enough, any other
keyword, e.g. "level" in 'flags all level debug' will be parsed as 'string' too.

So, revert this.

Fixes: a16697097e2b ("scanner: flags: move to own scope")
Reported-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
23 months agobuild: Bump version to 1.0.4 v1.0.4
Pablo Neira Ayuso [Tue, 7 Jun 2022 14:14:09 +0000 (16:14 +0200)] 
build: Bump version to 1.0.4

Bump libnftnl dependency to fix --debug with new TCP reset support.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
23 months agotests: shell: remove leftover modules on cleanup
Pablo Neira Ayuso [Thu, 2 Jun 2022 07:52:48 +0000 (09:52 +0200)] 
tests: shell: remove leftover modules on cleanup

After ./run-tests.sh no nf_tables modules are left in place.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
23 months agoevaluate: reset ctx->set after set interval evaluation
Pablo Neira Ayuso [Wed, 1 Jun 2022 17:09:31 +0000 (19:09 +0200)] 
evaluate: reset ctx->set after set interval evaluation

Otherwise bogus error reports on set datatype mismatch might occur, such as:

Error: datatype mismatch, expected Internet protocol, expression has type IPv4 address
    meta l4proto { tcp, udp } th dport 443 dnat to 10.0.0.1
    ~~~~~~~~~~~~ ^^^^^^^^^^^^

with an unrelated set declaration.

table ip test {
       set set_with_interval {
               type ipv4_addr
               flags interval
       }

       chain prerouting {
               type nat hook prerouting priority dstnat; policy accept;
               meta l4proto { tcp, udp } th dport 443 dnat to 10.0.0.1
       }
}

This bug has been introduced in the evaluation step.

Reported-by: Roman Petrov <nwhisper@gmail.com>
Fixes: 81e36530fcac ("src: replace interval segment tree overlap and automerge)"
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
23 months agotests: shell: sets_with_ifnames release netns on exit
Pablo Neira Ayuso [Wed, 1 Jun 2022 16:17:02 +0000 (18:17 +0200)] 
tests: shell: sets_with_ifnames release netns on exit

Missing ip netns del call from cleanup()

Fixes: d6fdb0d8d482 ("sets_with_ifnames: add test case for concatenated range")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
23 months agooptimize: segfault when releasing unsupported statement
Pablo Neira Ayuso [Wed, 1 Jun 2022 08:14:22 +0000 (10:14 +0200)] 
optimize: segfault when releasing unsupported statement

Call xfree() instead since stmt_alloc() does not initialize the
statement type fields.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1609
Fixes: ea1f1c9ff608 ("optimize: memleak in statement matrix")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
23 months agobuild: Bump version to 1.0.3 v1.0.3
Pablo Neira Ayuso [Tue, 31 May 2022 08:21:44 +0000 (10:21 +0200)] 
build: Bump version to 1.0.3

Still requires libnftnl 1.2.1

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
23 months agonft: simplify chain lookup in do_list_chain
Chander Govindarajan [Wed, 25 May 2022 09:55:43 +0000 (15:25 +0530)] 
nft: simplify chain lookup in do_list_chain

use the chain_cache_find function for faster lookup of chain instead of
iterating over all chains in table

Signed-off-by: ChanderG <mail@chandergovind.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
23 months agointervals: fix compilation --with-mini-gmp
Pablo Neira Ayuso [Mon, 30 May 2022 17:00:05 +0000 (19:00 +0200)] 
intervals: fix compilation --with-mini-gmp

Use pr_gmp_debug() instead to compile with minigmp.

intervals.c: In function â€˜set_delete’:
intervals.c:489:25: warning: implicit declaration of function â€˜gmp_printf’; did you mean â€˜gmp_vfprintf’? [-Wimplicit-function-declaration]
  489 |                         gmp_printf("remove: [%Zx-%Zx]\n",
      |                         ^~~~~~~~~~
      |                         gmp_vfprintf

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
23 months agojson: update json output ordering to place rules after chains
Chander Govindarajan [Mon, 23 May 2022 10:07:11 +0000 (15:37 +0530)] 
json: update json output ordering to place rules after chains

Currently the json output of `nft -j list ruleset` interleaves rules
with chains.

As reported in this bug:

 https://bugzilla.netfilter.org/show_bug.cgi?id=1580

the json cannot be fed into `nft -j -f <file>` since rules may
reference chains that are created later

Instead create rules after all chains are output.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1580
Signed-off-by: ChanderG <mail@chandergovind.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
23 months agonetlink_delinearize: release last register on exit
Pablo Neira Ayuso [Fri, 13 May 2022 14:46:31 +0000 (16:46 +0200)] 
netlink_delinearize: release last register on exit

netlink_release_registers() does not release the expression in the last
32-bit register.

struct netlink_parse_ctx {
...
        struct expr             *registers[MAX_REGS + 1];

This array is MAX_REGS + 1 (verdict register + 16 32-bit registers).

Fixes: 371c3a0bc3c2 ("netlink_delinearize: release expressions in context registers")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
23 months agosets_with_ifnames: add test case for concatenated range
Florian Westphal [Fri, 29 Apr 2022 18:32:39 +0000 (20:32 +0200)] 
sets_with_ifnames: add test case for concatenated range

Refactor existing test case for simple interace name ranges
(without concatenations) to also cover "addr . ifname".

Signed-off-by: Florian Westphal <fw@strlen.de>
23 months agosegtree: add pretty-print support for wildcard strings in concatenated sets
Florian Westphal [Fri, 29 Apr 2022 18:32:38 +0000 (20:32 +0200)] 
segtree: add pretty-print support for wildcard strings in concatenated sets

For concat ranges, something like 'ppp*' is translated as a range
from 'ppp\0\0\0...' to 'ppp\ff\ff\ff...'.

In order to display this properly, check for presence of string base
type and convert to symbolic expression, with appended '*' character.

Signed-off-by: Florian Westphal <fw@strlen.de>
23 months agonetlink: swap byteorder for host-endian concat data
Florian Westphal [Fri, 29 Apr 2022 18:32:37 +0000 (20:32 +0200)] 
netlink: swap byteorder for host-endian concat data

All data must be passed in network byte order, else matching
won't work respectively kernel will reject the interval because
it thinks that start is after end

This is needed to allow use of 'ppp*' in interval sets with
concatenations.

Signed-off-by: Florian Westphal <fw@strlen.de>
2 years agointervals: deletion should adjust range not yet in the kernel
Pablo Neira Ayuso [Fri, 6 May 2022 21:46:59 +0000 (23:46 +0200)] 
intervals: deletion should adjust range not yet in the kernel

Do not remove the range if it does not exists yet in the kernel, adjust it
instead.  Uncovered by use-after-free error.

==276702==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d00190663c at pc 0x7ff310ab526f bp 0x7fffeb76f750 sp 0x7fffeb76f748 READ of size 4 at 0x60d00190663c thread T0
    #0 0x7ff310ab526e in __adjust_elem_right .../nftables/src/intervals.c:300
    #1 0x7ff310ab59a7 in adjust_elem_right .../nftables/src/intervals.c:311
    #2 0x7ff310ab6daf in setelem_adjust .../nftables/src/intervals.c:354
    #3 0x7ff310ab783a in setelem_delete .../nftables/src/intervals.c:411
    #4 0x7ff310ab80e6 in __set_delete .../nftables/src/intervals.c:451

Fixes: 3e8d934e4f72 ("intervals: support to partial deletion with automerge")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agooptimize: memleak in statement matrix
Pablo Neira Ayuso [Wed, 4 May 2022 10:02:43 +0000 (12:02 +0200)] 
optimize: memleak in statement matrix

Release clone object in case this statement is not supported.

Fixes: 743b0e81371f ("optimize: do not clone unsupported statement")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agooptimize: merge nat rules with same selectors into map
Pablo Neira Ayuso [Tue, 3 May 2022 15:51:36 +0000 (17:51 +0200)] 
optimize: merge nat rules with same selectors into map

Verdict and nat are mutually exclusive, no need to support for this
combination.

 # cat ruleset.nft
 table ip x {
        chain y {
type nat hook postrouting priority srcnat; policy drop;
                ip saddr 1.1.1.1 tcp dport 8000 snat to 4.4.4.4:80
                ip saddr 2.2.2.2 tcp dport 8001 snat to 5.5.5.5:90
        }
 }

 # nft -o -c -f ruleset.nft
 Merging:
 ruleset.nft:4:3-52:                ip saddr 1.1.1.1 tcp dport 8000 snat to 4.4.4.4:80
 ruleset.nft:5:3-52:                ip saddr 2.2.2.2 tcp dport 8001 snat to 5.5.5.5:90
 into:
        snat to ip saddr . tcp dport map { 1.1.1.1 . 8000 : 4.4.4.4 . 80, 2.2.2.2 . 8001 : 5.5.5.5 . 90 }

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agooptimize: do not clone unsupported statement
Pablo Neira Ayuso [Tue, 3 May 2022 15:49:56 +0000 (17:49 +0200)] 
optimize: do not clone unsupported statement

Skip unsupported statements when building the statement matrix,
otherwise clone remains uninitialized.

Fixes: fb298877ece2 ("src: add ruleset optimization infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agooptimize: incorrect logic in verdict comparison
Pablo Neira Ayuso [Tue, 3 May 2022 09:30:57 +0000 (11:30 +0200)] 
optimize: incorrect logic in verdict comparison

Keep inspecting rule verdicts before assuming they are equal. Update
existing test to catch this bug.

Fixes: 1542082e259b ("optimize: merge same selector with different verdict into verdict map")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agosrc: fix always-true assertions
Florian Westphal [Tue, 26 Apr 2022 09:47:37 +0000 (11:47 +0200)] 
src: fix always-true assertions

assert(1) is a no-op, this should be assert(0). Use BUG() instead.
Add missing CATCHALL to avoid BUG().

Signed-off-by: Florian Westphal <fw@strlen.de>
2 years agointervals: set on EXPR_F_KERNEL flag for new elements in set cache
Pablo Neira Ayuso [Mon, 18 Apr 2022 13:17:59 +0000 (15:17 +0200)] 
intervals: set on EXPR_F_KERNEL flag for new elements in set cache

So follow up command in this batch that update the set assumes this
element is already in the kernel.

Fixes: 3da9643fb9ff ("intervals: add support to automerge with kernel elements")
Fixes: 3ed9fadaab95 ("intervals: build list of elements to be added from cache")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agotests: add concat test case with integer base type subkey
Florian Westphal [Sun, 17 Apr 2022 22:00:18 +0000 (00:00 +0200)] 
tests: add concat test case with integer base type subkey

Signed-off-by: Florian Westphal <fw@strlen.de>
2 years agosrc: allow use of base integer types as set keys in concatenations
Florian Westphal [Sun, 17 Apr 2022 20:27:41 +0000 (22:27 +0200)] 
src: allow use of base integer types as set keys in concatenations

"typeof ip saddr . ipsec in reqid" won't work because reqid uses
integer type, i.e. dtype->size is 0.

With "typeof", the size can be derived from the expression length,
via set->key.

This computes the concat length based either on dtype->size or
expression length.

It also updates concat evaluation to permit a zero datatype size
if the subkey expression has nonzero length (i.e., typeof was used).

Signed-off-by: Florian Westphal <fw@strlen.de>
2 years agointervals: build list of elements to be added from cache
Pablo Neira Ayuso [Fri, 15 Apr 2022 09:40:09 +0000 (11:40 +0200)] 
intervals: build list of elements to be added from cache

Loop over the set cache and add elements that have no EXPR_F_KERNEL,
meaning that these are new elements in the set that have resulted
from adjusting/split existing ranges.

This fixes several partial deletions of the same interval in one
command.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agointervals: fix deletion of multiple ranges with automerge
Pablo Neira Ayuso [Thu, 14 Apr 2022 15:47:30 +0000 (17:47 +0200)] 
intervals: fix deletion of multiple ranges with automerge

Iterate over the list of elements to be deleted, then splice one
EXPR_F_REMOVE element at a time to update the list of existing sets
incrementally.

Fixes: 3e8d934e4f722 ("intervals: support to partial deletion with automerge")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agointervals: add elements with EXPR_F_KERNEL to purge list only
Pablo Neira Ayuso [Thu, 14 Apr 2022 16:16:31 +0000 (18:16 +0200)] 
intervals: add elements with EXPR_F_KERNEL to purge list only

Do not add elements to purge list which are not in the kernel,
otherwise, bogus ENOENT is reported.

Fixes: 3e8d934e4f722 ("intervals: support to partial deletion with automerge")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetlink: remove unused argument from helper function
Florian Westphal [Sun, 17 Apr 2022 20:58:19 +0000 (22:58 +0200)] 
netlink: remove unused argument from helper function

Signed-off-by: Florian Westphal <fw@strlen.de>
2 years agointervals: Simplify element sanity checks
Phil Sutter [Thu, 14 Apr 2022 11:39:24 +0000 (13:39 +0200)] 
intervals: Simplify element sanity checks

Since setelem_delete() assigns to 'prev' pointer only if it doesn't have
EXPR_F_REMOVE flag set, there is no need to check that flag in called
functions.

Fixes: 3e8d934e4f722 ("intervals: support to partial deletion with automerge")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agointervals: unset EXPR_F_KERNEL for adjusted elements
Pablo Neira Ayuso [Wed, 13 Apr 2022 13:37:19 +0000 (15:37 +0200)] 
intervals: unset EXPR_F_KERNEL for adjusted elements

This element is adjusted, reset the EXPR_F_KERNEL flag, this is a new
element and the old is purged from the kernel.

The existing list of elements in the kernel is spliced to the elements
to be removed, then merge-sorted. The EXPR_F_REMOVE flag specifies that
this element represents a deletion.

The EXPR_F_REMOVE and EXPR_F_KERNEL allows to track objects: whether
element is in the kernel (EXPR_F_KERNEL), element is new (no flag) or
element represents a removal (EXPR_F_REMOVE).

Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agosrc: restore interval sets work with string datatypes
Pablo Neira Ayuso [Wed, 13 Apr 2022 02:01:30 +0000 (04:01 +0200)] 
src: restore interval sets work with string datatypes

Switch byteorder of string datatypes to host byteorder.

Partial revert of ("src: make interval sets work with string datatypes")
otherwise new interval code complains with conflicting intervals.

testcases/sets/sets_with_ifnames passes fine again.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agointervals: support to partial deletion with automerge
Pablo Neira Ayuso [Wed, 13 Apr 2022 02:01:22 +0000 (04:01 +0200)] 
intervals: support to partial deletion with automerge

Splice the existing set element cache with the elements to be deleted
and merge sort it.  The elements to be deleted are identified by the
EXPR_F_REMOVE flag.

The set elements to be deleted is automerged in first place if the
automerge flag is set on.

There are four possible deletion scenarios:

- Exact match, eg. delete [a-b] and there is a [a-b] range in the kernel set.
- Adjust left side of range, eg. delete [a-b] from range [a-x] where x > b.
- Adjust right side of range, eg. delete [a-b] from range [x-b] where x < a.
- Split range, eg. delete [a-b] from range [x-y] where x < a and b < y.

Update nft_evaluate() to use the safe list variant since new commands
are dynamically registered to the list to update ranges.

This patch also restores the set element existence check for Linux
kernels <= 5.7.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agoevaluate: allow for zero length ranges
Pablo Neira Ayuso [Wed, 13 Apr 2022 02:01:19 +0000 (04:01 +0200)] 
evaluate: allow for zero length ranges

Allow for ranges such as, eg. 30-30.

This is required by the new intervals.c code, which normalize constant,
prefix set elements to all ranges.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agointervals: add support to automerge with kernel elements
Pablo Neira Ayuso [Wed, 13 Apr 2022 02:01:17 +0000 (04:01 +0200)] 
intervals: add support to automerge with kernel elements

Extend the interval codebase to support for merging elements in the
kernel with userspace element updates.

Add a list of elements to be purged to cmd and set objects. These
elements representing outdated intervals are deleted before adding the
updated ranges.

This routine splices the list of userspace and kernel elements, then it
mergesorts to identify overlapping and contiguous ranges. This splice
operation is undone so the set userspace cache remains consistent.

Incrementally update the elements in the cache, this allows to remove
dd44081d91ce ("segtree: Fix add and delete of element in same batch").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agomnl: update mnl_nft_setelem_del() to allow for more reuse
Pablo Neira Ayuso [Wed, 13 Apr 2022 02:01:16 +0000 (04:01 +0200)] 
mnl: update mnl_nft_setelem_del() to allow for more reuse

Pass handle and element list as parameters to allow for code reuse.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agosrc: remove rbtree datastructure
Pablo Neira Ayuso [Wed, 13 Apr 2022 02:01:14 +0000 (04:01 +0200)] 
src: remove rbtree datastructure

Not used by anyone anymore, remove it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agosrc: replace interval segment tree overlap and automerge
Pablo Neira Ayuso [Wed, 13 Apr 2022 02:01:13 +0000 (04:01 +0200)] 
src: replace interval segment tree overlap and automerge

This is a rewrite of the segtree interval codebase.

This patch now splits the original set_to_interval() function in three
routines:

- add set_automerge() to merge overlapping and contiguous ranges.
  The elements, expressed either as single value, prefix and ranges are
  all first normalized to ranges. This elements expressed as ranges are
  mergesorted. Then, there is a linear list inspection to check for
  merge candidates. This code only merges elements in the same batch,
  ie. it does not merge elements in the kernela and the userspace batch.

- add set_overlap() to check for overlapping set elements. Linux
  kernel >= 5.7 already checks for overlaps, older kernels still needs
  this code. This code checks for two conflict types:

  1) between elements in this batch.
  2) between elements in this batch and kernelspace.

  The elements in the kernel are temporarily merged into the list of
  elements in the batch to check for this overlaps. The EXPR_F_KERNEL
  flag allows us to restore the set cache after the overlap check has
  been performed.

- set_to_interval() now only transforms set elements, expressed as range
  e.g. [a,b], to individual set elements using the EXPR_F_INTERVAL_END
  flag notation to represent e.g. [a,b+1), where b+1 has the
  EXPR_F_INTERVAL_END flag set on.

More relevant updates:

- The overlap and automerge routines are now performed in the evaluation
  phase.

- The userspace set object representation now stores a reference to the
  existing kernel set object (in case there is already a set with this
  same name in the kernel). This is required by the new overlap and
  automerge approach.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agosrc: add EXPR_F_KERNEL to identify expression in the kernel
Pablo Neira Ayuso [Wed, 13 Apr 2022 02:01:09 +0000 (04:01 +0200)] 
src: add EXPR_F_KERNEL to identify expression in the kernel

This allows to identify the set elements that reside in the kernel.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agosegtree: add support for get element with sets that contain ifnames
Florian Westphal [Sat, 9 Apr 2022 13:58:32 +0000 (15:58 +0200)] 
segtree: add support for get element with sets that contain ifnames

nft get element inet filter s { bla, prefixfoo }
table inet filter {
        set s {
                type ifname
                flags interval
                elements = { "prefixfoo*",
                             "bla" }
        }

Also add test cases for this.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agosegtree: use correct byte order for 'element get'
Florian Westphal [Sat, 9 Apr 2022 13:58:31 +0000 (15:58 +0200)] 
segtree: use correct byte order for 'element get'

Fails when the argument / set contains strings: we need to use
host byte order if element has string base type.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agotests: add testcases for interface names in sets
Florian Westphal [Sat, 9 Apr 2022 13:58:30 +0000 (15:58 +0200)] 
tests: add testcases for interface names in sets

Add initial test case, sets with names and interfaces,
anonymous and named ones.

Check match+no-match.
netns with ppp1 and ppq veth, send packets via both interfaces.
Rule counters should have incremented on the three rules.
(that match on set that have "abcdef1" or "abcdef*" strings in them).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agosegtree: add string "range" reversal support
Florian Westphal [Sat, 9 Apr 2022 13:58:29 +0000 (15:58 +0200)] 
segtree: add string "range" reversal support

Previous commits allows to use set key as a range, i.e.

key ifname
flags interval
elements = { eth* }

and then have it match on any interface starting with 'eth'.

Listing is broken however, we need to reverse-translate the (128bit)
number back to a string.

'eth*' is stored as interval
00687465 0000000 ..  00697465 0000000, i.e. "eth-eti",
this adds the needed endianess fixups.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agosrc: make interval sets work with string datatypes
Florian Westphal [Sat, 9 Apr 2022 13:58:28 +0000 (15:58 +0200)] 
src: make interval sets work with string datatypes

Allows to interface names in interval sets:

table inet filter {
        set s {
                type ifname
                flags interval
                elements = { eth*, foo }
        }

Concatenations are not yet supported, also, listing is broken,
those strings will not be printed back because the values will remain
in big-endian order.  Followup patch will extend segtree to translate
this back to host byte order.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agoevaluate: string prefix expression must retain original length
Florian Westphal [Sat, 9 Apr 2022 13:58:27 +0000 (15:58 +0200)] 
evaluate: string prefix expression must retain original length

To make something like "eth*" work for interval sets (match
eth0, eth1, and so on...) we must treat the string as a 128 bit
integer.

Without this, segtree will do the wrong thing when applying the prefix,
because we generate the prefix based on 'eth*' as input, with a length of 3.

The correct import needs to be done on "eth\0\0\0\0\0\0\0...", i.e., if
the input buffer were an ipv6 address, it should look like "eth\0::",
not "::eth".

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agosegtree: split prefix and range creation to a helper function
Florian Westphal [Sat, 9 Apr 2022 13:58:26 +0000 (15:58 +0200)] 
segtree: split prefix and range creation to a helper function

No functional change intended.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agoevaluate: keep prefix expression length
Florian Westphal [Sat, 9 Apr 2022 13:58:25 +0000 (15:58 +0200)] 
evaluate: keep prefix expression length

Else, range_expr_value_high() will see a 0 length when doing:

mpz_init_bitmask(tmp, expr->len - expr->prefix_len);

This wasn't a problem so far because prefix expressions generated
from "string*" were never passed down to the prefix->range conversion
functions.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>