git.ipfire.org Git - thirdparty/nftables.git/log

cache: don't crash when filter is NULL

a delete request will cause a crash in obj_cache_dump, move the deref
into the filter block.

Fixes: dbff26bfba83 ("cache: consolidate reset command")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

expression: incorrect assert() list_expr_to_binop

assert() logic is reversed, all expressions in the list are handled,
including the first.

src/expression.c:1285: list_expr_to_binop: Assertion `first' failed.

Fixes: 53d6bb992445 ("expression: initialize list of expression to silence gcc compile warning")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: only allow stateful statements in set and map definitions

The bison parser doesn't allow this to happen due to grammar
restrictions, but the json input has no such issues.

The bogon input assigns 'notrack' which triggers:
BUG: unknown stateful statement type 19
nft: src/netlink_linearize.c:1061: netlink_gen_stmt_stateful: Assertion `0' failed.

After patch, we get:
Error: map statement must be stateful

Fixes: 07958ec53830 ("json: add set statement list support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: compact STMT_F_STATEFUL checks

We'll gain another F_STATEFUL check in a followup patch,
so lets condense the pattern into a helper to reduce copypaste.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

expression: don't try to import empty string

The bogon will trigger the assertion in mpz_import_data:
src/expression.c:418: constant_expr_alloc: Assertion `(((len) + (8) - 1) / (8)) > 0' failed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

expression: initialize list of expression to silence gcc compile warning

The helper function to translate flagcmp expression to binop expression
results in the following compile warning.

  src/expression.c: In function 'list_expr_to_binop':
  src/expression.c:1286:16: warning: 'last' may be used uninitialized [-Wmaybe-uninitialized]
  1286 |         return last;

While at it, add assert() to validate the premises where this function
can be called.

Fixes: 4d5990c92c83 ("src: transform flag match expression to binop expression from parser")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

json: fix error propagation when parsing binop lhs/rhs

Malformed input returns NULL when decoding left/right side of binop.
This causes a NULL dereference in expr_evaluate_binop; left/right must
point to a valid expression.

Fix this in the parser, else would have to sprinkle NULL checks all over
the evaluation code.

After fix, loading the bogon yields:
internal:0:0-0: Error: Malformed object (too many properties): '{}'.
internal:0:0-0: Error: could not decode binop rhs, '<<'.
internal:0:0-0: Error: Invalid mangle statement value
internal:0:0-0: Error: Parsing expr array at index 1 failed.
internal:0:0-0: Error: Parsing command array at index 3 failed.

Fixes: 0ac39384fd9e ("json: Accept more than two operands in binary expressions")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: Fix owner/0002-persist on aarch64

Not sure if arch-specific, but for some reason src/nft wrapper script
would call src/.libs/lt-nft and thus the owner appeared as 'lt-nft'
instead of the expected 'nft'. Cover for that by extracting the expected
program name from /proc.

Fixes: b5205165bd708 ("tests: shell: Extend table persist flag test a bit")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Florian Westphal <fw@strlen.de>

tests: shell: Add socat availability feature test

Several tests did this manually and skipped if unavail, others just
implicitly depended on the tool.

Note that for the sake of simplicity, this will skip
packetpath/tcp_options test entirely when it did a partial run before.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>

json: don't BUG when asked to list synproxies

"-j list synproxys" triggers a BUG().

Rewrite this so that all enum values are handled so the compiler can alert
us to a missing value in case there are more commands in the future.

While at it, implement a few low-hanging fruites as well.

Not-yet-supported cases are simply ignored.

v2: return EOPNOTSUPP for unsupported commands (Pablo Neira Ayuso)

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

json: disallow empty concatenation

Disallow empty concatenation in set declaration in json.

internal:0:0-0: Error: Empty concatenation
internal:0:0-0: Error: Invalid set type.
internal:0:0-0: Error: Parsing command array at index 1 failed.

Joint work with Florian Westphal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: remove flagcmp expression

This expression is not used anymore, since:

("src: transform flag match expression to binop expression from parser")

remove it.

This completes the revert of c3d57114f119 ("parser_bison: add shortcut
syntax for matching flags without binary operations"), except the parser
chunk for backwards compatibility.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: transform flag match expression to binop expression from parser

Transform flagcmp expression to a relational with binop on the left hand
side, ie.

         relational
          /      \
       binop    value
       /   \
payload  mask

Add list_expr_to_binop() to make this transformation.

Goal is two-fold:

- Allow -o/--optimize to pick up on this representation.
- Remove the flagcmp expression in a follow up patch.

This prepare for the removal of the flagcmp expression added by:

  c3d57114f119 ("parser_bison: add shortcut syntax for matching flags without binary operations")

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: compact bitmask matching in set/map

Check if right hand side of relational is a bitmask, ie.

     relational
       /   \
    ...     or
           /  \
       value   or
              /  \
         value    value

then, if left hand side is a binop expression, compare left and right
hand sides (not only left hand of this binop expression) to check for
redundant matches in consecutive rules, ie.

        relational
          /   \
       and     ...
      /   \
payload  value

before this patch, only payload in the binop expression was compared.

This allows to compact several rules matching tcp flags in a set/map, eg.

# nft -c -o -f ruleset.nft
Merging:
ruleset.nft:7:17-76:                 tcp flags & (fin | syn | rst | ack | urg) == fin | ack | urg
ruleset.nft:8:17-70:                 tcp flags & (fin | syn | rst | ack | urg) == fin | ack
ruleset.nft:9:17-64:                 tcp flags & (fin | syn | rst | ack | urg) == fin
ruleset.nft:10:17-70:                 tcp flags & (fin | syn | rst | ack | urg) == syn | ack
ruleset.nft:11:17-64:                 tcp flags & (fin | syn | rst | ack | urg) == syn
ruleset.nft:12:17-70:                 tcp flags & (fin | syn | rst | ack | urg) == rst | ack
ruleset.nft:13:17-64:                 tcp flags & (fin | syn | rst | ack | urg) == rst
ruleset.nft:14:17-70:                 tcp flags & (fin | syn | rst | ack | urg) == ack | urg
ruleset.nft:15:17-64:                 tcp flags & (fin | syn | rst | ack | urg) == ack
into:
        tcp flags & (fin | syn | rst | ack | urg) == { fin | ack | urg, fin | ack, fin, syn | ack, syn, rst | ack, rst, ack | urg, ack }
Merging:
ruleset.nft:17:17-61:                 tcp flags & (ack | urg) == ack jump ack_chain
ruleset.bft:18:17-61:                 tcp flags & (ack | urg) == urg jump urg_chain
into:
        tcp flags & (ack | urg) vmap { ack : jump ack_chain, urg : jump urg_chain }

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: incorrect comparison for reject statement

Logic is reverse, this should returns false if the compared reject
expressions are not the same.

Fixes: 38d48fe57fff ("optimize: fix reject statement")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

expression: add __EXPR_MAX and use it to define EXPR_MAX

EXPR_MAX was never updated to the newest expression, add __EXPR_MAX and
use it to define EXPR_MAX.

Add case to expr_ops() other gcc complains with a warning on the
__EXPR_MAX case is not handled.

Fixes: 347039f64509 ("src: add symbol range expression to further compact intervals")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

json: return error if table does not exist

Identical bug and thus same fix as
853d3a2d3cbd ("rule: return error if table does not exist"),
but this time for json.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: missing ct count elements in new set_stmt test

Add missing entries to dump file.

Reported-by: Florian Westphal <fw@strlen.de>
Fixes: 1f3d0b9cf9cc ("tests: shell: extend coverage for set element statements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: don't update cache for anonymous chains

Chain lookup needs a name, not a numerical id.
After patch, loading bogon gives following errors:

Error: No symbol type information a b index 1 10.1.26.a

v2: Don't return an error, just make it a no-op (Pablo Neira Ayuso)

Fixes: c330152b7f77 ("src: support for implicit chain bindings")
Signed-off-by: Florian Westphal <fw@strlen.de>

json: make sure timeout list is initialised

On parser error, obj_free will iterate this list.
Included json bogon crashes due to null deref because
list head initialisation did not yet happen.

Fixes: c82a26ebf7e9 ("json: Add ct timeout support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: extend coverage for set element statements

Add a test to cover the existing set element statements.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: consolidate connlimit grammar rule for set elements

Define ct_limit_stmt_alloc and ct_limit_args to follow similar idiom
that is used for counters.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: consolidate last grammar rule for set elements

Define last_stmt_alloc and last_args to follow similar idiom that is
used for counters.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: consolidate quota grammar rule for set elements

Define quota_stmt_alloc and quota_args to follow similar idiom that is
used for counters.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: consolidate limit grammar rule for set elements

Define limit_stmt_alloc and limit_args to follow similar idiom that is
used for counters.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: consolidate counter grammar rule for set elements

Use existing grammar rules to parse counters to simplify parser.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: fix assertion failure with malformed map definitions

Included bogon triggers:
nft: src/evaluate.c:2267: expr_evaluate_mapping: Assertion `set->data != NULL' failed.

After this fix, following errors will be shown:
Error: unqualified type invalid specified in map definition. Try "typeof expression" instead of "type datatype".
map m {
^
map m {
^
Error: map has no mapping data

Fixes: 343a51702656 ("src: store expr, not dtype to track data in sets")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: return error if table does not exist

The bogon triggers segfault due to NULL dereference. Error out and set
errno to ENOENT; caller uses strerror() in the errmsg.

After fix, loading reproducer results in:
/tmp/A:2:1-18: Error: Could not process rule: No such file or directory
list table inet p
^^^^^^^^^^^^^^^^^^

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: don't allow nat map with specified protocol

Included bogon asserts:
src/netlink_linearize.c:1305: netlink_gen_nat_stmt: Assertion `stmt->nat.proto == NULL' failed.

The comment right above the assertion says:
  nat_stmt evaluation step doesn't allow
  STMT_NAT_F_CONCAT && stmt->nat.proto.

... except it does allow it.  Disable this.

Fixes: c68314dd4263 ("src: infer NAT mapping with concatenation from set")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

expression: tolerate named set protocol dependency

Included test will fail with:
/dev/stdin:8:38-52: Error: Transparent proxy support requires transport protocol match
   meta l4proto @protos tproxy to :1088
                        ^^^^^^^^^^^^^^^
Tolerate a set reference too.  Because the set can be empty (or there
can be removals later), add a fake 0-rhs value.

This will make pctx_update assign proto_unknown as the transport protocol
in use, Thats enough to avoid 'requires transport protocol' error.

v2: restrict it to meta lhs for now (Pablo Neira Ayuso)

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1686
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinerize: add more restrictions on meta nfproto removal

We can't remove 'meta nfproto' dependencies for all cases.
Its removed for ip/ip6 families, this works fine.

But for others, e.g. inet, removal is not as simple.
For example

meta nfproto ipv4 ct protocol tcp

is listed as 'ct protocol tcp', even when this is uses in the inet
table.

Meta L4PROTO removal checks were correct, but refactor this
into a helper function to split meta/ct checks from the common
calling function.

Ct check was lacking, we need to examine ct keys more closely
to figure out if they need to retain the network protocol depenency
or not. Elide for NFT_CT_SRC/DST and its variants, as those imply
the network protocol to use, all others must keep it as-is.

Also extend test coverage for this.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1783
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: reject non-serializeable typeof expressions

Included bogon asserts with:
BUG: unhandled key type 13
nft: src/intervals.c:73: setelem_expr_to_range: Assertion `0' failed.

This should be rejected at parser stage, but the check for udata
support was only done on the first item in a concatenation.

After fix, parser rejects this with:
Error: primary expression type 'symbol' lacks typeof serialization

Fixes: 6e48df5329ea ("src: add "typeof" build/parse/print support")
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: py: remove unknown fields

Amend tests/py after libnftnl fixes:

a7dfa49d34c7 ("expr: ct: print key name of id field")
dba1b687a9a7 ("expr: payload: print tunnel header")

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: fix stack buffer overrun when emitting ranged expressions

Included bogon input generates following Sanitizer splat:

AddressSanitizer: dynamic-stack-buffer-overflow on address 0x7...
WRITE of size 2 at 0x7fffffffcbe4 thread T0
    #0 0x0000003a68b8 in __asan_memset (src/nft+0x3a68b8) (BuildId: 3678ff51a5405c77e3e0492b9a985910efee73b8)
    #1 0x0000004eb603 in __mpz_export_data src/gmputil.c:108:2
    #2 0x0000004eb603 in netlink_export_pad src/netlink.c:256:2
    #3 0x0000004eb603 in netlink_gen_range src/netlink.c:471:2
    #4 0x0000004ea250 in __netlink_gen_data src/netlink.c:523:10
    #5 0x0000004e8ee3 in alloc_nftnl_setelem src/netlink.c:205:3
    #6 0x0000004d4541 in mnl_nft_setelem_batch src/mnl.c:1816:11

Problem is that the range end is emitted to the buffer at the *padded*
location (rounded up to next register size), but buffer sizing is
based of the expression length, not the padded length.

Also extend the test script: Capture stderr and if we see
AddressSanitizer warning, make it fail.

Same bug as the one fixed in 600b84631410 ("netlink: fix stack buffer overflow with sub-reg sized prefixes"),
just in a different function.

Apply same fix: no dynamic array + add a range check.

Joint work with Pablo Neira Ayuso.

Signed-off-by: Florian Westphal <fw@strlen.de>

src: replace struct stmt_ops by type field in struct stmt

Shrink struct stmt in 8 bytes.

__stmt_ops_by_type() provides an operation for STMT_INVALID since this
is required by -o/--optimize.

There are many checks for stmt->ops->type, which is the most accessed
field, that can be trivially replaced.

BUG() uses statement type enum instead of name.

Similar to:

68e76238749f ("src: expr: add and use expr_name helper").
72931553828a ("src: expr: add expression etype")
2cc91e6198e7 ("src: expr: add and use internal expr_ops helper")

Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: print set element with multi-word description in single one line

If the set element:

- represents a mapping
- has a timeout
- has a comment
- has counter/quota/limit
- concatenation (already printed in a single line before this patch)

ie. if the set element requires several words, then print it in one
single line.

Before this patch:

table ip x {
      set y {
            typeof ip saddr
            counter
            elements = { 192.168.10.35 counter packets 0 bytes 0, 192.168.10.101 counter packets 0 bytes 0,
                         192.168.10.135 counter packets 0 bytes 0 }
      }
}

After this patch:

table ip x {
      set y {
            typeof ip saddr
            counter
            elements = { 192.168.10.35 counter packets 0 bytes 0,
192.168.10.101 counter packets 0 bytes 0,
                         192.168.10.135 counter packets 0 bytes 0 }
      }
}

Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: move interval flag compat check after set key evaluation

Without this, included bogon asserts with:
BUG: unhandled key type 13
nft: src/intervals.c:73: setelem_expr_to_range: Assertion `0' failed.

... because we no longer evaluate set->key/data.

Move the check to the tail of the function, right before assiging
set->existing_set, so that set->key has been evaluated.

Fixes: ceab53cee499 ("evaluate: don't allow merging interval set/map with non-interval one")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: don't allow merging interval set/map with non-interval one

Included bogon asserts with:
BUG: invalid data expression type range_value

Pablo says: "Reject because flags interval is lacking".
Make it so.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: fix expression data corruption

Sometimes nftables will segfault when doing error-unwind of the included
afl-generated bogon.

The problem is the unconditional write access to expr->set_flags in
expr_evaluate_map():

mappings->set_flags |= NFT_SET_MAP;

... but mappings can point to EXPR_VARIABLE (legal), where this will flip
a bit in unused, but allocated memory (i.e., has no effect).

In case of the bogon, mapping is EXPR_RANGE_SYMBOL, and the store can flip
a bit in identifier_range[1], this causes crash when the pointer is freed.

We can't use expr->set_flags unconditionally, so rework this to pass
set_flags as argument and place all read and write accesses in places where
we've made sure we are dealing with EXPR_SET.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: skip interval size tests on kernel that lack rbtree size fix

Skip these tests for older kernels.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_linearize: reduce register waste with non-constant binop expressions

Register use is not good with bitwise operations that involve three or
more selectors, eg.

mark set ip dscp and 0x3 or ct mark or meta mark
  [ payload load 1b @ network header + 1 => reg 1 ]
  [ bitwise reg 1 = ( reg 1 & 0x000000fc ) ^ 0x00000000 ]
  [ bitwise reg 1 = ( reg 1 >> 0x00000002 ) ]
  [ bitwise reg 1 = ( reg 1 & 0x00000003 ) ^ 0x00000000 ]
  [ ct load mark => reg 2 ]
  [ bitwise reg 1 = ( reg 1 | reg 2 ) ]
  [ meta load mark => reg 3 ]   <--- this could use register 2 instead!
  [ bitwise reg 1 = ( reg 1 | reg 3 ) ]
  [ meta set mark with reg 1 ]

register 3 is used to store meta mark, however, register 2 can be
already use since register 1 already stores the partial result of the
bitwise operation for this expression.

After this fix:

  [ payload load 1b @ network header + 1 => reg 1 ]
  [ bitwise reg 1 = ( reg 1 & 0x000000fc ) ^ 0x00000000 ]
  [ bitwise reg 1 = ( reg 1 >> 0x00000002 ) ]
  [ bitwise reg 1 = ( reg 1 & 0x00000003 ) ^ 0x00000000 ]
  [ ct load mark => reg 2 ]
  [ bitwise reg 1 = ( reg 1 | reg 2 ) ]
  [ meta load mark => reg 2 ]            <--- recycle register 2
  [ bitwise reg 1 = ( reg 1 | reg 2 ) ]
  [ meta set mark with reg 1 ]

Release source register in bitwise operation given destination register
already stores the partial result of the expression.

Extend tests/py to cover this.

Fixes: 54bfc38c522b ("src: allow binop expressions with variable right-hand operands")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: don't crash if range has same start and end interval

In this case, evaluation step replaces the range expression with a
single value and we'd crash as range->left/right contain garbage
values.

Simply replace the input expression with the evaluation result.

Also add a test case modeled on the afl reproducer.

Fixes: fe6cc0ad29cd ("evaluate: consolidate evaluation of symbol range expression")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: extend reset test case to cover interval set and map type

Make sure segtree processing doesn't drop associated stateful elements.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: incomplete output in get element command with maps

get element command displays an incomplete range.

Using this simple test ruleset:

table ip x {
        map y {
                typeof ip saddr : meta mark
                counter
                flags interval,timeout
                elements = { 1.1.1.1-1.1.1.10 timeout 10m : 20, 2.2.2.2-2.2.2.5 timeout 10m : 30}
        }

then, invoking the get element command:

# nft get element x y { 1.1.1.2 }

results in, before (incomplete output):

table ip x {
        map y {
                type ipv4_addr : mark
                flags interval,timeout
                elements = { 1.1.1.1 counter packets 0 bytes 0 timeout 10m expires 1m24s160ms : 0x00000014 }
        }
}

Note that it displays 1.1.1.1, instead of 1.1.1.1-1.1.1.10.

After this fix:

table ip x {
        map y {
                type ipv4_addr : mark
                flags interval,timeout
                elements = { 1.1.1.1-1.1.1.10 counter packets 0 bytes 0 timeout 10m expires 1m24s160ms : 0x00000014 }
        }
}

Fixes: a43cc8d53096 ("src: support for get element command")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: fix reset element support for interval set type

Running reset command yields on an interval (rbtree) set yields:
nft reset element inet filter rbtreeset {1.2.3.4}
BUG: unhandled op 8

This is easy to fix, CMD_RESET doesn't add or remove so it should be
treated like CMD_GET.

Unfortunately, this still doesn't work properly:

nft get element inet filter rbset {1.2.3.4}
returns:
... elements = { 1.2.3.4 }

but its expected that "get" and "reset" also return stateful objects
associated with the element. This works for other set types, but for
rbtree, the list of statements gets lost during segtree processing.

After fix, get/reset returns:
elements = { 1.2.3.4 counter packets 10 ...

A follow up patch will add a test case.

Fixes: 83e0f4402fb7 ("Implement 'reset {set,map,element}' commands")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinearize: support for bitfield payload statement with binary operation

Add a new function to deal with payload statement delinearization with
binop expression.

Infer the payload offset from the mask, then walk the template list to
determine if estimated offset falls within a matching header field. If
so, then validate that this is not a raw expression but an actual
bitfield matching. Finally, trim the payload expression length
accordingly and adjust the payload offset.

instead of:

@nh,8,5 set 0x0

it displays:

ip dscp and 0x1

Update tests/py to cover for this enhancement.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: support for bitfield payload statement with binary operation

Update bitfield payload statement support to allow for bitwise
and/or/xor updates. Adjust payload expression to fetch 16-bits for
mangling while leaving unmodified bits intact.

# nft --debug=netlink add rule x y ip dscp set ip dscp or 0x1
ip x y
   [ payload load 2b @ network header + 0 => reg 1 ]
   [ bitwise reg 1 = ( reg 1 & 0x0000fbff ) ^ 0x00000400 ]
   [ payload write reg 1 => 2b @ network header + 0 csum_type 1 csum_off 10 csum_flags 0x0 ]

Skip expr_evaluate_bits() transformation since these are only useful
for payload matching and set lookups.

Listing still shows a raw expression:

  # nft list ruleset
    ...
                    @nh,8,5 set 0x0

The follow up patch completes it:

  ("netlink_delinearize: support for bitfield payload statement with binary operation")

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1698
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: reject unsupported expressions in payload statement for bitfields

The payload statement evaluation pretends that it can handle any
expression for bitfields, but the existing evaluation code only knows
how to handle value expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: simplify payload statement evaluation for bitfields

Instead of allocating a lshift expression and relying on the binary
operation transfer propagate this to the mask value, lshift the mask
value immediately.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: release existing datatype when evaluating unary expression

Use __datatype_set() to release the existing datatype before assigning
the new one, otherwise ASAN reports the following memleak:

Direct leak of 104 byte(s) in 1 object(s) allocated from:
    #0 0x7fbc8a2b89cf in __interceptor_malloc ../../../../src/libsa
    #1 0x7fbc898c96c2 in xmalloc src/utils.c:31
    #2 0x7fbc8971a182 in datatype_clone src/datatype.c:1406
    #3 0x7fbc89737c35 in expr_evaluate_unary src/evaluate.c:1366
    #4 0x7fbc89758ae9 in expr_evaluate src/evaluate.c:3057
    #5 0x7fbc89726bd9 in byteorder_conversion src/evaluate.c:243
    #6 0x7fbc89739ff0 in expr_evaluate_bitwise src/evaluate.c:1491
    #7 0x7fbc8973b4f8 in expr_evaluate_binop src/evaluate.c:1600
    #8 0x7fbc89758b01 in expr_evaluate src/evaluate.c:3059
    #9 0x7fbc8975ae0e in stmt_evaluate_arg src/evaluate.c:3198
    #10 0x7fbc8975c51d in stmt_evaluate_payload src/evaluate.c:330

Fixes: faa6908fad60 ("evaluate: clone unary expression datatype to deal with dynamic datatype")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

doc: add mptcp to tcp option matching list

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

segtree: fix string data initialisation

This uses the wrong length.  This must re-use the length of the datatype,
not the string length.

The added test cases will fail without the fix due to erroneous
overlap detection, which in itself is due to incorrect sorting of
the elements.

Example error:
netlink: Error: interval overlaps with an existing one
add element inet testifsets simple_wild {  "2-1" } failed.
table inet testifsets {
      ...       elements = { "1-1", "abcdef*", "othername", "ppp0" }

... but clearly "2-1" doesn't overlap with any existing members.
The false detection is because of the "acvdef*" wildcard getting sorted
at the beginning of the list which is because its erronously initialised
as a 64bit number instead of 128 bits (16 bytes / IFNAMSIZ).

Fixes: 5e393ea1fc0a ("segtree: add string "range" reversal support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

expression: expr_build_udata_recurse should recurse

If we see EXPR_BINOP, recurse: ->left can be another EXPR_BINOP.

This is irrelevant for 'typeof' named sets, but for anonymous sets, the
key is derived from the concat expression that builds the lookup key for
the anonymous set.

tcp option mptcp subtype . ip daddr { mp-join. 10.0.0.1, ..

needs two binops back-to-back:

  [ exthdr load tcpopt 1b @ 30 + 2 => reg 1 ]
  [ bitwise reg 1 = ( reg 1 & 0x000000f0 ) ^ 0x00000000 ]
  [ bitwise reg 1 = ( reg 1 >> 0x00000004 ) ]

This bug prevents concat_expr_build_udata() from creating the userdata key
at load time.

When listing the rules, we get an assertion:
nft: src/mergesort.c:23: concat_expr_msort_value: Assertion `ilen > 0' failed.

because the set has a key with 0-length integers.

Signed-off-by: Florian Westphal <fw@strlen.de>

netlink_delinearize: also consider exthdr type when trimming binops

This allows trimming the binop for exthdrs, this will make nft render
   (tcp option mptcp unknown & 240) >> 4 . ip saddr @s1

as
    tcp option mptcp subtype . ip saddr @s1

Also extend the typeof set tests with a set concatenating a
sub-byte-sized exthdr expression with a payload one.

The additional call to expr_postprocess() is needed, without this,
typeof_sets_0.nft fails because
  frag frag-off @s4 accept

is shown as
meta nfproto ipv6 frag frag-off @s4 accept

Previouly, EXPR_EXTHDR would cause payload_binop_postprocess()
to return false which will then make the caller invoke
expr_postprocess(), but after handling EXPR_EXTHDR this doesn't happen
anymore.

Signed-off-by: Florian Westphal <fw@strlen.de>

expression: propagate key datatype for anonymous sets

set s {
  typeof tcp option mptcp subtype
  elements = { mp-join, dss }
}

is listed correctly. The set key provides the 'mptcpopt_subtype'
information and listing can print all elements with symbolic names.

In anon set case this doesn't work:
  tcp option mptcp subtype { mp-join, dss }

is printed as "... subtype { 1, 2}" because the anon set only provides
plain integer type.

This change propagates the datatype to the individual members of the
anon set.

After this change, multiple existing data types such as TYPE_ICMP_TYPE
could theoretically be replaced by integer-type aliases.

However, those datatypes are already exposed to userspace via the
'set type' keyword.  Thus removing them will break set definitions that
use them.

Signed-off-by: Florian Westphal <fw@strlen.de>

tcpopt: add symbol table for mptcp suboptions

nft can be used t match on specific multipath tcp subtypes:

  tcp option mptcp subtype 0

However, depending on which subtype to match, users need to look up the
type/value to use in rfc8684. Add support for mnemonics and
"nft describe tcp option mptcp subtype" to get the subtype list.

Because the number of unique 'enum datatypes' is limited by ABI contraints
this adds a new mptcp suboption type as integer alias.

After this patch, nft supports all of the following:
add element t s { mp-capable }
add rule t c tcp option mptcp subtype mp-capable
add rule t c tcp option mptcp subtype { mp-capable, mp-fail }

For the 3rd case, listing will break because unlike for named sets, nft
lacks the type information needed to pretty-print the integer values,
i.e. nft will print the 3rd rule as 'subtype { 0, 6 }'.

This is resolved in a followup patch.

Other problematic constructs are:
  set s1 {
    typeof tcp option mptcp subtype . ip saddr
    elements = { mp-fail . 1.2.3.4 }
  }

Followed by:
  tcp option mptcp subtype . ip saddr @s1

nft will print this as:
  tcp option mptcp unknown & 240) >> 4 . ip saddr @s1

All of these issues are not related to this patch, however, they also occur
with other bit-sized extheader fields.

Signed-off-by: Florian Westphal <fw@strlen.de>

payload: don't kill dependency for proto_th

proto_th carries no information about the proto number, we need to
preserve the L4 protocol expression unless we can be sure that

For example, if "meta l4proto 91 @th,0,16 0" is simplified to
"th sport 0", the information of protocol number is lost.

Based on initial patch from Xiao Liang.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: remove temporary file

0002-relative leaves a temporary file in the current working
directory, at the time the "trap" argument is expanded, tmpfile2
isn't set.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: add atomic chain replace test

Add a test that replaces one base chain and check that no
filtered packets make it through, i.e. that the 'old chain'
doesn't disappear before new one is active.

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

build: add hint for a2x error message

Display:

a2x not found, please install asciidoc, or pass --disable-man-doc

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

payload: honor inner payload description in payload_expr_cmp()

payload comparison must consider inner_desc.

No test update because I could not find any specific bug related to
this. I found it through source code inspection.

Fixes: 772892a018b4 ("src: add vxlan matching support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

payload: return early if dependency is not a payload expression

if (dep->left->payload.base != PROTO_BASE_TRANSPORT_HDR)

is legal only after checking that ->left points to an
EXPR_PAYLOAD expression. The dependency store can also contain
EXPR_META, in this case we access a bogus part of the union.

The payload_may_dependency_kill_icmp helper can't handle a META
dep either, so return early.

Fixes: 533565244d88 ("payload: check icmp dependency before removing previous icmp expression")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

payload: remove double-store

This assignment was duplicated.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: consolidate evaluation of symbol range expression

Expand symbol_range to range expression to consolidate evaluation.

I found a bug when testing for negative ranges:

  test.nft:5:16-30: Error: Could not process rule: File exists
                  elements = { 1.1.1.1-1.1.1.0 }
                               ^^^^^^^^^^^^^^^

after this patch, error reporting has been restored:

  test.nft:5:16-30: Error: Range negative size
                  elements = { 1.1.1.1-1.1.1.0 }
                               ^^^^^^^^^^^^^^^

Fixes: 347039f64509 ("src: add symbol range expression to further compact intervals")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: optimize zero length range

A rule like the following:

... tcp dport 22-22 ...

results in a range expression to match from 22 to 22.

Simplify to singleton value so a cmp is used instead.

This optimization already exists in set elements which might explain
this overlook.

Fixes: 7a6e16040d65 ("evaluate: allow for zero length ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: random interval set with size

Generate a random set with intervals to validate set size, try one that
should fit and then subtract one to set size and retry.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: interval sets with size

Exercise size in set with intervals (rbtree), including corner cases
such as 0.0.0.0 and 255.255.255.255 (half-open interval).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

fib: Change data type of fib oifname to "ifname"

Change data type of fib oifname from "string" to "ifname", so that it
can be matched against a set of ifnames:

    set x {
            type ifname
    }
    chain y {
            fib saddr oifname @x drop
    }

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

parser_bison: get rid of unneeded statement

Was used for the legacy flow statement, but that was removed in
2ee93ca27ddc ("parser_bison: remove deprecated flow statement")

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: auto-merge is only available for singleton interval sets

auto-merge is only available to interval sets with one value only,
untoggle this flag for concatenation with intervals.

Later, this can be hardened to reject it.

Fixes: 30f667920601 ("src: add 'auto-merge' option to sets")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_linearize: use range expression for OP_EQ and OP_IMPLICIT

range expression is available since v4.9-rc1~127^2~67^2~3, replace the
two cmp expression when generating netlink bytecode.

Code to delinearize the two cmp expressions to represent the range
remains in place for backwards compatibility.

The delinearize path to parse range expressions with NFT_OP_EQ is
already present since:

3ed932917cc7 ("src: use new range expression for != [a,b] intervals")

Update tests/py payload accordingly, json tests need no update since
they already use the range to represent them.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add symbol range expression to further compact intervals

Update parser to use a new symbol range expression with smaller memory
footprint than range expression + two symbol expressions.

The evaluation step translates this into EXPR_RANGE_VALUE for interval
sets.

Note that maps or concatenations still use the less compact range
expressions representation, those require more work to use this new
symbol range expression. The parser also uses the classic range
expression if variables are used.

Testing with a 100k intervals, worst case scenario: no prefix or
singleton elements. This shows a reduction from 49.58 Mbytes to
35.47 Mbytes (-29.56% memory footprint for this case).

This follow up work to previous commits:

91dc281a82ea ("src: rework singleton interval transformation to reduce memory consumption")
c9ee9032b0ee ("src: add EXPR_RANGE_VALUE expression and use it")

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: compact and simplify list and reset syntax

Works:
list sets
list sets inet
list sets table inet foo

Doesn't work:
list sets inet foo

Same for "list counters", "list quotas", etc.

"reset" keyword however supports this:
reset counters inet foo

and aliased this to
reset counters table inet foo

This is inconsistent and not inuitive.

Moreover, unlike "list sets", "list maps" only supported "list maps" and
"list maps inet", without the ability to only list maps of a given table.

Compact this to unify the syntax so it becomes possible to omit the "table"
keyword for either reset or list mode.

flowtables, secmarks and synproxys keywords are updated too. "flow table"
and "meters" are NOT changed since both of these are deprecated in favor
of standard nft sets.

Reported-by: Slavko <linux@slavino.sk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: turn redundant ip option type field match into boolean

The ip option expression allows for non-sense matching like:

ip option lsrr type 1

because 'lsrr' already provides the type field, this never results in a
matching.

Turn this expression into:

ip option lsrr exists

And update documentation to hide this redundant type field.

Fixes: 226a0e072d5c ("exthdr: add support for matching IPv4 options")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipopt: use ipv4 address datatype for address field in ip options

So user does not have to play integer arithmetics to match on IPv4
address.

Before:

# nft describe ip option lsrr addr
exthdr expression, datatype integer (integer), 32 bits

After:

# nft describe ip option lsrr addr
exthdr expression, datatype ipv4_addr (IPv4 address) (basetype integer), 32 bits

Fixes: 226a0e072d5c ("exthdr: add support for matching IPv4 options")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

datatype: clamp boolean value to 0 and 1

If user provides a numeric value larger than 0 or 1, match never
happens:

# nft --debug=netlink add rule x y tcp option sack-perm 4
ip x y
  [ exthdr load tcpopt 1b @ 4 + 0 present => reg 1 ]
  [ cmp eq reg 1 0x00000004 ]

After this update:

# nft --debug=netlink add rule x y tcp option sack-perm 4
ip x y
  [ exthdr load tcpopt 1b @ 4 + 0 present => reg 1 ]
  [ cmp eq reg 1 0x00000001 ]

This is to address a rare corner case, in case user specifies the
boolean value through the integer base type.

Fixes: 9fd9baba43c8 ("Introduce boolean datatype and boolean expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

exthdr: incomplete type 2 routing header definition

Add missing type 2 routing header definition.

Listing is not correct because these IPv6 extension header are still
lacking context to properly delinearize the listing, but at least this
does not crash anymore.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: delete netdev chain after test

This update is needed for kernel patch:

("netfilter: nf_tables: Tolerate chains with no remaining hooks")

otherwise this hits DUMP FAILED in newer kernels.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: extend raw payload match tests

Add more test cases to exercise binop elimination for raw
payload matches.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add and use payload_expr_trim_force

Previous commit fixed erroneous handling of raw expressions when RHS sets
a zero value.

Input: @ih,58,6 set 0 @ih,86,6 set 0 @ih,170,22 set 0
Output:@ih,48,16 set @ih,48,16 & 0xffc0 @ih,80,16 set \
@ih,80,16 & 0xfc0f @ih,160,32 set @ih,160,32 & 0xffc00000

After this patch, this will instead display:

@ih,58,6 set 0x0 @ih,86,6 set 0x0 @ih,170,22 set 0x0

payload_expr_trim_force() only works when the payload has no known
protocol (template) attached, i.e. will be printed as raw payload syntax.

It performs sanity checks on @mask and then adjusts the payload expression
length and offset according to the mask.

Also add this check in __binop_postprocess() so we can also discard masks
when matching, e.g.

'@ih,7,5 2' becomes '@ih,7,5 0x2', not '@ih,0,16 & 0xffc0 == 0x20'.

binop_postprocess now returns if it performed an action or not; if this
returns true then arguments might have been freed so callers must no longer
refer to any of the expressions attached to the binop.

Next patch adds test cases for this.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinarize: fix bogus munging of mask value

Given following input:
table ip t {
chain c {
  @ih,58,6 set 0 @ih,86,6 set 0 @ih,170,22 set 0
}
}

nft will produce following output:
chain c {
@ih,48,16 set @ih,48,16 & 0x3f @ih,80,16 set @ih,80,16 & 0x3f0 @ih,160,32 set @ih,160,32 & 0x3fffff
}

The input side is correct, the generated expressions sent to kernel are:

1  [ payload load 2b @ inner header + 6 => reg 1 ]
2  [ bitwise reg 1 = ( reg 1 & 0x0000c0ff ) ^ 0x00000000 ]
3  [ payload write reg 1 => 2b @ inner header + 6 .. ]
4  [ payload load 2b @ inner header + 10 => reg 1 ]
5  [ bitwise reg 1 = ( reg 1 & 0x00000ffc ) ^ 0x00000000 ]
6  [ payload write reg 1 => 2b @ inner header + 10 .. ]
7  [ payload load 4b @ inner header + 20 => reg 1 ]
8  [ bitwise reg 1 = ( reg 1 & 0x0000c0ff ) ^ 0x00000000 ]
9  [ payload write reg 1 => 4b @ inner header + 20 .. ]

@ih,58,6 set 0 <- Zero 6 bits, starting with bit 58

Changes to inner header mandate a checksum update, which only works for
even byte counts (except for last byte in the payload).

Thus, we load 2b at offet 6. (16bits, offset 48).

Because we want to zero 6 bits, we need a mask that retains 10 bits and
clears 6: b1111111111000000 (first 8 bit retains 48-57, last 6 bit clear
58-63).  The '0xc0ff' is not correct, but thats because debug output comes
from libnftnl which prints values in host byte order, the value will be
interpreted as big endian on kernel side, so this will do the right thing.

Next, same problem:

@ih,86,6 set 0 <- Zero 6 bits, starting with bit 86.

nft needs to round down to even-sized byte offset, 10, then retain first
6 bits (80 + 6 == 86), then clear 6 bits (86-91), then keep 4 more as-is
(92-95).

So mask is 0xfc0f (in big endian) would be correct (b1111110000001111).

Last expression, @ih,170,22 set 0, asks to clear 22 bits starting with bit
170, nft correctly rounds this down to a 32 bit read at offset 160.

Required mask keeps first 10 bits, then clears 22
(b11111111110000000000000000000000).  Required mask would be 0xffc00000,
which corresponds to the wrong-endian-printed value in line 8 above.

Now that we convinced ourselves that the input side is correct, fix up
netlink delinearize to undo the mask alterations if we can't find a
template to print a human-readable payload expression.

With this patch, we get this output:

  @ih,48,16 set @ih,48,16 & 0xffc0 @ih,80,16 set @ih,80,16 & 0xfc0f @ih,160,32 set @ih,160,32 & 0xffc00000

... which isn't ideal.  We should fixup the payload expression to display
the same output as the input, i.e. adjust payload->len and offset as per
mask and discard the mask instead.

This will be done in a followup patch.

Fixes: 50ca788ca4d0 ("netlink: decode payload statment")
Reported-by: Sunny73Cr <Sunny73Cr@protonmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: allow to re-use existing metered set

Blamed commit translates old meter syntax (which used to allocate an
anonymous set) to dynamic sets.

A side effect of this is that re-adding a meter rule after chain was
flushed results in an error, unlike anonymous sets named sets are not
impacted by the flush.

Refine this: if a set of the same name exists and is compatible, then
re-use it instead of returning an error.

Also pick up the reproducer kindly provided by the reporter and place it
in the shell test directory.

Fixes: b8f8ddfff733 ("evaluate: translate meter into dynamic set")
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: use mount --bind to change cgroupsv2 root

Instead of remount which fails with SKIP in one of my test boxes because
cgroupsv2 rootfs is busy.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: remove variable shadowing

unsigned int i is already declared in resolve_ll_protocol_conflict(),
remove it.

Fixes: 3a734d608131 ("evaluate: don't assert on net/transport header conflict")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

scanner: better error reporting for CRLF line terminators

Provide a hint to users that file is coming with CRLF line terminators,
maybe from a non-Linux OS.

Extend scanner.l to provide hint on CRLF in files:

# file test.nft
test.nft: ASCII text, with CRLF, LF line terminators
# nft -f test.nft
test.nft:1:13-14: Error: syntax error, unexpected CRLF line terminators
table ip x {
^^

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: rework singleton interval transformation to reduce memory consumption

set_to_intervals() expands range expressions into a list of singleton
elements before building the netlink message that is sent to userspace.
This is because the kernel expects this list of singleton elements where
EXPR_F_INTERVAL_END denotes a closing interval. This expansion
significantly increases memory consumption in userspace.

This patch updates the logic to transform the range expression up to two
temporary singleton element expressions through setelem_to_interval().
Then, these two elements are used to allocate the nftnl_set_elem objects
through alloc_nftnl_setelem_interval() to build the netlink message,
finally all these temporary objects are released. For anonymous sets,
when adjacent ranges are found, the end element is not added to the set
to pack the set representation as in the original set_to_intervals()
routine.

After this update, set_to_intervals() only deals with adding the
non-matching all zero element to the interval set when it is not there
as the kernel expects.

In combination with the new EXPR_RANGE_VALUE expression, this shrinks
runtime userspace memory consumption from 70.50 Mbytes to 43.38 Mbytes
for a 100k intervals set sample.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

mnl: do not send set size when set is constant set

When turning element range into the interval representation based on
singleton elements for the rbtree tree set backend, userspace adjusts
the size to the internal kernel implementation.

For constant sets, this is leaking an internal kernel implementation
detail that is fixed by kernel patch ("netfilter: nf_tables: fix set
size with rbtree backend"). For non-constant sets, set size is just
broken.

This patch is required by the follow up patch ("src: rework singleton
interval transformation to reduce memory consumption").

On top of this, constant sets cannot be updated once they are bound, set
size is not useful in this case. Remove this implicit set size for
constant sets.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

mnl: rename list of expression in mnl_nft_setelem_batch()

Rename set to init to prepare to pass struct set to this function in
the follow up patch. No functional changes are intended.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: constify set_is_non_concat_range()

This is read-only, constify it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add EXPR_RANGE_VALUE expression and use it

set element with range takes 4 instances of struct expr:

  EXPR_SET_ELEM -> EXPR_RANGE -> (2) EXPR_VALUE

where EXPR_RANGE represents two references to struct expr with constant
value.

This new EXPR_RANGE_VALUE trims it down to two expressions:

  EXPR_SET_ELEM -> EXPR_RANGE_VALUE

with two direct low and high values that represent the range:

  struct {
      mpz_t low;
      mpz_t high;
  };

this two new direct values in struct expr do not modify its size.

setelem_expr_to_range() translates EXPR_RANGE to EXPR_RANGE_VALUE, this
conversion happens at a later stage.

constant_range_expr_print() translates this structure to constant values
to reuse the existing datatype_print() which relies in singleton values.

The automerge routine has been updated to use EXPR_RANGE_VALUE.

This requires a follow up patch to rework the conversion from range
expression to singleton element to provide a noticeable memory
consumption reduction.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

intervals: do not merge intervals with different timeout

If timeout/expiration of contiguous intervals is different, then do not
merge them.

Fixes: 81e36530fcac ("src: replace interval segment tree overlap and automerge")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

intervals: add helper function to set previous element

Add helper function to set previous element during the automerge
iteration. No functional changes are intended.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: make cmd_free(NULL) valid

bison uses cmd_free($$) as destructor, but base_cmd can
set it to NULL, e.g.

  |       ELEMENT         set_spec        set_block_expr
  {
    if (nft_cmd_collapse_elems(CMD_ADD, state->cmds, &$2, $3)) {
       handle_free(&$2);
       expr_free($3);
       $$ = NULL;   // cmd set to NULL
       break;
    }
    $$ = cmd_alloc(CMD_ADD, CMD_OBJ_ELEMENTS, &$2, &@$, $3);

expr_free(NULL) is legal, cmd_free() causes crash.  So just allow
this to avoid cluttering parser_bison.y with "if ($$)".

Also add the afl-generated bogon input to the test files.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: fix UaF when reporting table parse error

It passed already-freed memory to erec function. Found with afl++ and asan.

Fixes: 4955ae1a81b7 ("Add support for table's persist flag")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: add cgroupv2 socket match test case

This is a test case for nft_socket cgroupv2 matching, including
support for matching inside a cgroupv2 mount space added in kernel
commit 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces").

Test is thus run twice, once in the initial namespace and once with
a changed cgroupv2 root.

In case we can't create a cgroup or the 2nd half (unshared re-run)
fails, indicate SKIP.

Signed-off-by: Florian Westphal <fw@strlen.de>

main: prepend error tag to printed errors when parsing options

Prepend "Error: " tag to printed errors in the option parsing step.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

libnftables: include canonical path to avoid duplicates

A recent commit adds base directory of -f/--filename to include paths by
default to address a silly use of -I/--include to make this work:

  # nft -I /path/to -f /path/to/main.nft

instead users can simply invoke:

  # nft -f /path/to/main.nft

because /path/to/ is added at the end of the list of include paths.

This example above assumes main.nft includes more files that are
contained in /path/to/.

However, globbing can cause duplicates after this recent update, eg.

  # cat test/main
  table inet test {
        chain test {
                include "include/*";
        }
  }
  # nft -I /tmp/test/ -f test/main

because /tmp/test and test/ twice refer to the same directory and both
are added to the list of include path.

Use realpath() to canonicalize include paths. Then, search and skip
duplicated include paths.

Fixes: 302e9f8b3a13 ("libnftables: add base directory of -f/--filename to include path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: shrink line_offset in struct location to 4 bytes

line_offset of 2^32 bytes should be enough.

This requires the removal of the last_line field (in a previous patch) to
shrink struct expr to 112 bytes.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: remove last_line from struct location

This 4 bytes field is never used, remove it.

This does not shrink struct location in x86_64 due to alignment.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: remove unused token_offset from struct location

This saves 8 bytes in x86_64 in struct location which is embedded in
every expression.

This shrinks struct expr to 120 bytes according to pahole.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>