git.ipfire.org Git - thirdparty/nftables.git/log

evaluate: move interval flag compat check after set key evaluation

commit 3e50cd6b063d64c2e72b0e32bc36dd5a22f75c06 upstream.

Without this, included bogon asserts with:
BUG: unhandled key type 13
nft: src/intervals.c:73: setelem_expr_to_range: Assertion `0' failed.

... because we no longer evaluate set->key/data.

Move the check to the tail of the function, right before assiging
set->existing_set, so that set->key has been evaluated.

Fixes: ceab53cee499 ("evaluate: don't allow merging interval set/map with non-interval one")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: don't allow merging interval set/map with non-interval one

commit ceab53cee4999debd64ab29414b918746209ba7b upstream.

Included bogon asserts with:
BUG: invalid data expression type range_value

Pablo says: "Reject because flags interval is lacking".
Make it so.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: do not allow to chain more than 16 binops

commit dcb199544563ded462cb7151134278f82a9e6cfd upstream.

netlink_linearize.c has never supported more than 16 chained binops.
Adding more is possible but overwrites the stack in
netlink_gen_bitwise().

Add a recursion counter to catch this at eval stage.

Its not enough to just abort once the counter hits
NFT_MAX_EXPR_RECURSION.

This is because there are valid test cases that exceed this.
For example, evaluation of 1 | 2 will merge the constans, so even
if there are a dozen recursive eval calls this will not end up
with large binop chain post-evaluation.

v2: allow more than 16 binops iff the evaluation function
did constant-merging.

Signed-off-by: Florian Westphal <fw@strlen.de>

parser_bison: ensure all timeout policy names are released

commit 86a496928420046e9d32317f09db050e8351b10e upstream.

We need to add a custom destructor for this structure, it
contains the dynamically allocated names.

a:5:55-55: Error: syntax error, unexpected '}', expecting string
policy = { estabQisheestablished : 2m3s, cd : 2m3s, }

==562373==ERROR: LeakSanitizer: detected memory leaks

Indirect leak of 160 byte(s) in 2 object(s) allocated from:
    #1 0x5a565b in xmalloc src/utils.c:31:8
    #2 0x5a565b in xzalloc src/utils.c:70:8
    #3 0x3d9352 in nft_parse_bison_filename src/libnftables.c:520:8
[..]

Fixes: c7c94802679c ("src: add ct timeout support")
Signed-off-by: Florian Westphal <fw@strlen.de>

json: make sure timeout list is initialised

commit 0298bc012e020b2fca8ecc60b0091798d091e1fd upstream.

On parser error, obj_free will iterate this list.
Included json bogon crashes due to null deref because
list head initialisation did not yet happen.

Fixes: c82a26ebf7e9 ("json: Add ct timeout support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: fix stack buffer overrun when emitting ranged expressions

commit 37dfb1972cae061c09f278933af998a7c4fc2696 upstream.

Included bogon input generates following Sanitizer splat:

AddressSanitizer: dynamic-stack-buffer-overflow on address 0x7...
WRITE of size 2 at 0x7fffffffcbe4 thread T0
    #0 0x0000003a68b8 in __asan_memset (src/nft+0x3a68b8) (BuildId: 3678ff51a5405c77e3e0492b9a985910efee73b8)
    #1 0x0000004eb603 in __mpz_export_data src/gmputil.c:108:2
    #2 0x0000004eb603 in netlink_export_pad src/netlink.c:256:2
    #3 0x0000004eb603 in netlink_gen_range src/netlink.c:471:2
    #4 0x0000004ea250 in __netlink_gen_data src/netlink.c:523:10
    #5 0x0000004e8ee3 in alloc_nftnl_setelem src/netlink.c:205:3
    #6 0x0000004d4541 in mnl_nft_setelem_batch src/mnl.c:1816:11

Problem is that the range end is emitted to the buffer at the *padded*
location (rounded up to next register size), but buffer sizing is
based of the expression length, not the padded length.

Also extend the test script: Capture stderr and if we see
AddressSanitizer warning, make it fail.

Same bug as the one fixed in 600b84631410 ("netlink: fix stack buffer overflow with sub-reg sized prefixes"),
just in a different function.

Apply same fix: no dynamic array + add a range check.

Joint work with Pablo Neira Ayuso.

Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: don't allow nat map with specified protocol

commit 43cf4a2973ee9e3ab20edce47c6a054485707592 upstream.

Included bogon asserts:
src/netlink_linearize.c:1305: netlink_gen_nat_stmt: Assertion `stmt->nat.proto == NULL' failed.

The comment right above the assertion says:
  nat_stmt evaluation step doesn't allow
  STMT_NAT_F_CONCAT && stmt->nat.proto.

... except it does allow it.  Disable this.

Fixes: c68314dd4263 ("src: infer NAT mapping with concatenation from set")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: move flowtable with bogus priority to correct location

commit 156b5936b3b7a0b1ee590a02874beaba5235f758 upstream.

This is an input file to be processed by "assert_failures" script.

Fixes: b40bebbcee36 ("rule: do not crash if to-be-printed flowtable lacks priority")
Signed-off-by: Florian Westphal <fw@strlen.de>

netlink: Avoid crash upon missing NFTNL_OBJ_CT_TIMEOUT_ARRAY attribute

commit 2a38f458f12bc032dac1b3ba63f95ca5a3c03fbd upstream.

If missing, the memcpy call ends up reading from address zero.

Fixes: c7c94802679cd ("src: add ct timeout support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: Fix for potential crash parsing a flowtable

commit d5ef04441eb1de3efc27aa70193fe3d7f0b5c408 upstream.

Kernel's flowtable message might not contain the
NFTA_FLOWTABLE_HOOK_DEVS attribute. In that case, nftnl_flowtable_get()
will return NULL for the respective nftnl attribute.

Fixes: db0697ce7f602 ("src: support for flowtable listing")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: Do not allocate a bogus flowtable priority expr

commit 10b9a85b3278e0933bf47226588fede8c9fcbcc8 upstream.

Code accidentally treats missing NFTNL_FLOWTABLE_PRIO attribute as zero
prio value which may not be correct.

Fixes: db0697ce7f602 ("src: support for flowtable listing")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

json: prevent null deref if chain->policy is not set

commit 69b90023c7220fe283ee38686c758e3494e853d9 upstream.

The two commits mentioned below resolved null dererence crashes when the
policy resp. priority keyword was missing in the chain/flowtable
specification.

Same issue exists in the json output path, so apply similar fix there
and extend the existing test cases.

Fixes: 5b37479b42b3 ("nftables: don't crash in 'list ruleset' if policy is not set")
Fixes: b40bebbcee36 ("rule: do not crash if to-be-printed flowtable lacks priority")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Phil Sutter <phil@nwl.cc>

optimize: invalidate merge in case of duplicated key in set/map

commit tests/shell/testcases/optimizations/nomerge_vmap upstream.

-o/--optimize results in EEXIST error when merging two rules that lead
to ambiguous set/map, for instance:

table ip x {
        chain v4icmp {}
        chain v4icmpc {}

        chain y {
                ip protocol icmp jump v4icmp
                ip protocol icmp goto v4icmpc
        }
}

which is not possible because duplicated keys are not possible in
set/map. This is how it shows when running a test:

Merging:
testcases/sets/dumps/sets_with_ifnames.nft:56:3-30:            ip protocol icmp jump v4icmp
testcases/sets/dumps/sets_with_ifnames.nft:57:3-31:            ip protocol icmp goto v4icmpc
into:
       ip protocol vmap { icmp : jump v4icmp, icmp : goto v4icmpc }
internal:0:0-0: Error: Could not process rule: File exists

Add a new step to compare rules that are candidate to be merged to
detect colissions in set/map keys in order to skip them in the next
final merging step.

Add tests/shell unit to improve coverage.

Fixes: fb298877ece2 ("src: add ruleset optimization infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: expand expression list when merging into concatenation

commit 0d17d28bb06bf2a04862d5cd879a14bcb9a2d2dc upstream.

The following rules:

    udp dport 137 ct state new,untracked accept
    udp dport 138 ct state new,untracked accept

results in:

  nft: src/optimize.c:670: __merge_concat: Assertion `0' failed.

The logic to expand to the new,untracked list in the concatenation is
missing.

Fixes: 187c6d01d357 ("optimize: expand implicit set element when merging into concatenation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: compact bitmask matching in set/map

commit 447ac8a3e13f4706b0900d26c5c89dfcaa6773aa upstream.

Check if right hand side of relational is a bitmask, ie.

     relational
       /   \
    ...     or
           /  \
       value   or
              /  \
         value    value

then, if left hand side is a binop expression, compare left and right
hand sides (not only left hand of this binop expression) to check for
redundant matches in consecutive rules, ie.

        relational
          /   \
       and     ...
      /   \
payload  value

before this patch, only payload in the binop expression was compared.

This allows to compact several rules matching tcp flags in a set/map, eg.

# nft -c -o -f ruleset.nft
Merging:
ruleset.nft:7:17-76:                 tcp flags & (fin | syn | rst | ack | urg) == fin | ack | urg
ruleset.nft:8:17-70:                 tcp flags & (fin | syn | rst | ack | urg) == fin | ack
ruleset.nft:9:17-64:                 tcp flags & (fin | syn | rst | ack | urg) == fin
ruleset.nft:10:17-70:                 tcp flags & (fin | syn | rst | ack | urg) == syn | ack
ruleset.nft:11:17-64:                 tcp flags & (fin | syn | rst | ack | urg) == syn
ruleset.nft:12:17-70:                 tcp flags & (fin | syn | rst | ack | urg) == rst | ack
ruleset.nft:13:17-64:                 tcp flags & (fin | syn | rst | ack | urg) == rst
ruleset.nft:14:17-70:                 tcp flags & (fin | syn | rst | ack | urg) == ack | urg
ruleset.nft:15:17-64:                 tcp flags & (fin | syn | rst | ack | urg) == ack
into:
        tcp flags & (fin | syn | rst | ack | urg) == { fin | ack | urg, fin | ack, fin, syn | ack, syn, rst | ack, rst, ack | urg, ack }
Merging:
ruleset.nft:17:17-61:                 tcp flags & (ack | urg) == ack jump ack_chain
ruleset.bft:18:17-61:                 tcp flags & (ack | urg) == urg jump urg_chain
into:
        tcp flags & (ack | urg) vmap { ack : jump ack_chain, urg : jump urg_chain }

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: incorrect comparison for reject statement

commit abab6e60c755aef7e1ab9d3320effa714a0b49e2 upstream.

Logic is reverse, this should returns false if the compared reject
expressions are not the same.

Fixes: 38d48fe57fff ("optimize: fix reject statement")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

expression: tolerate named set protocol dependency

commit b00fc8cd1379f6e403538943d55d297b624f185b upstream.

Included test will fail with:
/dev/stdin:8:38-52: Error: Transparent proxy support requires transport protocol match
   meta l4proto @protos tproxy to :1088
                        ^^^^^^^^^^^^^^^
Tolerate a set reference too.  Because the set can be empty (or there
can be removals later), add a fake 0-rhs value.

This will make pctx_update assign proto_unknown as the transport protocol
in use, Thats enough to avoid 'requires transport protocol' error.

v2: restrict it to meta lhs for now (Pablo Neira Ayuso)

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1686
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: bogus concatenated set ranges with netlink message overrun

commit 2fbade3cd9900fe7f87ac660b6ac44544e238206 upstream.

When building each component of the set element key, a late byteorder
switch is performed to ensure that all components in the interval are
represented in big endian, as required by the pipapo backend.

In case that the set element does not fit into the netlink message, the
byteorder switch happens twice, leading to inserting an element with a
bogus component with large sets, so instead:

"lo" . 00:11:22:33:44:55 . 10.1.2.3 comment "123456789012345678901234567890"

listing reports:

16777216 . 00:11:22:33:44:55 . 10.1.2.3 comment "123456789012345678901234567890"

Note that 16777216 is 0x1000000, which should instead be 0x00000001 to
represent "lo" as u32.

Fix this by switching the value in a temporary variable and use it to
set the set element key attribute in the netlink message.

Later, revisit this to perform this byteorder switch from evaluation
step.

Add tests/shell unit to cover for this bug.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1792
Fixes: 8ac2f3b2fca3 ("src: Add support for concatenated set ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: incomplete output in get element command with maps

commit 6db28b2d71e7f61c64338787be5d82edfdb62a21 upstream.

get element command displays an incomplete range.

Using this simple test ruleset:

table ip x {
        map y {
                typeof ip saddr : meta mark
                counter
                flags interval,timeout
                elements = { 1.1.1.1-1.1.1.10 timeout 10m : 20, 2.2.2.2-2.2.2.5 timeout 10m : 30}
        }

then, invoking the get element command:

# nft get element x y { 1.1.1.2 }

results in, before (incomplete output):

table ip x {
        map y {
                type ipv4_addr : mark
                flags interval,timeout
                elements = { 1.1.1.1 counter packets 0 bytes 0 timeout 10m expires 1m24s160ms : 0x00000014 }
        }
}

Note that it displays 1.1.1.1, instead of 1.1.1.1-1.1.1.10.

After this fix:

table ip x {
        map y {
                type ipv4_addr : mark
                flags interval,timeout
                elements = { 1.1.1.1-1.1.1.10 counter packets 0 bytes 0 timeout 10m expires 1m24s160ms : 0x00000014 }
        }
}

Fixes: a43cc8d53096 ("src: support for get element command")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: fix string data initialisation

commit 63e3d5953c144abbc4ead2665ad7cec799c4cb64 upstream.

This uses the wrong length.  This must re-use the length of the datatype,
not the string length.

The added test cases will fail without the fix due to erroneous
overlap detection, which in itself is due to incorrect sorting of
the elements.

Example error:
netlink: Error: interval overlaps with an existing one
add element inet testifsets simple_wild {  "2-1" } failed.
table inet testifsets {
      ...       elements = { "1-1", "abcdef*", "othername", "ppp0" }

... but clearly "2-1" doesn't overlap with any existing members.
The false detection is because of the "acvdef*" wildcard getting sorted
at the beginning of the list which is because its erronously initialised
as a 64bit number instead of 128 bits (16 bytes / IFNAMSIZ).

Fixes: 5e393ea1fc0a ("segtree: add string "range" reversal support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: release existing datatype when evaluating unary expression

commit 494a6ed120065b764f07acd05789b816625e8e13 upstream.

Use __datatype_set() to release the existing datatype before assigning
the new one, otherwise ASAN reports the following memleak:

Direct leak of 104 byte(s) in 1 object(s) allocated from:
    #0 0x7fbc8a2b89cf in __interceptor_malloc ../../../../src/libsa
    #1 0x7fbc898c96c2 in xmalloc src/utils.c:31
    #2 0x7fbc8971a182 in datatype_clone src/datatype.c:1406
    #3 0x7fbc89737c35 in expr_evaluate_unary src/evaluate.c:1366
    #4 0x7fbc89758ae9 in expr_evaluate src/evaluate.c:3057
    #5 0x7fbc89726bd9 in byteorder_conversion src/evaluate.c:243
    #6 0x7fbc89739ff0 in expr_evaluate_bitwise src/evaluate.c:1491
    #7 0x7fbc8973b4f8 in expr_evaluate_binop src/evaluate.c:1600
    #8 0x7fbc89758b01 in expr_evaluate src/evaluate.c:3059
    #9 0x7fbc8975ae0e in stmt_evaluate_arg src/evaluate.c:3198
    #10 0x7fbc8975c51d in stmt_evaluate_payload src/evaluate.c:330

Fixes: faa6908fad60 ("evaluate: clone unary expression datatype to deal with dynamic datatype")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

payload: return early if dependency is not a payload expression

commit 50f45c004adbab6a077609088becf62d2651101f upstream.

if (dep->left->payload.base != PROTO_BASE_TRANSPORT_HDR)

is legal only after checking that ->left points to an
EXPR_PAYLOAD expression. The dependency store can also contain
EXPR_META, in this case we access a bogus part of the union.

The payload_may_dependency_kill_icmp helper can't handle a META
dep either, so return early.

Fixes: 533565244d88 ("payload: check icmp dependency before removing previous icmp expression")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: optimize zero length range

commit deda274293f80f9718de4cbb416bd2b2bf296709 upstream.

A rule like the following:

... tcp dport 22-22 ...

results in a range expression to match from 22 to 22.

Simplify to singleton value so a cmp is used instead.

This optimization already exists in set elements which might explain
this overlook.

Fixes: 7a6e16040d65 ("evaluate: allow for zero length ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

fib: Change data type of fib oifname to "ifname"

commit 316d99246644268e5e0453afa3ade163fda21d7f upstream.

Change data type of fib oifname from "string" to "ifname", so that it
can be matched against a set of ifnames:

    set x {
            type ifname
    }
    chain y {
            fib saddr oifname @x drop
    }

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: auto-merge is only available for singleton interval sets

commit 65382b888e266e2e3d49a418073fd76dcc4815a7 upstream.

auto-merge is only available to interval sets with one value only,
untoggle this flag for concatenation with intervals.

Later, this can be hardened to reject it.

Fixes: 30f667920601 ("src: add 'auto-merge' option to sets")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: turn redundant ip option type field match into boolean

commit f9a48ce2f9c252bf74d98d10412b1f72585a45ec upstream.

The ip option expression allows for non-sense matching like:

ip option lsrr type 1

because 'lsrr' already provides the type field, this never results in a
matching.

Turn this expression into:

ip option lsrr exists

And update documentation to hide this redundant type field.

Fixes: 226a0e072d5c ("exthdr: add support for matching IPv4 options")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipopt: use ipv4 address datatype for address field in ip options

commit 5faccb0681acb3b0175c4190eeaecf62f0bd12d4 upstream.

So user does not have to play integer arithmetics to match on IPv4
address.

Before:

# nft describe ip option lsrr addr
exthdr expression, datatype integer (integer), 32 bits

After:

# nft describe ip option lsrr addr
exthdr expression, datatype ipv4_addr (IPv4 address) (basetype integer), 32 bits

Fixes: 226a0e072d5c ("exthdr: add support for matching IPv4 options")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

datatype: clamp boolean value to 0 and 1

commit afb6a8e66a11178cbdbfc152c4aa9dda961b2140 upstream.

If user provides a numeric value larger than 0 or 1, match never
happens:

# nft --debug=netlink add rule x y tcp option sack-perm 4
ip x y
  [ exthdr load tcpopt 1b @ 4 + 0 present => reg 1 ]
  [ cmp eq reg 1 0x00000004 ]

After this update:

# nft --debug=netlink add rule x y tcp option sack-perm 4
ip x y
  [ exthdr load tcpopt 1b @ 4 + 0 present => reg 1 ]
  [ cmp eq reg 1 0x00000001 ]

This is to address a rare corner case, in case user specifies the
boolean value through the integer base type.

Fixes: 9fd9baba43c8 ("Introduce boolean datatype and boolean expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

exthdr: incomplete type 2 routing header definition

commit c029dcb14940936dbeddc2947316c9dbc5b93656 upstream.

Add missing type 2 routing header definition.

Listing is not correct because these IPv6 extension header are still
lacking context to properly delinearize the listing, but at least this
does not crash anymore.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add and use payload_expr_trim_force

commit 4f046ae450cbe2567022575c11dd65a9d9ea272d upstream.

Previous commit fixed erroneous handling of raw expressions when RHS sets
a zero value.

Input: @ih,58,6 set 0 @ih,86,6 set 0 @ih,170,22 set 0
Output:@ih,48,16 set @ih,48,16 & 0xffc0 @ih,80,16 set \
@ih,80,16 & 0xfc0f @ih,160,32 set @ih,160,32 & 0xffc00000

After this patch, this will instead display:

@ih,58,6 set 0x0 @ih,86,6 set 0x0 @ih,170,22 set 0x0

payload_expr_trim_force() only works when the payload has no known
protocol (template) attached, i.e. will be printed as raw payload syntax.

It performs sanity checks on @mask and then adjusts the payload expression
length and offset according to the mask.

Also add this check in __binop_postprocess() so we can also discard masks
when matching, e.g.

'@ih,7,5 2' becomes '@ih,7,5 0x2', not '@ih,0,16 & 0xffc0 == 0x20'.

binop_postprocess now returns if it performed an action or not; if this
returns true then arguments might have been freed so callers must no longer
refer to any of the expressions attached to the binop.

Next patch adds test cases for this.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinarize: fix bogus munging of mask value

commit 44c803015a1e0bca54fb7b92fdc154d162f9dbfd upstream.

Given following input:
table ip t {
chain c {
  @ih,58,6 set 0 @ih,86,6 set 0 @ih,170,22 set 0
}
}

nft will produce following output:
chain c {
@ih,48,16 set @ih,48,16 & 0x3f @ih,80,16 set @ih,80,16 & 0x3f0 @ih,160,32 set @ih,160,32 & 0x3fffff
}

The input side is correct, the generated expressions sent to kernel are:

1  [ payload load 2b @ inner header + 6 => reg 1 ]
2  [ bitwise reg 1 = ( reg 1 & 0x0000c0ff ) ^ 0x00000000 ]
3  [ payload write reg 1 => 2b @ inner header + 6 .. ]
4  [ payload load 2b @ inner header + 10 => reg 1 ]
5  [ bitwise reg 1 = ( reg 1 & 0x00000ffc ) ^ 0x00000000 ]
6  [ payload write reg 1 => 2b @ inner header + 10 .. ]
7  [ payload load 4b @ inner header + 20 => reg 1 ]
8  [ bitwise reg 1 = ( reg 1 & 0x0000c0ff ) ^ 0x00000000 ]
9  [ payload write reg 1 => 4b @ inner header + 20 .. ]

@ih,58,6 set 0 <- Zero 6 bits, starting with bit 58

Changes to inner header mandate a checksum update, which only works for
even byte counts (except for last byte in the payload).

Thus, we load 2b at offet 6. (16bits, offset 48).

Because we want to zero 6 bits, we need a mask that retains 10 bits and
clears 6: b1111111111000000 (first 8 bit retains 48-57, last 6 bit clear
58-63).  The '0xc0ff' is not correct, but thats because debug output comes
from libnftnl which prints values in host byte order, the value will be
interpreted as big endian on kernel side, so this will do the right thing.

Next, same problem:

@ih,86,6 set 0 <- Zero 6 bits, starting with bit 86.

nft needs to round down to even-sized byte offset, 10, then retain first
6 bits (80 + 6 == 86), then clear 6 bits (86-91), then keep 4 more as-is
(92-95).

So mask is 0xfc0f (in big endian) would be correct (b1111110000001111).

Last expression, @ih,170,22 set 0, asks to clear 22 bits starting with bit
170, nft correctly rounds this down to a 32 bit read at offset 160.

Required mask keeps first 10 bits, then clears 22
(b11111111110000000000000000000000).  Required mask would be 0xffc00000,
which corresponds to the wrong-endian-printed value in line 8 above.

Now that we convinced ourselves that the input side is correct, fix up
netlink delinearize to undo the mask alterations if we can't find a
template to print a human-readable payload expression.

With this patch, we get this output:

  @ih,48,16 set @ih,48,16 & 0xffc0 @ih,80,16 set @ih,80,16 & 0xfc0f @ih,160,32 set @ih,160,32 & 0xffc00000

... which isn't ideal.  We should fixup the payload expression to display
the same output as the input, i.e. adjust payload->len and offset as per
mask and discard the mask instead.

This will be done in a followup patch.

Fixes: 50ca788ca4d0 ("netlink: decode payload statment")
Reported-by: Sunny73Cr <Sunny73Cr@protonmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

scanner: better error reporting for CRLF line terminators

commit 8c35615297983227cac1437edbe0cdedf4c2227b upstream.

Provide a hint to users that file is coming with CRLF line terminators,
maybe from a non-Linux OS.

Extend scanner.l to provide hint on CRLF in files:

# file test.nft
test.nft: ASCII text, with CRLF, LF line terminators
# nft -f test.nft
test.nft:1:13-14: Error: syntax error, unexpected CRLF line terminators
table ip x {
^^

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: make cmd_free(NULL) valid

commit 581e051ae26b503484b7634b8799a9b9b531e95d upstream.

bison uses cmd_free($$) as destructor, but base_cmd can
set it to NULL, e.g.

  |       ELEMENT         set_spec        set_block_expr
  {
    if (nft_cmd_collapse_elems(CMD_ADD, state->cmds, &$2, $3)) {
       handle_free(&$2);
       expr_free($3);
       $$ = NULL;   // cmd set to NULL
       break;
    }
    $$ = cmd_alloc(CMD_ADD, CMD_OBJ_ELEMENTS, &$2, &@$, $3);

expr_free(NULL) is legal, cmd_free() causes crash.  So just allow
this to avoid cluttering parser_bison.y with "if ($$)".

Also add the afl-generated bogon input to the test files.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>

intervals: set internal element location with the deletion trigger

commit 93077e35accccd8cc056b67f70bfb3182c819fd4 upstream.

set location of internal elements (already in the kernel) to the one
that partial or fully deletes it.

Otherwise, error reporting refers to internal location.

Before this patch:

# nft delete element x y { 1.1.1.3 }
Error: Could not process rule: Too many open files in system
delete element x y { 1.1.1.3 }
^^^^^^^

After this patch:

# nft delete element x y { 1.1.1.3 }
Error: Could not process rule: Too many open files in system
delete element x y { 1.1.1.3 }
^^^^^^^

This occurs after splitting an existing interval in two:

remove: [1010100-10101ff]
add: [1010100-1010102]
add: [1010104-10101ff]

which results in two additions after removing the existing interval
that is split.

Fixes: 81e36530fcac ("src: replace interval segment tree overlap and automerge")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: reset statement length context before evaluating statement

commit 8d3de823b622136e1d05a6fed11ff2dc0e804f8a upstream.

This patch consolidates ctx->stmt_len reset in stmt_evaluate() to avoid
this problem. Note that stmt_evaluate_meta() and stmt_evaluate_ct()
already reset it after the statement evaluation.

Moreover, statement dependency can be generated while evaluating a meta
and ct statement. Payload statement dependency already manually stashes
this before calling stmt_evaluate(). Add a new stmt_dependency_evaluate()
function to stash statement length context when evaluating a new statement
dependency and use it for all of the existing statement dependencies.

Florian also says:

'meta mark set vlan id map { 1 : 0x00000001, 4095 : 0x00004095 }' will
crash. Reason is that the l2 dependency generated here is errounously
expanded to a 32bit-one, so the evaluation path won't recognize this
as a L2 dependency. Therefore, pctx->stacked_ll_count is 0 and
__expr_evaluate_payload() crashes with a null deref when
dereferencing pctx->stacked_ll[0].

nft-test.py gains a fugly hack to tolerate '!map typeof vlan id : meta mark'.
For more generic support we should find something more acceptable, e.g.

!map typeof( everything here is a key or data ) timeout ...

tests/py update and assert(pctx->stacked_ll_count) by Florian Westphal.

Fixes: edecd58755a8 ("evaluate: support shifts larger than the width of the left operand")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

optimize: compare expression length

commit bc0311378285d41850e3508df905d75959ba4239 upstream.

do not merge raw payload expressions with different length.

Other expression rely on key comparison which is assumed to have the
same length already.

Fixes: 60dcc01d6351 ("optimize: add __expr_cmp()")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: allow to map key to nfqueue number

commit 058246016188c8418cae1b3db70b16b935b1fe7c upstream.

Allow to specify a numeric queue id as part of a map.
The parser side is easy, but the reverse direction (listing) is not.

'queue' is a statement, it doesn't have an expression.

Add a generic 'queue_type' datatype as a shim to the real basetype with
constant expressions, this is used only for udata build/parse, it stores
the "key" (the parser token, here "queue") as udata in kernel and can
then restore the original key.

Add a dumpfile to validate parser & output.

JSON support is missing because JSON allow typeof only since quite
recently.

Joint work with Pablo.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1455
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

libnftables-json: fix raw payload expression documentation

commit 570320ab9a0752c7749a6c9cc85b34a5e7ab91b5 upstream.

Raw payload expression accesses payload data in bits, not bytes.

Fixes: 872f373dc50f7 ("doc: Add JSON schema documentation")
Signed-off-by: Eric Long <i@hack3r.moe>
Signed-off-by: Phil Sutter <phil@nwl.cc>

json: Support typeof in set and map types

commit bb6312484af93a83a9ec8716f3887a43566a775a upstream.

Implement this as a special "type" property value which is an object
with sole property "typeof". The latter's value is the JSON
representation of the expression in set->key, so for concatenated
typeofs it is a concat expression.

All this is a bit clumsy right now but it works and it should be
possible to tear it down a bit for more user-friendliness in a
compatible way by either replacing the concat expression by the array it
contains or even the whole "typeof" object - the parser would just
assume any object (or objects in an array) in the "type" property value
are expressions to extract a type from.

Signed-off-by: Phil Sutter <phil@nwl.cc>

cache: initialize filter when fetching implicit chains

commit e3d2a5e852ceea587bfff5878e6e5c569f15116a upstream.

ASAN reports:

src/cache.c:734:25: runtime error: load of value 189, which is not a valid value for type '_Bool'

because filter->reset.rule remains uninitialized.

Initialize filter and replace existing construct to initialize table and
chain which leaves remaining fields uninitialized.

Fixes: dbff26bfba83 ("cache: consolidate reset command")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

proto: use NFT_PAYLOAD_L4CSUM_PSEUDOHDR flag to mangle UDP checksum

commit f89abfb4068d31f7279cae298abf25e0c077d2d3 upstream.

There are two mechanisms to update the UDP checksum field:

1) _CSUM_TYPE and _CSUM_OFFSET which specify the type of checksum
(e.g. inet) and offset where it is located.
2) use NFT_PAYLOAD_L4CSUM_PSEUDOHDR flag to use layer 4 kernel
protocol parser.

The problem with 1) is that it is inconditional, that is, csum_type and
csum_offset cannot deal with zero UDP checksum.

Use NFT_PAYLOAD_L4CSUM_PSEUDOHDR flag instead since it relies on the
layer 4 kernel parser which skips updating zero UDP checksum.

Extend test coverage for the UDP mangling with and without zero
checksum.

Fixes: e6c9174e13b2 ("proto: add checksum key information to struct proto_desc")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

libnftables: Zero ctx->vars after freeing it

commit d361be1f8734461e27117f6c569acf2189fcf81e upstream.

Leaving the invalid pointer value in place will cause a double-free when
users call nft_ctx_clear_vars() first, then nft_ctx_free(). Moreover,
nft_ctx_add_var() passes the pointer to mrealloc() and thus assumes it
to be either NULL or valid.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1772
Fixes: 9edaa6a51eab4 ("src: add --define key=value")
Signed-off-by: Phil Sutter <phil@nwl.cc>

cache: position does not require full cache

commit d414f756af9d638fe0c0002b2df31c8c17a15002 upstream.

position refers to the rule handle, it has similar cache requirements as
replace rule command, relax cache requirements.

Commit e5382c0d08e3 ("src: Support intra-transaction rule references")
uses position.id for index support which requires a full cache, but
only in such case.

Fixes: 01e5c6f0ed03 ("src: add cache level flags")
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: relax requirement for replace rule command

commit 4984da8cc427974ea63796fa60a791b714a71440 upstream.

No need for full cache, this command relies on the rule handle which is
not validated from userspace. Cache requirements are similar to those
of add/create/delete rule commands.

This speeds up incremental updates with large rulesets.

Extend tests/coverage for rule replacement.

Fixes: 01e5c6f0ed03 ("src: add cache level flags")
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: remove full cache requirement when echo flag is set on

commit 53a503ad4a1abfa0374b3d12e884b69dc6df4b4f upstream.

The echo flag does not use the cache infrastructure yet, it relies on
the monitor cache which follows the netlink_echo_callback() path.

Fixes: 01e5c6f0ed03 ("src: add cache level flags")
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: clean up evaluate_cache_del()

commit 19702ae3d5da18fef64248f95df471c6664dd08e upstream.

Move NFT_CACHE_TABLE flag to default case to disentangle this.

Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: assert filter when calling nft_cache_evaluate()

commit 4dd20f3bbd606eed4869ebe449debee8b2ac7900 upstream.

nft_cache_evaluate() always takes a non-null filter, remove superfluous
checks when calculating cache requirements via flags.

Note that filter is still option from netlink dump path, since this can
be called from error path to provide hints.

Fixes: 08725a9dc14c ("cache: filter out rules by chain")
Fixes: b3ed8fd8c9f3 ("cache: missing family in cache filtering")
Fixes: 635ee1cad8aa ("cache: filter out sets and maps that are not requested")
Fixes: 3f1d3912c3a6 ("cache: filter out tables that are not requested")
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: only dump rules for the given table

commit ebd06f85a3257c294572005d0fa6b8ab0f213486 upstream.

Only family is set on in the dump request, set on table and chain
otherwise, rules for the given family are fetched for each existing
table.

Fixes: afbd102211dc ("src: do not use the nft_cache_filter object from mnl.c")
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: accumulate flags in batch

commit 68c8fb5f7c988a38a694c77c65e789e0cb8dfd8a upstream.

Recent updates are relaxing cache requirements:

babc6ee8773c ("cache: populate chains on demand from error path")

Flags describe cache requirements for a given batch, accumulate flags
that are inferred from commands in this batch.

Fixes: 7df42800cf89 ("src: single cache_update() call to build cache before evaluation")
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: reset filter for each command

commit 29cb49d0ca92b840938823dec697d8c5488d7253 upstream.

Inconditionally reset filter for each command in the batch, this is safer.

Fixes: 3f1d3912c3a6 ("cache: filter out tables that are not requested")
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_json: fix crash in json_parse_set_stmt_list

commit 26d9cbefb10e6bc3765df7e9e7a4fc3b951a80f3 upstream.

Due to missing `NULL`-check, there will be a segfault for invalid statements.

Fixes: 07958ec53830 ("json: add set statement list support")
Signed-off-by: Sebastian Walz (sivizius) <sebastian.walz@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_json: fix handle memleak from error path

commit 47e18c0eba51a538e1110322d1a9248b0501d7c8 upstream.

Based on patch from Sebastian Walz.

Fixes: 586ad210368b ("libnftables: Implement JSON parser")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_json: fix several expression memleaks from error path

commit bae7b4d283826efbeb28c21aecd7b355e86da170 upstream.

Fixes: 586ad210368b ("libnftables: Implement JSON parser")
Signed-off-by: Sebastian Walz (sivizius) <sebastian.walz@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_json: release buffer returned by json_dumps

commit 46700fbdbbbaab0d7db716fce3a438334c58ac9e upstream.

The signature of `json_dumps` is:

`char *json_dumps(const json_t *json, size_t flags)`:

It will return a pointer to an owned string, the caller must free it.
However, `json_error` just borrows the string to format it as `%s`, but
after printing the formatted error message, the pointer to the string is
lost and thus never freed.

Fixes: 586ad210368b ("libnftables: Implement JSON parser")
Signed-off-by: Sebastian Walz (sivizius) <sebastian.walz@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

doc: update outdated route and pkttype info

commit 5089d0f46676aa13ab679f6e4820a08957f2e7e6 upstream.

inet family supports route type.
unicast pkttype changed to host pkttype.

Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

parser_bison: allow 0 burst in limit rate byte mode

commit cea05ae5bdc50949d4c734796d6db5717187055a upstream.

Unbreak restoring elements in set with rate limit that fail with:

> /dev/stdin:3618:61-61: Error: limit burst must be > 0
> elements = { 1.2.3.4 limit rate over 1000 kbytes/second timeout 1s,

no need for burst != 0 for limit rate byte mode.

Add tests/shell too.

Fixes: 702eff5b5b74 ("src: allow burst 0 for byte ratelimit and use it as default")
Fixes: 285baccfea46 ("src: disallow burst 0 in ratelimits")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: do not fetch set inconditionally on delete

commit ba13acf4be081129d5c943db9f607a13954be5f6 upstream.

This is only required to remove elements, relax cache requirements for
anything else.

Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: populate flowtables on demand from error path

commit 52d99078521f0ae245ad0145348bebdba9f665ab upstream.

Flowtables are only required for error reporting hints if kernel reports
ENOENT. Populate the cache from this error path only.

Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: populate objects on demand from error path

commit aab2fe87a665c0cba2676096b49b5c8ea21910f8 upstream.

Objects are only required for error reporting hints if kernel reports
ENOENT. Populate the cache from this error path only.

Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: populate chains on demand from error path

commit babc6ee8773cfeed78167f78827b35e3141e04c6 upstream.

Updates on verdict maps that require many non-base chains are slowed
down due to fetching existing non-base chains into the cache.

Chains are only required for error reporting hints if kernel reports
ENOENT. Populate the cache from this error path only.

Similar approach already exists from rule ENOENT error path since:

  deb7c5927fad ("cmd: add misspelling suggestions for rule commands")

however, NFT_CACHE_CHAIN was toggled inconditionally for rule
commands, rendering this on-demand cache population useless.

before this patch, running Neels' nft_slew benchmark (peak values):

  created idx 4992 in 52587950 ns   (128 in 7122 ms)
  ...
  deleted idx  128 in 43542500 ns   (127 in 6187 ms)

after this patch:

  created idx 4992 in 11361299 ns   (128 in 1612 ms)
  ...
  deleted idx 1664 in  5239633 ns   (128 in 733 ms)

Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cmd: provide better hint if chain is already declared with different type/hook/priority

commit 1f321f86c45fce88a5bcd6f8eafa0157248c8b38 upstream.

Display the following error in such case:

  ruleset.nft:7:9-52: Error: Chain "input" already exists in table ip 'filter' with different declaration
          type filter hook postrouting priority filter;
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

instead of reporting a misleading unsupported chain type when updating
an existing chain with different type/hook/priority.

Fixes: 573788e05363 ("src: improve error reporting for unsupported chain type")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

meta: stash context statement length when generating payload/meta dependency

commit 5f1676ac9f1aeb36d7695c3c354dade013a1e4f3 upstream.

... meta mark set ip dscp

generates an implicit dependency from the inet family to match on meta
nfproto ip.

The length of this implicit expression is incorrectly adjusted to the
statement length, ie. relational to compare meta nfproto takes 4 bytes
instead of 1 byte. The evaluation of 'ip dscp' under the meta mark
statement triggers this implicit dependency which should not consider
the context statement length since it is added before the statement
itself.

This problem shows when listing the ruleset, since netlink_parse_cmp()
where left->len < right->len, hence handling the implicit dependency as
a concatenation, but it is actually a bug in the evaluation step that
leads to incorrect bytecode.

Fixes: 3c64ea7995cb ("evaluate: honor statement length in integer evaluation")
Fixes: edecd58755a8 ("evaluate: support shifts larger than the width of the left operand")
Tested-by: Brian Davidson <davidson.brian@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_linearize: use div_round_up in byteorder length

commit 25e7b99cc450490c38becb03d8bddd0199cfd3f9 upstream.

Use div_round_up() to calculate the byteorder length, otherwise fields
that take % BITS_PER_BYTE != 0 are not considered by the byteorder
expression.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: honor statement length in integer evaluation

commit 3c64ea7995cbbc4f1d9d7707f907667325eb62b9 upstream.

Otherwise, bogus error is reported:

# nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 | 0xff000000'
Error: Value 4278190080 exceeds valid range 0-63
add rule ip x y ct mark set ip dscp & 0x0f << 1 | 0xff000000
^^^^^^^^^^

Use the statement length as the maximum value in the mark statement
expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: improve error reporting for unsupported chain type

commit 573788e053631a5c069f887caed7c62d521b022d upstream.

8c75d3a16960 ("Reject invalid chain priority values in user space")
provides error reporting from the evaluation phase. Instead, this patch
infers the error after the kernel reports EOPNOTSUPP.

test.nft:3:28-40: Error: Chains of type "nat" must have a priority value above -200
type nat hook prerouting priority -300;
^^^^^^^^^^^^^

This patch also adds another common issue for users compiling their own
kernels if they forget to enable CONFIG_NFT_NAT in their .config file.

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: rule by index requires full cache

commit 161beaeacd2e5218d66febc3db825bf6a27119c5 upstream.

In preparation for on-demand cache population with errors, set on
NFT_CACHE_FULL if rule index is used since this requires a full cache
with rules.

This is not a fix, index is already fetching a full cache before this
patch.

But follow up patches relax cache requirements, so add this patch in
first place to make sure index does not break.

Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

datatype: improve error reporting when time unit is not correct

commit 6bcaef6a1ea6dc60250ed6124f3b49a8cd29434c upstream.

Display:

Wrong unit format, expecting bytes or kbytes or mbytes

instead of:

Wrong rate format

Fixes: 6615676d825e ("src: add per-bytes limit")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

datatype: reject rate in quota statement

commit 8ed6fa6d66b2df50d118423c1cb0e98cdd45cdbd upstream.

Bail out if rate are used:

ruleset.nft:5:77-106: Error: Wrong rate format, expecting bytes or kbytes or mbytes
add rule netdev firewall PROTECTED_IPS update @quota_temp_before { ip daddr quota over 45000 mbytes/second } add @quota_trigger { ip daddr }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

improve error reporting while at this.

Fixes: 6615676d825e ("src: add per-bytes limit")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: skip variables in nat statements

commit bc1f910f502701f1a1d28c7bd723e4be3bac1d8c upstream.

Do not hit assert():

nft: optimize.c:486: rule_build_stmt_matrix_stmts: Assertion `k >= 0' failed.

variables are not supported by -o/--optimize at this stage.

Fixes: 9be404a153bc ("optimize: ignore existing nat mapping")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

libnftables: skip useable checks for /dev/stdin

commit 477fd8218777b75bdfa3a5643f692adae4f002fe upstream.

/dev/stdin is a placeholder, read() from STDIN_FILENO is used to fetch
the standard input into a buffer.

Since 5c2b2b0a2ba7 ("src: error reporting with -f and read from stdin")
stdin is stored in a buffer to fix error reporting.

This patch requires: ("parser_json: use stdin buffer if available")

Fixes: 149b1c95d129 ("libnftables: refuse to open onput files other than named pipes or regular files")
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_json: use stdin buffer if available

commit e48f32701ff65d522c2f29f34bf4f3ce8e562057 upstream.

Since 5c2b2b0a2ba7 ("src: error reporting with -f and read from stdin")
stdin is stored in a buffer, update json support to use it instead of
reading from /dev/stdin.

Some systems do not provide /dev/stdin symlink to /proc/self/fd/0
according to reporter (that mentions Yocto Linux as example).

Fixes: 935f82e7dd49 ("Support 'nft -f -' to read from stdin")
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

optimize: clone counter before insertion into set element

commit ac77f3805c71f14c51730a9c5cb726ee67f14159 upstream.

The counter statement that is zapped from the rule needs to be cloned
before inserting it into each set element.

Fixes: 686ab8b6996e ("optimize: do not remove counter in verdict maps")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: recursive table declaration in deprecated meter statement

commit a70a217079ef83482fc093d8549f8cdeaeaa3cae upstream.

This is allowing for recursive table NAME declarations such as:

... table xyz1 table xyz2 { ... }

remove it.

Fixes: 3ed5e31f4a32 ("src: add flow statement")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

intervals: fix element deletions with maps

commit 551a4ad68b922fa6c942f5e79ac59f723a12e233 upstream.

Set element deletion in maps (including catchall elements) does not work.

# nft delete element ip x m { \* }
BUG: invalid range expression type catch-all set element
nft: src/expression.c:1472: range_expr_value_low: Assertion `0' failed.
Aborted

Call interval_expr_key() to fetch expr->left in the mapping but use the
expression that represents the mapping because it provides access to the
EXPR_F_REMOVE flags.

Moreover, assume maximum value for catchall expression by means of the
expr->len to reuse the existing code to check if the element to be
deleted really exists.

Fixes: 3e8d934e4f72 ("intervals: support to partial deletion with automerge")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: set on EXPR_F_KERNEL flag for catchall elements in the cache

commit dc6950a80110d6e6f63bd6f5c308d202db698f46 upstream.

Catchall set element deletion requires this flag to be set on,
otherwise it bogusly reports that such element does not exist
in the set.

Fixes: f1cc44edb218 ("src: add EXPR_F_KERNEL to identify expression in the kernel")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: set on expr->len for catchall set elements

commit b523008535f3de78ed5834a302ba07cda4b4c8fd upstream.

Catchall elements coming from the parser provide expr->len == 0.
However, the existing mergesort implementation requires expr->len to be
set up to the length of the set key to properly sort elements.

In particular, set element deletion leverages such list sorting to find
if elements exists in the set.

Fixes: 419d19688688 ("src: add set element catch-all support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

dynset: avoid errouneous assert with ipv6 concat data

commit aa791a12cb2b23fb49c233e40b6e12e0e3ce6b9b upstream.

nft add rule ip6 table-test chain-1 update @map-X { ip6 saddr : 1000::1 . 5001 }
nft: src/netlink_linearize.c:873: netlink_gen_expr: Assertion `dreg < ctx->reg_low' failed.
Aborted (core dumped)

This is because we pass the EXPR_SET_ELEM expr to the register allocation,
which will make it reserve 1 128 bit register / 16 bytes.

This happens to be enough for most cases, but its not for ipv6 concat data.
Pass the actual key and data instead: This will reserve enough space to
hold a possible concat expression.

Also add test cases.

Signed-off-by: Son Dinh <dinhtrason@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

rule: do not crash if to-be-printed flowtable lacks priority

commit b40bebbcee3602e2d849e48f3a50676bd8987204 upstream.

Print an empty flowtable rather than crashing when dereferencing
flowtable->priority.expr (its NULL).

Signed-off-by: Florian Westphal <fw@strlen.de>

cmd: skip variable set elements when collapsing commands

ASAN reports an issue when collapsing commands that represent an element
through a variable:

include/list.h:60:13: runtime error: member access within null pointer of type 'struct list_head'
AddressSanitizer:DEADLYSIGNAL
=================================================================
==11398==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7ffb77cf09c2 bp 0x7ffc818267c0 sp 0x7ffc818267a0 T0)
==11398==The signal is caused by a WRITE memory access.
==11398==Hint: address points to the zero page.
    #0 0x7ffb77cf09c2 in __list_add include/list.h:60
    #1 0x7ffb77cf0ad9 in list_add_tail include/list.h:87
    #2 0x7ffb77cf0e72 in list_move_tail include/list.h:169
    #3 0x7ffb77cf86ad in nft_cmd_collapse src/cmd.c:478
    #4 0x7ffb77da9f16 in nft_evaluate src/libnftables.c:531
    #5 0x7ffb77dac471 in __nft_run_cmd_from_filename src/libnftables.c:720
    #6 0x7ffb77dad703 in nft_run_cmd_from_filename src/libnftables.c:807

Skip such commands to address this issue.

This patch also extends tests/shell to cover for this bug.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1754
Fixes: 498a5f0c219d ("rule: collapse set element commands")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

monitor: too large shift exponent displaying payload expression

commit 016f37f1268fa1003c46c66655697d3f58d86598 upstream.

ASAN reports too large shift exponent when displaying traces for raw
payload expression:

trace id ec23e848 ip x y packet: oif "wlan0" src/netlink.c:2100:32: runtime error: shift exponent 1431657095 is too large for 32-bit type 'int'

skip if proto_unknown_template is set on in this payload expression.

Fixes: be5d9120e81e ("nft monitor [ trace ]")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

scanner: inet_pton() allows for broader IPv4-Mapped IPv6 addresses

commit fda913d712925e8a8d6460b361d49774dc5d34b8 upstream.

inet_pton() allows for broader IPv4-Mapped IPv6 address syntax than
those specified by rfc4291 Sect.2.5.5. This patch extends the scanner to
support them for compatibility reasons. This allows to represent the
last 4 bytes of an IPv6 address as an IPv4 address.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1730
Fixes: fd513de78bc0 ("scanner: IPv4-Mapped IPv6 addresses support")
Fixes: 3f82ef3d0dbf ("scanner: Support rfc4291 IPv4-compatible addresses")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: check for NFT_CACHE_REFRESH in current requested cache too

commit bbb0e944b59d355085e8f50b4b7b5057ae0d33a4 upstream.

NFT_CACHE_REFRESH is set on inconditionally by ruleset list commands to
deal with stateful information in this ruleset. This flag results in
dropping the existing cache and fully fetching all objects from the
kernel.

Set on this flag for reset commands too, this is missing.

List/reset commands allow for filtering by specific family and object,
therefore, NFT_CACHE_REFRESH also signals that the cache is partially
populated.

Check if this flag is requested by the current list/reset command, as
well as cache->flags which represents the cache after the _previous_
list of commands.

A follow up patch allows to recycle the existing cache if the flags
report that the same objects are already available in the cache,
NFT_CACHE_REFRESH is useful to report that cache cannot be recycled.

Fixes: 407c54f71255 ("src: cache gets out of sync in interactive mode")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: Fix incorrect checking the `base` variable in case of IPV6

commit f6b579344eee17e5587b6a7fcc444fe997cd8cb6 upstream.

Found by RASU JSC.

Fixes: 2b29ea5f3c3e ("src: ct: add eval part to inject dependencies for ct saddr/daddr")
Signed-off-by: Maks Mishin <maks.mishinFZ@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: bogus protocol conflicts in vlan with implicit dependencies

commit d1a7e74d1e065d244439fdb0f1c1cba83f921609 upstream.

The following command:

# nft add rule netdev x y ip saddr 10.1.1.1 icmp type echo-request vlan id set 321

fails with:

Error: conflicting link layer protocols specified: ether vs. vlan
netdev x y ip saddr 10.1.1.1 icmp type echo-request vlan id set 321
                                                    ^^^^^^^

Users can work around this issue by prepending an explicit match for
vlan ethertype field, that is:

ether type vlan ip saddr 10.1.1.1 ...
        ^-------------^

but nft should really handle this itself.

The error above is triggered by the following check in
resolve_ll_protocol_conflict():

       /* This payload and the existing context don't match, conflict. */
       if (pctx->protocol[base + 1].desc != NULL)
               return 1;

This check was added by 39f15c243912 ("nft: support listing expressions
that use non-byte header fields") and f7d5590688a6 ("tests: vlan tests")
to deal with conflicting link layer protocols, for instance:

ether type ip vlan id 1

this is matching ethertype ip at offset 12, but then it matches for vlan
id at offset 14 which is not present given the previous check.

One possibility is to remove such check, but nft does not bail out for
the example above and it results in bytecode that never matches:

# nft --debug=netlink netdev x y ether type ip vlan id 10
netdev x y
  [ meta load iiftype => reg 1 ]
  [ cmp eq reg 1 0x00000001 ]
  [ payload load 2b @ link header + 12 => reg 1 ] <---- ether type
  [ cmp eq reg 1 0x00000008 ]                     <---- ip
  [ payload load 2b @ link header + 12 => reg 1 ] <---- ether type
  [ cmp eq reg 1 0x00000081 ]                     <---- vlan
  [ payload load 2b @ link header + 14 => reg 1 ]
  [ bitwise reg 1 = ( reg 1 & 0x0000ff0f ) ^ 0x00000000 ]
  [ cmp eq reg 1 0x00000a00 ]

This is due to resolve_ll_protocol_conflict() which deals with the
conflict by updating protocol context and emitting an implicit
dependency, but there is already an explicit match coming from the user.

This patch adds a new helper function to check if an implicit dependency
clashes with an existing statement, which results in:

# nft add rule netdev x y ether type ip vlan id 1
        Error: conflicting statements
        add rule netdev x y ether type ip vlan id 1
                            ^^^^^^^^^^^^^ ~~~~~~~

Theoretically, no duplicated implicit dependency should ever be emitted
if protocol context is correctly handled.

Only implicit payload expressions are considered at this stage for this
conflict check, this patch can be extended to deal with other dependency
types.

Fixes: 39f15c243912 ("nft: support listing expressions that use non-byte header fields")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: add support for variables in map expressions

commit c6127ff0c4480ccefc5c29548409898fb315a2ca upstream.

It is possible to use a variable to initialize a map, which is then used
in a map statement:

  define dst_map = { ::1234 : 5678 }

  table ip6 nat {
    map dst_map {
      typeof ip6 daddr : tcp dport;
      elements = $dst_map
    }
    chain prerouting {
      ip6 nexthdr tcp redirect to ip6 daddr map @dst_map
    }
  }

However, if one tries to use the variable directly in the statement:

  define dst_map = { ::1234 : 5678 }

  table ip6 nat {
    chain prerouting {
      ip6 nexthdr tcp redirect to ip6 daddr map $dst_map
    }
  }

nft rejects it:

  /space/azazel/tmp/ruleset.1067161.nft:5:47-54: Error: invalid mapping expression variable
      ip6 nexthdr tcp redirect to ip6 daddr map $dst_map
                                  ~~~~~~~~~     ^^^^^^^^
It also rejects variables in stateful object statements:

  define quota_map = { 192.168.10.123 : "user123", 192.168.10.124 : "user124" }

  table ip nat {
    quota user123 { over 20 mbytes }
    quota user124 { over 20 mbytes }
    chain prerouting {
      quota name ip saddr map $quota_map
    }
  }

thus:

  /space/azazel/tmp/ruleset.1067161.nft:15:29-38: Error: invalid mapping expression variable
      quota name ip saddr map $quota_map
                 ~~~~~~~~     ^^^^^^^^^^

Add support for these uses together with some test-cases.

Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1067161
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: handle invalid mapping expressions in stateful object statements gracefully.

commit 52a7af9bec15a4fb4bfea86e40b70f96098f7dfd upstream.

Currently, they are reported as assertion failures:

  BUG: invalid mapping expression variable
  nft: src/evaluate.c:4618: stmt_evaluate_objref_map: Assertion `0' failed.
  Aborted

Instead, report them more informatively as errors:

  /space/azazel/tmp/ruleset.1067161.nft:15:29-38: Error: invalid mapping expression variable
      quota name ip saddr map $quota_map
                 ~~~~~~~~     ^^^^^^^^^^

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

doc: nft.8: Highlight "hook" in flowtable description

commit 9da7b00aa886012c0e59e73aa19e05a8d1568540 upstream.

Lacking an explicit description of possible hook values, emphasising the
word in the description text should draw readers' attention in the right
direction.

Signed-off-by: Phil Sutter <phil@nwl.cc>

doc: nft.8: Fix markup in ct expectation synopsis

commit ef659df2a45c38ad18f703febeae8c4febf9dd04 upstream.

Just a missing asterisk somewhere.

Fixes: 1dd08fcfa07a4 ("src: add ct expectations support")
Signed-off-by: Phil Sutter <phil@nwl.cc>

json: Fix for memleak in __binop_expr_json

commit a0a15e4dd0576bc4efd9b01fdd4ee1c565effac9 upstream.

When merging the JSON arrays generated for LHS and RHS of nested binop
expressions, the emptied array objects leak if their reference is not
decremented.

Fix this and tidy up other spots which did it right already by
introducing a json_array_extend wrapper.

Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixes: 0ac39384fd9e4 ("json: Accept more than two operands in binary expressions")
Signed-off-by: Phil Sutter <phil@nwl.cc>

mergesort: Avoid accidental set element reordering

commit bf52af188b306acf5a30134d6a670f41f16a9459 upstream.

In corner cases, expr_msort_cmp() may return 0 for two non-identical
elements. An example are ORed tcp flags: 'syn' and 'syn | ack' are
considered the same value since expr_msort_value() reduces the latter to
its LHS.

Keeping the above in mind and looking at how list_expr_sort() works: The
list in 'head' is cut in half, the first half put into the temporary
list 'list' and finally 'list' is merged back into 'head' considering
each element's position. Shall expr_msort_cmp() return 0 for two
elements, the one from 'list' ends up after the one in 'head', thus
reverting their previous ordering.

The practical implication is that output never matches input for the
sample set '{ syn, syn | ack }' as the sorting after delinearization in
netlink_list_setelems() keeps swapping the elements. Out of coincidence,
the commit this fixes itself illustrates the use-case this breaks,
namely tracking a ruleset in git: Each ruleset reload will trigger an
update to the stored dump.

This change breaks interval set element deletion because __set_delete()
implicitly relies upon this reordering of duplicate entries by inserting
a clone of the one to delete into the start (via list_move()) and after
sorting assumes the clone will end up right behind the original. Fix
this by calling list_move_tail() instead.

Fixes: 14ee0a979b622 ("src: sort set elements in netlink_get_setelems()")
Signed-off-by: Phil Sutter <phil@nwl.cc>

json: Accept more than two operands in binary expressions

commit 0ac39384fd9e48ff6bcc5605df2cbeb33af64b9e upstream.

The most common use case is ORing flags like

| syn | ack | rst

but nft seems to be fine with less intuitive stuff like

| meta mark set ip dscp << 2 << 3

so support all of them.

Signed-off-by: Phil Sutter <phil@nwl.cc>

src: disentangle ICMP code types

commit 5fecd2a6ef614eca7b0829e684449ee25982c233 upstream.

Currently, ICMP{v4,v6,inet} code datatypes only describe those that are
supported by the reject statement, but they can also be used for icmp
code matching. Moreover, ICMP code types go hand-to-hand with ICMP
types, that is, ICMP code symbols depend on the ICMP type.

Thus, the output of:

nft describe icmp_code

look confusing because that only displays the values that are supported
by the reject statement.

Disentangle this by adding internal datatypes for the reject statement
to handle the ICMP code symbol conversion to value as well as ruleset
listing.

The existing icmp_code, icmpv6_code and icmpx_code remain in place. For
backward compatibility, a parser function is defined in case an existing
ruleset relies on these symbols.

As for the manpage, move existing ICMP code tables from the DATA TYPES
section to the REJECT STATEMENT section, where this really belongs to.
But the icmp_code and icmpv6_code table stubs remain in the DATA TYPES
section because that describe that this is an 8-bit integer field.

After this patch:

# nft describe icmp_code
datatype icmp_code (icmp code) (basetype integer), 8 bits
# nft describe icmpv6_code
datatype icmpv6_code (icmpv6 code) (basetype integer), 8 bits
# nft describe icmpx_code
datatype icmpx_code (icmpx code) (basetype integer), 8 bits

do not display the symbol table of the reject statement anymore.

icmpx_code_type is not used anymore, but keep it in place for backward
compatibility reasons.

And update tests/shell accordingly.

Fixes: 5fdd0b6a0600 ("nft: complete reject support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinearize: unused code in reverse cross-day meta hour range

commit 3f776e8b37d8022d4492ed8be136e99f5a88ab9e upstream.

f8f32deda31d ("meta: Introduce new conditions 'time', 'day' and 'hour'")
reverses a cross-day range expressed as "22:00"-"02:00" UTC time into
!= "02:00"-"22:00" so meta hour ranges works.

Listing is however confusing, hence, 44d144cd593e ("netlink_delinearize:
reverse cross-day meta hour range") introduces code to reverse a cross-day.

However, it also adds code to reverse a range in == to-from form
(assuming OP_IMPLICIT) which is never exercised from the listing path
because the range expression is not currently used, instead two
instructions (cmp gte and cmp lte) are used to represent the range.
Remove this branch otherwise a reversed notation will be used to display
meta hour ranges once the range instruction is to represent this.

Add test for cross-day scenario in EADT timezone.

Fixes: 44d144cd593e ("netlink_delinearize: reverse cross-day meta hour range")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

datatype: use DTYPE_F_PREFIX only for IP address datatype

commit a2a7fbdfdd7f8dc5baa4cc8a23753b96bbc8a433 upstream.

DTYPE_F_PREFIX flag provides a hint to the netlink delinearize path to
use prefix notation.

It seems use of prefix notation in meta mark causes confusion, users
expect to see prefix in the listing only in IP address datatypes.

Untoggle this flag so (more lengthy) binop output such as:

meta mark & 0xffffff00 == 0xffffff00

is used instead.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1739
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: display "Range negative size" error

commit c0a5b8c6a6433ec1d4e41646dc42ccb8444c96be upstream.

zero length ranges now allowed, therefore, update error message to refer
to negative ranges which are not possible.

Fixes: 7a6e16040d65 ("evaluate: allow for zero length ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinearize: reverse cross-day meta hour range

commit 44d144cd593e3af9f3b3618ea510ea02bba4bc4c upstream.

f8f32deda31d ("meta: Introduce new conditions 'time', 'day' and 'hour'")
reverses the hour range in case that a cross-day range is used, eg.

meta hour "03:00"-"14:00" counter accept

which results in (Sidney, Australia AEDT time):

meta hour != "14:00"-"03:00" counter accept

kernel handles time in UTC, therefore, cross-day range may not be
obvious according to local time.

The ruleset listing above is not very intuitive to the reader depending
on their timezone, therefore, complete netlink delinearize path to
reverse the cross-day meta range.

Update manpage to recommend to use a range expression when matching meta
hour range. Recommend range expression for meta time and meta day too.

Extend testcases/listing/meta_time to cover for this scenario.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1737
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinearize: restore binop syntax when listing ruleset for flags

commit b11b6c68e61ea294eb4c313705ccfe3e7b0eda87 upstream.

c3d57114f119 ("parser_bison: add shortcut syntax for matching flags
without binary operations") provides a similar syntax to iptables using
a prefix representation for flag matching.

Restore original representation using binop when listing the ruleset.
The parser still accepts the prefix notation for backward compatibility.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: do not merge a set with a erroneous one

commit ea011231c06cbe828cf6056bc9c3d116e1f528d5 upstream.

The included sample causes a crash because we attempt to
range-merge a prefix expression with a symbolic expression.

The first set is evaluated, the symbol expression evaluation fails
and nft queues an error message ("Could not resolve hostname").

However, nft continues evaluation.

nft then encounters the same set definition again and merges the
new content with the preceeding one.

But the first set structure is dodgy, it still contains the
unresolved symbolic expression.

That then makes nft crash (assert) in the set internals.

There are various different incarnations of this issue, but the low
level set processing code does not allow for any partially transformed
expressions to still remain.

Before:
nft --check -f tests/shell/testcases/bogons/nft-f/invalid_range_expr_type_binop
BUG: invalid range expression type binop
nft: src/expression.c:1479: range_expr_value_low: Assertion `0' failed.

After:
nft --check -f tests/shell/testcases/bogons/nft-f/invalid_range_expr_type_binop
invalid_range_expr_type_binop:4:18-25: Error: Could not resolve hostname: Name or service not known
elements = { 1&.141.0.1 - 192.168.0.2}
^^^^^^^^

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser: json: Support for synproxy objects

commit 938a135f2200766661e42a20e02d87555f5bacfa upstream.

Parsing code was there already, merely the entry in json_parse_cmd_add()
missing.

To support maps with synproxy target, an entry in string_to_nft_object()
is required. While being at it, add other missing entries as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>

json: Support maps with concatenated data

commit c56d77cc82988391ab8f2514214c0088cbc7d89e upstream.

Dump such maps with an array of types in "map" property, make the parser
aware of this.

Signed-off-by: Phil Sutter <phil@nwl.cc>