git.ipfire.org Git - thirdparty/nftables.git/log

evaluate: check for NULL datatype in rhs in lookup expr

If we are evaluating an EXPR_SET_REF, check if right->dtype is not NULL.
We can hit SEGFAULT if for whatever reason the referenced object does not
exist.

Using this testfile (note the invalid set syntax):

% cat test.nft
flush ruleset
add table t
add chain t c
add set t s {type ipv4_addr\;}
add rule t c ip saddr @s

Without this patch:

% nft -f test.nft
Segmentation fault

With this patch:

% nft -f test.nft
t.nft:4:28-28: Error: syntax error, unexpected junk, expecting newline or semicolon
add set t s {type ipv4_addr\;}
                           ^
t.nft:4:13-29: Error: set definition does not specify key data type
add set t s {type ipv4_addr\;}
            ^^^^^^^^^^^^^^^^^
t.nft:5:23-24: Error: the referenced set does not exist
add rule t c ip saddr @s
             ~~~~~~~~ ^^

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/py: fix payload of dccp type in set elements

This value needs to be lshift one bit to be correct.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/py: fix fragment-offset field

Set elements were miscalculated.

After this patch:

element 00000801 : 0 [end]
^^^^

Which looks correct according to my calculations:

>>> print hex(socket.htons(33 << 3))
0x801
^^^^^

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/py: add missing netdev ip dscp payload tests

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add ecn support

This supports both IPv4:

# nft --debug=netlink add rule ip filter forward ip ecn ce counter
ip filter forward
  [ payload load 1b @ network header + 1 => reg 1 ]
  [ bitwise reg 1 = (reg=1 & 0x00000003 ) ^ 0x00000000 ]
  [ cmp eq reg 1 0x00000003 ]
  [ counter pkts 0 bytes 0 ]

For IPv6:

# nft --debug=netlink add rule ip6 filter forward ip6 ecn ce counter
ip6 filter forward
  [ payload load 1b @ network header + 1 => reg 1 ]
  [ bitwise reg 1 = (reg=1 & 0x00000030 ) ^ 0x00000000 ]
  [ cmp eq reg 1 0x00000030 ]
  [ counter pkts 0 bytes 0 ]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add dscp support

This supports both IPv4:

# nft --debug=netlink add rule filter forward ip dscp cs1 counter
ip filter forward
  [ payload load 1b @ network header + 1 => reg 1 ]
  [ bitwise reg 1 = (reg=1 & 0x000000fc ) ^ 0x00000000 ]
  [ cmp neq reg 1 0x00000080 ]
  [ counter pkts 0 bytes 0 ]

And also IPv6, note that in this case we take two bytes from the payload:

# nft --debug=netlink add rule ip6 filter input ip6 dscp cs4 counter
ip6 filter input
  [ payload load 2b @ network header + 0 => reg 1 ]
  [ bitwise reg 1 = (reg=1 & 0x0000c00f ) ^ 0x00000000 ]
  [ cmp eq reg 1 0x00000008 ]
  [ counter pkts 0 bytes 0 ]

Given the DSCP is split in two bytes, the less significant nibble
of the first byte and the two most significant 2 bits of the second
byte.

The 8 bit traffic class in RFC2460 after the version field are used for
DSCP (6 bit) and ECN (2 bit). Support for ECN comes in a follow up
patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

proto: remove priority field definition from IPv6 header

This is actually part of the traffic class field according to RFC2460.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

proto: update IPv6 flowlabel offset and length according to RFC2460

This is a 20 bit field according to Section 3. IPv6 Header Format.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: handle payload matching split in two bytes

When the bits are split between two bytes and the payload field is
smaller than one byte, we need to extend the expression length on both
sides (payload and constant) of the relational expression.

The existing trimming from the delinerization step handles the listing
for us, so no changes on that front.

This patch allows us to match the IPv6 DSCP field which falls into the
case that is described above.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: move payload sub-byte matching to the evaluation step

Generating the bitwise logic to match sub-byte payload fields from the
linearize step has several problems:

1) When the bits are split between two bytes and the payload field is
   smaller than one byte, we need to extend the expression length on
   both sides (payload and constant) of the relational expression.

2) Explicit bitmask operations on sub-byte payload fields need to be
   merge to the implicit bitmask operation, otherwise we generate two
   bitwise instructions. This is not resolved by this patch, but we
   should have a look at some point to this.

With this approach, we can benefit from the binary operation transfer
for shifts to provide a generic way to adjust the constant side of the
expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: transfer right shifts to set reference side

This provides a generic way to transfer shifts from the left hand side
to the right hand range side of a relational expression when performing
transformations from the evaluation step.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: transfer right shifts to range side

This provides a generic way to transfer shifts from the left hand side
to the right hand range side of a relational expression when performing
transformations from the evaluation step.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: transfer right shifts to constant side

This provides a generic way to transfer shifts from the left hand side
to the right hand constant side of a relational expression when
performing transformations from the evaluation step.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

dist: include tests/ directory and files in tarball

If we include tests/ in the release tarball, downstream distributors
can run the testsuites themselves while developing the packages.

This way, tests can be run in a more integrated environment and they can
discover errors related to the integration with the given distribution itself.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: allow to run tests with other nft binaries

Allow to run tests with other nft binaries by reading a 'NFT'
environment variable, allowing arbitrary locations for the nft binary.

This is what the tests/shell/run-tests.sh script does.

Among other thing, this allow us to properly hook this testsuite
from the Debian CI environment (https://ci.debian.net) where we can perform
tests for packages 'as installed'.

Examples:

# run with default config (ie src/nft)
% ./nft-test.py

# run with installed binary (ie /usr/sbin/nft)
% NFT=/usr/sbin/nft ./nft-test.py

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/py: add interval tests

Add some initial tests to cover dynamic interval sets.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/py: explicitly indication of set type and flags from test definitions

This patch adds explicit set type in test definitions, as well as flags.

This has triggered a rework that starts by introducing a Set class to
make this whole code more extensible and maintainable.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/py: add more interval tests for anonymous sets

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: add testcases for named sets with intervals

Let's add some testcases for named sets with intervals and ranges.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: add interval overlap detection for dynamic updates

Make sure the new intervals that we want to add are not overlapping with
any of the existing ones.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: rename set expression set_to_segtree()

This function is modified by a follow up patch to take the set object,
so rename it to init.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: add expr_to_intervals()

Refactor code to add the new expr_to_intervals(). This function takes
the list of set element expressions and convert them to a list of
half-closed intervals.

This is useful for different purposes, such as interval overlap
and conflicts detection.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: set expr->len for prefix expression from interval_map_decompose()

This field needs to be set for the new interval overlap detection.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: bail out on prefix or range to non-interval set

If you declare a set with no interval flag, you get this bug message:

# nft add element filter myset { 192.168.1.100/24 }
BUG: invalid data expression type prefix
nft: netlink.c:323: netlink_gen_data: Assertion `0' failed.
Aborted

After this patch, we provide a clue to the user:

# nft add element filter myset { 192.168.1.100/24 }
<cmdline>:1:23-38: Error: Set member cannot be prefix, missing interval flag on declaration
add element filter myset { 192.168.1.100/24 }
^^^^^^^^^^^^^^^^

# nft add element filter myset { 192.168.1.100-192.168.1.200 }
<cmdline>:1:23-49: Error: Set member cannot be range, missing interval flag on declaration
add element filter myset { 192.168.1.100-192.168.1.200 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: special handling for the first non-matching segment

Add the first non-matching segment if the set is empty or if the set
becomes empty after the element removal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: support for incremental set interval element updates

Introduce __do_add_setelems() and do_delete_setelems() to support
incremental set interval element updates.

From do_add_set(), use netlink_add_setelems() not to try to re-add the
same elements again

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: explicit initialization via set_to_intervals()

Allow explicit compound expression to initialize the set intervals.
Incremental updates to interval sets require this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: handle adjacent interval nodes from expr_value_cmp()

Named sets may contain adjacent interval nodes, when equal in key, look
at the flags. Those with EXPR_F_INTERVAL_END should come in first place.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: clone full expression from interval_map_decompose()

Instead of cloning just its value, expr_value() expects a set element or
mapping.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

segtree: perform stricter expression type validation from expr_value()

This helper function returns a expression value type that represents the
set element key. This functions currently expects two kind of
expressions: set elements and mappings.

Bail out otherwise, if we see anything else, we have to fix our code.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

nft monitor [ trace ]

... can now display nftables nftrace debug information.

$ nft filter input tcp dport 10000 nftrace set 1
$ nft filter input icmp type echo-request nftrace set 1
$ nft -nn monitor trace
trace id e1f5055f ip filter input packet: iif eth0 ether saddr 63:f6:4b:00:54:52 ether daddr c9:4b:a9:00:54:52 ip saddr 192.168.122.1 ip daddr 192.168.122.83 ip tos 0 ip ttl 64 ip id 32315 ip length 84 icmp type echo-request icmp code 0 icmp id 10087 icmp sequence 1
trace id e1f5055f ip filter input rule icmp type echo-request nftrace set 1 (verdict continue)
trace id e1f5055f ip filter input verdict continue
trace id e1f5055f ip filter input
trace id 74e47ad2 ip filter input packet: iif vlan0 ether saddr 63:f6:4b:00:54:52 ether daddr c9:4b:a9:00:54:52 vlan pcp 0 vlan cfi 1 vlan id 1000 ip saddr 10.0.0.1 ip daddr 10.0.0.2 ip tos 0 ip ttl 64 ip id 49030 ip length 84 icmp type echo-request icmp code 0 icmp id 10095 icmp sequence 1
trace id 74e47ad2 ip filter input rule icmp type echo-request nftrace set 1 (verdict continue)
trace id 74e47ad2 ip filter input verdict continue
trace id 74e47ad2 ip filter input
trace id 3030de23 ip filter input packet: iif vlan0 ether saddr 63:f6:4b:00:54:52 ether daddr c9:4b:a9:00:54:52 vlan pcp 0 vlan cfi 1 vlan id 1000 ip saddr 10.0.0.1 ip daddr 10.0.0.2 ip tos 16 ip ttl 64 ip id 59062 ip length 60 tcp sport 55438 tcp dport 10000 tcp flags == syn tcp window 29200
trace id 3030de23 ip filter input rule tcp dport 10000 nftrace set 1 (verdict continue)
trace id 3030de23 ip filter input verdict continue
trace id 3030de23 ip filter input

Based on a patch from Florian Westphal, which again was based on a patch
from Markus Kötter.

Signed-off-by: Patrick McHardy <kaber@trash.net>

proto: add protocol header fields filter and ordering for packet decoding

The next patch introduces packet decoding for tracing messages based on
the proto definitions. In order to provide a readable output, add a filter
to surpress uninteresting header fields and allow to specify and explicit
output order.

Signed-off-by: Patrick McHardy <kaber@trash.net>

payload: add payload_is_stacked()

Add payload_is_stacked() to determine whether a protocol expression match defines
a stacked protocol on the same layer.

Signed-off-by: Patrick McHardy <kaber@trash.net>

payload: move payload depedency tracking to payload.c

Signed-off-by: Patrick McHardy <kaber@trash.net>

nft: resync kernel header files

Signed-off-by: Patrick McHardy <kaber@trash.net>

payload: fix stacked headers protocol context tracking

The code contains multiple scattered around fragments to fiddle with the
protocol contexts to work around the fact that stacked headers update the
context for the incorrect layer.

Fix this by updating the correct layer in payload_expr_pctx_update() and
also take care of offset adjustments there and only there. Remove all
manual protocol context fiddling and change protocol context debugging to
also print the offset for stacked headers.

All previously successful testcases pass.

Signed-off-by: Patrick McHardy <kaber@trash.net>

payload: only merge if adjacent and combined size fits into a register

add rule ip6 filter input ip6 saddr ::1/128 ip6 daddr ::1/128 fails,
we ask to compare a 32byte immediate which is not supported:

  [ payload load 32b @ network header + 8 => reg 1 ]
  [ cmp eq reg 1 0x00000000 0x00000000 0x00000000 0x01000000 0x00000000 0x00000000 0x00000000 0x02000000 ]

We would need to use two cmps in this case, i.e.:

  [ payload load 32b @ network header + 8 => reg 1 ]
  [ cmp eq reg 1 0x00000000 0x00000000 0x00000000 0x01000000 ]
  [ cmp eq reg 2 0x00000000 0x00000000 0x00000000 0x02000000 ]

Seems however that this requires a bit more changes to how nft
handles register allocations, we'd also need to undo the constant merge.

Lets disable merging for now so that we generate

  [ payload load 16b @ network header + 8 => reg 1 ]
  [ cmp eq reg 1 0x00000000 0x00000000 0x00000000 0x01000000 ]
  [ payload load 16b @ network header + 24 => reg 1 ]
  [ cmp eq reg 1 0x00000000 0x00000000 0x00000000 0x02000000 ]

... if merge would bring us over the 128 bit register size.

Closes: http://bugzilla.netfilter.org/show_bug.cgi?id=1032
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: delete tempfile failover in testcases

It seems both Debian/Fedora (and derivates) contains mktemp (from the coreutils
package) so it makes no sense to have this failover, which looks buggy also.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: add testcases for Netfilter bug #965

Testscases for Netfilter bug #965:
* add rule at position
* insert rule at position
* replace rule with given handle
* delete rule with given handle
* don't allow to delete rules with position keyword

Netfilter Bugzilla: http://bugzilla.netfilter.org/show_bug.cgi?id=965
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: Use libnftnl user data TLV infrastructure

Now it is possible to store multiple variable length user data into rule.
Modify the parser in order to fill the nftnl_udata with the comment, and
the print function for extract these commentary and print it to user.

Signed-off-by: Carlos Falgueras García <carlosfg@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

test: shell: also unload NAT modules

Also unload NAT modules between tests.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: evaluate: Show error for fanout without balance

The idea of fanout option is to improve the performance by indexing CPU
ID to map packets to the queues. This is used for load balancing.
Fanout option is not required when there is a single queue specified.

According to iptables, queue balance should be specified in order to use
fanout. Following that, throw an error in nftables if the range of
queues for load balancing is not specified with the fanout option.

After this patch,

$ sudo nft add rule ip filter forward counter queue num 0 fanout
<cmdline>:1:46-46: Error: fanout requires a range to be specified
add rule ip filter forward counter queue num 0 fanout
^^^^^

Signed-off-by: Shivani Bhardwaj <shivanib134@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: improve rule management checks

Improve checks (and error reporting) for basic rule management operations.
This includes a fix for netfilter bug #965.

Netfilter bug: http://bugzilla.netfilter.org/show_bug.cgi?id=965
Reported-by: Jesper Sander Lindgren <sander.contrib@gmail.com>
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: add some tests for network namespaces

A basic tests to check we can perform operations in different network
namespaces.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: add new testcases for commit/rollback

New simple testcases for kernel commit/rollback operations.

* ruleset A is loaded (good ruleset)
* ruleset B is loaded (bad ruleset): fail is expected
* ruleset A should remain in the kernel

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: store parser location for handle and position specifiers

Store the parser location structure for handle and position IDs so we
can use this information from the evaluation step, to provide better
error reporting.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>

rule: don't print trailing statement whitespace

This trailing whitespace is annoying when working with the textual output
of nft.

Before:

table t {
chain c {
ct state new
^
}
}

After:

table t {
chain c {
ct state new
}
}

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: add chain validations tests

Some basic test regarding chains: jumps and validations.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell/run-tests.sh: tune kernel cleanup

The modprobe call can return != 0 if, for example, a module was builtin and
we are triying to remove it, so force return code of 0 at the end of the
script.

This patch also adds the '-a' switch to modprobe so it doesn't stop unloading
modules if one of them fails (for example, it was builtin).

While at it, fix several module names, for example: 'nft_bridge_reject' vs
'nft_reject_bridge', delete bogus module names.

Reported-by: Piyush Pangtey <gokuvsvegita@gmail.com>
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Tested-by: Piyush Pangtey <gokuvsvegita@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: unload modules between tests

This patch adjusts the main test script so it unload all nftables
kernel modules between tests.

This way we achieve two interesting things:
* avoid false errors in some testcases due to module loading order
* test the module loading/unloading path itself

The false positives is for example, listing ruleset per families, which depends
on the loading order of nf_tables_xx modules.

We can later add more modules to unload incrementally (for
example nf_tables_switchdev).

This patch assumes we are working with a kernel which is compiled with
nf_tables =m, the case using =y is not supported and can still produce false
positives in some testcases due to module ordering.

Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

build: include/mini-gmp.h is not included at "make dist"

Added missing dist. file mini-gmp.h in include/Makefile.am

Signed-off-by: Magnus Öberg <magnus.oberg@westermo.se>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: release parsed type and hook name strings

The scanner allocates memory for this, so release them given that we
don't attach them to any object.

==6277== 42 bytes in 6 blocks are definitely lost in loss record 2 of 4
==6277==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==6277==    by 0x57AC9D9: strdup (strdup.c:42)
==6277==    by 0x41B82D: xstrdup (utils.c:64)
==6277==    by 0x41F510: nft_lex (scanner.l:511)
==6277==    by 0x427FD1: nft_parse (parser_bison.c:3690)
==6277==    by 0x4063AC: nft_run (main.c:231)
==6277==    by 0x40600C: main (main.c:361)

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: duplicate string returned by chain_type_name_lookup()

This chain type string is released via chain_free() since b7cb6915a88f,
so duplicate it so we don't try to release statically allocated memory.

Fixes: b7cb6915a88f ("rule: Remove memory leak")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: simplify hook_spec rule

Consolidate this rule by introducing the dev_spec and prio_spec, we save
50 LOC with this patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: Remove memory leak

Added matching xfree calls in chain_free(), for the chain members 'type' and
'dev'.

It can be reproduced by :
nft add chain x y { type filter hook input priority 0; }

Then:
$ sudo valgrind --leak-check=full nft list tables

==2899== HEAP SUMMARY:
==2899==     in use at exit: 327 bytes in 10 blocks
==2899==   total heap usage: 145 allocs, 135 frees, 211,462 bytes allocated
==2899==
==2899== 63 bytes in 9 blocks are definitely lost in loss record 1 of 2
==2899==    at 0x4C2AB80: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2899==    by 0x57A3839: strdup (strdup.c:42)
==2899==    by 0x41C05D: xstrdup (utils.c:64)
==2899==    by 0x411E9B: netlink_delinearize_chain.isra.3 (netlink.c:717)
==2899==    by 0x411F70: list_chain_cb (netlink.c:748)
==2899==    by 0x504A943: nft_chain_list_foreach (chain.c:1015)
==2899==    by 0x4145AE: netlink_list_chains (netlink.c:771)
==2899==    by 0x40793F: cache_init_objects (rule.c:90)
==2899==    by 0x40793F: cache_init (rule.c:130)
==2899==    by 0x40793F: cache_update (rule.c:147)
==2899==    by 0x40FB59: cmd_evaluate (evaluate.c:2475)
==2899==    by 0x429A1C: nft_parse (parser_bison.y:655)
==2899==    by 0x40651C: nft_run (main.c:231)
==2899==    by 0x40618C: main (main.c:357)
==2899==
==2899== LEAK SUMMARY:
==2899==    definitely lost: 63 bytes in 9 blocks
==2899==    indirectly lost: 0 bytes in 0 blocks
==2899==      possibly lost: 0 bytes in 0 blocks
==2899==    still reachable: 264 bytes in 1 blocks
==2899==         suppressed: 0 bytes in 0 blocks
==2899== Reachable blocks (those to which a pointer was found) are not shown.
==2899== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==2899==
==2899== For counts of detected and suppressed errors, rerun with: -v
==2899== Use --track-origins=yes to see where uninitialised values come from
==2899== ERROR SUMMARY: 4 errors from 2 contexts (suppressed: 0 from 0)

Signed-off-by: Piyush Pangtey <gokuvsvegita@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: use table_lookup_global() from expr_evaluate_symbol()

If there's already a table 'test' defined in the kernel and you load
another table 'test' via `nft -f', table_lookup() returns the table
that already exists in the kernel, so if you look up for objects that
are defined in the file, nft bails out with 'Set does not exist'.

Use table_lookup_global() function returns the existing table that is
defined in the file and that it is set as context via
ctx->handle->table.

This is not a complete fix, we should splice the existing kernel objects
into the userspace declaration. We just need some way to identify what
objects are already in the kernel so we don't send them again (otherwise
we will hit EEXIST errors). I'll follow up with this full fix asap.

Anyway, this patch fixes this shell test:

I: [OK] ./testcases/sets/cache_handling_0

So at least by now we have all shell test returning OK. I'll add more
tests to catch the case I describe above once it is fixed too.

Cc: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: revisit cache population logic

We get a partial cache (tables, chains and sets) when:

* We see a set reference from a rule, since this set object may be
  already defined in kernelspace and we need to fetch the datatype
  for evaluation.

* We add/delete a set element, we need this to evaluate if the
  element datatype is correct.

* We rename a chain, since we need to know the chain handle.

* We add a chain/set. This isn't needed for simple command line
  invocations. However, since the existing codepath is also exercised
  from `nft -f' context, we need to know if the object exists in the
  kernel. Thus, if this a newly declared object (not yet in the kernel) we
  add it to the cache, otherwise, we will not find follow up references to
  this object in our cache.

We get a full cache when:

* We list the ruleset. We can provide finer grain listing though,
  via partial cache, later.

* We monitor updates, since this displays incremental updates based on
  the existing objects.

* We export the ruleset, since this dumps all of the existing objects.

* We push updates via `nft -f'. We need to know what objects are
  already in the kernel for incremental updates. Otherwise,
  cache_update() hits a bogus 'set doesn't exist' error message for
  just declared set in this batch.  To avoid this problem, we need a
  way to differentiate between what objects in the lists that are
  already defined in the kernel and what are just declared in this
  batch (hint: the location structure information is set for just
  declared objects).

We don't get a cache at all when:

* We flush the ruleset, this is important in case of delinearize
  bugs, so you don't need to reboot or manually flush the ruleset via
  libnftnl examples/nft-table-flush.

* We delete any object, except for set elements (as we describe above).

* We add a rule, so you can generate via --debug=netlink the expression
  without requiring a table and chain in place.

* We describe a expression.

This patch also includes some intentional adjustments to the shell tests
to we don't get bogus errors due to changes in the list printing.

BTW, this patch also includes a revert for 97493717e738 ("evaluate: check
if table and chain exists when adding rules") since that check is not
possible anymore with this logic.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: frag: enable more tests

a couple of tests were disabled since nft did not support this.

Signed-off-by: Florian Westphal <fw@strlen.de>

netlink_delinarize: shift constant for ranges too

... else rule like vlan pcp 1-3 won't work and will be displayed
as 0-0 (reverse direction already works since range is represented
as two lte/gte compare expressions).

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

nft-test: don't zap remainder of rule after handling a set

Don't delete the part after the set, i.e. given

chain input {
type filter hook input priority 0; policy accept;
vlan id { 1, 2, 4, 100, 4095} vlan pcp 1-3
}

don't remove the vlan pcp 1-3 part.

This exposes following bug:

bridge/vlan.t: WARNING: line: 32:
'nft add rule --debug=netlink bridge test-bridge input vlan id { 1, 2, 4, 100, 4095 } vlan pcp 1-3': 'vlan id { 1, 2, 4, 100, 4095 } vlan pcp 1-3' mismatches 'vlan id { 4, 1, 2, 4095, 100} vlan pcp 0-0'

We do not shift the range, so on reverse translation we get a 0-0 output.
The bug will be fixes in a followup commit.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinearize: fix bogus offset w exthdr expressions

Need to fetch the offset from the exthdr template.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: frag: enable more tests

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/py: add tests for router-advertisement and router-solicitation icmp types

Introduced by 039f818fc88010 ("proto: Add router advertisement and solicitation
icmp types").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

proto: Add router advertisement and solicitation icmp types

Enable support for router-advertisement and router-solicitation icmp types in nft.

Example:
$ sudo nft add rule ip filter input icmp type router-advertisement counter accept
$ sudo nft add rule ip filter input icmp type router-solicitation counter accept

Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser_bison: allow 'snat' and 'dnat' keywords from the right-hand side

Parse 'snat' and 'dnat' reserved keywords from the right-hand side as
symbols. Thus, we can use them as values from ct status.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=950
Reported-by: Ana Rey <anarey@gmail.com>
Reported-by: Karol Babioch <karol@babioch.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

nft: Modified punctuation used in nft's show_help

Replaced '/' between shortopt and longopt with ',' , as used by other utilities.

Signed-off-by: Piyush Pangtey <gokuvsvegita@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

doc: nft: Fixed a typo and added/changed punctuation

In nft's man page , instead of using '/' between shortopt and longopt in the
"SYNOPSIS" and "OPTIONS" section , use '|' and ',' respectively.
(just like the man pages of iptables, etc.)

Fixed a typo and added missing ',' .

Signed-off-by: Piyush Pangtey <gokuvsvegita@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/py: enable tests for dccp types

This patch make sure we test dccp type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

parser: remove 'reset' as reserve keyword

The 'reset' keyword can be used as dccp type, so don't qualify it as
reserve keyword to avoid a conflict with this.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1055
Reported-by: Shivani Bhardwaj <shivanib134@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: simplify ("rule: delete extra space in sets printing")

This simplifies bd23f7628570 ("rule: delete extra space in sets printing")
by passing the whitespace from set_print_plain() called from the monitoring
path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>

tests/py: extend masquerade to cover ports too

Tests new masquerade port range support (available since 4.6-rc).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/listing: add some listing tests

Let's test what is shown with the 'list' command, for ruleset, tables and sets.

In order to ease debug in case of failure, if the diff tool is in the system,
then a textual diff is printed.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: add first `nft -f' tests

This patch add some basic initial tests.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: delete extra space in sets printing

The extra space is printed when sets are printed in tabulated format.

table inet test {
set test {
^
type ipv4_addr
}
}

However, the space is still required in printing in plain format (ie, monitor).

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: Add support for masquerade port selection

Provide full support for masquerading by allowing port range selection, eg.

# nft add rule nat postrouting ip protocol tcp masquerade to :1024-10024

Signed-off-by: Shivani Bhardwaj <shivanib134@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinearize: handle extension header templates with odd sizes

This enables nft to display
frag frag-off 33

... by considering a mask during binop postprocess in case
the initial template lookup done when the exthdr expression was
created did not yield a match.

In the above example, kernel netlink data specifies 16bits,
but the frag field is only 13bits wide.

We use the implicit binop mask to re-do the template lookup with
corrected offset and size information.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinearize: prepare binop_postprocess for exthdr demux

binop_postprocess takes care of removing masks if we're dealing
with payload expressions that have non-byte divisible sizes
or offsets.

Same can happen when matching some extension header fields, i.e.
this also needs to handle exthdr expression, not just payload.

So rename payload to left and move test for left type to
binop_postprocess.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

exthdr: store offset for later use

Its possible that we cannot find the template without also
considering an implicit mask. For this we need to store the offset.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

exthdr: remove implicit dependencies

exthdr expression requires a dependency on ipv6; we can
thus remove an ipv6 protocol test if its present.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: add/fix inet+exthdr tests

exhdr needs to be treated as if we'd test an ipv6 header field, i.e.
inet, bridge, netdev need to add a dependency on ipv6 protocol.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

exthdr: generate dependencies for inet/bridge/netdev family

Should treat this as if user would have asked to match ipv6 header field.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

payload: move payload_gen_dependency generic part to helper

We should treat exthdr just as if user asked for e.g. ip6 saddr
and inject the needed dependency statement.

payload_gen_dependency cannot be used since the *expr needs
to be a payload expression, but the actual dependency generation
doesn't depend on a particular expression type.

In order to reuse this part for future exthdr dependency injection
move it to a helper.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: add and use netlink_gen_exthdr_mask

rule ip6 filter input frag frag-off 33

before patch:
[ exthdr load 1b @ 44 + 2 => reg 1 ]
[ cmp eq reg 1 0x00002100 ]

We truncated 13bit field to 1 byte.

after patch:
[ exthdr load 2b @ 44 + 2 => reg 1 ]
[ bitwise reg 1 = (reg=1 & 0x0000f8ff ) ^ 0x00000000 ]
[ cmp eq reg 1 0x00000801 ]

- ask for 2 bytes
- mask out the 3 lower bits
- shift the value by 3 so equality test will pass for 33

This causes test failures, will be fixed up in a later patch
(the test suite expects the old, broken input).

It also misses the reverse translation to remove the binop,
find the right template and undo the shift of the value.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: split generic part of netlink_gen_payload_mask into helper

netlink_gen_payload_mask assumes expr is a payload expression,
but most of this function would work fine with exthdr too.

So split the gernic part into a helper, followup patch will
add netlink_gen_exthdr_mask.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: enforce ip6 proto with exthdr expression

Don't allow use of exthdr with e.g. ip family.
Move frag.t to ip6 directory and don't use it with ipv4 anymore.

This change causes major test failures for all exthdr users
since they now fail with inet/bridge/netdev families.

Will be resolved in a later patch -- we need to add
an ipv6 dependency for them.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: reject set references in set elements

given

table filter {
  set local {
    type iface_index
    elements = { lo }
  }
  chain input {
    type filter hook input priority 0;
    iif { @lan, } accept;
  }
}

nft BUG()s.  I don't see how we could support sets-in-set; add a sanity
check and error out instead.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

examples: use current type names

Signed-off-by: Florian Westphal <fw@strlen.de>

meta: fix error checks in tc handle parser

'meta priority foobar' did not return an error -- instead
we used min/max values with undefined content.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

proto: use parameter-problem for icmpv6 type

To keep it consistent with icmpv4 naming.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=911
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink_delinearize: prune implicit binop before payload_match_postprocess()

payload_match_postprocess() expects a relational with payload of his lhs
and value on the rhs.

Moreover, payload_match_expand() releases the previous expression so
valgrind reports an use-after-free when pruning the implicit binop.

Fix this by calling payload_match_postprocess() in first place.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/py: test vlan on ingress

This generates the same code as bridge does, but it includes this check
in first place.

[ meta load iiftype => reg 1 ]
[ cmp eq reg 1 0x00000001 ]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: annotate follow up dependency just after killing another

The inet and netdev families generate two implicit dependencies to check
for the interface type, so we have to check just after killing an implicit
dependency if there is another that we should annotate to kill it as well.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

proto: proto_dev_type() returns interface type for base protocols too

The device protocol definition provides a mapping between the interface
type, ie. ARPHDR_*, and the overlying protocol base definition, eg.
proto_eth.

This patch updates proto_dev_type() so it also returns a mapping for
these overlying ethernet protocol definitions, ie. ip, ip6, vlan, ip,
arp.

This patch required to resolve problems with automatic dependency
generation for vlan in the netdev and inet families.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: generate ether type payload after meta iiftype

Once the meta iiftype is generated, we shouldn't return from
resolve_protocol_conflict() since we also need to generate the ether
type payload implicit match after it.

This gets rid of the manual proto-ctx update from
meta_iiftype_gen_dependency() that we don't need since stmt_evaluate()
already handles this for us.

Moreover, skip error reporting once we verify that the protocol conflict
has been resolved.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: wrap protocol context debunk into function

ether type vlan sets the network layer protocol context to vlan. This
function debunks the existing link layer protocol context by setting it
to vlan.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: assert on invalid base in resolve_protocol_conflict()

We already have similar code in the tree, we shouldn't see bases over
transport yet.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: only try to replace dummy protocol from link-layer context

Add proto_is_dummy() that returns true for netdev and inet family, the
only two using a dummy link-layer protocol base definition.

Rename supersede_dep() to meta_iiftype_gen_dependency() since this is
generating the implicit meta iiftype check for netdev and inet.

This patch also gets rid of the have->length check. The tests pass fine
without this so I suspect this is superfluos.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: don't adjust offset from resolve_protocol_conflict()

This is not itself a conflict, move this check out of this function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: check if we have to resolve a conflict in first place

So we enter resolve_protocol_conflict() only when we really have a
conflict that we want to try to resolve.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: move inet/netdev protocol context supersede logic to supersede_dep()

This is a cleanup to untangle this logic a bit.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>