]> git.ipfire.org Git - thirdparty/nftables.git/log
thirdparty/nftables.git
4 years agocache: add hashtable cache for table
Pablo Neira Ayuso [Thu, 29 Apr 2021 20:23:05 +0000 (22:23 +0200)] 
cache: add hashtable cache for table

Add a hashtable for fast table lookups.

Tables that reside in the cache use the table->cache_hlist and
table->cache_list heads.

Table that are created from command line / ruleset are also added
to the cache.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoevaluate: add object to the cache
Pablo Neira Ayuso [Thu, 15 Apr 2021 12:00:26 +0000 (14:00 +0200)] 
evaluate: add object to the cache

If the cache does not contain this object that is defined in this batch,
add it to the cache. This allows for references to this new object in
the same batch.

This patch also adds missing handle_merge() to set the object name,
otherwise object name is NULL and obj_cache_find() crashes.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agocache: missing table cache for several policy objects
Pablo Neira Ayuso [Thu, 15 Apr 2021 12:00:22 +0000 (14:00 +0200)] 
cache: missing table cache for several policy objects

Populate the cache with tables for several policy objects types.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoevaluate: add flowtable to the cache
Pablo Neira Ayuso [Thu, 15 Apr 2021 12:00:20 +0000 (14:00 +0200)] 
evaluate: add flowtable to the cache

If the cache does not contain this flowtable that is defined in this
batch, then add it to the cache. This allows for references to this new
flowtable in the same batch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoevaluate: add set to the cache
Pablo Neira Ayuso [Thu, 15 Apr 2021 12:00:16 +0000 (14:00 +0200)] 
evaluate: add set to the cache

If the cache does not contain the set that is defined in this batch, add
it to the cache. This allows for references to this new set in the same
batch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agocache: add set_cache_del() and use it
Pablo Neira Ayuso [Thu, 15 Apr 2021 13:06:07 +0000 (15:06 +0200)] 
cache: add set_cache_del() and use it

Update set_cache_del() from the monitor path to remove sets
in the cache.

Fixes: df48e56e987f ("cache: add hashtable cache for sets")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agocache: add hashtable cache for flowtable
Pablo Neira Ayuso [Thu, 29 Apr 2021 20:19:07 +0000 (22:19 +0200)] 
cache: add hashtable cache for flowtable

Add flowtable hashtable cache.

Actually I am not expecting that many flowtables to benefit from the
hashtable to be created by streamline this code with tables, chains,
sets and policy objects.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agocache: add hashtable cache for object
Pablo Neira Ayuso [Thu, 29 Apr 2021 20:09:15 +0000 (22:09 +0200)] 
cache: add hashtable cache for object

This patch adds a hashtable for object lookups.

This patch also splits table->objs in two:

- Sets that reside in the cache are stored in the new
  tables->cache_obj and tables->cache_obj_ht.

- Set that defined via command line / ruleset file reside in
  tables->obj.

Sets in the cache (already in the kernel) are not placed in the
table->objs list.

By keeping separated lists, objs defined via command line / ruleset file
can be added to cache.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agosrc: consolidate object cache infrastructure
Pablo Neira Ayuso [Thu, 29 Apr 2021 19:55:34 +0000 (21:55 +0200)] 
src: consolidate object cache infrastructure

This patch consolidates the object cache infrastructure. Update set and
chains to use it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agosrc: consolidate nft_cache infrastructure
Pablo Neira Ayuso [Thu, 29 Apr 2021 18:29:09 +0000 (20:29 +0200)] 
src: consolidate nft_cache infrastructure

- prepend nft_ prefix to nft_cache API and internal functions
- move declarations to cache.h (and remove redundant declarations)
- move struct nft_cache definition to cache.h

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agosrc: pass chain name to chain_cache_find()
Pablo Neira Ayuso [Thu, 29 Apr 2021 18:04:55 +0000 (20:04 +0200)] 
src: pass chain name to chain_cache_find()

You can identify chains through the unique handle in deletions, update
this interface to take a string instead of the handle to prepare for
the introduction of 64-bit handle chain lookups.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agorule: skip fuzzy lookup for unexisting 64-bit handle
Pablo Neira Ayuso [Thu, 29 Apr 2021 23:01:17 +0000 (01:01 +0200)] 
rule: skip fuzzy lookup for unexisting 64-bit handle

Deletion by handle, if incorrect, should not exercise the misspell
lookup functions.

Fixes: 3a0e07106f66 ("src: combine extended netlink error reporting with mispelling support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agosrc: unbreak deletion by table handle
Pablo Neira Ayuso [Thu, 29 Apr 2021 22:30:05 +0000 (00:30 +0200)] 
src: unbreak deletion by table handle

Use NFTA_TABLE_HANDLE instead of NFTA_TABLE_NAME to refer to the
table 64-bit unique handle.

Fixes: 7840b9224d5b ("evaluate: remove table from cache on delete table")
Fixes: f8aec603aa7e ("src: initial extended netlink error reporting")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agotests: shell: remove missing modules
Pablo Neira Ayuso [Thu, 29 Apr 2021 16:20:53 +0000 (18:20 +0200)] 
tests: shell: remove missing modules

Update run-tests.sh to remove the following modules:

- nft_reject_netdev
- nft_xfrm
- nft_synproxy

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoparser_bison: missing relational operation on flag list
Pablo Neira Ayuso [Mon, 19 Apr 2021 09:56:15 +0000 (11:56 +0200)] 
parser_bison: missing relational operation on flag list

Complete e6c32b2fa0b8 ("src: add negation match on singleton bitmask
value") which was missing comma-separated list of flags.

This patch provides a shortcut for:

    tcp flags and fin,rst == 0

which allows to check for the packet whose fin and rst bits are unset:

    # nft add rule x y tcp flags not fin,rst counter

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoparser: allow to load stateful ct connlimit elements in sets
Laura Garcia Liebana [Tue, 13 Apr 2021 09:03:41 +0000 (11:03 +0200)] 
parser: allow to load stateful ct connlimit elements in sets

This patch fixes a syntax error after loading a nft
dump with a set including stateful ct connlimit elements.

Having a nft dump as per below:

table ip nftlb {
set connlimit-set {
type ipv4_addr
size 65535
flags dynamic
elements = { 84.245.120.167 ct count over 20 , 86.111.207.45 ct count over 20 ,
             173.212.220.26 ct count over 20 , 200.153.13.235 ct count over 20  }
}
}

The syntax error is shown when loading the ruleset.

root# nft -f connlimit.nft
connlimit.nft:15997:31-32: Error: syntax error, unexpected ct, expecting comma or '}'
elements = { 84.245.120.167 ct count over 20 , 86.111.207.45 ct count over 20 ,
                            ^^
connlimit.nft:16000:9-22: Error: syntax error, unexpected string
     173.212.220.26 ct count over 20 , 200.153.13.235 ct count over 20  }
     ^^^^^^^^^^^^^^

After applying this patch a kernel panic is raised running
nft_rhash_gc() although no packet reaches the set.

The following patch [0] should be used as well:

4d8f9065830e5 ("netfilter: nftables: clone set element expression template")

Note that the kernel patch will produce the emptying of the
connection tracking, so the restore of the conntrack states
should be considered.

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git/commit/?id=4d8f9065830e526c83199186c5f56a6514f457d2

Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoevaluate: check if nat statement map specifies a transport header expr
Florian Westphal [Tue, 6 Apr 2021 16:34:19 +0000 (18:34 +0200)] 
evaluate: check if nat statement map specifies a transport header expr

Importing the systemd nat table fails:

table ip io.systemd.nat {
 map map_port_ipport {
   type inet_proto . inet_service : ipv4_addr . inet_service
   elements = { tcp . 8088 : 192.168.162.117 . 80 }
 }
 chain prerouting {
   type nat hook prerouting priority dstnat + 1; policy accept;
    fib daddr type local dnat ip addr . port to meta l4proto . th dport map @map_port_ipport
 }
}
ruleset:9:48-59: Error: transport protocol mapping is only valid after transport protocol match

To resolve this (no transport header base specified), check if the
map itself contains a network base protocol expression.

This allows nft to import the ruleset.
Import still fails with same error if 'inet_service' is removed
from the map, as it should.

Reported-by: Henning Reich <henning.reich@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agomnl: Increase BATCH_PAGE_SIZE to support huge rulesets
Phil Sutter [Wed, 14 Apr 2021 11:47:47 +0000 (13:47 +0200)] 
mnl: Increase BATCH_PAGE_SIZE to support huge rulesets

Apply the same change from iptables-nft to nftables to keep them in
sync with regards to max supported transaction sizes.

Signed-off-by: Phil Sutter <phil@nwl.cc>
4 years agoMakefile: missing owner.h file
Pablo Neira Ayuso [Sat, 3 Apr 2021 18:24:45 +0000 (20:24 +0200)] 
Makefile: missing owner.h file

Add it to include/Makefile.am, this fixes `make distcheck'.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agofiles: move example files away from /etc
Jan Engelhardt [Tue, 30 Mar 2021 14:46:53 +0000 (16:46 +0200)] 
files: move example files away from /etc

As per file-hierarchy(5), /etc is for "system-specific configuration", not
"vendor-supplied default configuration files".

Moreover, the comments in all-in-one.nft say it is an example, and so,
not a vendor config either.

Move it out of /etc.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agocache: bail out if chain list cannot be fetched from kernel
Pablo Neira Ayuso [Fri, 2 Apr 2021 18:48:00 +0000 (20:48 +0200)] 
cache: bail out if chain list cannot be fetched from kernel

Do not report success if chain cache list cannot be built.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agocache: add hashtable cache for sets
Pablo Neira Ayuso [Fri, 2 Apr 2021 18:26:15 +0000 (20:26 +0200)] 
cache: add hashtable cache for sets

This patch adds a hashtable for set lookups.

This patch also splits table->sets in two:

- Sets that reside in the cache are stored in the new
  tables->cache_set and tables->cache_set_ht.

- Set that defined via command line / ruleset file reside in
  tables->set.

Sets in the cache (already in the kernel) are not placed in the
table->sets list.

By keeping separated lists, sets defined via command line / ruleset file
can be added to cache.

Adding 10000 sets, before:

 # time nft -f x
 real    0m6,415s
 user    0m3,126s
 sys     0m3,284s

After:

 # time nft -f x
 real    0m3,949s
 user    0m0,743s
 sys     0m3,205s

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agocache: check for NULL chain in cache_init()
Pablo Neira Ayuso [Thu, 1 Apr 2021 21:15:02 +0000 (23:15 +0200)] 
cache: check for NULL chain in cache_init()

Another process might race to add chains after chain_cache_init().
The generation check does not help since it comes after cache_init().
NLM_F_DUMP_INTR only guarantees consistency within one single netlink
dump operation, so it does not help either (cache population requires
several netlink dump commands).

Let's be safe and do not assume the chain exists in the cache when
populating the rule cache.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agocache: statify chain_cache_dump()
Pablo Neira Ayuso [Thu, 1 Apr 2021 20:25:28 +0000 (22:25 +0200)] 
cache: statify chain_cache_dump()

Only used internally in cache.c

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoevaluate: use chain hashtable for lookups
Pablo Neira Ayuso [Thu, 1 Apr 2021 20:25:24 +0000 (22:25 +0200)] 
evaluate: use chain hashtable for lookups

Instead of the linear list lookup.

Before this patch:

real    0m21,735s
user    0m20,329s
sys     0m1,384s

After:

real    0m10,910s
user    0m9,448s
sys     0m1,434s

chain_lookup() is removed since linear list lookups are only used by the
fuzzy chain name matching for error reporting.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agosrc: split chain list in table
Pablo Neira Ayuso [Thu, 1 Apr 2021 20:19:30 +0000 (22:19 +0200)] 
src: split chain list in table

This patch splits table->lists in two:

- Chains that reside in the cache are stored in the new
  tables->cache_chain and tables->cache_chain_ht. The hashtable chain
  cache allows for fast chain lookups.

- Chains that defined via command line / ruleset file reside in
  tables->chains.

Note that chains in the cache (already in the kernel) are not placed in
the table->chains.

By keeping separated lists, chains defined via command line / ruleset
file can be added to cache.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agocache: rename chain_htable to cache_chain_ht
Pablo Neira Ayuso [Thu, 1 Apr 2021 20:18:29 +0000 (22:18 +0200)] 
cache: rename chain_htable to cache_chain_ht

Rename the hashtable chain that is used for fast cache lookups.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoproto: replace vlan ether type with 8021q
Florian Westphal [Fri, 2 Apr 2021 10:54:53 +0000 (12:54 +0200)] 
proto: replace vlan ether type with 8021q

Previous patches added "8021ad" mnemonic for IEEE 802.1AD frame type.
This adds the 8021q shorthand for the existing 'vlan' frame type.

nft will continue to recognize 'ether type vlan', but listing
will now print 8021q.

Adjust all test cases accordingly.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agotests: add 8021.AD vlan test cases
Florian Westphal [Thu, 1 Apr 2021 14:08:46 +0000 (16:08 +0200)] 
tests: add 8021.AD vlan test cases

Check nft doesn't remove the explicit '8021ad' type check and that
the expected dependency chains are generated.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agopayload: be careful on vlan dependency removal
Florian Westphal [Thu, 1 Apr 2021 14:08:45 +0000 (16:08 +0200)] 
payload: be careful on vlan dependency removal

'vlan ...' implies 8021Q frame.  In case the expression tests something else
(802.1AD for example) its not an implictly added one, so keep it.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoproto: add 8021ad as mnemonic for IEEE 802.1AD (0x88a8) ether type
Florian Westphal [Thu, 1 Apr 2021 14:08:44 +0000 (16:08 +0200)] 
proto: add 8021ad as mnemonic for IEEE 802.1AD (0x88a8) ether type

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agosrc: vlan: allow matching vlan id insider 802.1ad frame
Florian Westphal [Thu, 1 Apr 2021 14:08:43 +0000 (16:08 +0200)] 
src: vlan: allow matching vlan id insider 802.1ad frame

This makes "ether type 0x88a8 vlan id 342" work.

Before this change, nft would still insert a dependency on 802.1q so the
rule would never match.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agonetlink: don't crash when set elements are not evaluated as expected
Florian Westphal [Tue, 30 Mar 2021 23:26:19 +0000 (01:26 +0200)] 
netlink: don't crash when set elements are not evaluated as expected

define foo = 2001:db8:123::/48

table inet filter {
set foo {
typeof ip6 saddr
elements = $foo
}
}

gives crash.  This now exits with:

stdin:1:14-30: Error: Unexpected initial set type prefix
define foo = 2001:db8:123::/48
             ^^^^^^^^^^^^^^^^^

For literals, bison parser protects us, as it enforces
'elements = { 2001:... '.

For 'elements = $foo' we can't detect it at parsing stage as the '$foo'
symbol might as well evaluate to "{ 2001, ...}" (i.e. we can't do a
set element allocation).

So at least detect this from set instantiaton.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoparser_bison: simplify flowtable offload flag parser
Pablo Neira Ayuso [Wed, 31 Mar 2021 14:14:03 +0000 (16:14 +0200)] 
parser_bison: simplify flowtable offload flag parser

Remove ft_flags_spec rule.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agomnl: do not set flowtable flags twice
Pablo Neira Ayuso [Wed, 31 Mar 2021 14:07:13 +0000 (16:07 +0200)] 
mnl: do not set flowtable flags twice

Flags are already set on from mnl_nft_flowtable_add(), remove duplicated
code.

Fixes: e6cc9f37385 ("nftables: add flags offload to flowtable")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agorule: remove semicolon in flowtable offload
Pablo Neira Ayuso [Thu, 25 Mar 2021 12:06:02 +0000 (13:06 +0100)] 
rule: remove semicolon in flowtable offload

opts->stmt_separator already prints the semicolon when needed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoparser: fix scope closure of COUNTER token
Florian Westphal [Thu, 25 Mar 2021 09:34:40 +0000 (10:34 +0100)] 
parser: fix scope closure of COUNTER token

It is closed after allocation, which is too early: this
stopped 'packets' and 'bytes' from getting parsed correctly.

Also add a test case for this.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agosrc: add datatype->describe()
Pablo Neira Ayuso [Wed, 24 Mar 2021 16:19:32 +0000 (17:19 +0100)] 
src: add datatype->describe()

As an alternative to print the datatype values when no symbol table is
available. Use it to print protocols available via getprotobynumber()
which actually refers to /etc/protocols.

Not very efficient, getprotobynumber() causes a series of open()/close()
calls on /etc/protocols, but this is called from a non-critical path.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1503
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agonftables: add flags offload to flowtable
Frank Wunderlich [Sun, 21 Mar 2021 16:49:16 +0000 (17:49 +0100)] 
nftables: add flags offload to flowtable

allow flags (currently only offload) in flowtables like it is stated
here: https://lwn.net/Articles/804384/

tested on mt7622/Bananapi-R64

table ip filter {
flowtable f {
hook ingress priority filter + 1
devices = { lan3, lan0, wan }
flags offload;
}

chain forward {
type filter hook forward priority filter; policy accept;
ip protocol { tcp, udp } flow add @f
}
}

table ip nat {
chain post {
type nat hook postrouting priority filter; policy accept;
oifname "wan" masquerade
}
}

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agodoc: no need to define a set in ct state
Pablo Neira Ayuso [Wed, 24 Mar 2021 16:54:33 +0000 (17:54 +0100)] 
doc: no need to define a set in ct state

ct state are flags, no need to define a set for this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agodoc: use symbolic names for chain priorities
Simon Ruderich [Tue, 9 Mar 2021 10:53:30 +0000 (11:53 +0100)] 
doc: use symbolic names for chain priorities

This replaces the numbers with the matching symbolic names with one
exception: The NAT example used "priority 0" for the prerouting
priority. This is replaced by "dstnat" which has priority -100 which is
the new recommended priority.

Also use spaces instead of tabs for consistency in lines which require
updates.

Signed-off-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agotests: shell: fix 0025empty_dynset_0
Pablo Neira Ayuso [Wed, 24 Mar 2021 12:36:14 +0000 (13:36 +0100)] 
tests: shell: fix 0025empty_dynset_0

Use bash, otherwise it reports here:

testcases/nft-f/0025empty_dynset_0: 22: Syntax error: redirection unexpected

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agotests: shell: flowtable add after delete in batch
Pablo Neira Ayuso [Wed, 17 Mar 2021 19:50:12 +0000 (20:50 +0100)] 
tests: shell: flowtable add after delete in batch

Check for bogus EEXIST and EBUSY errors.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agosegtree: release single element already contained in an interval
Pablo Neira Ayuso [Tue, 16 Mar 2021 23:44:09 +0000 (00:44 +0100)] 
segtree: release single element already contained in an interval

Before this patch:

 table ip x {
        chain y {
                ip saddr { 1.1.1.1-1.1.1.2, 1.1.1.1 }
        }
 }

results in:

 table ip x {
        chain y {
                ip saddr { 1.1.1.1 }
        }
 }

due to incorrect interval merge logic.

If the element 1.1.1.1 is already contained in an existing interval
1.1.1.1-1.1.1.2, release it.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1512
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoparser: add missing scope_close annotation for RT keyword
Florian Westphal [Wed, 24 Mar 2021 11:07:05 +0000 (12:07 +0100)] 
parser: add missing scope_close annotation for RT keyword

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: log: move to own scope
Florian Westphal [Tue, 16 Mar 2021 23:40:36 +0000 (00:40 +0100)] 
scanner: log: move to own scope

GROUP and PREFIX are used by igmp and nat, so they can't be moved out of
INITIAL scope yet.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: counter: move to own scope
Florian Westphal [Tue, 16 Mar 2021 23:40:35 +0000 (00:40 +0100)] 
scanner: counter: move to own scope

move bytes/packets away from initial state.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: add support for scope nesting
Florian Westphal [Tue, 16 Mar 2021 23:40:34 +0000 (00:40 +0100)] 
scanner: add support for scope nesting

Adding a COUNTER scope introduces parsing errors.  Example:

add rule  ... counter ip saddr 1.2.3.4

This is supposed to be

    COUNTER IP SADDR SYMBOL

but it will be parsed as

    COUNTER IP STRING SYMBOL

... and rule fails with unknown saddr.
This is because IP state change gets popped right after it was pushed.

bison parser invokes scanner_pop_start_cond() helper via
'close_scope_counter' rule after it has processed the entire 'counter' rule.
But that happens *after* flex has executed the 'IP' rule.

IOW, the sequence of events is not the exepcted
"COUNTER close_scope_counter IP SADDR SYMBOL close_scope_ip", it is
"COUNTER IP close_scope_counter".

close_scope_counter pops the just-pushed SCANSTATE_IP and returns the
scanner to SCANSTATE_COUNTER, so next input token (saddr) gets parsed
as a string, which gets then rejected from bison.

To resolve this, defer the pop operation until the current state is done.
scanner_pop_start_cond() already gets the scope that it has been
completed as an argument, so we can compare it to the active state.

If those are not the same, just defer the pop operation until the
bison reports its done with the active flex scope.

This leads to following sequence of events:
  1. flex switches to SCANSTATE_COUNTER
  2. flex switches to SCANSTATE_IP
  3. bison calls scanner_pop_start_cond(SCANSTATE_COUNTER)
  4. flex remains in SCANSTATE_IP, bison continues
  5. bison calls scanner_pop_start_cond(SCANSTATE_IP) once the entire
     ip rule has completed: this pops both IP and COUNTER.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: avoid -fasan heap overflow warnings
Florian Westphal [Thu, 18 Mar 2021 16:31:30 +0000 (17:31 +0100)] 
scanner: avoid -fasan heap overflow warnings

Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: secmark: move to own scope
Florian Westphal [Thu, 11 Mar 2021 13:23:13 +0000 (14:23 +0100)] 
scanner: secmark: move to own scope

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: move until,over,used keywords away from init state
Florian Westphal [Thu, 11 Mar 2021 13:23:12 +0000 (14:23 +0100)] 
scanner: move until,over,used keywords away from init state

Only applicable for limit and quota. "ct count" also needs 'over'.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: quota: move to own scope
Florian Westphal [Thu, 11 Mar 2021 13:23:11 +0000 (14:23 +0100)] 
scanner: quota: move to own scope

... and move "used" keyword to it.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: limit: move to own scope
Florian Westphal [Thu, 11 Mar 2021 13:23:10 +0000 (14:23 +0100)] 
scanner: limit: move to own scope

Moves rate and burst out of INITIAL.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: vlan: move to own scope
Florian Westphal [Thu, 11 Mar 2021 13:23:09 +0000 (14:23 +0100)] 
scanner: vlan: move to own scope

ID needs to remain exposed as its used by ct, icmp, icmp6 and so on.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: remove saddr/daddr from initial state
Florian Westphal [Thu, 11 Mar 2021 13:23:08 +0000 (14:23 +0100)] 
scanner: remove saddr/daddr from initial state

This can now be reduced to expressions that can expect saddr/daddr tokens.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: arp: move to own scope
Florian Westphal [Thu, 11 Mar 2021 13:23:07 +0000 (14:23 +0100)] 
scanner: arp: move to own scope

allows to move the arp specific tokens out of the INITIAL scope.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: add ether scope
Florian Westphal [Thu, 11 Mar 2021 13:23:06 +0000 (14:23 +0100)] 
scanner: add ether scope

just like previous change: useless as-is, but prepares
for removal of saddr/daddr from INITIAL scope.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: add fib scope
Florian Westphal [Thu, 11 Mar 2021 13:23:05 +0000 (14:23 +0100)] 
scanner: add fib scope

makes no sense as-is because all keywords need to stay
in the INITIAL scope.

This can be changed after all saddr/daddr users have been scoped.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: ip6: move to own scope
Florian Westphal [Thu, 11 Mar 2021 13:23:04 +0000 (14:23 +0100)] 
scanner: ip6: move to own scope

move flowlabel and hoplimit.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: ip: move to own scope
Florian Westphal [Thu, 11 Mar 2021 13:23:03 +0000 (14:23 +0100)] 
scanner: ip: move to own scope

Move the ip option names (rr, lsrr, ...) out of INITIAL scope.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: ct: move to own scope
Florian Westphal [Thu, 11 Mar 2021 13:23:02 +0000 (14:23 +0100)] 
scanner: ct: move to own scope

This allows moving multiple ct specific keywords out of INITIAL scope.
Next few patches follow same pattern:
 1. add a scope_close_XXX rule
 2. add a SCANSTATE_XXX & make flex switch to it when
    encountering XXX keyword
 3. make bison leave SCANSTATE_XXXX when it has seen the complete
    expression.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agosrc: move remaining cache functions in rule.c to cache.c
Pablo Neira Ayuso [Thu, 11 Mar 2021 12:34:10 +0000 (13:34 +0100)] 
src: move remaining cache functions in rule.c to cache.c

Move all the cache logic to src/cache.c

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoscanner: socket: move to own scope
Florian Westphal [Mon, 8 Mar 2021 17:18:37 +0000 (18:18 +0100)] 
scanner: socket: move to own scope

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: rt: move to own scope
Florian Westphal [Mon, 8 Mar 2021 17:18:36 +0000 (18:18 +0100)] 
scanner: rt: move to own scope

classid and nexthop can be moved out of INIT scope.
Rest are still needed because tehy are used by other expressions as
well.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: ipsec: move to own scope
Florian Westphal [Mon, 8 Mar 2021 17:18:35 +0000 (18:18 +0100)] 
scanner: ipsec: move to own scope

... and hide the ipsec specific tokens from the INITITAL scope.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: queue: move to own scope
Florian Westphal [Mon, 8 Mar 2021 17:18:34 +0000 (18:18 +0100)] 
scanner: queue: move to own scope

allows to remove 3 queue specific keywords from INITIAL scope.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: introduce start condition stack
Florian Westphal [Mon, 8 Mar 2021 17:18:33 +0000 (18:18 +0100)] 
scanner: introduce start condition stack

Add a small initial chunk of flex start conditionals.

This starts with two low-hanging fruits, numgen and j/symhash.

NUMGEN and HASH start conditions are entered from flex when
the corresponding expression token is encountered.

Flex returns to the INIT condition when the bison parser
has seen a complete numgen/hash statement.

This intentionally uses a stack rather than BEGIN()
to eventually support nested states.

The scanner_pop_start_cond() function argument is not used yet, but
will need to be used later to deal with nesting.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoscanner: remove unused tokens
Florian Westphal [Mon, 8 Mar 2021 17:18:32 +0000 (18:18 +0100)] 
scanner: remove unused tokens

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agonftables: xt: fix misprint in nft_xt_compatible_revision
Pavel Tikhomirov [Tue, 9 Mar 2021 15:09:15 +0000 (18:09 +0300)] 
nftables: xt: fix misprint in nft_xt_compatible_revision

The rev variable is used here instead of opt obviously by mistake.
Please see iptables:nft_compatible_revision() for an example how it
should be.

This breaks revision compatibility checks completely when reading
compat-target rules from nft utility. That's why nftables can't work on
"old" kernels which don't support new revisons. That's a problem for
containers.

E.g.: 0 and 1 is supported but not 2:
https://git.sw.ru/projects/VZS/repos/vzkernel/browse/net/netfilter/xt_nat.c#111

Reproduce of the problem on Virtuozzo 7 kernel
3.10.0-1160.11.1.vz7.172.18 in centos 8 container:

  iptables-nft -t nat -N TEST
  iptables-nft -t nat -A TEST -j DNAT --to-destination 172.19.0.2
  nft list ruleset > nft.ruleset
  nft -f - < nft.ruleset
  #/dev/stdin:19:67-81: Error: Range has zero or negative size
  # meta l4proto tcp tcp dport 81 counter packets 0 bytes 0 dnat to 3.0.0.0-0.0.0.0
  #                                                                 ^^^^^^^^^^^^^^^

  nft -v
  #nftables v0.9.3 (Topsy)
  iptables-nft -v
  #iptables v1.8.7 (nf_tables)

Kernel returns ip range in rev 0 format:

  crash> p *((struct nf_nat_ipv4_multi_range_compat *) 0xffff8ca2fabb3068)
  $5 = {
    rangesize = 1,
    range = {{
        flags = 3,
        min_ip = 33559468,
        max_ip = 33559468,

But nft reads this as rev 2 format (nf_nat_range2) which does not have
rangesize, and thus flugs 3 is treated as ip 3.0.0.0, which is wrong and
can't be restored later.

(Should probably be the same on Centos 7 kernel 3.10.0-1160.11.1)

Fixes: fbc0768cb696 ("nftables: xt: don't use hard-coded AF_INET")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agotests/py: Fix for missing JSON equivalent in any/ct.t.json
Phil Sutter [Mon, 8 Mar 2021 14:43:23 +0000 (15:43 +0100)] 
tests/py: Fix for missing JSON equivalent in any/ct.t.json

JSON equivalent for recently added test of the '!' shortcut was missing.

Fixes: e6c32b2fa0b82 ("src: add negation match on singleton bitmask value")
Signed-off-by: Phil Sutter <phil@nwl.cc>
4 years agomnl: Set NFTNL_SET_DATA_TYPE before dumping set elements
Phil Sutter [Thu, 4 Feb 2021 01:20:23 +0000 (02:20 +0100)] 
mnl: Set NFTNL_SET_DATA_TYPE before dumping set elements

In combination with libnftnl's commit "set_elem: Fix printing of verdict
map elements", This adds the vmap target to netlink dumps. Adjust dumps
in tests/py accordingly.

Signed-off-by: Phil Sutter <phil@nwl.cc>
4 years agotests/py: Adjust payloads for fixed nat statement dumps
Phil Sutter [Tue, 29 Dec 2020 17:39:30 +0000 (18:39 +0100)] 
tests/py: Adjust payloads for fixed nat statement dumps

Libnftnl no longer dumps unused regs, so drop those.

Signed-off-by: Phil Sutter <phil@nwl.cc>
4 years agodoc: move drop rule on a separate line in blackhole example
Simon Ruderich [Sun, 7 Mar 2021 09:51:36 +0000 (10:51 +0100)] 
doc: move drop rule on a separate line in blackhole example

At first I overlooked the "drop". Putting it on a separate line makes it
more visible and also details the separate steps of this rule.

Signed-off-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agodoc: remove duplicate tables in synproxy example
Simon Ruderich [Sun, 7 Mar 2021 09:51:35 +0000 (10:51 +0100)] 
doc: remove duplicate tables in synproxy example

The "outcome ruleset" is the same as the two tables in the example.
Don't duplicate this information which just wastes space in the
documentation and can confuse the reader (it took me a while to realize
the tables are the same).

In addition, use the same table name for both tables to make it clear
that they can be the same. They will be merged in the resulting ruleset.

Signed-off-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agodoc: add * to include example to actually include files
Simon Ruderich [Sun, 7 Mar 2021 09:51:34 +0000 (10:51 +0100)] 
doc: add * to include example to actually include files

"/etc/firewall/rules/" causes no error but also doesn't include any
files contained in the directory.

Signed-off-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoparser: compact ct obj list types
Florian Westphal [Thu, 4 Mar 2021 01:07:35 +0000 (02:07 +0100)] 
parser: compact ct obj list types

Add new ct_cmd_type and avoid copypaste of the ct cmd_list rules.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoparser: compact map RHS type
Florian Westphal [Thu, 4 Mar 2021 01:07:34 +0000 (02:07 +0100)] 
parser: compact map RHS type

Similar to previous patch, we can avoid duplication.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoparser: squash duplicated spec/specid rules
Florian Westphal [Thu, 4 Mar 2021 01:07:33 +0000 (02:07 +0100)] 
parser: squash duplicated spec/specid rules

No need to have duplicate CMD rules for spec and specid: add and use
a common rule for those cases.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoexpression: memleak in verdict_expr_parse_udata()
Pablo Neira Ayuso [Fri, 5 Mar 2021 19:36:31 +0000 (20:36 +0100)] 
expression: memleak in verdict_expr_parse_udata()

Remove unnecessary verdict_expr_alloc() invocation.

Fixes: 4ab1e5e60779 ("src: allow use of 'verdict' in typeof definitions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agocache: memleak list of chain
Pablo Neira Ayuso [Tue, 2 Mar 2021 11:40:27 +0000 (12:40 +0100)] 
cache: memleak list of chain

Release chain list from the error path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agomnl: remove nft_mnl_socket_reopen()
Pablo Neira Ayuso [Tue, 2 Mar 2021 11:35:20 +0000 (12:35 +0100)] 
mnl: remove nft_mnl_socket_reopen()

nft_mnl_socket_reopen() was introduced to deal with the EINTR case.
By reopening the netlink socket, pending netlink messages that are part of
a stale netlink dump are implicitly drop. This patch replaces the
nft_mnl_socket_reopen() strategy by pulling out all of the remaining
netlink message to restart in a clean state.

This is implicitly fixing up a bug in the table ownership support, which
assumes that the netlink socket remains open until nft_ctx_free() is
invoked.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agotable: support for the table owner flag
Pablo Neira Ayuso [Sat, 20 Feb 2021 15:18:03 +0000 (16:18 +0100)] 
table: support for the table owner flag

Add new flag to allow userspace process to own tables: Tables that have
an owner can only be updated/destroyed by the owner. The table is
destroyed either if the owner process calls nft_ctx_free() or owner
process is terminated (implicit table release).

The ruleset listing includes the program name that owns the table:

 nft> list ruleset
 table ip x { # progname nft
        flags owner

        chain y {
                type filter hook input priority filter; policy accept;
                counter packets 1 bytes 309
        }
 }

Original code to pretty print the netlink portID to program name has
been extracted from the conntrack userspace utility.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agotable: rework flags printing
Pablo Neira Ayuso [Mon, 22 Feb 2021 14:44:35 +0000 (15:44 +0100)] 
table: rework flags printing

Simplify routine to print the table flags. Add table_flag_name() and use
it from json too.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoparser: re-enable support for concatentation on map RHS
Florian Westphal [Tue, 23 Feb 2021 11:12:40 +0000 (12:12 +0100)] 
parser: re-enable support for concatentation on map RHS

"typeof .... : ip saddr . tcp dport" is legal.

This makes 'testcases/maps/nat_addr_port' pass again.

Fixes: 4ab1e5e6077918 ("src: allow use of 'verdict' in typeof definitions")
Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agosrc: allow use of 'verdict' in typeof definitions
Florian Westphal [Sat, 30 Jan 2021 18:58:42 +0000 (19:58 +0100)] 
src: allow use of 'verdict' in typeof definitions

'verdict' cannot be used as part of a map typeof-based key definition,
its a datatype and not an expression, e.g.:

  typeof iifname . ip protocol . th dport : verdic

... will fail.

Make the parser convert a 'verdict' symbol to a verdict expression
and allow to store its presence as part of the typeof key definition.

Reported-by: Frank Myhr <fmyhr@fhmtech.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agodoc: nft: fix some typos and formatting issues
Štěpán Němec [Mon, 22 Feb 2021 12:03:20 +0000 (13:03 +0100)] 
doc: nft: fix some typos and formatting issues

Trying to escape asciidoc (9.1.0) * with \ preserves the backslash in
the formatted man page. Bare * works as expected.

Signed-off-by: Štěpán Němec <snemec@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agomain: fix nft --help output fallout from 719e4427
Štěpán Němec [Mon, 22 Feb 2021 12:03:19 +0000 (13:03 +0100)] 
main: fix nft --help output fallout from 719e4427

Long options were missing the double dash.

Fixes: 719e44277f8e ("main: use one data-structure to initialize getopt_long(3) arguments and help.")
Cc: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Štěpán Němec <snemec@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agojson: init parser state for every new buffer/file
Eric Garver [Fri, 19 Feb 2021 15:11:26 +0000 (10:11 -0500)] 
json: init parser state for every new buffer/file

Otherwise invalid error states cause subsequent json parsing to fail
when it should not.

Signed-off-by: Eric Garver <eric@garver.life>
Signed-off-by: Phil Sutter <phil@nwl.cc>
4 years agomonitor: Don't print newgen message with JSON output
Phil Sutter [Wed, 17 Feb 2021 11:38:42 +0000 (12:38 +0100)] 
monitor: Don't print newgen message with JSON output

Iff this should be printed, it must adhere to output format settings. In
its current form it breaks JSON syntax, so skip it for non-default
output formats.

Fixes: cb7e02f44d6a6 ("src: enable json echo output when reading native syntax")
Signed-off-by: Phil Sutter <phil@nwl.cc>
4 years agoevaluate: set evaluation context for set elements
Florian Westphal [Wed, 3 Feb 2021 16:57:07 +0000 (17:57 +0100)] 
evaluate: set evaluation context for set elements

This resolves same issue as previous patch when such
expression is used as a set key:

        set z {
                typeof ct zone
-               elements = { 1, 512, 768, 1024, 1280, 1536 }
+               elements = { 1, 2, 3, 4, 5, 6 }
        }

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoevaluate: pick data element byte order, not dtype one
Florian Westphal [Wed, 3 Feb 2021 16:57:06 +0000 (17:57 +0100)] 
evaluate: pick data element byte order, not dtype one

Some expressions have integer base type, not a specific one, e.g. 'ct zone'.
In that case nft used the wrong byte order.

Without this, nft adds
elements = { "eth0" : 256, "eth1" : 512, "veth4" : 256 }
instead of 1, 2, 3.

This is not a 'display bug', the added elements have wrong byte order.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agotests: extend dtype test case to cover expression with integer type
Florian Westphal [Wed, 3 Feb 2021 16:57:05 +0000 (17:57 +0100)] 
tests: extend dtype test case to cover expression with integer type

... nft doesn't handle this correctly at the moment: they are added
as network byte order (invalid byte order).

ct zone has integer_type, the byte order has to be taken from the expression.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agoevaluate: incorrect usage of stmt_binary_error() in reject
Pablo Neira Ayuso [Tue, 9 Feb 2021 13:22:12 +0000 (14:22 +0100)] 
evaluate: incorrect usage of stmt_binary_error() in reject

Don't pass ctx->pctx.protocol[PROTO_BASE_LL_HDR] to stmt_binary_error(),
it's not useful for the error reporting as location is not available.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoerec: Sanitize erec location indesc
Phil Sutter [Tue, 26 Jan 2021 17:52:15 +0000 (18:52 +0100)] 
erec: Sanitize erec location indesc

erec_print() unconditionally dereferences erec->locations->indesc, so
make sure it is valid when either creating an erec or adding a location.

Signed-off-by: Phil Sutter <phil@nwl.cc>
4 years agotests: shell: extend 0025empty_dynset_0 to cover multi-statement support
Pablo Neira Ayuso [Tue, 9 Feb 2021 11:57:14 +0000 (12:57 +0100)] 
tests: shell: extend 0025empty_dynset_0 to cover multi-statement support

Add a test to cover multi-statement support.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agotrace: do not remove icmp type from packet dump
Florian Westphal [Mon, 8 Feb 2021 14:54:44 +0000 (15:54 +0100)] 
trace: do not remove icmp type from packet dump

As of 0.9.8 the icmp type is marked as a protocol field, so its
elided in 'nft monitor trace' output:

   icmp code 0 icmp id 44380 ..

Restore it.  Unlike tcp, where 'tcp sport' et. al in the dump
will make the 'ip protocol tcp' redundant this case isn't obvious
in the icmp case:

  icmp type 8 code 0 id ...

Reported-by: Martin Gignac <martin.gignac@gmail.com>
Fixes: 98b871512c4677 ("src: add auto-dependencies for ipv4 icmp")
Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agosrc: add negation match on singleton bitmask value
Pablo Neira Ayuso [Mon, 1 Feb 2021 21:21:41 +0000 (22:21 +0100)] 
src: add negation match on singleton bitmask value

This patch provides a shortcut for:

ct status and dnat == 0

which allows to check for the packet whose dnat bit is unset:

  # nft add rule x y ct status ! dnat counter

This operation is only available for expression with a bitmask basetype, eg.

  # nft describe ct status
  ct expression, datatype ct_status (conntrack status) (basetype bitmask, integer), 32 bits

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoevaluate: do not crash if dynamic set has no statements
Florian Westphal [Wed, 3 Feb 2021 18:42:27 +0000 (19:42 +0100)] 
evaluate: do not crash if dynamic set has no statements

list_first_entry() returns garbage when the list is empty.
There is no need to run the following loop if we have no statements,
so just return 0.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agotests: add empty dynamic set
Florian Westphal [Wed, 3 Feb 2021 18:42:26 +0000 (19:42 +0100)] 
tests: add empty dynamic set

nft crashes on restore.

Signed-off-by: Florian Westphal <fw@strlen.de>
4 years agotestcases: move two dump files to correct location
Florian Westphal [Wed, 3 Feb 2021 18:42:25 +0000 (19:42 +0100)] 
testcases: move two dump files to correct location

The test cases were moved but the dumps remained in the old location.

Fixes: eb14363d44cea5 ("tests: shell: move chain priority and policy to chain folder")
Signed-off-by: Florian Westphal <fw@strlen.de>