intervals: add support to automerge with kernel elements
Extend the interval codebase to support for merging elements in the
kernel with userspace element updates.
Add a list of elements to be purged to cmd and set objects. These
elements representing outdated intervals are deleted before adding the
updated ranges.
This routine splices the list of userspace and kernel elements, then it
mergesorts to identify overlapping and contiguous ranges. This splice
operation is undone so the set userspace cache remains consistent.
Incrementally update the elements in the cache, this allows to remove dd44081d91ce ("segtree: Fix add and delete of element in same batch").
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: replace interval segment tree overlap and automerge
This is a rewrite of the segtree interval codebase.
This patch now splits the original set_to_interval() function in three
routines:
- add set_automerge() to merge overlapping and contiguous ranges.
The elements, expressed either as single value, prefix and ranges are
all first normalized to ranges. This elements expressed as ranges are
mergesorted. Then, there is a linear list inspection to check for
merge candidates. This code only merges elements in the same batch,
ie. it does not merge elements in the kernela and the userspace batch.
- add set_overlap() to check for overlapping set elements. Linux
kernel >= 5.7 already checks for overlaps, older kernels still needs
this code. This code checks for two conflict types:
1) between elements in this batch.
2) between elements in this batch and kernelspace.
The elements in the kernel are temporarily merged into the list of
elements in the batch to check for this overlaps. The EXPR_F_KERNEL
flag allows us to restore the set cache after the overlap check has
been performed.
- set_to_interval() now only transforms set elements, expressed as range
e.g. [a,b], to individual set elements using the EXPR_F_INTERVAL_END
flag notation to represent e.g. [a,b+1), where b+1 has the
EXPR_F_INTERVAL_END flag set on.
More relevant updates:
- The overlap and automerge routines are now performed in the evaluation
phase.
- The userspace set object representation now stores a reference to the
existing kernel set object (in case there is already a set with this
same name in the kernel). This is required by the new overlap and
automerge approach.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add initial test case, sets with names and interfaces,
anonymous and named ones.
Check match+no-match.
netns with ppp1 and ppq veth, send packets via both interfaces.
Rule counters should have incremented on the three rules.
(that match on set that have "abcdef1" or "abcdef*" strings in them).
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: make interval sets work with string datatypes
Allows to interface names in interval sets:
table inet filter {
set s {
type ifname
flags interval
elements = { eth*, foo }
}
Concatenations are not yet supported, also, listing is broken,
those strings will not be printed back because the values will remain
in big-endian order. Followup patch will extend segtree to translate
this back to host byte order.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: string prefix expression must retain original length
To make something like "eth*" work for interval sets (match
eth0, eth1, and so on...) we must treat the string as a 128 bit
integer.
Without this, segtree will do the wrong thing when applying the prefix,
because we generate the prefix based on 'eth*' as input, with a length of 3.
The correct import needs to be done on "eth\0\0\0\0\0\0\0...", i.e., if
the input buffer were an ipv6 address, it should look like "eth\0::",
not "::eth".
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: make byteorder conversion on string base type a no-op
Prerequisite for support of interface names in interval sets:
table inet filter {
set s {
type ifname
flags interval
elements = { "foo" }
}
chain input {
type filter hook input priority filter; policy accept;
iifname @s counter
}
}
Will yield: "Byteorder mismatch: meta expected big endian, got host endian".
This is because of:
/* Data for range lookups needs to be in big endian order */
if (right->set->flags & NFT_SET_INTERVAL &&
byteorder_conversion(ctx, &rel->left, BYTEORDER_BIG_ENDIAN) < 0)
It doesn't make sense to me to add checks to all callers of
byteorder_conversion(), so treat this similar to EXPR_CONCAT and turn
TYPE_STRING byteorder change into a no-op.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Martin Gignac [Sat, 9 Apr 2022 12:57:02 +0000 (08:57 -0400)]
tests: py: Add meta time tests without 'meta' keyword
v1.0.2 of 'nft' fails on 'time < "2022-07-01 11:00:00"' but succeeds
when 'meta' is specified ('meta time < "2022-07-01 11:00:00"'). This
extends coverage by testing 'time' without 'meta'.
Signed-off-by: Martin Gignac <martin.gignac@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Thu, 7 Apr 2022 11:53:05 +0000 (13:53 +0200)]
tests: py: Don't colorize output if stderr is redirected
Cover for calls with '2>/tmp/log' and avoid printing escape sequences to
that file. One could still keep colored output on stdout, but that
required a printing routine for non-errors.
Phil Sutter [Wed, 6 Apr 2022 13:41:03 +0000 (15:41 +0200)]
tests: monitor: Hide temporary file names from error output
Make error output deterministic by passing input to nft via stdin. This
way error messages will contain "/dev/stdin" instead of the temporary
file name.
After commit 0210097879 ("meta: time: use uint64_t instead of time_t")
there is a compiler warning due to comparison of the return value from
parse_iso_date with -1, which is now implicitly cast to uint64_t.
Fix this by making parse_iso_date take a pointer to the tstamp and
return bool instead.
Fixes: 0210097879 ("meta: time: use uint64_t instead of time_t") Signed-off-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
time_t may be 32 bit on some platforms and thus can't fit a timestamp
with nanoseconds resolution. This causes overflows and ultimatively
breaks meta time expressions on such platforms.
Fix this by using uint64_t instead.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1567 Fixes: f8f32deda31df ("meta: Introduce new conditions 'time', 'day' and 'hour'") Signed-off-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
optimize: Restore optimization for raw payload expressions
This patch reverts d0f14b5337e7 ("optimize: do not merge raw payload
expressions") after adding support for concatenation with variable
length TYPE_INTEGER.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: allow to use integer type header fields via typeof set declaration
Header fields such as udp length cannot be used in concatenations because
it is using the generic integer_type:
test.nft:3:10-19: Error: can not use variable sized data types (integer) in concat expressions
typeof udp length . @th,32,32
^^^^^^^^^^~~~~~~~~~~~~
This patch slightly extends ("src: allow to use typeof of raw expressions in
set declaration") to set on NFTNL_UDATA_SET_KEY_PAYLOAD_LEN in userdata if
TYPE_INTEGER is used.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: allow to use typeof of raw expressions in set declaration
Use the dynamic datatype to allocate an instance of TYPE_INTEGER and set
length and byteorder. Add missing information to the set userdata area
for raw payload expressions which allows to rebuild the set typeof from
the listing path.
A few examples:
- With anonymous sets:
nft add rule x y ip saddr . @ih,32,32 { 1.1.1.1 . 0x14, 2.2.2.2 . 0x1e }
- With named sets:
table x {
set y {
typeof ip saddr . @ih,32,32
elements = { 1.1.1.1 . 0x14 }
}
}
Incremental updates are also supported, eg.
nft add element x y { 3.3.3.3 . 0x28 }
expr_evaluate_concat() is used to evaluate both set key definitions
and set key values, using two different function might help to simplify
this code in the future.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
"When trying to add a rule which contains an anonymous chain to a
non-existent chain, string_misspell_update() is called with a NULL
string because the anonymous chain has no name. Avoid this by making the
function NULL-pointer tolerant."
Fixes: c330152b7f777 ("src: support for implicit chain bindings") Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Sam James [Thu, 24 Feb 2022 19:45:43 +0000 (19:45 +0000)]
build: explicitly pass --version-script to linker
--version-script is a linker option, so let's use -Wl, so that
libtool handles it properly. It seems like the previous method gets silently
ignored with GNU libtool in some cases(?) and downstream in Gentoo,
we had to apply this change to make the build work with slibtool anyway.
But it's indeed correct in any case, so let's swap.
Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Sam James [Thu, 24 Feb 2022 19:45:42 +0000 (19:45 +0000)]
libnftables.map: export new nft_ctx_{get,set}_optimize API
[ Remove incorrect symbol names were exported via .map file ]
Without this, we're not explicitly saying this is part of the
public API.
This new API was added in 1.0.2 and is used by e.g. the main
nft binary. Noticed when fixing the version-script option
(separate patch) which picked up this problem when .map
was missing symbols (related to when symbol visibility
options get set).
Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 22 Feb 2022 12:51:09 +0000 (13:51 +0100)]
tests: add test case for flowtable with owner flag
BUG: KASAN: use-after-free in nf_hook_entries_grow+0x675/0x980
Read of size 4 at ... nft/19662
nf_hook_entries_grow+0x675/0x980
This is fixed by kernel commit 6069da443bf
("netfilter: nf_tables: unregister flowtable hooks on netns exit").
The test case here uses owner flag, netlink event handler doesn't
release the flowtable, next attempt to add one then causes uaf because
of dangling ingress hook reference.
examples: compile with `make check' and add AM_CPPFLAGS
Compile examples via `make check' like libnftnl does. Use AM_CPPFLAGS to
specify local headers via -I.
Unfortunately, `make distcheck' did not catch this compile time error in
my system, since it was using the nftables/libnftables.h file of the
previous nftables release.
Fixes: 5b364657a35f ("build: missing SUBIRS update") Fixes: caf2a6ad2d22 ("examples: add libnftables example program") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eugene Crosser [Thu, 9 Dec 2021 18:26:06 +0000 (19:26 +0100)]
netlink: Use abort() in case of netlink_abi_error
Library functions should not use exit(), application that uses the
library may contain error handling path, that cannot be executed if
library functions calls exit(). For truly fatal errors, using abort() is
more acceptable than exit().
Signed-off-by: Eugene Crosser <crosser@average.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Sat, 15 Jan 2022 18:27:06 +0000 (18:27 +0000)]
src: add a helper that returns a payload dependency for a particular base
Currently, with only one base and dependency stored this is superfluous,
but it will become more useful when the next commit adds support for
storing a payload for every base.
Remove redundant `ctx->pbase` check.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Sat, 15 Jan 2022 19:00:49 +0000 (20:00 +0100)]
src: silence compiler warnings
cache.c:504:22: warning: ‘chain’ may be used uninitialized in this function [-Wmaybe-uninitialized]
cache.c:504:22: warning: ‘table’ may be used uninitialized in this function [-Wmaybe-uninitialized]
erec.c:128:16: warning: ‘line’ may be used uninitialized in this function [-Wmaybe-uninitialized]
optimize.c:524:9: warning: ‘line’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Fixes: 8ad4056e9182 ("erec: expose print_location() and line_location()") Fixes: afbd102211dc ("src: do not use the nft_cache_filter object from mnl.c") Fixes: fb298877ece2 ("src: add ruleset optimization infrastructure") Signed-off-by: Florian Westphal <fw@strlen.de>
src: 'nft list chain' prints anonymous chains correctly
If the user is requesting a chain listing, e.g. nft list chain x y
and a rule refers to an anonymous chain that cannot be found in the cache,
then fetch such anonymous chain and its ruleset.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1577 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
optimize: merge rules with same selectors into a concatenation
This patch extends the ruleset optimization infrastructure to collapse
several rules with the same selectors into a concatenation.
Transform:
meta iifname eth1 ip saddr 1.1.1.1 ip daddr 2.2.2.3 accept
meta iifname eth1 ip saddr 1.1.1.2 ip daddr 2.2.2.5 accept
meta iifname eth2 ip saddr 1.1.1.3 ip daddr 2.2.2.6 accept
into:
meta iifname . ip saddr . ip daddr { eth1 . 1.1.1.1 . 2.2.2.6, eth1 . 1.1.1.2 . 2.2.2.5 , eth1 . 1.1.1.3 . 2.2.2.6 } accept
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds a new -o/--optimize option to enable ruleset
optimization.
You can combine this option with the dry run mode (--check) to review
the proposed ruleset updates without actually loading the ruleset, e.g.
# nft -c -o -f ruleset.test
Merging:
ruleset.nft:16:3-37: ip daddr 192.168.0.1 counter accept
ruleset.nft:17:3-37: ip daddr 192.168.0.2 counter accept
ruleset.nft:18:3-37: ip daddr 192.168.0.3 counter accept
into:
ip daddr { 192.168.0.1, 192.168.0.2, 192.168.0.3 } counter packets 0 bytes 0 accept
This infrastructure collects the common statements that are used in
rules, then it builds a matrix of rules vs. statements. Then, it looks
for common statements in consecutive rules which allows to merge rules.
This ruleset optimization always performs an implicit dry run to
validate that the original ruleset is correct. Then, on a second pass,
it performs the ruleset optimization and add the rules into the kernel
(unless --check has been specified by the user).
From libnftables perspective, there is a new API to enable
this feature:
This patch adds support for the first optimization: Collapse a linear
list of rules matching on a single selector into a set as exposed in the
example above.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>