evaluate: reject unsupported expressions in payload statement for bitfields
The payload statement evaluation pretends that it can handle any
expression for bitfields, but the existing evaluation code only knows
how to handle value expression.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: simplify payload statement evaluation for bitfields
Instead of allocating a lshift expression and relying on the binary
operation transfer propagate this to the mask value, lshift the mask
value immediately.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: release existing datatype when evaluating unary expression
Use __datatype_set() to release the existing datatype before assigning
the new one, otherwise ASAN reports the following memleak:
Direct leak of 104 byte(s) in 1 object(s) allocated from:
#0 0x7fbc8a2b89cf in __interceptor_malloc ../../../../src/libsa
#1 0x7fbc898c96c2 in xmalloc src/utils.c:31
#2 0x7fbc8971a182 in datatype_clone src/datatype.c:1406
#3 0x7fbc89737c35 in expr_evaluate_unary src/evaluate.c:1366
#4 0x7fbc89758ae9 in expr_evaluate src/evaluate.c:3057
#5 0x7fbc89726bd9 in byteorder_conversion src/evaluate.c:243
#6 0x7fbc89739ff0 in expr_evaluate_bitwise src/evaluate.c:1491
#7 0x7fbc8973b4f8 in expr_evaluate_binop src/evaluate.c:1600
#8 0x7fbc89758b01 in expr_evaluate src/evaluate.c:3059
#9 0x7fbc8975ae0e in stmt_evaluate_arg src/evaluate.c:3198
#10 0x7fbc8975c51d in stmt_evaluate_payload src/evaluate.c:330
Fixes: faa6908fad60 ("evaluate: clone unary expression datatype to deal with dynamic datatype") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This uses the wrong length. This must re-use the length of the datatype,
not the string length.
The added test cases will fail without the fix due to erroneous
overlap detection, which in itself is due to incorrect sorting of
the elements.
Example error:
netlink: Error: interval overlaps with an existing one
add element inet testifsets simple_wild { "2-1" } failed.
table inet testifsets {
... elements = { "1-1", "abcdef*", "othername", "ppp0" }
... but clearly "2-1" doesn't overlap with any existing members.
The false detection is because of the "acvdef*" wildcard getting sorted
at the beginning of the list which is because its erronously initialised
as a 64bit number instead of 128 bits (16 bytes / IFNAMSIZ).
Florian Westphal [Thu, 27 Feb 2025 14:52:10 +0000 (15:52 +0100)]
expression: expr_build_udata_recurse should recurse
If we see EXPR_BINOP, recurse: ->left can be another EXPR_BINOP.
This is irrelevant for 'typeof' named sets, but for anonymous sets, the
key is derived from the concat expression that builds the lookup key for
the anonymous set.
Florian Westphal [Thu, 27 Feb 2025 14:52:09 +0000 (15:52 +0100)]
netlink_delinearize: also consider exthdr type when trimming binops
This allows trimming the binop for exthdrs, this will make nft render
(tcp option mptcp unknown & 240) >> 4 . ip saddr @s1
as
tcp option mptcp subtype . ip saddr @s1
Also extend the typeof set tests with a set concatenating a
sub-byte-sized exthdr expression with a payload one.
The additional call to expr_postprocess() is needed, without this,
typeof_sets_0.nft fails because
frag frag-off @s4 accept
is shown as
meta nfproto ipv6 frag frag-off @s4 accept
Previouly, EXPR_EXTHDR would cause payload_binop_postprocess()
to return false which will then make the caller invoke
expr_postprocess(), but after handling EXPR_EXTHDR this doesn't happen
anymore.
Florian Westphal [Thu, 27 Feb 2025 14:52:07 +0000 (15:52 +0100)]
tcpopt: add symbol table for mptcp suboptions
nft can be used t match on specific multipath tcp subtypes:
tcp option mptcp subtype 0
However, depending on which subtype to match, users need to look up the
type/value to use in rfc8684. Add support for mnemonics and
"nft describe tcp option mptcp subtype" to get the subtype list.
Because the number of unique 'enum datatypes' is limited by ABI contraints
this adds a new mptcp suboption type as integer alias.
After this patch, nft supports all of the following:
add element t s { mp-capable }
add rule t c tcp option mptcp subtype mp-capable
add rule t c tcp option mptcp subtype { mp-capable, mp-fail }
For the 3rd case, listing will break because unlike for named sets, nft
lacks the type information needed to pretty-print the integer values,
i.e. nft will print the 3rd rule as 'subtype { 0, 6 }'.
This is resolved in a followup patch.
Other problematic constructs are:
set s1 {
typeof tcp option mptcp subtype . ip saddr
elements = { mp-fail . 1.2.3.4 }
}
Followed by:
tcp option mptcp subtype . ip saddr @s1
nft will print this as:
tcp option mptcp unknown & 240) >> 4 . ip saddr @s1
All of these issues are not related to this patch, however, they also occur
with other bit-sized extheader fields.
Add a test that replaces one base chain and check that no
filtered packets make it through, i.e. that the 'old chain'
doesn't disappear before new one is active.
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Tue, 25 Feb 2025 20:13:33 +0000 (21:13 +0100)]
payload: return early if dependency is not a payload expression
if (dep->left->payload.base != PROTO_BASE_TRANSPORT_HDR)
is legal only after checking that ->left points to an
EXPR_PAYLOAD expression. The dependency store can also contain
EXPR_META, in this case we access a bogus part of the union.
The payload_may_dependency_kill_icmp helper can't handle a META
dep either, so return early.
Fixes: 533565244d88 ("payload: check icmp dependency before removing previous icmp expression") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: add symbol range expression to further compact intervals
Update parser to use a new symbol range expression with smaller memory
footprint than range expression + two symbol expressions.
The evaluation step translates this into EXPR_RANGE_VALUE for interval
sets.
Note that maps or concatenations still use the less compact range
expressions representation, those require more work to use this new
symbol range expression. The parser also uses the classic range
expression if variables are used.
Testing with a 100k intervals, worst case scenario: no prefix or
singleton elements. This shows a reduction from 49.58 Mbytes to
35.47 Mbytes (-29.56% memory footprint for this case).
This follow up work to previous commits:
91dc281a82ea ("src: rework singleton interval transformation to reduce memory consumption") c9ee9032b0ee ("src: add EXPR_RANGE_VALUE expression and use it")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Moreover, unlike "list sets", "list maps" only supported "list maps" and
"list maps inet", without the ability to only list maps of a given table.
Compact this to unify the syntax so it becomes possible to omit the "table"
keyword for either reset or list mode.
flowtables, secmarks and synproxys keywords are updated too. "flow table"
and "meters" are NOT changed since both of these are deprecated in favor
of standard nft sets.
exthdr: incomplete type 2 routing header definition
Add missing type 2 routing header definition.
Listing is not correct because these IPv6 extension header are still
lacking context to properly delinearize the listing, but at least this
does not crash anymore.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Thu, 30 Jan 2025 17:47:13 +0000 (18:47 +0100)]
src: add and use payload_expr_trim_force
Previous commit fixed erroneous handling of raw expressions when RHS sets
a zero value.
Input: @ih,58,6 set 0 @ih,86,6 set 0 @ih,170,22 set 0
Output:@ih,48,16 set @ih,48,16 & 0xffc0 @ih,80,16 set \
@ih,80,16 & 0xfc0f @ih,160,32 set @ih,160,32 & 0xffc00000
After this patch, this will instead display:
@ih,58,6 set 0x0 @ih,86,6 set 0x0 @ih,170,22 set 0x0
payload_expr_trim_force() only works when the payload has no known
protocol (template) attached, i.e. will be printed as raw payload syntax.
It performs sanity checks on @mask and then adjusts the payload expression
length and offset according to the mask.
Also add this check in __binop_postprocess() so we can also discard masks
when matching, e.g.
binop_postprocess now returns if it performed an action or not; if this
returns true then arguments might have been freed so callers must no longer
refer to any of the expressions attached to the binop.
Next patch adds test cases for this.
Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
@ih,58,6 set 0 <- Zero 6 bits, starting with bit 58
Changes to inner header mandate a checksum update, which only works for
even byte counts (except for last byte in the payload).
Thus, we load 2b at offet 6. (16bits, offset 48).
Because we want to zero 6 bits, we need a mask that retains 10 bits and
clears 6: b1111111111000000 (first 8 bit retains 48-57, last 6 bit clear
58-63). The '0xc0ff' is not correct, but thats because debug output comes
from libnftnl which prints values in host byte order, the value will be
interpreted as big endian on kernel side, so this will do the right thing.
Next, same problem:
@ih,86,6 set 0 <- Zero 6 bits, starting with bit 86.
nft needs to round down to even-sized byte offset, 10, then retain first
6 bits (80 + 6 == 86), then clear 6 bits (86-91), then keep 4 more as-is
(92-95).
So mask is 0xfc0f (in big endian) would be correct (b1111110000001111).
Last expression, @ih,170,22 set 0, asks to clear 22 bits starting with bit
170, nft correctly rounds this down to a 32 bit read at offset 160.
Required mask keeps first 10 bits, then clears 22
(b11111111110000000000000000000000). Required mask would be 0xffc00000,
which corresponds to the wrong-endian-printed value in line 8 above.
Now that we convinced ourselves that the input side is correct, fix up
netlink delinearize to undo the mask alterations if we can't find a
template to print a human-readable payload expression.
With this patch, we get this output:
@ih,48,16 set @ih,48,16 & 0xffc0 @ih,80,16 set @ih,80,16 & 0xfc0f @ih,160,32 set @ih,160,32 & 0xffc00000
... which isn't ideal. We should fixup the payload expression to display
the same output as the input, i.e. adjust payload->len and offset as per
mask and discard the mask instead.
Florian Westphal [Wed, 22 Jan 2025 09:18:04 +0000 (10:18 +0100)]
evaluate: allow to re-use existing metered set
Blamed commit translates old meter syntax (which used to allocate an
anonymous set) to dynamic sets.
A side effect of this is that re-adding a meter rule after chain was
flushed results in an error, unlike anonymous sets named sets are not
impacted by the flush.
Refine this: if a set of the same name exists and is compatible, then
re-use it instead of returning an error.
Also pick up the reproducer kindly provided by the reporter and place it
in the shell test directory.
Fixes: b8f8ddfff733 ("evaluate: translate meter into dynamic set") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
src: rework singleton interval transformation to reduce memory consumption
set_to_intervals() expands range expressions into a list of singleton
elements before building the netlink message that is sent to userspace.
This is because the kernel expects this list of singleton elements where
EXPR_F_INTERVAL_END denotes a closing interval. This expansion
significantly increases memory consumption in userspace.
This patch updates the logic to transform the range expression up to two
temporary singleton element expressions through setelem_to_interval().
Then, these two elements are used to allocate the nftnl_set_elem objects
through alloc_nftnl_setelem_interval() to build the netlink message,
finally all these temporary objects are released. For anonymous sets,
when adjacent ranges are found, the end element is not added to the set
to pack the set representation as in the original set_to_intervals()
routine.
After this update, set_to_intervals() only deals with adding the
non-matching all zero element to the interval set when it is not there
as the kernel expects.
In combination with the new EXPR_RANGE_VALUE expression, this shrinks
runtime userspace memory consumption from 70.50 Mbytes to 43.38 Mbytes
for a 100k intervals set sample.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
mnl: do not send set size when set is constant set
When turning element range into the interval representation based on
singleton elements for the rbtree tree set backend, userspace adjusts
the size to the internal kernel implementation.
For constant sets, this is leaking an internal kernel implementation
detail that is fixed by kernel patch ("netfilter: nf_tables: fix set
size with rbtree backend"). For non-constant sets, set size is just
broken.
This patch is required by the follow up patch ("src: rework singleton
interval transformation to reduce memory consumption").
On top of this, constant sets cannot be updated once they are bound, set
size is not useful in this case. Remove this implicit set size for
constant sets.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
set element with range takes 4 instances of struct expr:
EXPR_SET_ELEM -> EXPR_RANGE -> (2) EXPR_VALUE
where EXPR_RANGE represents two references to struct expr with constant
value.
This new EXPR_RANGE_VALUE trims it down to two expressions:
EXPR_SET_ELEM -> EXPR_RANGE_VALUE
with two direct low and high values that represent the range:
struct {
mpz_t low;
mpz_t high;
};
this two new direct values in struct expr do not modify its size.
setelem_expr_to_range() translates EXPR_RANGE to EXPR_RANGE_VALUE, this
conversion happens at a later stage.
constant_range_expr_print() translates this structure to constant values
to reuse the existing datatype_print() which relies in singleton values.
The automerge routine has been updated to use EXPR_RANGE_VALUE.
This requires a follow up patch to rework the conversion from range
expression to singleton element to provide a noticeable memory
consumption reduction.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is a test case for nft_socket cgroupv2 matching, including
support for matching inside a cgroupv2 mount space added in kernel
commit 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces").
Test is thus run twice, once in the initial namespace and once with
a changed cgroupv2 root.
In case we can't create a cgroup or the 2nd half (unshared re-run)
fails, indicate SKIP.
Jeremy Sowden [Mon, 18 Nov 2024 23:18:28 +0000 (00:18 +0100)]
src: allow binop expressions with variable right-hand operands
Hitherto, the kernel has required constant values for the `xor` and
`mask` attributes of boolean bitwise expressions. This has meant that
the right-hand operand of a boolean binop must be constant. Now the
kernel has support for AND, OR and XOR operations with right-hand
operands passed via registers, we can relax this restriction. Allow
non-constant right-hand operands if the left-hand operand is not
constant, e.g.:
ct mark & 0xffff0000 | meta mark & 0xffff
The kernel now supports performing AND, OR and XOR operations directly,
on one register and an immediate value or on two registers, so we need
to be able to generate and parse bitwise boolean expressions of this
form.
If a boolean operation has a constant RHS, we continue to send a
mask-and-xor expression to the kernel.
Add tests for {ct,meta} mark with variable RHS operands.
JSON support is also included.
This requires Linux kernel >= 6.13-rc.
[ Originally posted as patch 1/8 and 6/8 which has been collapsed and
simplified to focus on initial {ct,meta} mark support. Tests have
been extracted from 8/8 including a tests/py fix to payload output
due to incorrect output in original patchset. JSON support has been
extracted from patch 7/8 --pablo]
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Donald Yandt [Fri, 22 Nov 2024 22:04:49 +0000 (17:04 -0500)]
mnl: fix basehook comparison
When comparing two hooks, if both device names are null,
the comparison should return true, as they are considered equal.
Fixes: b8872b83eb365 ("src: mnl: prepare for listing all device netdev device hooks") Signed-off-by: Donald Yandt <donald.yandt@gmail.com> Signed-off-by: Phil Sutter <phil@nwl.cc>
Florian Westphal [Fri, 25 Oct 2024 07:47:25 +0000 (09:47 +0200)]
src: allow to map key to nfqueue number
Allow to specify a numeric queue id as part of a map.
The parser side is easy, but the reverse direction (listing) is not.
'queue' is a statement, it doesn't have an expression.
Add a generic 'queue_type' datatype as a shim to the real basetype with
constant expressions, this is used only for udata build/parse, it stores
the "key" (the parser token, here "queue") as udata in kernel and can
then restore the original key.
Add a dumpfile to validate parser & output.
JSON support is missing because JSON allow typeof only since quite
recently.
Phil Sutter [Thu, 7 Nov 2024 13:39:51 +0000 (14:39 +0100)]
tests: monitor: Become $PWD agnostic
The call to 'cd' is problematic since later the script tries to 'exec
unshare -n $0'. This is not the only problem though: Individual test
cases specified on command line are expected to be relative to the
script's directory, too. Just get rid of these nonsensical restrictions.
Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Phil Sutter <phil@nwl.cc>
Phil Sutter [Wed, 2 Oct 2024 17:55:49 +0000 (19:55 +0200)]
tests: py: Fix for storing payload into missing file
When running a test for which no corresponding *.payload file exists,
the *.payload.got file name was incorrectly constructed due to
'payload_path' variable not being set.
Fixes: 2cfab7a3e10fc ("tests/py: Write dissenting payload into the right file") Signed-off-by: Phil Sutter <phil@nwl.cc>
Phil Sutter [Fri, 27 Sep 2024 22:55:34 +0000 (00:55 +0200)]
json: Support typeof in set and map types
Implement this as a special "type" property value which is an object
with sole property "typeof". The latter's value is the JSON
representation of the expression in set->key, so for concatenated
typeofs it is a concat expression.
All this is a bit clumsy right now but it works and it should be
possible to tear it down a bit for more user-friendliness in a
compatible way by either replacing the concat expression by the array it
contains or even the whole "typeof" object - the parser would just
assume any object (or objects in an array) in the "type" property value
are expressions to extract a type from.
Update json parser to collapse {add,create} element commands to reduce
memory consumption in the case of large sets defined by one element per
command:
Add CTX_F_COLLAPSED flag to report that command has been collapsed.
This patch reduces memory consumption by ~32% this case.
Fixes: 20f1c60ac8c8 ("src: collapse set element commands from parser") Reported-by: Eric Garver <eric@garver.life> Tested-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 29 Oct 2024 20:12:19 +0000 (21:12 +0100)]
tests: monitor: fix up test case breakage
Monitor test fails:
echo: running tests from file set-simple.t
echo output differs!
-add element ip t portrange { 1024-65535 }
add element ip t portrange { 100-200 }
+add element ip t portrange { 1024-65535 }
+# new generation 510 by process 129009 (nft)
I also noticed -j mode did not work correctly, add missing json annotations
in set-concat-interval.t while at it.
Florian Westphal [Tue, 22 Oct 2024 13:26:54 +0000 (15:26 +0200)]
tests: shell: don't rely on writable test directory
Running shell tests from a virtme-ng instance with ro mapped test dir
hangs due to runaway 'awk' reading from stdin instead of the intended
$tmpfile (variable is empty), so add quotes where needed.
0002relative_0 wants to check relative includes. It tries to create a
temporary file in the current directory, which fails as thats readonly
inside the virtme vm instance.
[ -w ! $foo ... did not catch this due to missing "".
Add quotes and return the skip retval so the test gets flagged as skipped.
0013input_descriptors_included_files_0 and 0020include_chain_0 are
switched to normal tmpfiles, there is nothing in the test that needs
relative includes.
Also, get rid of some error tests for subsequent mktemp calls for
scripts that already called 'set -e'.
src: fix extended netlink error reporting with large set elements
Large sets can expand into several netlink messages, use sequence number
and attribute offset to correlate the set element and the location.
When set element command expands into several netlink messages,
increment sequence number for each netlink message. Update struct cmd to
store the range of netlink messages that result from this command.
struct nlerr_loc remains in the same size in x86_64.
# nft -f set-65535.nft
set-65535.nft:65029:22-32: Error: Could not process rule: File exists
create element x y { 1.1.254.253 }
^^^^^^^^^^^
rule: netlink attribute offset is uint32_t for struct nlerr_loc
The maximum netlink message length (nlh->nlmsg_len) is uint32_t, struct
nlerr_loc stores the offset to the netlink attribute which must be
uint32_t, not uint16_t.
While at it, remove check for zero netlink attribute offset in
nft_cmd_error() which should not ever happen, likely this check was
there to prevent the uint16_t offset overflow.
498a5f0c219d ("rule: collapse set element commands") does not help to
reduce memory consumption in the case of large sets defined by one
element per line:
add element ip x y { 1.1.1.1 }
add element ip x y { 1.1.1.2 }
...
This patch reduces memory consumption by ~75%, set elements are
collapsed into an existing cmd object wherever possible to reduce the
number of cmd objects.
This patch also adds a special case for variables for sets similar to:
be055af5c58d ("cmd: skip variable set elements when collapsing commands")
netfilter: nf_tables: set element extended ACK reporting support
which is already included in recent -stable kernels:
# cat ruleset.nft
add table ip x
add chain ip x y
add set ip x y { type ipv4_addr; }
create element ip x y { 1.1.1.1 }
create element ip x y { 1.1.1.1 }
# nft -f ruleset.nft
ruleset.nft:5:25-31: Error: Could not process rule: File exists
create element ip x y { 1.1.1.1 }
^^^^^^^
since there is no need to relate commands via sequence number anymore,
this allows also removes the uncollapse step.
Fixes: 498a5f0c219d ("rule: collapse set element commands") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Allow to specify elements that never expire in sets with global
timeout.
set x {
typeof ip saddr
timeout 1m
elements = { 1.1.1.1 timeout never,
2.2.2.2,
3.3.3.3 timeout 2m }
}
in this example above:
- 1.1.1.1 is a permanent element
- 2.2.2.2 expires after 1 minute (uses default set timeout)
- 3.3.3.3 expires after 2 minutes (uses specified timeout override)
Use internal NFT_NEVER_TIMEOUT marker as UINT64_MAX to differenciate
between use default set timeout and timeout never if "timeout N" is used
in set declaration. Maximum supported timeout in milliseconds which is
conveyed within a netlink attribute is 0x10c6f7a0b5ec.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tests: shell: more randomization for timeout parameter
Either pass no timeout argument, pass timeout+expires or omit
timeout (uses default timeout, if any).
This should not expose further kernel code to run at this time, but unlike
the existing (deterministic) element-update test case this script does
have live traffic and different set types, including rhashtable which has
async gc.
proto: use NFT_PAYLOAD_L4CSUM_PSEUDOHDR flag to mangle UDP checksum
There are two mechanisms to update the UDP checksum field:
1) _CSUM_TYPE and _CSUM_OFFSET which specify the type of checksum
(e.g. inet) and offset where it is located.
2) use NFT_PAYLOAD_L4CSUM_PSEUDOHDR flag to use layer 4 kernel
protocol parser.
The problem with 1) is that it is inconditional, that is, csum_type and
csum_offset cannot deal with zero UDP checksum.
Use NFT_PAYLOAD_L4CSUM_PSEUDOHDR flag instead since it relies on the
layer 4 kernel parser which skips updating zero UDP checksum.
Extend test coverage for the UDP mangling with and without zero
checksum.
Fixes: e6c9174e13b2 ("proto: add checksum key information to struct proto_desc") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- Add sleep calls after setting up container topology.
- Extend TCP connect timeout to 4 seconds. Test has no listener, this is
just sending SYN packets that are rejected but it works to test the
payload mangling ruleset.
- fix incorrect logic to check for 0 matching packets through grep.
Fixes: 84da729e067a ("tests: shell: add test to cover payload transport match and mangle") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Needs a feature check file, so add one:
Add element with 1m timeout, then update expiry to 1ms.
If element still exists after 1ms, update request was ignored.
Test case checks timeouts can both be incremented and decremented,
checks error recovery (update request but transaction fails) and
that expiry is restored in addion to timeout.
Phil Sutter [Tue, 3 Sep 2024 15:43:19 +0000 (17:43 +0200)]
libnftables: Zero ctx->vars after freeing it
Leaving the invalid pointer value in place will cause a double-free when
users call nft_ctx_clear_vars() first, then nft_ctx_free(). Moreover,
nft_ctx_add_var() passes the pointer to mrealloc() and thus assumes it
to be either NULL or valid.
position refers to the rule handle, it has similar cache requirements as
replace rule command, relax cache requirements.
Commit e5382c0d08e3 ("src: Support intra-transaction rule references")
uses position.id for index support which requires a full cache, but
only in such case.
Fixes: 01e5c6f0ed03 ("src: add cache level flags") Tested-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
No need for full cache, this command relies on the rule handle which is
not validated from userspace. Cache requirements are similar to those
of add/create/delete rule commands.
This speeds up incremental updates with large rulesets.
Extend tests/coverage for rule replacement.
Fixes: 01e5c6f0ed03 ("src: add cache level flags") Tested-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
cache: assert filter when calling nft_cache_evaluate()
nft_cache_evaluate() always takes a non-null filter, remove superfluous
checks when calculating cache requirements via flags.
Note that filter is still option from netlink dump path, since this can
be called from error path to provide hints.
Fixes: 08725a9dc14c ("cache: filter out rules by chain") Fixes: b3ed8fd8c9f3 ("cache: missing family in cache filtering") Fixes: 635ee1cad8aa ("cache: filter out sets and maps that are not requested") Fixes: 3f1d3912c3a6 ("cache: filter out tables that are not requested") Tested-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>