Phil Sutter [Tue, 1 Aug 2023 11:35:27 +0000 (13:35 +0200)]
tests: shell: Review test-cases for destroy command
Having separate files for successful destroy of existing and
non-existing objects is a bit too much, just combine them into one.
While being at it:
* No bashisms, using /bin/sh is fine
* Append '-e' to shebang itself instead of calling 'set'
* Use 'nft -a -e' instead of assuming the created rule's handle value
* Shellcheck warned about curly braces, quote them
Jeremy Sowden [Mon, 31 Jul 2023 11:40:23 +0000 (12:40 +0100)]
py: use setup.cfg to configure setuptools
Setuptools has had support for declarative configuration for several
years. To quote their documentation:
Setuptools allows using configuration files (usually setup.cfg) to
define a package’s metadata and other options that are normally
supplied to the setup() function (declarative config).
This approach not only allows automation scenarios but also reduces
boilerplate code in some cases.
Additionally, this allows us to introduce support for PEP-517-compatible
build-systems.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Mon, 31 Jul 2023 11:40:22 +0000 (12:40 +0100)]
py: move package source into src directory
Separate the actual package source from the build files. In addition
to being a bit tidier, this will prevent setup.py being erroneously
installed when we introduce PEP-517 support in a later commit.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Its legal to DNAT in output and SNAT in input chain, so don't test
for that being illegal.
Fixes: 8beafab74c39 ("rule: allow src/dstnat prios in input and output") Fixes: 34ce4e4a7bb6 ("test: shell: Test cases for standard chain prios") Signed-off-by: Florian Westphal <fw@strlen.de>
Extend e0aace943412 ("libnftables: Drop cache in error case") to also
drop the cache with -c/--check, this is a dry run mode and kernel does
not get any update.
This fixes a bug with -o/--optimize, which first runs in an implicit
-c/--check mode to validate that the ruleset is correct, then it
provides the proposed optimization. In this case, if the cache is not
emptied, old objects in the cache refer to scanner data that was
already released, which triggers BUG like this:
ct expectation: fix 'list object x' vs. 'list objects in table' confusion
Just like "ct timeout", "ct expectation" is in need of the same fix,
we get segfault on "nft list ct expectation table t", if table t exists.
This is the exact same pattern as resolved for "ct timeout" in commit 1d2e22fc0521 ("ct timeout: fix 'list object x' vs. 'list objects in table' confusion").
The "dnat" command is usable from either "prerouting" or "output", but the
"dstnat" priority is only usable from "prerouting". (Likewise, "snat" is usable
from either "postrouting" or "input", but "srcnat" is only usable from
"postrouting".)
No need to restrict those priorities to pre/postrouting.
On my test vm a full scan takes almost 5s. As this would slowdown
the test runs too much, only run them every couple of tests.
This allows to detect when there is a leak reported at the
end of the script, and it allows to narrow down the test
case/group that triggers the issue.
Add new -K flag to force kmemleak runs after each test if its
available, this can then be used to find the exact test case.
meta: stash context statement length when generating payload/meta dependency
... meta mark set ip dscp
generates an implicit dependency from the inet family to match on meta
nfproto ip.
The length of this implicit expression is incorrectly adjusted to the
statement length, ie. relational to compare meta nfproto takes 4 bytes
instead of 1 byte. The evaluation of 'ip dscp' under the meta mark
statement triggers this implicit dependency which should not consider
the context statement length since it is added before the statement
itself.
This problem shows when listing the ruleset, since netlink_parse_cmp()
where left->len < right->len, hence handling the implicit dependency as
a concatenation, but it is actually a bug in the evaluation step that
leads to incorrect bytecode.
Fixes: 3c64ea7995cb ("evaluate: honor statement length in integer evaluation") Fixes: edecd58755a8 ("evaluate: support shifts larger than the width of the left operand") Tested-by: Brian Davidson <davidson.brian@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Thomas Haller [Tue, 18 Jul 2023 10:33:09 +0000 (12:33 +0200)]
py: return boolean value from Nftables.__[gs]et_output_flag()
The callers of __get_output_flag() and __set_output_flag(), for example
get_reversedns_output(), are all documented to return a "boolean" value.
Instead, they returned the underlying, non-zero flags value. That number
is not obviously useful to the caller, because there is no API so that
the caller could do anything with it (except evaluating it in a boolean
context). Adjust that, to match the documentation.
The alternative would be to update the documentation, to indicate that
the functions return a non-zero integer when the flag is set. That would
preserve the previous behavior and maybe the number could be useful
somehow(?).
Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Phil Sutter <phil@nwl.cc>
Phil Sutter [Thu, 15 Jun 2023 13:24:28 +0000 (15:24 +0200)]
Implement 'reset {set,map,element}' commands
All these are used to reset state in set/map elements, i.e. reset the
timeout or zero quota and counter values.
While 'reset element' expects a (list of) elements to be specified which
should be reset, 'reset set/map' will reset all elements in the given
set/map.
Phil Sutter [Wed, 14 Jun 2023 15:40:02 +0000 (17:40 +0200)]
evaluate: Cache looked up set for list commands
Evaluation phase checks the given table and set exist in cache. Relieve
execution phase from having to perform the lookup again by storing the
set reference in cmd->set. Just have to increase the ref counter so
cmd_free() does the right thing (which lacked handling of MAP and METER
objects for some reason).
Phil Sutter [Wed, 14 Jun 2023 13:32:04 +0000 (15:32 +0200)]
evaluate: Merge some cases in cmd_evaluate_list()
The code for set, map and meter were almost identical apart from the
specific last check. Fold them together and make the distinction in that
spot only.
Add a test to cover 423abaa40ec4 ("scanner: don't rely on fseek for
input stream repositioning") that fixes the bug described in
https://bugs.gentoo.org/675188.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Thomas Haller [Mon, 10 Jul 2023 08:45:18 +0000 (10:45 +0200)]
libnftables: inline creation of nf_sock in nft_ctx_new()
The function only has one caller. It's not clear how to extend this in a
useful way, so that it makes sense to keep the initialization in a
separate function.
Simplify the code, by inlining and dropping the static function
nft_ctx_netlink_init(). There was only one caller.
Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Thomas Haller [Mon, 10 Jul 2023 08:45:16 +0000 (10:45 +0200)]
libnftables: always initialize netlink socket in nft_ctx_new()
nft_ctx_new() has a flags argument, but currently no flags are
supported. The documentation suggests to pass 0 (NFT_CTX_DEFAULT).
Initializing the netlink socket happens by default already, we should do
it for all flags. Also because nft_ctx_netlink_init() is not public
API so it's not clear how the user gets a functioning context instance
otherwise.
If we ever want to not initialize the netlink socket for a context
instance, then there should be a dedicated flag for doing that (and
additional API for making that mode of operation usable).
Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: place byteorder conversion before rshift in payload statement
For bitfield that spans more than one byte, such as ip6 dscp, byteorder
conversion needs to be done before rshift. Add unary expression for this
conversion only in the case of meta and ct statements.
Before this patch:
# nft --debug=netlink add rule ip6 x y 'meta mark set ip6 dscp'
ip6 x y
[ payload load 2b @ network header + 0 => reg 1 ]
[ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ]
[ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ]
[ byteorder reg 1 = ntoh(reg 1, 2, 2) ] <--------- incorrect
[ meta set mark with reg 1 ]
After this patch:
# nft --debug=netlink add rule ip6 x y 'meta mark set ip6 dscp'
ip6 x y
[ payload load 2b @ network header + 0 => reg 1 ]
[ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ]
[ byteorder reg 1 = ntoh(reg 1, 2, 2) ] <-------- correct
[ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ]
[ meta set mark with reg 1 ]
For the matching case, binary transfer already deals with the rshift to
adjust left and right hand side of the expression, the unary conversion
is not needed in such case.
Fixes: 8221d86e616b ("tests: py: add test-cases for ct and packet mark payload expressions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 14 Jun 2023 18:01:46 +0000 (20:01 +0200)]
tests: shell: Introduce valgrind mode
Pass flag '-V' to run-tests.sh to run all 'nft' invocations in valgrind
leak checking environment. Code copied from iptables' shell-testsuite
where it proved to be useful already.
Phil Sutter [Wed, 21 Jun 2023 23:10:59 +0000 (01:10 +0200)]
cli: Make cli_init() return to caller
Avoid direct exit() calls as that leaves the caller-allocated nft_ctx
object in place. Making sure it is freed helps with valgrind-analyses at
least.
To signal desired exit from CLI, introduce global cli_quit boolean and
make all cli_exit() implementations also set cli_rc variable to the
appropriate return code.
The logic is to finish CLI only if cli_quit is true which asserts proper
cleanup as it is set only by the respective cli_exit() function.
Phil Sutter [Wed, 21 Jun 2023 22:46:53 +0000 (00:46 +0200)]
main: Make 'buf' variable branch-local
It is used only to linearize non-option argv for passing to
nft_run_cmd_from_buffer(), reduce its scope. Allows to safely move the
free() call there, too.
Florian Westphal [Tue, 20 Jun 2023 19:52:13 +0000 (21:52 +0200)]
src: avoid IPPROTO_MAX for array definitions
ip header can only accomodate 8but value, but IPPROTO_MAX has been bumped
due to uapi reasons to support MPTCP (262, which is used to toggle on
multipath support in tcp).
This results in:
exthdr.c:349:11: warning: result of comparison of constant 263 with expression of type 'uint8_t' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare]
if (type < array_size(exthdr_protocols))
~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
redude array sizes back to what can be used on-wire.
Florian Westphal [Mon, 19 Jun 2023 20:43:06 +0000 (22:43 +0200)]
ct timeout: fix 'list object x' vs. 'list objects in table' confusion
<empty ruleset>
$ nft list ct timeout table t
Error: No such file or directory
list ct timeout table t
^
This is expected to list all 'ct timeout' objects.
The failure is correct, the table 't' does not exist.
But now lets add one:
$ nft add table t
$ nft list ct timeout table t
Segmentation fault (core dumped)
... and thats not expected, nothing should be shown
and nft should exit normally.
Because of missing TIMEOUTS command enum, the backend thinks
it should do an object lookup, but as frontend asked for
'list of objects' rather than 'show this object',
handle.obj.name is NULL, which then results in this crash.
Update the command enums so that backend knows what the
frontend asked for.
Florian Westphal [Mon, 19 Jun 2023 20:43:01 +0000 (22:43 +0200)]
json: dccp: remove erroneous const qualifier
This causes a clang warning:
parser_json.c:767:6: warning: variable 'opt_type' is uninitialized when used here [-Wuninitialized]
if (opt_type < DCCPOPT_TYPE_MIN || opt_type > DCCPOPT_TYPE_MAX) {
^~~~~~~~
... because it deduces the object is readonly.
Florian Westphal [Sun, 18 Jun 2023 16:39:45 +0000 (18:39 +0200)]
cache: include set elements in "nft set list"
Make "nft list sets" include set elements in listing by default.
In nftables 1.0.0, "nft list sets" did not include the set elements,
but with "--json" they were included.
1.0.1 and newer never include them.
This causes a problem for people updating from 1.0.0 and relying
on the presence of the set elements.
Change nftables to always include the set elements.
The "--terse" option is honored to get the "no elements" behaviour.
Make sure reference tracking during transaction update is correct by
checking for bogus EBUSY error. For example, when deleting map with
chain reference X, followed by a delete chain X command.
Jeremy Sowden [Tue, 11 Apr 2023 20:45:34 +0000 (21:45 +0100)]
exthdr: add boolean DCCP option matching
Iptables supports the matching of DCCP packets based on the presence
or absence of DCCP options. Extend exthdr expressions to add this
functionality to nftables.
Extend tests to cover destroy command for chains, flowtables, sets,
maps. In addition rename a destroy command test for rules with a
duplicated number.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This fails because the relational expression first evaluates
the left hand side, so when concat evaluation sees '1.2.3.4'
no key context is available.
Check if the RHS is a set reference, and, if so, evaluate
the right hand side.
This sets a pointer to the set key in the evaluation context
structure which then makes the concat evaluation step parse
1.2.3.4 and 80 as ipv4 address and 16bit port number.
On delinearization, extend relop postprocessing to
copy the datatype from the rhs (set reference, has
proper datatype according to set->key) to the lhs (concat
expression).
evaluate: set NFT_SET_EVAL flag if dynamic set already exists
nft reports EEXIST when reading an existing set whose NFT_SET_EVAL has
been previously inferred from the ruleset.
# cat test.nft
table ip test {
set dlist {
type ipv4_addr
size 65535
}
chain output {
type filter hook output priority filter; policy accept;
udp dport 1234 update @dlist { ip daddr } counter packets 0 bytes 0
}
}
# nft -f test.nft
# nft -f test.nft
test.nft:2:6-10: Error: Could not process rule: File exists
set dlist {
^^^^^
Phil Sutter says:
In the first call, the set lacking 'dynamic' flag does not exist
and is therefore added to the cache. Consequently, both the 'add set'
command and the set statement point at the same set object. In the
second call, a set with same name exists already, so the object created
for 'add set' command is not added to cache and consequently not updated
with the missing flag. The kernel thus rejects the NEWSET request as the
existing set differs from the new one.
Set on the NFT_SET_EVAL flag if the existing set sets it on.
Fixes: 8d443adfcc8c1 ("evaluate: attempt to set_eval flag if dynamic updates requested") Tested-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If user provides a symbol that cannot be parsed and the datatype provides
an error handler, provide a hint through the misspell infrastructure.
For instance:
# cat test.nft
table ip x {
map y {
typeof ip saddr : verdict
elements = { 1.2.3.4 : filter_server1 }
}
}
# nft -f test.nft
test.nft:4:26-39: Error: Could not parse netfilter verdict; did you mean `jump filter_server1'?
elements = { 1.2.3.4 : filter_server1 }
^^^^^^^^^^^^^^
While at it, normalize error to "Could not parse symbolic %s expression".
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
datatype: misspell support with symbol table parser for error reporting
Some datatypes provide a symbol table that is parsed as an integer.
Improve error reporting by using the misspell infrastructure, to provide
a hint to the user, whenever possible.
If base datatype, usually the integer datatype, fails to parse the
symbol, then try a fuzzy match on the symbol table to provide a hint
in case the user has mistype it.
For instance:
test.nft:3:11-14: Error: Could not parse Differentiated Services Code Point expression; did you you mean `cs0`?
ip dscp ccs0
^^^^
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: skip optimization if anonymous set uses stateful statement
fee6bda06403 ("evaluate: remove anon sets with exactly one element")
introduces an optimization to remove use of sets with single element.
Skip this optimization if set element contains stateful statements.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: allow for updating devices on existing netdev chain
This patch allows you to add/remove devices to an existing chain:
# cat ruleset.nft
table netdev x {
chain y {
type filter hook ingress devices = { eth0 } priority 0; policy accept;
}
}
# nft -f ruleset.nft
# nft add chain netdev x y '{ devices = { eth1 }; }'
# nft list ruleset
table netdev x {
chain y {
type filter hook ingress devices = { eth0, eth1 } priority 0; policy accept;
}
}
# nft delete chain netdev x y '{ devices = { eth0 }; }'
# nft list ruleset
table netdev x {
chain y {
type filter hook ingress devices = { eth1 } priority 0; policy accept;
}
}
This feature allows for creating an empty netdev chain, with no devices.
In such case, no packets are seen until a device is registered.
This patch includes extended netlink error reporting:
# nft add chain netdev x y '{ devices = { x } ; }'
Error: Could not process rule: No such file or directory
add chain netdev x y { devices = { x } ; }
^
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
mnl: flowtable support for extended netlink error reporting
This patch extends existing flowtable support to improve error
reporting:
# nft add flowtable inet x y '{ devices = { x } ; }'
Error: Could not process rule: No such file or directory
add flowtable inet x y { devices = { x } ; }
^
# nft delete flowtable inet x y '{ devices = { x } ; }'
Error: Could not process rule: No such file or directory
delete flowtable inet x y { devices = { x } ; }
^ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Set SO_SNDBUF before SO_SNDBUFFORCE: Unpriviledged user namespace does
not have CAP_NET_ADMIN on the host (user_init_ns) namespace.
SO_SNDBUF always succeeds in Linux, always try SO_SNDBUFFORCE after it.
Moreover, suggest the user to bump socket limits if EMSGSIZE after
having see EPERM previously, when calling SO_SNDBUFFORCE.
Provide a hint to the user too:
# nft -f test.nft
netlink: Error: Could not process rule: Message too long
Please, rise /proc/sys/net/core/wmem_max on the host namespace. Hint: 4194304 bytes
Dave Pfike says:
Prior to this patch, nft inside a systemd-nspawn container was failing
to install my ruleset (which includes a large-ish map), with the error
netlink: Error: Could not process rule: Message too long
Phil Sutter [Thu, 20 Apr 2023 15:39:27 +0000 (17:39 +0200)]
tests: shell: Fix for unstable sets/0043concatenated_ranges_0
On my (slow?) testing VM, The test tends to fail when doing a full run
(i.e., calling run-test.sh without arguments) and tends to pass when run
individually.
The problem seems to be the 1s element timeout which in some cases may
pass before element deletion occurs. Simply fix this by doubling the
timeout. It has to pass just once, so shouldn't hurt too much.
Fixes: 618393c6b3f25 ("tests: Introduce test for set with concatenated ranges") Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Phil Sutter <phil@nwl.cc>
The redirect and masquerade statements can be handled as verdicts:
- if redirect statement specifies no ports.
- masquerade statement, in any case.
Exceptions to the rule: If redirect statement specifies ports, then nat
map transformation can be used iif both statements specify ports.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1668 Fixes: 0a6dbfce6dc3 ("optimize: merge nat rules with same selectors into map") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netlink_delinearize: do not reset protocol context for nat protocol expression
This patch reverts 403b46ada490 ("netlink_delinearize: kill dependency
before eval of 'redirect' stmt"). Since ("evaluate: bogus missing
transport protocol"), this workaround is not required anymore.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Users have to specify a transport protocol match such as
meta l4proto tcp
before the redirect statement, even if the redirect statement already
implicitly refers to the transport protocol, for instance:
test.nft:3:16-53: Error: transport protocol mapping is only valid after transport protocol match
redirect to :tcp dport map { 83 : 8083, 84 : 8084 }
~~~~~~~~ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Evaluate the redirect expression before the mandatory check for the
transport protocol match, so protocol context already provides a
transport protocol.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Tue, 28 Mar 2023 11:46:10 +0000 (13:46 +0200)]
xt: Fix translation error path
If xtables support was compiled in but the required libxtables DSO is
not found, nft prints an error message and leaks memory:
| counter packets 0 bytes 0 XT target MASQUERADE not found
This is not as bad as it seems, the output combines stdout and stderr.
Dropping stderr produces an incomplete ruleset listing, though. While
this seemingly inline output can't easily be avoided, fix a few things:
* Respect octx->error_fp, libnftables might have been configured to
redirect stderr somewhere else.
* Align error message formatting with others.
* Don't return immediately, but free allocated memory and fall back to
printing the expression in "untranslated" form.
Fixes: 5c30feeee5cfe ("xt: Delay libxtables access until translation") Signed-off-by: Phil Sutter <phil@nwl.cc>
netlink_delinerize: incorrect byteorder in mark statement listing
When using ip dscp in combination with bitwise operation:
# nft --debug=netlink add rule ip x y 'ct mark set ip dscp | 0x4'
ip x y
[ payload load 1b @ network header + 1 => reg 1 ]
[ bitwise reg 1 = ( reg 1 & 0x000000fc ) ^ 0x00000000 ]
[ bitwise reg 1 = ( reg 1 >> 0x00000002 ) ]
[ bitwise reg 1 = ( reg 1 & 0xfffffffb ) ^ 0x00000004 ]
[ ct set mark with reg 1 ]
the listing is showing in the incorrect byteorder:
# nft list ruleset
table ip x {
chain y {
ct mark set ip dscp | 0x4000000
}
}
handle and and or operations in host byteorder.
The following command:
# nft --debug=netlink add rule ip6 x y 'ct mark set ip6 dscp | 0x4'
ip6 x y
[ payload load 2b @ network header + 0 => reg 1 ]
[ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ]
[ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ]
[ byteorder reg 1 = ntoh(reg 1, 2, 1) ]
[ bitwise reg 1 = ( reg 1 & 0xfffffffb ) ^ 0x00000004 ]
[ ct set mark with reg 1 ]
works fine (without requiring this patch) because there is an explicit
byteorder expression.
However, ip dscp takes only 1-byte, so it does not require the byteorder
expression. Use host byteorder if the rhs of bitwise AND OR is larger
than lhs payload expression and such expression is equal or less than
1-byte.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>