Jeremy Sowden [Tue, 3 Mar 2020 09:48:31 +0000 (09:48 +0000)]
evaluate: no need to swap byte-order for values of fewer than 16 bits.
Endianness is not meaningful for objects smaller than 2 bytes and the
byte-order conversions are no-ops in the kernel, so just update the
expression as if it were constant.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
This patch extends the basechain definition to allow users to specify
the offload flag. This flag enables hardware offload if your drivers
supports it.
# cat file.nft
table netdev x {
chain y {
type filter hook ingress device eth0 priority 10; flags offload;
}
}
# nft -f file.nft
Note: You have to enable offload via ethtool:
# ethtool -K eth0 hw-tc-offload on
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 24 Feb 2020 00:03:23 +0000 (01:03 +0100)]
src: allow nat maps containing both ip(6) address and port
nft will now be able to handle
map destinations {
type ipv4_addr . inet_service : ipv4_addr . inet_service
}
chain f {
dnat to ip daddr . tcp dport map @destinations
}
Something like this won't work though:
meta l4proto tcp dnat ip6 to numgen inc mod 4 map { 0 : dead::f001 . 8080, ..
as we lack the type info to properly dissect "dead::f001" as an ipv6
address.
For the named map case, this info is available in the map
definition, but for the anon case we'd need to resort to guesswork.
Support is added by peeking into the map definition when evaluating
a nat statement with a map.
Right now, when a map is provided as address, we will only check that
the mapped-to data type matches the expected size (of an ipv4 or ipv6
address).
After this patch, if the mapped-to type is a concatenation, it will
take a peek at the individual concat expressions. If its a combination
of address and service, nft will translate this so that the kernel nat
expression looks at the returned register that would store the
inet_service part of the octet soup returned from the lookup expression.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 24 Feb 2020 00:03:22 +0000 (01:03 +0100)]
evaluate: add two new helpers
In order to support 'dnat to ip saddr map @foo', where @foo returns
both an address and a inet_service, we will need to peek into the map
and process the concatenations sub-expressions.
Add two helpers for this, will be used in followup patches.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Thu, 20 Feb 2020 11:58:40 +0000 (12:58 +0100)]
evaluate: print correct statement name on family mismatch
nft add rule inet filter c ip daddr 1.2.3.4 dnat ip6 to f00::1
Error: conflicting protocols specified: ip vs. unknown. You must specify ip or ip6 family in tproxy statement
Should be: ... "in nat statement".
Fixes: fbe27464dee4588d90 ("src: add nat support for the inet family") Signed-off-by: Florian Westphal <fw@strlen.de>
mnl: do not use expr->identifier to fetch device name
This string might not be nul-terminated, resulting in spurious errors
when adding netdev chains.
Fixes: 3fdc7541fba0 ("src: add multidevice support for netdev chain") Fixes: 92911b362e90 ("src: add support to add flowtables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
==1135425== 9 bytes in 1 blocks are definitely lost in loss record 1 of 1
==1135425== at 0x483577F: malloc (vg_replace_malloc.c:309)
==1135425== by 0x4BE846A: strdup (strdup.c:42)
==1135425== by 0x48A5EDD: xstrdup (utils.c:75)
==1135425== by 0x48C9A20: nft_lex (scanner.l:640)
==1135425== by 0x48BC1A4: nft_parse (parser_bison.c:5682)
==1135425== by 0x48AC336: nft_parse_bison_buffer (libnftables.c:375)
==1135425== by 0x48AC336: nft_run_cmd_from_buffer (libnftables.c:443)
==1135425== by 0x10A707: main (main.c:384)
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stefano Brivio [Fri, 14 Feb 2020 15:27:25 +0000 (16:27 +0100)]
tests: Introduce test for set with concatenated ranges
This test checks that set elements can be added, deleted, that
addition and deletion are refused when appropriate, that entries
time out properly, and that they can be fetched by matching values
in the given ranges.
v5:
- speed this up by performing the timeout test for one single
permutation (Phil Sutter), by decreasing the number of
permutations from 96 to 12 if this is invoked by run-tests.sh
(Pablo Neira Ayuso) and by combining some commands into single
nft calls where possible: with dash 0.5.8 on AMD Epyc 7351 the
test now takes 1.8s instead of 82.5s
- renumber test to 0043, 0042 was added meanwhile
v4: No changes
v3:
- renumber test to 0042, 0041 was added meanwhile
v2:
- actually check an IPv6 prefix, instead of specifying everything
as explicit ranges in ELEMS_ipv6_addr
- renumber test to 0041, 0038 already exists
Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
# nft delete rule ip y z handle 7
Error: Could not process rule: No such file or directory
delete rule ip y z handle 7
^
# nft delete rule ip x z handle 7
Error: Could not process rule: No such file or directory
delete rule ip x z handle 7
^
# nft delete rule ip x x handle 7
Error: Could not process rule: No such file or directory
delete rule ip x x handle 7
^
# nft replace rule x y handle 10 ip saddr 1.1.1.2 counter
Error: Could not process rule: No such file or directory
replace rule x y handle 10 ip saddr 1.1.1.2 counter
^^
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Thu, 13 Feb 2020 11:45:55 +0000 (12:45 +0100)]
src: maps: update data expression dtype based on set
What we want:
- update @sticky-set-svc-M53CN2XYVUHRQ7UB { ip saddr : 0x00000002 }
what we got:
+ update @sticky-set-svc-M53CN2XYVUHRQ7UB { ip saddr : 0x2000000 [invalid type] }
Phil Sutter [Thu, 6 Feb 2020 11:31:56 +0000 (12:31 +0100)]
scanner: Extend asteriskstring definition
Accept escaped asterisks also mid-string and as only character.
Especially the latter will help when translating from iptables where
asterisk has no special meaning.
Jeremy Sowden [Mon, 3 Feb 2020 11:20:20 +0000 (11:20 +0000)]
evaluate: change shift byte-order to host-endian.
The byte-order of the righthand operands of the right-shifts generated
for payload and exthdr expressions is big-endian. However, all right
operands should be host-endian. Since evaluation of the shift binop
will insert a byte-order conversion to enforce this, change the
endianness in order to avoid the extra operation.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Mon, 3 Feb 2020 11:20:18 +0000 (11:20 +0000)]
parser: add parenthesized statement expressions.
Primary and primary RHS expressions support parenthesized basic and
basic RHS expressions. However, primary statement expressions do not
support parenthesized basic statement expressions. Add them.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stefano Brivio [Thu, 30 Jan 2020 00:16:57 +0000 (01:16 +0100)]
src: Add support for concatenated set ranges
After exporting field lengths via NFTNL_SET_DESC_CONCAT attributes,
we now need to adjust parsing of user input and generation of
netlink key data to complete support for concatenation of set
ranges.
Instead of using separate elements for start and end of a range,
denoting the end element by the NFT_SET_ELEM_INTERVAL_END flag,
as it's currently done for ranges without concatenation, we'll use
the new attribute NFTNL_SET_ELEM_KEY_END as suggested by Pablo. It
behaves in the same way as NFTNL_SET_ELEM_KEY, but it indicates
that the included key represents the upper bound of a range.
For example, "packets with an IPv4 address between 192.0.2.0 and
192.0.2.42, with destination port between 22 and 25", needs to be
expressed as a single element with two keys:
- adjust the lexer rules to allow multiton expressions as elements
of a concatenation. As wildcards are not allowed (semantics would
be ambiguous), exclude wildcards expressions from the set of
possible multiton expressions, and allow them directly where
needed. Concatenations now admit prefixes and ranges
- generate, for each element in a range concatenation, a second key
attribute, that includes the upper bound for the range
- also expand prefixes and non-ranged values in the concatenation
to ranges: given a set with interval and concatenation support,
the kernel has no way to tell which elements are ranged, so they
all need to be. For example, 192.0.2.0 . 192.0.2.9 : 1024 is
sent as:
- aggregate ranges when elements received by the kernel represent
concatenated ranges, see concat_range_aggregate()
- perform a few minor adjustments where interval expressions
are already handled: we have intervals in these sets, but
the set specification isn't just an interval, so we can't
just aggregate and deaggregate interval ranges linearly
v4: No changes
v3:
- rework to use a separate key for closing element of range instead of
a separate element with EXPR_F_INTERVAL_END set (Pablo Neira Ayuso)
v2:
- reworked netlink_gen_concat_data(), moved loop body to a new function,
netlink_gen_concat_data_expr() (Phil Sutter)
- dropped repeated pattern in bison file, replaced by a new helper,
compound_expr_alloc_or_add() (Phil Sutter)
- added set_is_nonconcat_range() helper (Phil Sutter)
- in expr_evaluate_set(), we need to set NFT_SET_SUBKEY also on empty
sets where the set in the context already has the flag
- dropped additional 'end' parameter from netlink_gen_data(),
temporarily set EXPR_F_INTERVAL_END on expressions and use that from
netlink_gen_concat_data() to figure out we need to add the 'end'
element (Phil Sutter)
- replace range_mask_len() by a simplified version, as we don't need
to actually store the composing masks of a range (Phil Sutter)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stefano Brivio [Thu, 30 Jan 2020 00:16:56 +0000 (01:16 +0100)]
src: Add support for NFTNL_SET_DESC_CONCAT
To support arbitrary range concatenations, the kernel needs to know
how long each field in the concatenation is. The new libnftnl
NFTNL_SET_DESC_CONCAT set attribute describes this as an array of
lengths, in bytes, of concatenated fields.
While evaluating concatenated expressions, export the datatype size
into the new field_len array, and hand the data over via libnftnl.
Similarly, when data is passed back from libnftnl, parse it into
the set description.
When set data is cloned, we now need to copy the additional fields
in set_clone(), too.
This change depends on the libnftnl patch with title:
set: Add support for NFTA_SET_DESC_CONCAT attributes
v4: No changes
v3: Rework to use set description data instead of a stand-alone
attribute
v2: No changes
Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Sun, 19 Jan 2020 22:57:09 +0000 (22:57 +0000)]
netlink: add support for handling shift expressions.
The kernel supports bitwise shift operations, so add support to the
netlink linearization and delinearization code. The number of bits (the
righthand operand) is expected to be a 32-bit value in host endianness.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The comment documenting how bitwise expressions work includes a table
which summarizes the mask and xor arguments combined to express the
supported boolean operations. However, the row for OR:
mask xor
0 x
is incorrect.
dreg = (sreg & 0) ^ x
is not equivalent to:
dreg = sreg | x
What the code actually does is:
dreg = (sreg & ~x) ^ x
Update the documentation to match.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Mon, 20 Jan 2020 15:32:40 +0000 (16:32 +0100)]
netlink: Avoid potential NULL-pointer deref in netlink_gen_payload_stmt()
With payload_needs_l4csum_update_pseudohdr() unconditionally
dereferencing passed 'desc' parameter and a previous check for it to be
non-NULL, make sure to call the function only if input is sane.
Fixes: 68de70f2b3fc6 ("netlink_linearize: fix IPv6 layer 4 checksum mangling") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Mon, 20 Jan 2020 13:48:26 +0000 (14:48 +0100)]
netlink: Fix leaks in netlink_parse_cmp()
This fixes several problems at once:
* Err path would leak expr 'right' in two places and 'left' in one.
* Concat case would leak 'right' by overwriting the pointer. Introduce a
temporary variable to hold the new pointer.
Fixes: 6377380bc265f ("netlink_delinearize: handle relational and lookup concat expressions") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: better error notice when interval flag is not set on
Users get confused with the existing error notice, let's try a different one:
# nft add element x y { 1.1.1.0/24 }
Error: You must add 'flags interval' to your set declaration if you want to add prefix elements
add element x y { 1.1.1.0/24 }
^^^^^^^^^^
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1380 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Phil Sutter <phil@nwl.cc>
Phil Sutter [Tue, 14 Jan 2020 16:25:35 +0000 (17:25 +0100)]
cache: Fix for doubled output after reset command
Reset command causes a dump of the objects to reset and adds those to
cache. Yet it ignored if the object in question was already there and up
to now CMD_RESET was flagged as NFT_CACHE_FULL.
Tackle this from two angles: First, reduce cache requirements of reset
command to the necessary bits which is table cache. This alone would
suffice if there wasn't interactive mode (and other libnftables users):
A cache containing the objects to reset might be in place already, so
add dumped objects to cache only if they don't exist already.
Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Tue, 14 Jan 2020 15:50:35 +0000 (16:50 +0100)]
tests: shell: Search diff tool once and for all
Instead of calling 'which diff' over and over again, just detect the
tool's presence in run-tests.sh and pass $DIFF to each testcase just
like with nft binary.
Fall back to using 'true' command to avoid the need for any conditional
calling in test cases.
While being at it, unify potential diff calls so that a string
comparison in shell happens irrespective of diff presence.
Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Mon, 13 Jan 2020 13:53:24 +0000 (14:53 +0100)]
monitor: Fix output for ranges in anonymous sets
Previous fix for named interval sets was simply wrong: Instead of
limiting decomposing to anonymous interval sets, it effectively disabled
it entirely.
Since code needs to check for both interval and anonymous bits
separately, introduce set_is_interval() helper to keep the code
readable.
Also extend test case to assert ranges in anonymous sets are correctly
printed by echo or monitor modes. Without this fix, range boundaries are
printed as individual set elements.
Fixes: 5d57fa3e99bb9 ("monitor: Do not decompose non-anonymous sets") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Thu, 9 Jan 2020 16:43:11 +0000 (17:43 +0100)]
monitor: Fix for use after free when printing map elements
When populating the dummy set, 'data' field must be cloned just like
'key' field.
Fixes: 343a51702656a ("src: store expr, not dtype to track data in sets") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Thu, 9 Jan 2020 12:34:20 +0000 (13:34 +0100)]
monitor: Do not decompose non-anonymous sets
They have been decomposed already, trying to do that again causes a
segfault. This is a similar fix as in commit 8ecb885589591 ("src:
restore --echo with anonymous sets").
Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Mon, 6 Jan 2020 22:35:10 +0000 (22:35 +0000)]
evaluate: fix expr_set_context call for shift binops.
expr_evaluate_binop calls expr_set_context for shift expressions to set
the context data-type to `integer`. This clobbers the byte-order of the
context, resulting in unexpected conversions to NBO. For example:
$ sudo nft flush ruleset
$ sudo nft add table t
$ sudo nft add chain t c '{ type filter hook output priority mangle; }'
$ sudo nft add rule t c oif lo tcp dport ssh ct mark set '0x10 | 0xe'
$ sudo nft add rule t c oif lo tcp dport ssh ct mark set '0xf << 1'
$ sudo nft list table t
table ip t {
chain c {
type filter hook output priority mangle; policy accept;
oif "lo" tcp dport 22 ct mark set 0x0000001e
oif "lo" tcp dport 22 ct mark set 0x1e000000
}
}
Replace it with a call to __expr_set_context and set the byteorder to
that of the left operand since this is the value being shifted.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
scanner: incorrect error reporting after file inclusion
scanner_pop_buffer() incorrectly sets the current input descriptor. The
state->indesc_idx field actually stores the number of input descriptors
in the stack, decrement it and then update the current input descriptor
accordingly.
Florian Westphal [Tue, 10 Dec 2019 14:23:35 +0000 (15:23 +0100)]
evaluate: print a hint about 'typeof' syntax on 0 keylen
If user says
'type integer; ...' in a set definition, don't just throw an error --
provide a hint that the typeof keyword can be used to provide
the needed size information.
Florian Westphal [Fri, 16 Aug 2019 12:22:01 +0000 (14:22 +0200)]
tests: add typeof test cases
Add sets using unspecific string/integer types, one with
osf name, other with vlan id. Neither type can be used directly,
as they lack the type size information.