Phil Sutter [Tue, 10 Apr 2018 17:00:20 +0000 (19:00 +0200)]
libnftables: Fix for input without trailing newline
Input parser implementation requires a newline at end of input,
otherwise the last pattern may not be recognized correctly.
If input comes from a file, the culprit was YY_INPUT macro not expecting
the last line not ending with a newline, so the last word wasn't
accepted. This is easily fixed by checking for feof(yyin) in there. A
simple test case for that is:
| echo -en "table ip t {\nchain c {\n}\n}" >/tmp/foo
| nft -f /tmp/foo
Input from a string buffer is a bit more tricky: The culprit here is
that detection of classid pattern is done by checking the character
following it which makes it impossible to sit right at end of input and
I haven't found an alternative to that. After dropping the manual
newline appending when combining argv into a single buffer in main(),
a rule like this won't be recognized anymore:
| nft add rule ip t c meta priority feed:babe
Since a direct call to run_cmd_from_buffer() via libnftables bypasses
the sanitizing done in main() entirely, it has to happen in libnftables
instead which means creating a newline-terminated duplicate of the input
buffer.
Note that main() created a buffer one byte longer than needed since it
accounts for whitespace at end of each argv but doesn't add it to the
buffer for the last one, so buffer length is reduced by two bytes
instead of just one although only one less character is printed into it.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Needed by followup patch. EXPR_SET_REF handling is bonkers, it
"works" when using { key : value } because ->key and ->left are aliased
in struct expr to the same location.
evaluate: propagate binop_transfer() adjustment to set key size
The right shift transfer may be result in adjusting the set key size,
eg. ip6 dscp results in fetching 6 bits that are splitted between two
bytes, hence the set element ends up being 16 bytes long.
Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
when explicitly filtering icmp-in-ipv6 and icmp6-in-ip don't remove the
required l3 protocol dependency, else "nft list ruleset" can't be read
via nft -f anymore.
Florian Westphal [Tue, 27 Mar 2018 07:29:54 +0000 (09:29 +0200)]
src: avoid errouneous assert with map+concat
Phil reported following assert:
add rule ip6 f o mark set ip6 saddr . ip6 daddr . tcp dport \
map { dead::beef . f00::. 22 : 1 }
nft: netlink_linearize.c:655: netlink_gen_expr: Assertion `dreg < ctx->reg_low' failed.
This happens because "mark set" will allocate one register (the dreg),
but netlink_gen_concat_expr will populate a lot more register space if
the concat expression strings a lot of expressions together.
As the assert is useful pseudo-reserve the register space as per
concat->len and undo after generating the expressions.
Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
Duncan Roe [Tue, 27 Mar 2018 04:17:01 +0000 (15:17 +1100)]
doc: nft.8 more spelling fixes
I ran the following command:
ispell -p ./ispell_nft -H nft.xml
to create the local dictionary ispell_nft.
ispell_nft contains almost every special word in nft.xml.
The idea is that anyone can run ispell the same way and only have to accept:
- alpha strings in hexadecimal numbers
- "FIXME" : that has to be fixed eventually
- "differv" : I don't know what that is or whether it's correct
You need to use the English (i.e. American) dictionary, and you want the screen
to be about 100 chars wide (at least).
The patch enforces consistent capitalisation of words, e.g. IPv4 is always that
way but ipv4_addr stays as before. The existing dictionary suggested capital
Ethernet so that is in there too.
Current libnftables API should be stable enough to release it into the
public, and after 4aba100e593f ("rule: reset cache iff there is an
existing cache") we have a simple way to batch commands through this
API.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Mon, 19 Mar 2018 17:02:07 +0000 (18:02 +0100)]
tests/shell: Allow to specify multiple testcases
Extend run-tests.sh a bit so that all remaining arguments after option
parsing are treated as filenames to test and complain if one doesn't
seem like such. This allows for doing stuff like:
| ./run-tests.sh testcases/include/000*
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Mon, 19 Mar 2018 17:02:06 +0000 (18:02 +0100)]
tests/shell: Fix sporadic fail of include/0007glob_double_0
Since ruleset listing shows tables sorted by handle (which in turn
depends on table creation ordering), using random filenames here
guarantees to make the test fail randomly.
Since the include files reside in a temporary directory anyway, there is
no need to randomize their names so simplify the whole test a bit.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Mon, 19 Mar 2018 17:02:05 +0000 (18:02 +0100)]
flowtable: Make parsing a little more robust
It was surprisingly easy to crash nft with invalid syntax in 'add
flowtable' command. Catch at least three possible ways (illustrated in
provided test case) by making evaluation phase survive so that bison
gets a chance to complain.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Mon, 19 Mar 2018 17:02:04 +0000 (18:02 +0100)]
tests/shell: Fix flowtable test cases
The major problem here was that existence of network interfaces 'eth0'
and 'wlan0' was assumed. Overcome this by just using 'lo' instead, which
exists even in newly created netns by default.
Another minor issue was false naming of 0004delete_after_add0 - the
expected return code is supposed to be separated by '_' from the
remaining filename.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Mon, 19 Mar 2018 17:02:03 +0000 (18:02 +0100)]
tests/shell: Fix dump of chains/0016delete_handle_0
The purpose of this test is to delete some chains by their handle and
that is supposed to succeed. So the respective dump should not contain
them anymore.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Mon, 19 Mar 2018 17:02:02 +0000 (18:02 +0100)]
Support 'nft -f -' to read from stdin
In libnftables, detect if given filename is '-' and treat it as the
common way of requesting to read from stdin, then open /dev/stdin
instead. (Calling 'nft -f /dev/stdin' worked before as well, but this
makes it official.)
With this in place and bash's support for here strings, review all tests
in tests/shell for needless use of temp files. Note that two categories
of test cases were intentionally left unchanged:
- Tests creating potentially large rulesets to avoid running into shell
parameter length limits.
- Tests for 'include' directive for obvious reasons.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Sat, 17 Mar 2018 09:39:27 +0000 (10:39 +0100)]
Combine redir and masq statements into nat
All these statements are very similar, handling them with the same code
is obvious. The only thing required here is a custom extension of enum
nft_nat_types which is used in nat_stmt to distinguish between snat and
dnat already. Though since enum nft_nat_types is part of kernel uAPI,
create a local extended version containing the additional fields.
Note that nat statement printing got a bit more complicated to get the
number of spaces right for every possible combination of attributes.
Note also that there wasn't a case for STMT_MASQ in
rule_parse_postprocess(), which seems like a bug. Since STMT_MASQ became
just a variant of STMT_NAT, postprocessing will take place for it now
anyway.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Unlike plain "ip dscp { 4, 63 }", we don't have a relational operation in
case of vmap. Binop fixups need to be done when evaluating map statements.
This patch is incomplete. 'ip dscp' works, but this won't:
nft add rule --debug=netlink ip6 test-ip6 input ip6 dscp vmap { 0x04 : accept, 0x3f : continue }
The generated expressions look sane, however there is disagreement on
the sets key size vs. the sizes of the individual elements in the set.
This is because ip6 dscp spans a byte boundary.
Key set size is still set to one byte (dscp type is 6bits).
However, binop expansion requirements result in 2 byte loads, i.e.
set members will be 2 bytes in size as well.
This can hopefully get addressed in an incremental patch.
Florian Westphal [Thu, 11 Jan 2018 15:30:22 +0000 (16:30 +0100)]
evaluate: handle binop adjustment recursively
currently this is fine, but a followup commit will add
EXPR_SET_ELEM handling.
And unlike RANGE we cannot assume the key is a value.
Therefore make binop_can_transfer and binop_transfer_one handle
right hand recursively if needed. For RANGE, call it again with
from/to.
For future SET_ELEM, we can then just call the function recursively
again with right->key as new RHS.
Florian Westphal [Thu, 11 Jan 2018 15:30:20 +0000 (16:30 +0100)]
src: segtree: use value expression length
In case of EXPR_MAPPING, expr->len is 0, we need to use
the length of the key instead.
Without this we can get assertion failure later on:
nft: netlink_delinearize.c:1484: binop_adjust_one: Assertion `value->len >= binop->right->len' failed.
Phil Sutter [Thu, 15 Mar 2018 23:03:20 +0000 (00:03 +0100)]
netlink: Fold netlink_gen_cmp() into netlink_gen_relational()
Since netlink_gen_relational() didn't do much anymore after meta OP
treating had been removed, it makes sense to merge it with the only
function it dispached to.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Thu, 15 Mar 2018 23:03:19 +0000 (00:03 +0100)]
relational: Eliminate meta OPs
With a bit of code reorganization, relational meta OPs OP_RANGE,
OP_FLAGCMP and OP_LOOKUP become unused and can be removed. The only meta
OP left is OP_IMPLICIT which is usually treated as alias to OP_EQ.
Though it needs to stay in place for one reason: When matching against a
bitmask (e.g. TCP flags or conntrack states), it has a different
meaning:
| nft --debug=netlink add rule ip t c tcp flags syn
| ip t c
| [ meta load l4proto => reg 1 ]
| [ cmp eq reg 1 0x00000006 ]
| [ payload load 1b @ transport header + 13 => reg 1 ]
| [ bitwise reg 1 = (reg=1 & 0x00000002 ) ^ 0x00000000 ]
| [ cmp neq reg 1 0x00000000 ]
| nft --debug=netlink add rule ip t c tcp flags == syn
| ip t c
| [ meta load l4proto => reg 1 ]
| [ cmp eq reg 1 0x00000006 ]
| [ payload load 1b @ transport header + 13 => reg 1 ]
| [ cmp eq reg 1 0x00000002 ]
OP_IMPLICIT creates a match which just checks the given flag is present,
while OP_EQ creates a match which ensures the given flag and no other is
present.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
You need a Linux kernel >= 4.15 to use this feature.
This patch allows us to dump the content of an existing set.
# nft list ruleset
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2, 3.3.3.3,
5.5.5.5-6.6.6.6 }
}
}
You check if a single element exists in the set:
# nft get element x x { 1.1.1.5 }
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2 }
}
}
Output means '1.1.1.5' belongs to the '1.1.1.1-2.2.2.2' interval.
You can also check for intervals:
# nft get element x x { 1.1.1.1-2.2.2.2 }
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2 }
}
}
If you try to check for an element that doesn't exist, an error is
displayed.
# nft get element x x { 1.1.1.0 }
Error: Could not receive set elements: No such file or directory
get element x x { 1.1.1.0 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can also check for multiple elements in one go:
# nft get element x x { 1.1.1.5, 5.5.5.10 }
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2, 5.5.5.5-6.6.6.6 }
}
}
You can also use this to fetch the existing timeout for specific
elements, in case you have a set with timeouts in place:
# nft get element w z { 2.2.2.2 }
table ip w {
set z {
type ipv4_addr
timeout 30s
elements = { 2.2.2.2 expires 17s }
}
}
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Harsha Sharma [Mon, 8 Jan 2018 17:57:07 +0000 (23:27 +0530)]
parser_bison: delete table via table handle
This patch allows deletion of table via unique table handles and table
family which can be listed with '-a' option.
For.eg.
nft delete table [<family>] [handle <handle>]
Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch allows us to refer to existing flowtables:
# nft add rule x x flow offload @m
Packets matching this rule create an entry in the flow table 'm', hence,
follow up packets that get to the flowtable at ingress bypass the
classic forwarding path.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
# nft add table x
# nft add flowtable x m { hook ingress priority 10\; devices = { eth0, wlan0 }\; }
You have to specify hook and priority. So far, only the ingress hook is
supported. The priority represents where this flowtable is placed in the
ingress hook, which is registered to the devices that the user
specifies.
You can also use the 'create' command instead to bail out in case that
there is an existing flowtable with this name.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: add variable expression and use it to allow redefinitions
Add new variable expression that we can use to attach symbols in
runtime, this allows us to redefine variables via new keyword, eg.
table ip x {
chain y {
define address = { 1.1.1.1, 2.2.2.2 }
ip saddr $address
redefine address = { 3.3.3.3 }
ip saddr $address
}
}
# nft list ruleset
table ip x {
chain y {
ip saddr { 1.1.1.1, 2.2.2.2 }
ip saddr { 3.3.3.3 }
}
}
Note that redefinition just places a new symbol version before the
existing one, so symbol lookups always find the latest version. The
undefine keyword decrements the reference counter and removes the symbol
from the list, so it cannot be used anymore. Still, previous references
to this symbol via variable expression are still valid.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* AC_PREREQ checks for 2.61, which is not supported any contemporary
distribution.
* AC_COPYRIGHT, autoconf documentation states "in addition to the Free
Software Foundation's copyright on the Autoconf macros, parts of your
configure are covered by the copyright-notice.".
This only refers to the autoconf infrastructure: we are doing simple
and standard usage of autoconf infrastructure, we also don't use this
macro in other existing userspace software available at netfilter.org.
The comment above at the beginning of this file shows text that is
available in many configure.ac templates on the Internet.
* AC_CANONICAL_HOST, we don't need the canonical host-system type to
build this software.
* AC_CONFIG_SRCDIR is not used in other userspace software in the tree.
* AC_DEFINE _GNU_SOURCE, define this where it's needed instead.
* AC_DEFINE _STDC_FORMAT_MACROS is not used in this codebase.
* AC_HEADER_STDC checks for ANSI C89 headers, however, we need more than
just this C standard, so this doesn't guarantee anything at all.
* Remove "Checks for libraries" comment, it's obvious.
* AC_HEADER_ASSERT allows us to disable assertions, this is bad because
this is helping us to diagnose bugs and incomplete features.
* AC_CHECK_HEADERS is checking for an arbitrary list of headers,
this still doesn't even guarantee that we can actually do a successful
compilation in a broken system.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
Phil Sutter [Thu, 1 Mar 2018 14:00:32 +0000 (15:00 +0100)]
netlink_delinearize: Fix resource leaks
Most of the cases are basically the same: Error path fails to free the
previously allocated statement or expression. A few cases received
special treatment though:
- In netlink_parse_payload_stmt(), the leak is easily avoided by code
reordering.
- In netlink_parse_exthdr(), there's no point in introducing a goto
label since there is but a single affected error check.
- In netlink_parse_hash() non-error path leaked as well if sreg
contained a concatenated expression.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Thu, 1 Mar 2018 14:00:31 +0000 (15:00 +0100)]
netlink: Complain if setting O_NONBLOCK fails
Assuming that code is not aware that reads from netlink socket may
block, treat inability to set O_NONBLOCK flag as fatal initialization
error aborting program execution.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
will call expr_cmp() in case e1->hash.expr is NULL, causing null-pointer
dereference. This is probably a typo, the intention when introducing
this was to avoid the call to expr_cmp() for symmetric hash expressions
which don't use expr->hash.expr. Inverting the existence check should
fix this.
Fixes: 3a86406729782 ("src: hash: support of symmetric hash") Cc: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>