mnl: don't send empty set elements netlink message to kernel
The following command:
# nft --debug=mnl add rule x y flow table xyz { ip saddr timeout 30s counter }
breaks with EINVAL. The following netlink message is causing the
problem:
...
---------------- ------------------
| 0000000044 | | message length |
| 02572 | R--- | | type | flags |
| 0000000004 | | sequence number|
| 0000000000 | | port ID |
---------------- ------------------
| 02 00 00 00 | | extra header |
|00008|--|00002| |len |flags| type|
| 78 79 7a 00 | | data | x y z
|00008|--|00004| |len |flags| type|
| 00 00 00 01 | | data |
|00006|--|00001| |len |flags| type|
| 78 00 00 00 | | data | x
---------------- ------------------
...
This is incorrect since this describes no elements at all, so it is
useless. Add upfront check before iterating over the list of set
elements so the netlink message is not placed in the batch.
This patch also adds a set so flow tables are minimally covered.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
So adding the same element doesn't trigger any error:
# nft add element filter bogons { 3.3.3.123/24 }
# nft add element filter bogons { 3.3.3.123/24 }
Still kernel reports an error if we use create instead:
# nft create element filter bogons { 3.3.3.123/24 }
<cmdline>:1:1-46: Error: Could not process rule: File exists
create element filter bogons { 3.3.3.123/24 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira [Thu, 24 Nov 2016 11:12:33 +0000 (12:12 +0100)]
src: trigger layer 4 checksum when pseudoheader fields are modified
This patch sets the NFT_PAYLOAD_L4CSUM_PSEUDOHDR when any of the
pseudoheader fields are modified. This implicitly enables stateless NAT,
that can be useful under some circuntances.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Tue, 8 Nov 2016 15:06:37 +0000 (23:06 +0800)]
tests: shell: add test case for inserting element into verdict map
"dalegaard@gmail.com" reports that when inserting an element into a
verdict map, kernel crash will happen. Now add this test case so we
can avoid future regressions fail.
evaluate: return ctx->table from table_lookup_global()
Instead of returning ctx->cmd->table. Note that ctx->cmd->table and
ctx->table points to the same object when all commands are embedded into
the table definition. But this is not true if we mix table definitions
with linear list commands in one file that we load via nft -f.
Reported-by: Martin Bednar <martin@serafean.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Anatole Denis [Thu, 1 Dec 2016 10:50:17 +0000 (11:50 +0100)]
evaluate: Update cache on flush ruleset
After a flush, the cache should be empty, otherwise the cache and the expected
state are desynced, causing unwarranted errors. See
tests/shell/testcases/cache/0002_interval_0.
`flush table` and `flush chain` don't empty sets or destroy chains, so the cache
does not need an update in those cases, since only chain names and set contents
are held in cache for commands other than "list"
Reported-by: Leon Merten Lohse <leon@green-side.de> Signed-off-by: Anatole Denis <anatole@rezel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Anatole Denis [Thu, 1 Dec 2016 10:50:16 +0000 (11:50 +0100)]
rule: Introduce helper function cache_flush
cache_release empties the cache, and marks it as uninitialized. Add cache_flush,
which does the same, except it keeps the cache initialized, eg. after a "nft
flush ruleset" when empty is the correct state of the cache.
Signed-off-by: Anatole Denis <anatole@rezel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Elise Lennion [Wed, 30 Nov 2016 01:12:37 +0000 (23:12 -0200)]
datatype: Replace getnameinfo() by internal lookup table
To avoid exceeding the inputs number limit of the flex scanner used,
when calling getnameinfo() in inet_service_type_print().
The new symbol_table was associated with inet_service_type, to enable
listing all pre-defined services using nft command line tool.
The listed services are all well-known and registered ports of my
local /etc/services file, from Ubuntu 16.04. Service numbers are
converted to respect network byte order.
Signed-off-by: Elise Lennion <elise.lennion@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Mon, 28 Nov 2016 17:51:43 +0000 (18:51 +0100)]
parser_bison: Allow parens on RHS of relational_expr
This is useful to allow a construct such as:
| tcp flags & (syn|fin) == (syn|fin)
Before, only the parentheses on the left side were allowed, but via a
quite funny path through the parser:
* expr might be a concat_expr
* concat_expr might be a basic_expr
* basic_expr is an inclusive_or_expr
* inclusive_or_expr might be an exclusive_or_expr
* exclusive_or_expr might be an and_expr
* and_expr might be 'and_expr AMPERSAND shift_expr'
-> here we eliminate 'flags &' in above statement
* shift_expr might be a primary_expr
* primary_expr might be '( basic_expr )'
Commit a3e60492a684b ("parser: restrict relational rhs expression
recursion") introduced rhs_expr to disallow recursion on RHS, so just
reverting that change for relational_expr is a no go. Allowing rhs_expr
to be '( rhs_expr )' though seems way too intrusive to me since it's
being used in all kinds of places, so this patch is the safest way to
allow the above I could come up with.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tests: shell: add testcase for different defines usage
This testcase add some defines in a nft -f run and then uses
them in different spots (which are not covered in previous testcases).
* defines used to define another one
* different datatypes (numbers, strings, bits, ranges)
* usage in sets, maps, contatenatios
* single rules with single statements, multiple statements
* reuse define in same rule
Perhaps this isn't testing many different code path, but I find this
interesting to have given it will probably be one of the most common
use cases of nftables.
Signed-off-by: Arturo Borrero Gonzalez <arturo@debian.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Anatole Denis [Mon, 28 Nov 2016 16:43:10 +0000 (17:43 +0100)]
Revert "evaluate: check for NULL datatype in rhs in lookup expr"
This reverts commit 5afa5a164ff1c066af1ec56d875b91562882bd50.
This commit is obsoleted by removing the possibility for a NULL right->dtype in
the first place, at set declaration.
Signed-off-by: Anatole Denis <anatole@rezel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Anatole Denis [Mon, 28 Nov 2016 16:43:09 +0000 (17:43 +0100)]
tests: Add regression test for malformed sets
see: 5afa5a164ff1c066af1ec56d875b91562882bd50
When a malformed set is added, it was added before erroring out, causing a
segfault further down when used. This tests for this case, ensuring that
nftables doesn't segfault but errors correctly
Signed-off-by: Anatole Denis <anatole@rezel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Anatole Denis [Mon, 28 Nov 2016 16:43:08 +0000 (17:43 +0100)]
evaluate: Add set to cache only when well-formed
When creating a set (in set_evaluate), it is added to the table cache before
being checked for correctness. When the set is ill-formed, the function returns
without removing the (non-existent, since the function returned) set. Further
references to this set will not result in an error (since the set is in the
lookup table), but the malformed set will probably cause a segfault.
The symptom (the segfault) was fixed by checking for NULL when evaluating a
reference to the set (commit 5afa5a164ff1c066af1ec56d875b91562882bd50), this
should fix the root cause.
Signed-off-by: Anatole Denis <anatole@rezel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Sat, 19 Nov 2016 11:31:15 +0000 (19:31 +0800)]
src: add log flags syntax support
Now NF_LOG_XXX is exposed to the userspace, we can set it explicitly.
Like iptables LOG target, we can log TCP sequence numbers, TCP options,
IP options, UID owning local socket and decode MAC header. Note the
log flags are mutually exclusive with group.
Some examples are listed below:
# nft add rule t c log flags tcp sequence,options
# nft add rule t c log flags ip options
# nft add rule t c log flags skuid
# nft add rule t c log flags ether
# nft add rule t c log flags all
# nft add rule t c log flags all group 1
<cmdline>:1:14-16: Error: flags and group are mutually exclusive
add rule t c log flags all group 1
^^^
Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tests: shell: add a new testcase for ruleset loading bug
There seems to be a bug that prevent loading a ruleset twice in a row
if the ruleset contains sets with intervals. This seems related to the
nft cache.
By the time of this commit, the bug is not fixed yet.
Signed-off-by: Arturo Borrero Gonzalez <arturo@debian.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netlink_gen_flow_stmt() relies on the flow table key, that is expressed
as a set element. Use the set element key instead to skip the set
element wrap, otherwise get_register() abort execution:
Concatenations of rt nexthop or ct {orignal | reply} {saddr | daddr} fail
due to
# nft add rule ip filter postrouting flow table acct \{ ip saddr . rt nexthop counter \}
<cmdline>:1:61-70: Error: can not use variable sized data types (invalid) in concat expressions
add rule ip filter postrouting flow table acct { ip saddr . rt nexthop counter }
~~~~~~~~~~~^^^^^^^^^^
Fix this by reordering the check for variable size data types in
expr_evaluate_concat() to happen after expr_evaluate() has been called (via
list_member_evaluate()) for the sub expression. This allows
expr_evaluate_[cr]t() to call [cr]t_expr_update_type() and set the data type
before the check.
Signed-off-by: Anders K. Pedersen <akp@cohaesio.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Sun, 30 Oct 2016 12:36:14 +0000 (20:36 +0800)]
ct: fix "ct l3proto/protocol" syntax broken
"l3proto" and "protocol" are still keywords in our grammer, they are not
STRING, so if the user input the following rule, nft will complain that
the syntax is error:
# nft add t c ct l3proto ipv4
<cmdline>:1:12-18: Error: syntax error, unexpected l3proto, expecting
string or mark or packets or bytes
add t c ct l3proto ipv4
^^^^^^^
Fixes: c992153402c7 ("ct: allow resolving ct keys at run time") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
This adds the 'fib' expression which can be used to
obtain the output interface from the route table based on either
source or destination address of a packet.
This can be used to e.g. add reverse path filtering:
# drop if not coming from the same interface packet
# arrived on
# nft add rule x prerouting fib saddr . iif oif eq 0 drop
# accept only if from eth0
# nft add rule x prerouting fib saddr . iif oif eq "eth0" accept
# accept if from any valid interface
# nft add rule x prerouting fib saddr oif accept
Querying of address type is also supported. This can be used
to e.g. only accept packets to addresses configured in the same
interface:
# fib daddr . iif type local
Its also possible to use mark and verdict map, e.g.:
# nft add rule x prerouting meta mark set 0xdead fib daddr . mark type vmap {
blackhole : drop,
prohibit : drop,
unicast : accept
}
Introduce rt expression for routing related data with support for nexthop
(i.e. the directly connected IP address that an outgoing packet is sent
to), which can be used either for matching or accounting, eg.
# nft add rule filter postrouting \
ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop
This will drop any traffic to 192.168.1.0/24 that is not routed via
192.168.0.1.
use the meta template to translate the textual token to the enum value.
This allows to remove two keywords from the scanner and also means we do
not need to introduce new keywords when more meta keys get added.
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: support ct l3proto/protocol without direction syntax
Acctually, ct l3proto and ct protocol are unrelated to direction, so
it's unnecessary that we must specify dir if we want to use them.
Now add support that we can match ct l3proto/protocol without direction:
# nft add rule filter input ct l3proto ipv4
# nft add rule filter output ct protocol 17
Note: existing syntax is still preserved, so "ct reply l3proto ipv6"
is still fine.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tests: py: fix numgen case failed due to changes in libnftnl
In nftnl_expr_ng_snprintf_default, format "(%u)" was changed to
"mod %u", so numgen test case failed:
...
'[ numgen reg 1 = inc(2) ]' mismatches '[ numgen reg 1 = inc mod 2 ]'
...
ip/numgen.t: 3 unit tests, 3 error, 0 warning
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: fix compile error due to _UNTIL renamed to _MODULUS in libnftnl
In the latest libnftnl, NFTNL_EXPR_NG_UNTIL was renamed to
NFTNL_EXPR_NG_MODULUS, so compile error happened:
netlink_linearize.c: In function ‘netlink_gen_numgen’:
netlink_linearize.c:184:26: error: ‘NFTNL_EXPR_NG_UNTIL’ undeclared
(first use in this function)
Also update NFTA_NG_UNTIL to NFTA_NG_MODULUS.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tests: py: replace "eth0" with "lo" in dup expr tests
This patch follow up on Manuel's commit a8871ba6daa0 ("tests: py: any:
Make tests more generic by using other interfaces"). The ifindex of
"eth0" is not always 1, furthermore, "eth0" maybe not exist on some
systems. So replace it with "lo" will make tests more rubost.
In other test cases, "eth0" is used by iifname or oifname, so there's no
need to convert it to "lo". Even if "eth0" is not exist, test will never
fail.
This is what made ether addresses get formatted correctly with
plain payload expression (ether saddr 00:11 ...) when listing
rules. Not needed anymore since etheraddr_type is now BIG_ENDIAN.
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
As netlink_get_register() may return NULL, we must not pass the returned
data unchecked to expr_set_type() as that will dereference it. Since the
parser has failed at that point anyway, by returning early we can skip
the useless statement allocation that follows in
netlink_parse_ct_stmt().
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Phil Sutter <phil@nwl.cc>
tests: py: any: Make tests more generic by using other interfaces
Some tests use hard coded interface names and interface indexes.
This commit removes these cases by exchanging "eth0" with "dummy0" and
"lo" (depending on the test) in all ifname tests and by using "lo"
instead of "eth0" in all interface index tests (because we can assume
"lo" ifindex is 1).
Signed-off-by: Manuel Johannes Messner <manuel.johannes.messner@hs-furtwangen.de> Signed-off-by: Florian Westphal <fw@strlen.de>
evaluate: display expression, statement and command name on debug
Extend debugging knob for evaluation to display the command, the
expression and statement names.
# nft --debug=eval add rule x y ip saddr 1.1.1.1 counter
<cmdline>:1:1-37: Evaluate add
add rule x y ip saddr 1.1.1.1 counter
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
<cmdline>:1:14-29: Evaluate expression
add rule x y ip saddr 1.1.1.1 counter
^^^^^^^^^^^^^^^^
ip saddr $1.1.1.1
<cmdline>:1:14-29: Evaluate relational
add rule x y ip saddr 1.1.1.1 counter
^^^^^^^^^^^^^^^^
ip saddr $1.1.1.1
<cmdline>:1:14-21: Evaluate payload
add rule x y ip saddr 1.1.1.1 counter
^^^^^^^^
ip saddr
<cmdline>:1:23-29: Evaluate symbol
add rule x y ip saddr 1.1.1.1 counter
^^^^^^^
<cmdline>:1:23-29: Evaluate value
add rule x y ip saddr 1.1.1.1 counter
^^^^^^^
1.1.1.1
<cmdline>:1:31-37: Evaluate counter
add rule x y ip saddr 1.1.1.1 counter
^^^^^^^
counter packets 0 bytes 0
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Tue, 30 Aug 2016 17:39:52 +0000 (19:39 +0200)]
evaluate: Avoid undefined behaviour in concat_subtype_id()
For the left side of a concat expression, dtype is NULL and therefore
off is 0. In that case the code expects to get a datatype of
TYPE_INVALID, but this is fragile as the output of concat_subtype_id()
is undefined for n > 32 / TYPE_BITS.
To fix this, call datatype_lookup() directly passing the expected
TYPE_INVALID as argument if off is 0.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Tue, 30 Aug 2016 17:39:51 +0000 (19:39 +0200)]
evaluate: reject: Have a generic fix for missing network context
Commit 17b495957b29e ("evaluate: reject: fix crash if we have transport
protocol conflict from inet") took care of a crash when using inet or
bridge families, but since then netdev family has been added which also
does not implicitly define the network context. Therefore the crash can
be reproduced again using the following example:
nft add rule netdev filter e1000-ingress \
meta l4proto udp reject with tcp reset
In order to fix this in a more generic way, have stmt_evaluate_reset()
fall back to the generic proto_inet_service irrespective of the actual
proto context.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Tue, 30 Aug 2016 17:39:49 +0000 (19:39 +0200)]
evaluate: Fix datalen checks in expr_evaluate_string()
I have been told that the flex scanner won't return empty strings, so
strlen(data) should always be greater 0. To avoid a hard to debug issue
though, add an assert() to make sure this is always the case before
risking an unsigned variable underrun.
A real issue though is the check for 'datalen - 1 >= 0', which will
never fail due to datalen being unsigned. Fix this by incrementing both
sides by one, hence checking 'datalen >= 1'.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
set s-ext-2-int {
type ipv4_addr . inet_service
elements = { $s-ext-2-int }
}
This syntax is not correct though, since the curly braces should be
placed in the variable definition itself, so we have context to handle
this variable as a list of set elements.
The correct syntax that works after this patch is:
We can validate that values don't get over the maximum datatype
length, this is expressed in number of bits, so the maximum value
is always power of 2.
However, since we got the hash and numgen expressions, the user should
not set a value higher that what the specified modulus option, which
may not be power of 2. This patch extends the expression context with
a new optional field to store the maximum value.
After this patch, nft bails out if the user specifies non-sense rules
like those below:
# nft add rule x y jhash ip saddr mod 10 seed 0xa 10
<cmdline>:1:45-46: Error: Value 10 exceeds valid range 0-9
add rule x y jhash ip saddr mod 10 seed 0xa 10
^^
The modulus sets a valid value range of [0, n), so n is out of the valid
value range.
# nft add rule x y numgen inc mod 10 eq 12
<cmdline>:1:35-36: Error: Value 12 exceeds valid range 0-9
add rule x y numgen inc mod 10 eq 12
^^
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is special expression that transforms an input expression into a
32-bit unsigned integer. This expression takes a modulus parameter to
scale the result and the random seed so the hash result becomes harder
to predict.
You can use it to set the packet mark, eg.
# nft add rule x y meta mark set jhash ip saddr . ip daddr mod 2 seed 0xdeadbeef
You can combine this with maps too, eg.
# nft add rule x y dnat to jhash ip saddr mod 2 seed 0xdeadbeef map { \
0 : 192.168.20.100, \
1 : 192.168.30.100 \
}
Currently, this expression implements the jenkins hash implementation
available in the Linux kernel:
This new expression allows us to generate incremental and random numbers
bound to a specified modulus value.
The following rule sets the conntrack mark of 0 to the first packet seen,
then 1 to second packet, then 0 again to the third packet and so on:
# nft add rule x y ct mark set numgen inc mod 2
A more useful example is a simple load balancing scenario, where you can
also use maps to set the destination NAT address based on this new numgen
expression:
So this is distributing new connections in a round-robin fashion between
192.168.10.100 and 192.168.20.200. Don't forget the special NAT chain
semantics: Only the first packet evaluates the rule, follow up packets
rely on conntrack to apply the NAT information.
You can also emulate flow distribution with different backend weights
using intervals:
This new statement is stateful, so it can be used from flow tables, eg.
# nft add rule filter input \
flow table http { ip saddr timeout 60s quota over 50 mbytes } drop
This basically sets a quota per source IP address of 50 mbytes after
which packets are dropped. Note that the timeout releases the entry if
no traffic is seen from this IP after 60 seconds.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tests: py: adapt it to new add element command semantics
Since fd33d96 ("src: create element command"), add element doesn't
fail anymore if the element exists, you have to use create instead in
case you want to check if the element already exists.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds the create command, that send the NLM_F_EXCL flag so
nf_tables bails out if the element already exists, eg.
# nft add element x y { 1.1.1.1 }
# nft create element x y { 1.1.1.1 }
<cmdline>:1:1-31: Error: Could not process rule: File exists
create element x y { 1.1.1.1 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This update requires nf_tables kernel patches to honor the NLM_F_EXCL.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add support for the 'create' command, we already support this in other
existing objects, so support this for sets too, eg.
# nft add set x y { type ipv4_addr\; }
# nft create set x y { type ipv4_addr\; }
<cmdline>:1:1-35: Error: Could not process rule: File exists
create set x y { type ipv4_addr; }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# nft add set x y { type ipv4_addr\; }
#
This command sets the NLM_F_EXCL netlink flag, so if the object already
exists, nf_tables returns -EEXIST.
This is changing the existing behaviour of 'nft add set' which was
setting this flag, this is inconsistent with regards to the way other
objects behave.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>