segtree: perform stricter expression type validation from expr_value()
This helper function returns a expression value type that represents the
set element key. This functions currently expects two kind of
expressions: set elements and mappings.
Bail out otherwise, if we see anything else, we have to fix our code.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sat, 26 Mar 2016 10:04:48 +0000 (10:04 +0000)]
nft monitor [ trace ]
... can now display nftables nftrace debug information.
$ nft filter input tcp dport 10000 nftrace set 1
$ nft filter input icmp type echo-request nftrace set 1
$ nft -nn monitor trace
trace id e1f5055f ip filter input packet: iif eth0 ether saddr 63:f6:4b:00:54:52 ether daddr c9:4b:a9:00:54:52 ip saddr 192.168.122.1 ip daddr 192.168.122.83 ip tos 0 ip ttl 64 ip id 32315 ip length 84 icmp type echo-request icmp code 0 icmp id 10087 icmp sequence 1
trace id e1f5055f ip filter input rule icmp type echo-request nftrace set 1 (verdict continue)
trace id e1f5055f ip filter input verdict continue
trace id e1f5055f ip filter input
trace id 74e47ad2 ip filter input packet: iif vlan0 ether saddr 63:f6:4b:00:54:52 ether daddr c9:4b:a9:00:54:52 vlan pcp 0 vlan cfi 1 vlan id 1000 ip saddr 10.0.0.1 ip daddr 10.0.0.2 ip tos 0 ip ttl 64 ip id 49030 ip length 84 icmp type echo-request icmp code 0 icmp id 10095 icmp sequence 1
trace id 74e47ad2 ip filter input rule icmp type echo-request nftrace set 1 (verdict continue)
trace id 74e47ad2 ip filter input verdict continue
trace id 74e47ad2 ip filter input
trace id 3030de23 ip filter input packet: iif vlan0 ether saddr 63:f6:4b:00:54:52 ether daddr c9:4b:a9:00:54:52 vlan pcp 0 vlan cfi 1 vlan id 1000 ip saddr 10.0.0.1 ip daddr 10.0.0.2 ip tos 16 ip ttl 64 ip id 59062 ip length 60 tcp sport 55438 tcp dport 10000 tcp flags == syn tcp window 29200
trace id 3030de23 ip filter input rule tcp dport 10000 nftrace set 1 (verdict continue)
trace id 3030de23 ip filter input verdict continue
trace id 3030de23 ip filter input
Based on a patch from Florian Westphal, which again was based on a patch
from Markus Kötter.
Patrick McHardy [Sat, 26 Mar 2016 10:04:48 +0000 (10:04 +0000)]
proto: add protocol header fields filter and ordering for packet decoding
The next patch introduces packet decoding for tracing messages based on
the proto definitions. In order to provide a readable output, add a filter
to surpress uninteresting header fields and allow to specify and explicit
output order.
The code contains multiple scattered around fragments to fiddle with the
protocol contexts to work around the fact that stacked headers update the
context for the incorrect layer.
Fix this by updating the correct layer in payload_expr_pctx_update() and
also take care of offset adjustments there and only there. Remove all
manual protocol context fiddling and change protocol context debugging to
also print the offset for stacked headers.
tests/shell: delete tempfile failover in testcases
It seems both Debian/Fedora (and derivates) contains mktemp (from the coreutils
package) so it makes no sense to have this failover, which looks buggy also.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Testscases for Netfilter bug #965:
* add rule at position
* insert rule at position
* replace rule with given handle
* delete rule with given handle
* don't allow to delete rules with position keyword
Now it is possible to store multiple variable length user data into rule.
Modify the parser in order to fill the nftnl_udata with the comment, and
the print function for extract these commentary and print it to user.
Signed-off-by: Carlos Falgueras García <carlosfg@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: evaluate: Show error for fanout without balance
The idea of fanout option is to improve the performance by indexing CPU
ID to map packets to the queues. This is used for load balancing.
Fanout option is not required when there is a single queue specified.
According to iptables, queue balance should be specified in order to use
fanout. Following that, throw an error in nftables if the range of
queues for load balancing is not specified with the fanout option.
After this patch,
$ sudo nft add rule ip filter forward counter queue num 0 fanout
<cmdline>:1:46-46: Error: fanout requires a range to be specified
add rule ip filter forward counter queue num 0 fanout
^^^^^
Signed-off-by: Shivani Bhardwaj <shivanib134@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: store parser location for handle and position specifiers
Store the parser location structure for handle and position IDs so we
can use this information from the evaluation step, to provide better
error reporting.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Arturo Borrero [Fri, 18 Mar 2016 08:41:31 +0000 (09:41 +0100)]
tests/shell/run-tests.sh: tune kernel cleanup
The modprobe call can return != 0 if, for example, a module was builtin and
we are triying to remove it, so force return code of 0 at the end of the
script.
This patch also adds the '-a' switch to modprobe so it doesn't stop unloading
modules if one of them fails (for example, it was builtin).
While at it, fix several module names, for example: 'nft_bridge_reject' vs
'nft_reject_bridge', delete bogus module names.
Arturo Borrero [Thu, 17 Mar 2016 08:34:47 +0000 (09:34 +0100)]
tests/shell: unload modules between tests
This patch adjusts the main test script so it unload all nftables
kernel modules between tests.
This way we achieve two interesting things:
* avoid false errors in some testcases due to module loading order
* test the module loading/unloading path itself
The false positives is for example, listing ruleset per families, which depends
on the loading order of nf_tables_xx modules.
We can later add more modules to unload incrementally (for
example nf_tables_switchdev).
This patch assumes we are working with a kernel which is compiled with
nf_tables =m, the case using =y is not supported and can still produce false
positives in some testcases due to module ordering.
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
parser_bison: release parsed type and hook name strings
The scanner allocates memory for this, so release them given that we
don't attach them to any object.
==6277== 42 bytes in 6 blocks are definitely lost in loss record 2 of 4
==6277== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==6277== by 0x57AC9D9: strdup (strdup.c:42)
==6277== by 0x41B82D: xstrdup (utils.c:64)
==6277== by 0x41F510: nft_lex (scanner.l:511)
==6277== by 0x427FD1: nft_parse (parser_bison.c:3690)
==6277== by 0x4063AC: nft_run (main.c:231)
==6277== by 0x40600C: main (main.c:361)
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Piyush Pangtey [Tue, 15 Mar 2016 03:07:41 +0000 (08:37 +0530)]
rule: Remove memory leak
Added matching xfree calls in chain_free(), for the chain members 'type' and
'dev'.
It can be reproduced by :
nft add chain x y { type filter hook input priority 0; }
Then:
$ sudo valgrind --leak-check=full nft list tables
==2899== HEAP SUMMARY:
==2899== in use at exit: 327 bytes in 10 blocks
==2899== total heap usage: 145 allocs, 135 frees, 211,462 bytes allocated
==2899==
==2899== 63 bytes in 9 blocks are definitely lost in loss record 1 of 2
==2899== at 0x4C2AB80: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2899== by 0x57A3839: strdup (strdup.c:42)
==2899== by 0x41C05D: xstrdup (utils.c:64)
==2899== by 0x411E9B: netlink_delinearize_chain.isra.3 (netlink.c:717)
==2899== by 0x411F70: list_chain_cb (netlink.c:748)
==2899== by 0x504A943: nft_chain_list_foreach (chain.c:1015)
==2899== by 0x4145AE: netlink_list_chains (netlink.c:771)
==2899== by 0x40793F: cache_init_objects (rule.c:90)
==2899== by 0x40793F: cache_init (rule.c:130)
==2899== by 0x40793F: cache_update (rule.c:147)
==2899== by 0x40FB59: cmd_evaluate (evaluate.c:2475)
==2899== by 0x429A1C: nft_parse (parser_bison.y:655)
==2899== by 0x40651C: nft_run (main.c:231)
==2899== by 0x40618C: main (main.c:357)
==2899==
==2899== LEAK SUMMARY:
==2899== definitely lost: 63 bytes in 9 blocks
==2899== indirectly lost: 0 bytes in 0 blocks
==2899== possibly lost: 0 bytes in 0 blocks
==2899== still reachable: 264 bytes in 1 blocks
==2899== suppressed: 0 bytes in 0 blocks
==2899== Reachable blocks (those to which a pointer was found) are not shown.
==2899== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==2899==
==2899== For counts of detected and suppressed errors, rerun with: -v
==2899== Use --track-origins=yes to see where uninitialised values come from
==2899== ERROR SUMMARY: 4 errors from 2 contexts (suppressed: 0 from 0)
Signed-off-by: Piyush Pangtey <gokuvsvegita@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: use table_lookup_global() from expr_evaluate_symbol()
If there's already a table 'test' defined in the kernel and you load
another table 'test' via `nft -f', table_lookup() returns the table
that already exists in the kernel, so if you look up for objects that
are defined in the file, nft bails out with 'Set does not exist'.
Use table_lookup_global() function returns the existing table that is
defined in the file and that it is set as context via
ctx->handle->table.
This is not a complete fix, we should splice the existing kernel objects
into the userspace declaration. We just need some way to identify what
objects are already in the kernel so we don't send them again (otherwise
we will hit EEXIST errors). I'll follow up with this full fix asap.
Anyway, this patch fixes this shell test:
I: [OK] ./testcases/sets/cache_handling_0
So at least by now we have all shell test returning OK. I'll add more
tests to catch the case I describe above once it is fixed too.
Cc: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We get a partial cache (tables, chains and sets) when:
* We see a set reference from a rule, since this set object may be
already defined in kernelspace and we need to fetch the datatype
for evaluation.
* We add/delete a set element, we need this to evaluate if the
element datatype is correct.
* We rename a chain, since we need to know the chain handle.
* We add a chain/set. This isn't needed for simple command line
invocations. However, since the existing codepath is also exercised
from `nft -f' context, we need to know if the object exists in the
kernel. Thus, if this a newly declared object (not yet in the kernel) we
add it to the cache, otherwise, we will not find follow up references to
this object in our cache.
We get a full cache when:
* We list the ruleset. We can provide finer grain listing though,
via partial cache, later.
* We monitor updates, since this displays incremental updates based on
the existing objects.
* We export the ruleset, since this dumps all of the existing objects.
* We push updates via `nft -f'. We need to know what objects are
already in the kernel for incremental updates. Otherwise,
cache_update() hits a bogus 'set doesn't exist' error message for
just declared set in this batch. To avoid this problem, we need a
way to differentiate between what objects in the lists that are
already defined in the kernel and what are just declared in this
batch (hint: the location structure information is set for just
declared objects).
We don't get a cache at all when:
* We flush the ruleset, this is important in case of delinearize
bugs, so you don't need to reboot or manually flush the ruleset via
libnftnl examples/nft-table-flush.
* We delete any object, except for set elements (as we describe above).
* We add a rule, so you can generate via --debug=netlink the expression
without requiring a table and chain in place.
* We describe a expression.
This patch also includes some intentional adjustments to the shell tests
to we don't get bogus errors due to changes in the list printing.
BTW, this patch also includes a revert for 97493717e738 ("evaluate: check
if table and chain exists when adding rules") since that check is not
possible anymore with this logic.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
... else rule like vlan pcp 1-3 won't work and will be displayed
as 0-0 (reverse direction already works since range is represented
as two lte/gte compare expressions).
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
parser_bison: allow 'snat' and 'dnat' keywords from the right-hand side
Parse 'snat' and 'dnat' reserved keywords from the right-hand side as
symbols. Thus, we can use them as values from ct status.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=950 Reported-by: Ana Rey <anarey@gmail.com> Reported-by: Karol Babioch <karol@babioch.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Piyush Pangtey [Tue, 8 Mar 2016 14:27:15 +0000 (19:57 +0530)]
doc: nft: Fixed a typo and added/changed punctuation
In nft's man page , instead of using '/' between shortopt and longopt in the
"SYNOPSIS" and "OPTIONS" section , use '|' and ',' respectively.
(just like the man pages of iptables, etc.)
Fixed a typo and added missing ',' .
Signed-off-by: Piyush Pangtey <gokuvsvegita@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
rule: simplify ("rule: delete extra space in sets printing")
This simplifies bd23f7628570 ("rule: delete extra space in sets printing")
by passing the whitespace from set_print_plain() called from the monitoring
path.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
netlink_delinearize: handle extension header templates with odd sizes
This enables nft to display
frag frag-off 33
... by considering a mask during binop postprocess in case
the initial template lookup done when the exthdr expression was
created did not yield a match.
In the above example, kernel netlink data specifies 16bits,
but the frag field is only 13bits wide.
We use the implicit binop mask to re-do the template lookup with
corrected offset and size information.
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
payload: move payload_gen_dependency generic part to helper
We should treat exthdr just as if user asked for e.g. ip6 saddr
and inject the needed dependency statement.
payload_gen_dependency cannot be used since the *expr needs
to be a payload expression, but the actual dependency generation
doesn't depend on a particular expression type.
In order to reuse this part for future exthdr dependency injection
move it to a helper.
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: annotate follow up dependency just after killing another
The inet and netdev families generate two implicit dependencies to check
for the interface type, so we have to check just after killing an implicit
dependency if there is another that we should annotate to kill it as well.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: generate ether type payload after meta iiftype
Once the meta iiftype is generated, we shouldn't return from
resolve_protocol_conflict() since we also need to generate the ether
type payload implicit match after it.
This gets rid of the manual proto-ctx update from
meta_iiftype_gen_dependency() that we don't need since stmt_evaluate()
already handles this for us.
Moreover, skip error reporting once we verify that the protocol conflict
has been resolved.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Shivani Bhardwaj [Thu, 28 Jan 2016 19:35:37 +0000 (01:05 +0530)]
src: netlink_linearize: Fix bug for redirect target
Before this patch,
$ sudo nft --debug=netlink add rule ip nat post ip protocol tcp redirect to 100-200
ip nat post
[ payload load 1b @ network header + 9 => reg 1 ]
[ cmp eq reg 1 0x00000006 ]
[ immediate reg 1 0x00006400 ]
[ immediate reg 2 0x0000c800 ]
[ redir proto_min reg 1 proto_max reg 5 ]
<cmdline>:1:1-56: Error: Could not process rule: Invalid argument
add rule ip nat post ip protocol tcp redirect to 100-200
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
After this patch,
$ sudo nft --debug=netlink add rule ip nat post ip protocol tcp redirect to 100-200
ip nat post
[ payload load 1b @ network header + 9 => reg 1 ]
[ cmp eq reg 1 0x00000006 ]
[ immediate reg 1 0x00006400 ]
[ immediate reg 2 0x0000c800 ]
[ redir proto_min reg 1 proto_max reg 2 ]
Signed-off-by: Shivani Bhardwaj <shivanib134@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch contains the missing chunk to add support for the netdev
family. Part of the support slipped through in the original patch to
add the dup statement for IPv4 and IPv6.
Florian Westphal [Sun, 24 Jan 2016 12:01:04 +0000 (13:01 +0100)]
netlink: move binop postprocess to extra function
Just move the payload trim part to a separate function.
Next patch will add a second call site to deal with map ops
that use a lookup based on a binop result.
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
So far it was only possible to match packet under a rate limit, this
patch allows you to explicitly indicate if you want to match packets
that goes over or until the rate limit, eg.
... limit rate over 3/second counter log prefix "OVERLIMIT: " drop
... limit rate over 3 mbytes/second counter log prefix "OVERLIMIT: " drop
... ct state invalid limit rate until 1/second counter log prefix "INVALID: "
Florian Westphal [Sat, 12 Dec 2015 00:10:04 +0000 (01:10 +0100)]
ct: add packet/byte counter support
packets and bytes need special treatment -- we want to be able to get
packet/byte counter in either direction, but also express
'fetch in *BOTH* directions', i.e.
ct packets original + ct packets reply > 1000
This either requires a '+' expression, a new 'both' direction, or
keys where direction is optional, i.e.
ct packets > 12345 ; original + reply
ct original packets > 12345 ; original
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
tests/py: don't test log statement from protocol match
I think this unit tests should be self-contained at some level. The
shell/ directory should be used to catch regressions at ruleset level,
ie. these kind of combinations.
Another motivation is that I want that netdev/ingress gets tested
(coming in a follow up patch), and we don't support log there yet, so I
would need to skip this test for that case.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use rhs_expr and list_rhs_expr as possible occurrences of
initializer_expr since we may only find constant expressions on the
right hand side of the assignment.
Fixes: 2a5d44d8b3c (parser: get rid of multiton_expr from lhs relational expression) Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Florian Westphal <fw@strlen.de> Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Now the chains should be included before tables. Also, lines defining
tables have a new third part (delimited by semicolon) where the chains
needed by the table are declared. If table needs to include more than
one chain, those must be separated by commas: