The code contains multiple scattered around fragments to fiddle with the
protocol contexts to work around the fact that stacked headers update the
context for the incorrect layer.
Fix this by updating the correct layer in payload_expr_pctx_update() and
also take care of offset adjustments there and only there. Remove all
manual protocol context fiddling and change protocol context debugging to
also print the offset for stacked headers.
tests/shell: delete tempfile failover in testcases
It seems both Debian/Fedora (and derivates) contains mktemp (from the coreutils
package) so it makes no sense to have this failover, which looks buggy also.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Testscases for Netfilter bug #965:
* add rule at position
* insert rule at position
* replace rule with given handle
* delete rule with given handle
* don't allow to delete rules with position keyword
Now it is possible to store multiple variable length user data into rule.
Modify the parser in order to fill the nftnl_udata with the comment, and
the print function for extract these commentary and print it to user.
Signed-off-by: Carlos Falgueras García <carlosfg@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: evaluate: Show error for fanout without balance
The idea of fanout option is to improve the performance by indexing CPU
ID to map packets to the queues. This is used for load balancing.
Fanout option is not required when there is a single queue specified.
According to iptables, queue balance should be specified in order to use
fanout. Following that, throw an error in nftables if the range of
queues for load balancing is not specified with the fanout option.
After this patch,
$ sudo nft add rule ip filter forward counter queue num 0 fanout
<cmdline>:1:46-46: Error: fanout requires a range to be specified
add rule ip filter forward counter queue num 0 fanout
^^^^^
Signed-off-by: Shivani Bhardwaj <shivanib134@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: store parser location for handle and position specifiers
Store the parser location structure for handle and position IDs so we
can use this information from the evaluation step, to provide better
error reporting.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Arturo Borrero [Fri, 18 Mar 2016 08:41:31 +0000 (09:41 +0100)]
tests/shell/run-tests.sh: tune kernel cleanup
The modprobe call can return != 0 if, for example, a module was builtin and
we are triying to remove it, so force return code of 0 at the end of the
script.
This patch also adds the '-a' switch to modprobe so it doesn't stop unloading
modules if one of them fails (for example, it was builtin).
While at it, fix several module names, for example: 'nft_bridge_reject' vs
'nft_reject_bridge', delete bogus module names.
Arturo Borrero [Thu, 17 Mar 2016 08:34:47 +0000 (09:34 +0100)]
tests/shell: unload modules between tests
This patch adjusts the main test script so it unload all nftables
kernel modules between tests.
This way we achieve two interesting things:
* avoid false errors in some testcases due to module loading order
* test the module loading/unloading path itself
The false positives is for example, listing ruleset per families, which depends
on the loading order of nf_tables_xx modules.
We can later add more modules to unload incrementally (for
example nf_tables_switchdev).
This patch assumes we are working with a kernel which is compiled with
nf_tables =m, the case using =y is not supported and can still produce false
positives in some testcases due to module ordering.
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
parser_bison: release parsed type and hook name strings
The scanner allocates memory for this, so release them given that we
don't attach them to any object.
==6277== 42 bytes in 6 blocks are definitely lost in loss record 2 of 4
==6277== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==6277== by 0x57AC9D9: strdup (strdup.c:42)
==6277== by 0x41B82D: xstrdup (utils.c:64)
==6277== by 0x41F510: nft_lex (scanner.l:511)
==6277== by 0x427FD1: nft_parse (parser_bison.c:3690)
==6277== by 0x4063AC: nft_run (main.c:231)
==6277== by 0x40600C: main (main.c:361)
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Piyush Pangtey [Tue, 15 Mar 2016 03:07:41 +0000 (08:37 +0530)]
rule: Remove memory leak
Added matching xfree calls in chain_free(), for the chain members 'type' and
'dev'.
It can be reproduced by :
nft add chain x y { type filter hook input priority 0; }
Then:
$ sudo valgrind --leak-check=full nft list tables
==2899== HEAP SUMMARY:
==2899== in use at exit: 327 bytes in 10 blocks
==2899== total heap usage: 145 allocs, 135 frees, 211,462 bytes allocated
==2899==
==2899== 63 bytes in 9 blocks are definitely lost in loss record 1 of 2
==2899== at 0x4C2AB80: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2899== by 0x57A3839: strdup (strdup.c:42)
==2899== by 0x41C05D: xstrdup (utils.c:64)
==2899== by 0x411E9B: netlink_delinearize_chain.isra.3 (netlink.c:717)
==2899== by 0x411F70: list_chain_cb (netlink.c:748)
==2899== by 0x504A943: nft_chain_list_foreach (chain.c:1015)
==2899== by 0x4145AE: netlink_list_chains (netlink.c:771)
==2899== by 0x40793F: cache_init_objects (rule.c:90)
==2899== by 0x40793F: cache_init (rule.c:130)
==2899== by 0x40793F: cache_update (rule.c:147)
==2899== by 0x40FB59: cmd_evaluate (evaluate.c:2475)
==2899== by 0x429A1C: nft_parse (parser_bison.y:655)
==2899== by 0x40651C: nft_run (main.c:231)
==2899== by 0x40618C: main (main.c:357)
==2899==
==2899== LEAK SUMMARY:
==2899== definitely lost: 63 bytes in 9 blocks
==2899== indirectly lost: 0 bytes in 0 blocks
==2899== possibly lost: 0 bytes in 0 blocks
==2899== still reachable: 264 bytes in 1 blocks
==2899== suppressed: 0 bytes in 0 blocks
==2899== Reachable blocks (those to which a pointer was found) are not shown.
==2899== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==2899==
==2899== For counts of detected and suppressed errors, rerun with: -v
==2899== Use --track-origins=yes to see where uninitialised values come from
==2899== ERROR SUMMARY: 4 errors from 2 contexts (suppressed: 0 from 0)
Signed-off-by: Piyush Pangtey <gokuvsvegita@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: use table_lookup_global() from expr_evaluate_symbol()
If there's already a table 'test' defined in the kernel and you load
another table 'test' via `nft -f', table_lookup() returns the table
that already exists in the kernel, so if you look up for objects that
are defined in the file, nft bails out with 'Set does not exist'.
Use table_lookup_global() function returns the existing table that is
defined in the file and that it is set as context via
ctx->handle->table.
This is not a complete fix, we should splice the existing kernel objects
into the userspace declaration. We just need some way to identify what
objects are already in the kernel so we don't send them again (otherwise
we will hit EEXIST errors). I'll follow up with this full fix asap.
Anyway, this patch fixes this shell test:
I: [OK] ./testcases/sets/cache_handling_0
So at least by now we have all shell test returning OK. I'll add more
tests to catch the case I describe above once it is fixed too.
Cc: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We get a partial cache (tables, chains and sets) when:
* We see a set reference from a rule, since this set object may be
already defined in kernelspace and we need to fetch the datatype
for evaluation.
* We add/delete a set element, we need this to evaluate if the
element datatype is correct.
* We rename a chain, since we need to know the chain handle.
* We add a chain/set. This isn't needed for simple command line
invocations. However, since the existing codepath is also exercised
from `nft -f' context, we need to know if the object exists in the
kernel. Thus, if this a newly declared object (not yet in the kernel) we
add it to the cache, otherwise, we will not find follow up references to
this object in our cache.
We get a full cache when:
* We list the ruleset. We can provide finer grain listing though,
via partial cache, later.
* We monitor updates, since this displays incremental updates based on
the existing objects.
* We export the ruleset, since this dumps all of the existing objects.
* We push updates via `nft -f'. We need to know what objects are
already in the kernel for incremental updates. Otherwise,
cache_update() hits a bogus 'set doesn't exist' error message for
just declared set in this batch. To avoid this problem, we need a
way to differentiate between what objects in the lists that are
already defined in the kernel and what are just declared in this
batch (hint: the location structure information is set for just
declared objects).
We don't get a cache at all when:
* We flush the ruleset, this is important in case of delinearize
bugs, so you don't need to reboot or manually flush the ruleset via
libnftnl examples/nft-table-flush.
* We delete any object, except for set elements (as we describe above).
* We add a rule, so you can generate via --debug=netlink the expression
without requiring a table and chain in place.
* We describe a expression.
This patch also includes some intentional adjustments to the shell tests
to we don't get bogus errors due to changes in the list printing.
BTW, this patch also includes a revert for 97493717e738 ("evaluate: check
if table and chain exists when adding rules") since that check is not
possible anymore with this logic.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
... else rule like vlan pcp 1-3 won't work and will be displayed
as 0-0 (reverse direction already works since range is represented
as two lte/gte compare expressions).
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
parser_bison: allow 'snat' and 'dnat' keywords from the right-hand side
Parse 'snat' and 'dnat' reserved keywords from the right-hand side as
symbols. Thus, we can use them as values from ct status.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=950 Reported-by: Ana Rey <anarey@gmail.com> Reported-by: Karol Babioch <karol@babioch.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Piyush Pangtey [Tue, 8 Mar 2016 14:27:15 +0000 (19:57 +0530)]
doc: nft: Fixed a typo and added/changed punctuation
In nft's man page , instead of using '/' between shortopt and longopt in the
"SYNOPSIS" and "OPTIONS" section , use '|' and ',' respectively.
(just like the man pages of iptables, etc.)
Fixed a typo and added missing ',' .
Signed-off-by: Piyush Pangtey <gokuvsvegita@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
rule: simplify ("rule: delete extra space in sets printing")
This simplifies bd23f7628570 ("rule: delete extra space in sets printing")
by passing the whitespace from set_print_plain() called from the monitoring
path.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
netlink_delinearize: handle extension header templates with odd sizes
This enables nft to display
frag frag-off 33
... by considering a mask during binop postprocess in case
the initial template lookup done when the exthdr expression was
created did not yield a match.
In the above example, kernel netlink data specifies 16bits,
but the frag field is only 13bits wide.
We use the implicit binop mask to re-do the template lookup with
corrected offset and size information.
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
payload: move payload_gen_dependency generic part to helper
We should treat exthdr just as if user asked for e.g. ip6 saddr
and inject the needed dependency statement.
payload_gen_dependency cannot be used since the *expr needs
to be a payload expression, but the actual dependency generation
doesn't depend on a particular expression type.
In order to reuse this part for future exthdr dependency injection
move it to a helper.
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: annotate follow up dependency just after killing another
The inet and netdev families generate two implicit dependencies to check
for the interface type, so we have to check just after killing an implicit
dependency if there is another that we should annotate to kill it as well.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: generate ether type payload after meta iiftype
Once the meta iiftype is generated, we shouldn't return from
resolve_protocol_conflict() since we also need to generate the ether
type payload implicit match after it.
This gets rid of the manual proto-ctx update from
meta_iiftype_gen_dependency() that we don't need since stmt_evaluate()
already handles this for us.
Moreover, skip error reporting once we verify that the protocol conflict
has been resolved.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Shivani Bhardwaj [Thu, 28 Jan 2016 19:35:37 +0000 (01:05 +0530)]
src: netlink_linearize: Fix bug for redirect target
Before this patch,
$ sudo nft --debug=netlink add rule ip nat post ip protocol tcp redirect to 100-200
ip nat post
[ payload load 1b @ network header + 9 => reg 1 ]
[ cmp eq reg 1 0x00000006 ]
[ immediate reg 1 0x00006400 ]
[ immediate reg 2 0x0000c800 ]
[ redir proto_min reg 1 proto_max reg 5 ]
<cmdline>:1:1-56: Error: Could not process rule: Invalid argument
add rule ip nat post ip protocol tcp redirect to 100-200
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
After this patch,
$ sudo nft --debug=netlink add rule ip nat post ip protocol tcp redirect to 100-200
ip nat post
[ payload load 1b @ network header + 9 => reg 1 ]
[ cmp eq reg 1 0x00000006 ]
[ immediate reg 1 0x00006400 ]
[ immediate reg 2 0x0000c800 ]
[ redir proto_min reg 1 proto_max reg 2 ]
Signed-off-by: Shivani Bhardwaj <shivanib134@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch contains the missing chunk to add support for the netdev
family. Part of the support slipped through in the original patch to
add the dup statement for IPv4 and IPv6.
Florian Westphal [Sun, 24 Jan 2016 12:01:04 +0000 (13:01 +0100)]
netlink: move binop postprocess to extra function
Just move the payload trim part to a separate function.
Next patch will add a second call site to deal with map ops
that use a lookup based on a binop result.
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
So far it was only possible to match packet under a rate limit, this
patch allows you to explicitly indicate if you want to match packets
that goes over or until the rate limit, eg.
... limit rate over 3/second counter log prefix "OVERLIMIT: " drop
... limit rate over 3 mbytes/second counter log prefix "OVERLIMIT: " drop
... ct state invalid limit rate until 1/second counter log prefix "INVALID: "
Florian Westphal [Sat, 12 Dec 2015 00:10:04 +0000 (01:10 +0100)]
ct: add packet/byte counter support
packets and bytes need special treatment -- we want to be able to get
packet/byte counter in either direction, but also express
'fetch in *BOTH* directions', i.e.
ct packets original + ct packets reply > 1000
This either requires a '+' expression, a new 'both' direction, or
keys where direction is optional, i.e.
ct packets > 12345 ; original + reply
ct original packets > 12345 ; original
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
tests/py: don't test log statement from protocol match
I think this unit tests should be self-contained at some level. The
shell/ directory should be used to catch regressions at ruleset level,
ie. these kind of combinations.
Another motivation is that I want that netdev/ingress gets tested
(coming in a follow up patch), and we don't support log there yet, so I
would need to skip this test for that case.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use rhs_expr and list_rhs_expr as possible occurrences of
initializer_expr since we may only find constant expressions on the
right hand side of the assignment.
Fixes: 2a5d44d8b3c (parser: get rid of multiton_expr from lhs relational expression) Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Florian Westphal <fw@strlen.de> Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Now the chains should be included before tables. Also, lines defining
tables have a new third part (delimited by semicolon) where the chains
needed by the table are declared. If table needs to include more than
one chain, those must be separated by commas:
- Adjust lines to 80 columns style
- Add two lines of separation between functions
- Remove redundant parentheses and semicolons
- Apply other minor style fixes
Signed-off-by: Pablo M. Bermudo Garay <pablombg@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netlink: don't handle lhs zero-length expression as concat type
expr->len 0 can appear for some data types whose size can be different
based on some external state, e.g. the conntrack src/dst addresses.
The nft type is 'invalid/0-length' in the template definition, the
size is set (on linearization) based on the network base family,
i.e. the type is changed to ip or ipv6 address at a later stage.
For delinarization, skip zero-length expression as concat type
and give expr_postprocess a chance to fix the types.
Without this change the previous patch will result in nft consuming all
available memory when trying to display e.g. a 'ct saddr' rule.
A few keys in the ct expression are directional, i.e.
we need to tell kernel if it should fetch REPLY or ORIGINAL direction.
Split ct_keys into ct_keys & ct_keys_dir, the latter are those keys
that the kernel rejects unless also given a direction.
During postprocessing we also need to invoke ct_expr_update_type,
problem is that e.g. ct saddr can be any family (ip, ipv6) so we need
to update the expected data type based on the network base.