This patch adds the `flush ruleset' operation to nft.
The syntax is:
% nft flush ruleset [family]
To flush all the ruleset (all families):
% nft flush ruleset
To flush the ruleset of a given family:
% nft flush ruleset ip
% nft flush ruleset inet
This flush is a shortcut operation which deletes all rules, sets, tables
and chains.
It's possible since the modifications in the kernel to the NFT_MSG_DELTABLE
API call.
Users can benefit of this operation when doing an atomic replacement of the
entire ruleset, loading a file like this:
Ana Rey [Tue, 2 Sep 2014 18:37:17 +0000 (20:37 +0200)]
src: Add devgroup support in meta expresion
This adds device group support in meta expresion.
The new attributes of meta are "iffgroup" and "oifgroup"
- iffgroup: Match device group of incoming device.
- oifgroup: Match device group of outcoming device.
Example of use:
nft add rule ip test input meta iifgroup 2 counter
nft add rule ip test output meta oifgroup 2 counter
The kernel and libnftnl support were added in these commits:
netfilter: nf_tables: add devgroup support in meta expresion
src: meta: Add devgroup support to meta expresion
Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ana Rey [Tue, 5 Aug 2014 18:33:39 +0000 (20:33 +0200)]
src: Add support for pkttype in meta expresion
If you want to match the pkttype field of the skbuff, you have to
use the following syntax:
nft add rule ip filter input meta pkttype PACKET_TYPE
where PACKET_TYPE can be: unicast, broadcast and multicast.
Joint work with Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: don't return error in netlink_linearize_rule()
This function converts the rule from the list of statements to the
netlink message format. The only two possible errors that can make
this function to fail are memory exhaustion and malformed statements
which inmediately stop the execution of nft.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Mon, 18 Aug 2014 23:21:59 +0000 (00:21 +0100)]
datatype: take endianess into account in symbolic_constant_print()
symbolic_constant_print() uses mpz_cmp_ui() to find the matching symbol.
Since GMP internally treats all values as being in host byte, this
doesn't work when the constant value is non-host byteorder, such as
the ethernet protocol type.
Export the expression's value in its original byteorder for comparison
to fix this.
Patrick McHardy [Mon, 18 Aug 2014 23:21:59 +0000 (00:21 +0100)]
payload: take endianess into account when updating the payload context
payload_expr_pctx_update() uses the numeric protocol value in host byte
order to find the upper layer protocol. This obviously doesn't work for
protocol expressions in other byte orders, such as the ethernet protocol
on little endian.
Export the protocol value in the correct byte order and use that value
to look up the upper layer protocol.
Yanchuan Nian [Mon, 11 Aug 2014 02:24:24 +0000 (10:24 +0800)]
Fix memory leak in nft get operation
Some memories are forgotten to release on the error path in get operation.
Just release them. Also, in netlink_get_chain, it's better to return
immediately when a error is detected.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
You can also specify the snaplen and qthreshold for the nfnetlink_log.
But you cannot mix level and group at the same time, they are mutually
exclusive.
Default values for both snaplen and qthreshold are 0 (just like in
iptables).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
mnl: fix crashes when using sets with many elements
nft crashes when adding many elements into a set for two reasons:
1) The overflow of the nla_len field for the NFTA_SET_ELEM_LIST_ELEMENTS
attribute.
2) Out-of-bound memory writes to the reserved area for the netlink
message, which is solved by the patch entitled ("mnl: introduce
NFT_NLMSG_MAXSIZE").
This patch adds the corresponding nla_len overflow check for
NFTA_SET_ELEM_LIST_ELEMENTS and it splits the elements in several
netlink messages. This should be enough when set updates are handled
by the transaction infrastructure.
With this patch, nft should be now capable of adding an unlimited
number of elements to a given set.
Fixes: https://bugzilla.netfilter.org/show_bug.cgi?id=898 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The NFT_NLMSG_MAXSIZE constant defines the maximum nf_tables netlink
message. Currently, the largest is the set element message, which
contains the NFTA_SET_ELEM_LIST_ELEMENTS attribute. This attribute is
a nest that describes the set elements. Given that the netlink attribute
length (nla_len) is 16 bits, the largest message is a bit larger than
64 KBytes. Thus, the proposed value of NFT_NLMSG_MAXSIZE is set to
(1 << 16) + getpagesize().
This new constant is used to calculate the length of:
1) the batch page length, when the batching mode is used.
2) the buffer that stores the netlink message in the send (when no
batching is used) and receive paths.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
# nft add rule ip test input ip hdrlength 3
<cmdline>:1:1-37: Error: Could not process rule: Invalid argument
add rule ip test input ip hdrlength 3
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# echo $?
0
After:
# nft add rule ip test input ip hdrlength 3
<cmdline>:1:1-37: Error: Could not process rule: Invalid argument
add rule ip test input ip hdrlength 3
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# echo $?
1
Reported-by: Ana Rey Botello <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
src: rework batching logic to fix possible use of uninitialized pages
This patch reworks the batching logic in several aspects:
1) New batch pages are now always added into the batch page list in
first place. Then, in the send path, if the last batch page is
empty, it is removed from the batch list.
2) nft_batch_page_add() is only called if the current batch page is
full. Therefore, it is guaranteed to find a valid netlink message
in the batch page when moving the tail that didn't fit into a new
batch page.
3) The batch paging is initialized and released from the nft_netlink()
path.
4) No more global struct mnl_nlmsg_batch *batch that points to the
current batch page. Instead, it is retrieved from the tail of the
batch list, which indicates the current batch page.
This patch fixes a crash due to access of uninitialized memory area in
due to calling batch_page_add() with an empty batch in the send path,
and the memleak of the batch page contents. Reported in:
mnl: check for NLM_F_DUMP_INTR when dumping object lists
This flag allows to detect that an update has ocurred while dumping
any of the object lists. In case of interference, nft cancels the
netlink socket to skip processing the remaining stale entries and
it retries to obtain fresh list of objects.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch reverts Alvaro's 34040b1 ("reject: add ICMP code parameter
for indicating the type of error") and 11b2bb2 ("reject: Use protocol
context for indicating the reject type").
These patches are flawed by two things:
1) IPv6 support is broken, only ICMP codes are considered.
2) If you don't specify any transport context, the utility exits without
adding the rule, eg. nft add rule ip filter input reject.
The kernel is also flawed when it comes to the inet table. Let's revert
this until we can provide decent reject reason support.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
reject: add ICMP code parameter for indicating the type of error
This patch allows to indicate the ICMP code field in case that we
use to reject. Before, we have always sent network unreachable error
as ICMP code, now we can explicitly indicate the ICMP code that
we want to use. Examples:
reject: Use protocol context for indicating the reject type
This patch uses the protocol context to initialize the reject type
considering if the transport protocol is tcp, udp, etc. Before this
patch, this was left unset.
Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch allows to use a new syntax more compact and break
the current syntax. This new syntax is more similar than the nftables
syntax that we use usually. We can use range like we have used in
other case in nftables. Here, we have some examples:
Before, If we want to declare a queue, we have used a syntax like this:
nft add rule test input queue num 1 total 3 options bypass,fanout
If we want to use the queue number 1 and the two next (total 3),
we use a range in the new syntax, for example:
nft add rule test input queue num 1-3 bypass fanout
Also if we want to use only one queue, the new rules are like:
nft add rule test input queue num 1 # queue 1
or
nft add rule test input queue # queue 0
And if we want to add a specific flags we only need to put
what flags we want to use:
nft add rule test input queue bypass
we don't need to use options and the comma for indicating the
flags.
Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
cli: fix nft -i command crashes when try to input multi line command
When try to input multiline command in "nft -i", it crashes.
Issue is, function cli_append_multiline() return null in case of
multiline command. But in the calling function cli_complete(),
cli_exit is getting called, which in turn calls
rl_callback_handler_remove() and the handler is getting removed.
netlink: fix crash if kernel doesn't support nfnetlink / nf_tables
When trying to close a descriptor which failed to be opened.
==6231== Process terminating with default action of signal 11 (SIGSEGV)
==6231== Access not within mapped region at address 0x0
==6231== at 0x5503E21: mnl_socket_close (socket.c:248)
==6231== by 0x40517F: netlink_close_sock (netlink.c:68)
==6231== by 0x400EFEE: _dl_fini (dl-fini.c:253)
==6231== by 0x5740AA0: __run_exit_handlers (exit.c:77)
==6231== by 0x5740B24: exit (exit.c:99)
==6231== by 0x40F16F: netlink_open_error (netlink.c:105)
==6231== by 0x405642: netlink_open_sock (netlink.c:54)
==6231== by 0x424E6C: __libc_csu_init (in /usr/sbin/nft)
==6231== by 0x5728924: (below main) (libc-start.c:219)
==6231== If you believe this happened as a result of a stack
==6231== overflow in your program's main thread (unlikely but
==6231== possible), you can try to increase the size of the
==6231== main thread stack using the --main-stacksize= flag.
==6231== The main thread stack size used in this run was 8388608.
Closes: http://bugzilla.netfilter.org/show_bug.cgi?id=881 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netlink: don't add table/chain/set to ctx->list in the event path
The delinearize functions for tables, chains and sets add these objects
to the ctx->list. In the chain case, this is not required. Regarding
tables and sets, those are added to the cache.
This patch implicitly fixes an use chain object after free that result
in random crashes.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netlink_delinearize: fix double free in relational_binop_postprocess()
free(expr->right) and free(value) point to the same object, so one
single free() is enough.
This manifests in valgrind with:
==4020== Invalid read of size 4
==4020== at 0x40A429: expr_free (expression.c:65)
==4020== by 0x414032: expr_postprocess (netlink_delinearize.c:747)
==4020== by 0x414C33: netlink_delinearize_rule (netlink_delinearize.c:883)
==4020== by 0x411305: netlink_events_cb (netlink.c:1692)
==4020== by 0x55040AD: mnl_cb_run (callback.c:77)
==4020== by 0x4171E4: nft_mnl_recv (mnl.c:45)
==4020== by 0x407B44: do_command (rule.c:895)
==4020== by 0x405C6C: nft_run (main.c:183)
==4020== by 0x405849: main (main.c:334)
==4020== Address 0x5d126f8 is 56 bytes inside a block of size 120 free'd
==4020== at 0x4C2AF5C: free (vg_replace_malloc.c:446)
==4020== by 0x41402A: expr_postprocess (netlink_delinearize.c:746)
==4020== by 0x414C33: netlink_delinearize_rule (netlink_delinearize.c:883)
==4020== by 0x411305: netlink_events_cb (netlink.c:1692)
==4020== by 0x55040AD: mnl_cb_run (callback.c:77)
==4020== by 0x4171E4: nft_mnl_recv (mnl.c:45)
==4020== by 0x407B44: do_command (rule.c:895)
==4020== by 0x405C6C: nft_run (main.c:183)
==4020== by 0x405849: main (main.c:334)
==4020==
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch moves the netlink set messages to the batch that contains
the rules. This helps to speed up rule-set restoration time by
changing the operational. To achieve this, an internal set ID which
is unique to the batch is allocated as suggested by Patrick.
To retain backward compatibility, nft initially guesses if the
kernel supports set in batches. Otherwise, it falls back to the
previous (slowier) operational.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Mon, 14 Apr 2014 06:18:47 +0000 (08:18 +0200)]
build: fix documentation build
Handle the docbook2x-man mess that is called differently on different distributions.
Also switch to dblatex since db2pdf is unable to handle XML on Fedora (and probably
other distributions).
expression: fix constant expression allocation on big endian
When allocating a constant expression, a pointer to the data is passed
to the allocation function. When the variable used to store the data
is larger than the size of the data type, this fails on big endian since
the most significant bytes (being zero) come first.
Add a helper function to calculate the proper address for the cases
where this is needed.
This currently affects symbolic tables for values < u64 and payload
dependency generation for protocol values < u32.
Two issues with these:
1. They compile & run a test program, which won't work when cross-compiling
2. When libnftnl has just been installed and is not (yet) in linker path, the
test fails since loader won't find libnftnl.
In that case configure will succeed without obvious errors, but config.h
re-defines malloc/realloc with rpl_ prefix, which then results in a
linker error ("undefined reference to `rpl_realloc'") on 'make'.
These macros are only useful to check that malloc(0) returns non-NULL
and that realloc(NULL, ... works.
For nftables the former is irrelevant and the latter a safe assumption,
so lets just remove them.
Ana Rey [Tue, 8 Apr 2014 08:19:41 +0000 (10:19 +0200)]
rule: fix crash in set listing
It fixes an invalid read that is shown by valgrind.
==3962== Invalid read of size 4
==3962== at 0x407040: do_command (rule.c:692)
==3962== by 0x40588C: nft_run (main.c:183)
==3962== by 0x405469: main (main.c:334)
==3962== Address 0x10 is not stack'd, malloc'd or (recently) free'd
Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ana Rey [Fri, 28 Mar 2014 12:30:27 +0000 (12:30 +0000)]
nftables: Fix list of sets by family
Fix the result of command line 'nft list sets FAMILY'. It shows the
following error message:
"Error: syntax error, unexpected end of file, expecting string"
Now, it is possible shows right this information:
$ sudo nft -nna list sets ip
set set_test {
type ipv4_address
elements = { 192.168.3.45, 192.168.3.43, 192.168.3.42, 192.168.3.4}
}
set set_test2 {
type ipv4_address
elements = { 192.168.3.43, 192.168.3.42, 192.168.3.4}
}
set set0 {
type ipv4_address
flags constant
elements = { 127.0.0.12, 12.11.11.11}
}
Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
Arturo Borrero [Wed, 12 Mar 2014 18:03:19 +0000 (19:03 +0100)]
ct: add support for setting ct mark
This patch adds the possibility to set ct keys using nft. Currently, the
connection mark is supported. This functionality enables creating rules
performing the same action as iptables -j CONNMARK --save-mark. For example:
table ip filter {
chain postrouting {
type filter hook postrouting priority 0;
ip protocol icmp ip daddr 8.8.8.8 ct mark set meta mark
}
}
My patch is based on the original http://patchwork.ozlabs.org/patch/307677/
by Kristian Evensen <kristian.evensen@gmail.com>.
I simply did a rebase and some testing. To test, I added rules like these:
counter meta mark set 1 counter
counter ct mark set mark counter
counter ct mark 1 counter
The last matching worked as expected, which means the second rule is also
working as expected.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Acked-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The commit e7b43ec0 [expr: make expr_binary_error() usable outside of evaluation]
seem to change expr_binary_error() interface.
Later, several compilation warning appears.
The expr_binary_error() function and expr_error() macro both expect
`struct list_head *', so I simply changed callers to send `ctx->msgs'.
[...]
src/evaluate.c: In function ‘byteorder_conversion’:
src/evaluate.c:166:3: warning: passing argument 1 of ‘expr_binary_error’ from incompatible pointer type [enabled by default]
In file included from src/evaluate.c:21:0:
include/expression.h:275:12: note: expected ‘struct list_head *’ but argument is of type ‘struct eval_ctx *’
src/evaluate.c: In function ‘expr_evaluate_symbol’:
src/evaluate.c:204:4: warning: passing argument 1 of ‘expr_binary_error’ from incompatible pointer type [enabled by default]
In file included from src/evaluate.c:21:0:
include/expression.h:275:12: note: expected ‘struct list_head *’ but argument is of type ‘struct eval_ctx *’
[...]
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Fri, 7 Mar 2014 10:30:10 +0000 (11:30 +0100)]
segtree: sort set elements before decomposition
The decomposition phase currently depends on the kernel returning elements
in sorted order. This is a fragile assumption, change the code to sort the
elements itself.
Patrick McHardy [Mon, 17 Feb 2014 19:43:36 +0000 (19:43 +0000)]
parser: add grammatical distinction for verdict maps
Currently the parser accepts verdicts in regular maps and non-verdicts
in verdict maps and we have to check matching types during evaluation.
Add grammar rules for verdict maps and seperate them from regular maps.
This has a couple of advantages:
- We recognize verdict maps completely in the parser and any attempt to
mix verdicts and other expressions will result in a syntax error.
So far this hasn't actually been checked.
- Using verdicts in non-verdict mappings will also result in a syntax
error instead of a datatype mismatch.
- There's a grammatical distinction between dictionaries and verdict
maps, which are actually statements.
This is needed as preparation for a following patch to turn verdicts
into pure statements, which in turn is needed to reinstate support for
using the queue verdict in maps, which was broken by the introduction
of the queue statement.
Patrick McHardy [Fri, 7 Mar 2014 09:57:08 +0000 (10:57 +0100)]
netlink: use set location for IO errors
We currently crash when reporting a permission denied error for set additions.
This is due to using the wrong location, fix by passing in the set location.
Patrick McHardy [Thu, 6 Mar 2014 15:26:09 +0000 (16:26 +0100)]
set: abort on interval conflicts
We currently print a debug message (with debugging) and continue. Output
a proper error message and abort.
While at it, make sure we only report a conflict if there actually is one.
This is not the case similar actions, IOW in case of sets, never, in case
of maps, only if the mapping differs.
Florian Westphal [Tue, 22 Oct 2013 13:03:52 +0000 (15:03 +0200)]
ct: connlabel matching support
Takes advantage of the fact that the current maximum label storage area
is 128 bits, i.e. the dynamically allocated extension area in the
kernel will always fit into a nft register.
Currently this re-uses rt_symbol_table_init() to read connlabel.conf.
This works since the format is pretty much the same.
Patrick McHardy [Mon, 17 Feb 2014 14:06:44 +0000 (14:06 +0000)]
netlink: fix prefix expression handling
The prefix expression handling is full of bugs:
- netlink_gen_data() is used to construct the prefix mask from the full
prefix expression. This is both conceptually wrong, the prefix expression
is *not* data, and buggy, it only assumes network masks and thus only
handles big endian types.
- Prefix expression reconstruction doesn't check whether the mask is a
valid prefix and reconstructs crap otherwise. It doesn't reconstruct
prefixes for anything but network addresses. On top of that its
needlessly complicated, using the mpz values directly its a simple
matter of finding the sequence of 1's that extend up to the full width.
- Unnecessary cloning of expressions where a simple refcount increase would
suffice.
Patrick McHardy [Sun, 16 Feb 2014 22:45:19 +0000 (22:45 +0000)]
netlink_delinarize: convert *all* bitmask values into individual bit values
We're currently only converting bitmask types as direct argument to a
relational expression in the form of a flagcmp (expr & mask neq 0) back
into a list of bit values. This means expressions like:
tcp flags & (syn | ack) == syn | ack
won't be shown symbolically. Convert *all* bitmask values back to a sequence
of inclusive or expressions of the individual bits. In case of a flagcmp,
this sequence is further converted to a list (tcp flags syn,ack).