netlink: cmp: shift rhs constant if lhs offset doesn't start on byte boundary
if we have payload(someoffset) == 42, then shift 42 in case someoffset
doesn't start on a byte boundary.
We already insert a mask instruction to only load those bits into
the register that we were interested in, but the cmp will fail without
also adjusting rhs accordingly.
Needs additional patch in reverse direction to undo the shift again
when dumping ruleset.
payload: disable payload merge if offsets are not on byte boundary.
... because it doesn't work, we attempt to merge it into wrong
place, we would have to merge the second value at a specific location.
F.e. vlan hdr 4094 gives us
0xfe0f
Merging in the CFI should yield 0xfe1f, but the constant merging
doesn't know how to achive that; at the moment 'vlan id 4094'
and 'vlan id 4094 vlan cfi 1' give same result -- 0xfe0f.
For now just turn off the optimization step unless everything is
byte divisible (the common case).
nft: allow stacking vlan header on top of ethernet
currently 'vlan id 42' or even 'vlan type ip' doesn't work since
we expect ethernet header but get vlan.
So if we want to add another protocol header to the same base, we
attempt to figure out if the new header can fit on top of the existing
one (i.e. proto_find_num gives a protocol number when asking to find
link between the two).
We also annotate protocol description for eth and vlan with the full
header size and track the offset from the current base.
Otherwise, 'vlan type ip' fetches the protocol field from mac header
offset 0, which is some mac address.
Instead, we must consider full size of ethernet header.
Pablo reported test failures because the order of returned set entries
is not deterministic.
This sorts set elements before comparision.
Patrick suggested to move ordering into libnftnl (since we could f.e.
also get duplicate entries due to how netlink dumps work), but thats a bit
more work. Hence this quick workaround.
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
Adapt the nftables code to use the new symbols in libnftnl. This patch contains
quite some renaming to reserve the nft_ prefix for our high level library.
Explicitly request libnftnl 1.0.5 at configure stage.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: use existing table object from evaluation context
Skip table object lookup if we are in the context of table declaration already,
ctx->table already points to the right table we have to use during the
evalution. Otherwise, a list corruption occurs when using the wrong table
object when it already exists in the kernel.
mnl: rework netlink socket receive path for events
This patch reworks two aspects of the netlink socket event receive path:
1) In case of ENOBUFS, stay in the loop to keep receiving messages. The tool
displays a message so the user knows that we got lost event messages.
2) Rise the default size of the receive socket buffer up to 16 MBytes to reduce
chances of hitting ENOBUFS. Asumming that the netlink event message size is
~150 bytes, we can bear with ~111848 rules without message loss.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tests: display error when trying to run tests out of the root directory
Since 357d8cfcceb2 ("tests: use the src/nft binary instead of $PATH one"), the
tests fail if you try to run them if you are not under the root directory of
the nftables repository.
Display an error so I don't forget I have to do it like this.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tests: sets: don't include listing in payload tests
Since e715f6d1241c ("netlink: don't call netlink_dump_*() from listing
functions with --debug=netlink"), there is no debugging from the listing path.
Thus, we can remove the set line from the test files.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netlink: don't call netlink_dump_*() from listing functions with --debug=netlink
Now that we always retrieve the object list to build a cache before executing
the command, this results in fully listing of existing objects in the kernel.
This is confusing when adding a simple rule, so better not to call
netlink_dump_*() from listing functions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
evaluate: display error on unexisting chain when listing
nft list chain ip test output
<cmdline>:1:1-25: Error: Could not process rule: Chain 'output' does not exist
list chain ip test output
^^^^^^^^^^^^^^^^^^^^^^^^^
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The only remaining caller that needs this is netlink_dump_ruleset(), that is
used to export the ruleset using markup representation. We can remove it and
handle this from do_command_export() now that we have a centralized point to
build up the object cache.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds set objects to the cache if they don't exist in the kernel, so
they can be referenced from this batch. This occurs from the evaluation step.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch populates the cache only once through netlink_list_sets() during
evaluation. As a result, there is a single call to netlink_list_sets().
After this change, we can rid of get_set(). This function was fine by the time
we had no transaction support, but this doesn't work for set objects that are
declared in this batch, so inquiring the kernel doesn't help since they are not
yet available.
As a result from this update, the monitor code gets simplified quite a lot
since it can rely of the set cache. Moreover, we can now validate that the
table and set exists from evaluation path.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add declared table objects to the cache, thus we can refer to objects that
come in this batch but that are not yet available in the kernel. This happens
from the evaluation step.
Get rid of code that is doing this from the later do_command_*() stage.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
.. to make sure that later support to match header elements that have odd
(non-byte aligned) lengths/offsets doesn't erronously eliminate explicitly
added binops while searching expressions for implicit binops.
netlink_delinearize: meta l4proto range printing broken on 32bit
Florian Westphal says:
09565a4b1ed4863d44c4509a93c50f44efd12771 ("netlink_delinearize: consolidate
range printing") causes nft to segfault on 32bit machine when printing l4proto
ranges.
The problem is that meta_expr_pctx_update() assumes that right is a value, but
after this change it can also be a range.
Thus, expr->value contents are undefined (its union). On x86_64 this is also
broken but by virtue of struct layout and pointer sizes, value->_mp_size will
almost always be 0 so mpz_get_uint8() returns 0.
But on x86-32 _mp_size will be huge value (contains expr->right pointer of
range), so we crash in libgmp.
Pablo says:
We shouldn't call pctx_update(), before the transformation we had
there a expr->op == { OP_GT, OP_GTE, OP_LT, OP_LTE }. So we never
entered that path as the assert in payload_expr_pctx_update()
indicates.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Florian Westphal <fw@strlen.de>
Note that nft_parse() returns 1 on parsing errors and 0 + state->errs on
evaluation problems, so return -1 as other functions do here to pass up the
error to the main routine.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Fri, 26 Jun 2015 08:50:44 +0000 (10:50 +0200)]
datatype: avoid crash in debug mode when printing integers
nft -i --debug=all
nft> add rule ip filter foo mark 42
dies with sigfpe; seems mpz doesn't like len 0:
#1 0x0805f2ee in mpz_export_data (data=0xbfeda588, op=0x9d9fb08, byteorder=BYTEORDER_HOST_ENDIAN, len=0) at gmputil.c:115
After patch this prints 0x0000002a.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric Leblond [Wed, 24 Jun 2015 07:51:49 +0000 (09:51 +0200)]
erec: fix buffer overflow
A static array was used to read data and to write information in
it without checking the limit of the array. The result was a buffer
overflow when the line was longer than 1024.
This patch now uses a allocated buffer to avoid the problem.
Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds a routine to the postprocess stage to check if the previous
expression statement and the current actually represent a range, so we can
provide a more compact listing, eg.
# nft -nn list table test
table ip test {
chain test {
tcp dport 22
tcp dport 22-23
tcp dport != 22-23
ct mark != 0x00000016-0x00000017
ct mark 0x00000016-0x00000017
mark 0x00000016-0x00000017
mark != 0x00000016-0x00000017
}
}
To do so, the context state stores a pointer to the current statement. This
pointer needs to be invalidated in case the current statement is replaced.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sun, 12 Apr 2015 20:10:42 +0000 (21:10 +0100)]
netlink_delinearize: handle relational and lookup concat expressions
When the RHS length differs from the LHS length (which is only the
first expression), both expressions are assumed to be concat expressions.
The LHS concat expression is reconstructed from the available register
values, advancing by the number of registers required by the subexpressions'
register space, until the RHS length has been reached.
The RHS concat expression is reconstructed by splitting the data value
into multiple subexpressions based on the LHS concat expressions types.
Patrick McHardy [Sun, 12 Apr 2015 20:10:41 +0000 (21:10 +0100)]
netlink_linearize: use NFT_REG32 values internally
Prepare netlink_linearize for 32 bit register usage:
Switch to use 16 data registers of 32 bit each. A helper function takes
care of mapping the registers to the NFT_REG32 values and, if the
register refers to the beginning of an 128 bit area, the old NFT_REG_1-4
values for compatibility.
New register reservation and release helper function take the size into
account and reserve the required amount of registers.
The reservation and release functions will so far still always allocate
128 bit. If no other expression in a rule uses a 32 bit register directly,
these will be mapped to the old register values, meaning everything
continues to work with old kernel versions.
Patrick McHardy [Tue, 2 Jun 2015 10:16:42 +0000 (12:16 +0200)]
eval: prohibit variable sized types in concat expressions
Since we need to calculate the length of the entire concat type, we can
not support variable sized types where the length can't be determined
by the type.
This only affects base types since all higher types include a length.
Patrick McHardy [Tue, 2 Jun 2015 10:53:11 +0000 (12:53 +0200)]
netlink_delinearize: remove obsolete fixme
The FIXME was related to exclusion of string types from cmp length checks.
Since with fixed sized helper names the last case where this could happen
is gone, remove the FIXME and perform length checks on strings as well.
Patrick McHardy [Tue, 2 Jun 2015 10:53:10 +0000 (12:53 +0200)]
ct: add maximum helper length value
The current kernel restricts ct helper names to 16 bytes length. Specify
this limit in the ct expression table to catch oversized strings in userspace.
Since older versions of nft didn't support larger values, this does not
negatively affect interaction with old kernel versions.
Each batch page has a size of 320 Kbytes, and the limit has been set to 256
KBytes, so the overrun area is 64 KBytes long to accomodate the largest netlink
message (sets).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Thu, 19 Mar 2015 13:34:18 +0000 (13:34 +0000)]
setelem: add timeout support for set elements
Support specifying per element timeout values and displaying the expiration
time.
If an element should not use the default timeout value of the set, an
element specific value can be specified as follows:
# nft add element filter test { 192.168.0.1, 192.168.0.2 timeout 10m}
For listing of elements that use the default timeout value, just the
expiration time is shown, otherwise the element specific timeout value
is also displayed:
set test {
type ipv4_addr
timeout 1h
elements = { 192.168.0.2 timeout 10m expires 9m59s, 192.168.0.1 expires 59m59s}
}
Patrick McHardy [Sat, 11 Apr 2015 16:02:13 +0000 (17:02 +0100)]
expr: add set_elem_expr as container for set element attributes
Add a new expression type "set_elem_expr" that is used as container for
the key in order to attach different attributes, such as timeout values,
to the key.
Patrick McHardy [Sun, 12 Apr 2015 09:41:56 +0000 (10:41 +0100)]
parser: fix inconsistencies in set expression rules
Set keys are currently defined as a regular expr for pure sets and
map_lhs_expr for maps. map_lhs_expr is what can actually be used for
a single member, namely a concat_expr or a multiton_expr. The reason
why pure sets use expr for the key is to allow recursive set specifications,
which doesn't make sense for maps since every element needs a mapping.
However, the rule is too wide and also allows map expressions as a key,
which obviously doesn't make sense.
Rearrange the rules so we have:
set_lhs_expr: concat or multiton
set_rhs_expr: concat or verdict
and special case the recursive set specifications, as they deserve.
Besides making it a lot easier to understand what is actually supported,
this will be used by the following patch to support timeouts and comments
for keys in a uniform way.
Patrick McHardy [Sat, 11 Apr 2015 13:56:16 +0000 (14:56 +0100)]
datatype: less strict time parsing
Don't require hours to be in range 0-23 and minutes/seconds in range 0-59.
The time_type is used for relative times where it is entirely reasonable
to specify 180s instead of 3m.
nftables used to have a cache to speed up interface name <-> index lookup,
restore it using libmnl.
This reduces netlink traffic since if_nametoindex() and if_indextoname() open,
send a request, receive the list of interface and close a netlink socket for
each call. I think this is also good for consistency since nft -f will operate
with the same index number when reloading the ruleset.
The cache is populated by when nft_if_nametoindex() and nft_if_indextoname()
are used for first time. Then, it it released in the output path. In the
interactive mode, it is invalidated after each command.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Based on the existing netlink_open_error(), but indicate file and line
where the error happens. This will help us to diagnose what is going
wrong when users can back to us to report problems.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>