git.ipfire.org Git - thirdparty/nftables.git/log

tests/shell: check test names before start and support directories

Check for valid test names early. That's useful because we treat any
unrecognized options as test names. We should detect a mistake early.

While at it, also support specifying directory names instead of
executable files.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: rework finding tests and add "--list-tests" option

Cleanup finding the test files. Also add a "--list-tests" option to see
which tests are found and would run.

Also get rid of the FIND="$(which find)" detection. Which system doesn't
have a working find? Also, we can just fail when we try to use find, and
don't need a check first.

This is still after "unshare", which will be addressed next.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests/shell: rework command line parsing in "run-tests.sh"

Parse the arguments in a loop, so that their order does not matter.
Also, soon more command line arguments will be added, and this way of
parsing seems more maintainable and flexible.

Currently this is still after the is-root check and after unshare. That
will be addressed later.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: 0043concatenated_ranges_0: re-enable all tests

This script suppressed a few tests when ran via run-tests.sh,
don't do that, it would have caught the previous 'get' bug
years ago.

Signed-off-by: Florian Westphal <fw@strlen.de>

evaluate: fix get element for concatenated set

given:
table ip filter {
set test {
type ipv4_addr . ether_addr . mark
flags interval
elements = { 198.51.100.0/25 . 00:0b:0c:ca:cc:10-c1:a0:c1:cc:10:00 . 0x0000006f, }
}
}

We get lookup failure:

nft get element ip filter test { 198.51.100.1 . 00:0b:0c:ca:cc:10 . 0x6f }
Error: Could not process rule: No such file or directory

Its possible to work around this via dummy range somewhere in the key, e.g.
nft get element ip filter test { 198.51.100.1 . 00:0b:0c:ca:cc:10 . 0x6f-0x6f }

but that shouldn't be needed, so make sure the INTERVAL flag is enabled
for the queried element if the set is of interval type.

Signed-off-by: Florian Westphal <fw@strlen.de>

doc: describe behaviour of {ip,ip6} length

This field exposes internal kernel GRO/GSO packet aggregation
implementation details to userspace, provide a hint to the user to
understand better when matching on this field.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: revisit anonymous set with single element optimization

This patch reworks it to perform this optimization from the evaluation
step of the relational expression. Hence, when optimizing for protocol
flags, use OP_EQ instead of OP_IMPLICIT, that is:

tcp flags { syn }

becomes (to represent an exact match):

tcp flags == syn

given OP_IMPLICIT and OP_EQ are not equivalent for flags.

01167c393a12 ("evaluate: do not remove anonymous set with protocol flags
and single element") disabled this optimization, which is enabled again
after this patch.

Fixes: 01167c393a12 ("evaluate: do not remove anonymous set with protocol flags and single element")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: place byteorder conversion after numgen for IP address datatypes

The numgen extension generates numbers in little-endian.
This can be very tricky when trying to combine it with IP addresses, which use big endian.
This change adds a new byteorder operation to convert data type endianness.

Before this patch:
$ sudo nft -d netlink add rule nat snat_chain snat to numgen inc mod 7 offset 0x0a000001
ip nat snat_chain
  [ numgen reg 1 = inc mod 7 offset 167772161 ]
  [ nat snat ip addr_min reg 1 ]

After this patch:
$ sudo nft -d netlink add rule nat snat_chain snat to numgen inc mod 7 offset 0x0a000001
ip nat snat_chain
  [ numgen reg 1 = inc mod 7 offset 167772161 ]
  [ byteorder reg 1 = hton(reg 1, 4, 4) ]
  [ nat snat ip addr_min reg 1 ]

Regression tests have been modified to include these new cases.

Signed-off-by: Jorge Ortiz Escribano <jorge.ortiz.escribano@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: set internal_location for table and chain

JSON parser does not seem to set on this, better provide a default
location.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: simplify chain_alloc()

Remove parameter to set the chain name which is only used from netlink
path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: remove check for NULL before calling expr_free()

expr_free() already handles NULL pointer, remove redundant check.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: use internal_location for unspecified location at allocation time

Set location to internal_location instead of NULL to ensure this is
always set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

xt: avoid "-Wmissing-field-initializers" for "original_opts"

Avoid this warning with clang:

      CC       src/xt.lo
    src/xt.c:353:9: error: missing field 'has_arg' initializer [-Werror,-Wmissing-field-initializers]
            { NULL },
                   ^

The warning seems not very useful, because it's well understood that
specifying only some initializers leaves the remaining fields
initialized with the default. However, as this warning is only hit once
in the code base, it doesn't seem that we violate this style frequently.
Hence, fix it instead of disabling the warning.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: silence "implicit-fallthrough" warnings

Gcc with "-Wextra" warns:

    CC       segtree.lo
  segtree.c: In function 'get_set_interval_find':
  segtree.c:129:28: error: this statement may fall through [-Werror=implicit-fallthrough=]
    129 |                         if (expr_basetype(i->key)->type != TYPE_STRING)
        |                            ^
  segtree.c:134:17: note: here
    134 |                 case EXPR_PREFIX:
        |                 ^~~~

    CC       optimize.lo
  optimize.c: In function 'rule_collect_stmts':
  optimize.c:396:28: error: this statement may fall through [-Werror=implicit-fallthrough=]
    396 |                         if (stmt->expr->left->etype == EXPR_CONCAT) {
        |                            ^
  optimize.c:400:17: note: here
    400 |                 case STMT_VERDICT:
        |                 ^~~~

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

utils: call abort() after BUG() macro

Otherwise, we get spurious warnings. The compiler should be aware that there is
no return from BUG(). Call abort() there, which is marked as __attribute__
((__noreturn__)).

    In file included from ./include/nftables.h:6,
                     from ./include/rule.h:4,
                     from src/payload.c:26:
    src/payload.c: In function 'icmp_dep_to_type':
    ./include/utils.h:39:34: error: this statement may fall through [-Werror=implicit-fallthrough=]
       39 | #define BUG(fmt, arg...)        ({ fprintf(stderr, "BUG: " fmt, ##arg); assert(0); })
          |                                 ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    src/payload.c:791:17: note: in expansion of macro 'BUG'
      791 |                 BUG("Invalid map for simple dependency");
          |                 ^~~
    src/payload.c:792:9: note: here
      792 |         case PROTO_ICMP_ECHO: return ICMP_ECHO;
          |         ^~~~

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

rule: fix "const static" declaration

Gcc warns against this with "-Wextra":

    src/rule.c:869:1: error: static is not at beginning of declaration [-Werror=old-style-declaration]
      869 | const static struct prio_tag std_prios[] = {
          | ^~~~~
    src/rule.c:878:1: error: static is not at beginning of declaration [-Werror=old-style-declaration]
      878 | const static struct prio_tag bridge_std_prios[] = {
          | ^~~~~

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: debloat frag.t.payload.netdev

This bytecode output file contains many duplicated entries, remove them.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: py: extend ip frag-off coverage

Cover matching on DF and MF bits and fragments.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

proto: use hexadecimal to display ip frag-off field

The ip frag-off field in the protocol definition is 16-bits long
and it contains the DF (0x2000) and MF (0x4000) bits too.

iptables-translate also suggests:

ip frag-off & 0x1ffff != 0

to match on fragments. Use hexadecimal for listing this header field.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: do not remove anonymous set with protocol flags and single element

Set lookups with flags search for an exact match, however:

tcp flags { syn }

gets transformed into:

tcp flags syn

which is matching on the syn flag only (non-exact match).

This optimization is safe for ct state though, because only one bit is
ever set on in the ct state bitmask.

Since protocol flags allow for combining flags, skip this optimization
to retain exact match semantics.

Another possible solution is to turn OP_IMPLICIT into OP_EQ for exact
flag match to re-introduce this optimization and deal with this corner
case.

Fixes: fee6bda06403 ("evaluate: remove anon sets with exactly one element")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

include: drop "format" attribute from nft_gmp_print()

nft_gmp_print() passes the format string and arguments to
gmp_vfprintf(). Note that the format string is then interpreted
by gmp, which also understand special specifiers like "%Zx".

Note that with clang we get various compiler warnings:

  datatype.c:299:26: error: invalid conversion specifier 'Z' [-Werror,-Wformat-invalid-specifier]
          nft_gmp_print(octx, "0x%Zx [invalid type]", expr->value);
                                 ~^

gcc doesn't warn, because to gcc 'Z' is a deprecated alias for 'z' and
because the 3rd argument of the attribute((format())) is zero (so gcc
doesn't validate the arguments). But Z specifier in gmp expects a
"mpz_t" value and not a size_t. It's really not the same thing.

The correct solution is not to mark the function to accept a printf format
string.

Fixes: 2535ba7006f2 ('src: get rid of printf')
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: suppress "-Wunused-but-set-variable" warning with "parser_bison.c"

Clang warns:

    parser_bison.c:7606:9: error: variable 'nft_nerrs' set but not used [-Werror,-Wunused-but-set-variable]
        int yynerrs = 0;
            ^
    parser_bison.c:72:25: note: expanded from macro 'yynerrs'
    #define yynerrs         nft_nerrs
                            ^

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: don't needlessly clear full string buffer in stmt_evaluate_log_prefix()

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: rework SNPRINTF_BUFFER_SIZE() and handle truncation

Before, the macro asserts against truncation. This is despite the
callers still checked for truncation and tried to handle it. Probably
for good reason. With stmt_evaluate_log_prefix() it's not clear that the
code ensures that truncation cannot happen, so we must not assert
against it, but handle it.

Also,

- wrap the macro in "do { ... } while(0)" to make it more
  function-like.

- evaluate macro arguments exactly once, to make it more function-like.

- take pointers to the arguments that are being modified.

- use assert() instead of abort().

- use size_t type for arguments related to the buffer size.

- drop "size". It was mostly redundant to "offset". We can know
  everything we want based on "len" and "offset" alone.

- "offset" previously was incremented before checking for truncation.
  So it would point somewhere past the buffer. This behavior does not
  seem useful. Instead, on truncation "len" will be zero (as before) and
  "offset" will point one past the buffer (one past the terminating
  NUL).

Thereby, also fix a warning from clang:

    evaluate.c:4134:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable]
            size_t size = 0;
                   ^

    meta.c:1006:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable]
            size_t size;
                   ^

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: fix check for truncation in stmt_evaluate_log_prefix()

Otherwise, nft crashes with prefix longer than 127 bytes:

# nft add rule x y log prefix \"eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee\"

==159385==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffed5bf4a10 at pc 0x7f3134839269 bp 0x7ffed5bf48b0 sp 0x7ffed5bf4060
WRITE of size 129 at 0x7ffed5bf4a10 thread T0
    #0 0x7f3134839268 in __interceptor_memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:778
    #1 0x7f3133e3074e in __mpz_export_data /tmp/nftables/src/gmputil.c:110
    #2 0x7f3133d21d3c in expr_to_string /tmp/nftables/src/expression.c:192
    #3 0x7f3133ded103 in netlink_gen_log_stmt /tmp/nftables/src/netlink_linearize.c:1148
    #4 0x7f3133df33a1 in netlink_gen_stmt /tmp/nftables/src/netlink_linearize.c:1682
    [...]

Fixes: e76bb3794018 ('src: allow for variables in the log prefix string')
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

datatype: avoid cast-align warning with struct sockaddr result from getaddrinfo()

With CC=clang we get

    datatype.c:625:11: error: cast from 'struct sockaddr *' to 'struct sockaddr_in *' increases required alignment from 2 to 4 [-Werror,-Wcast-align]
                    addr = ((struct sockaddr_in *)ai->ai_addr)->sin_addr;
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    datatype.c:690:11: error: cast from 'struct sockaddr *' to 'struct sockaddr_in6 *' increases required alignment from 2 to 4 [-Werror,-Wcast-align]
                    addr = ((struct sockaddr_in6 *)ai->ai_addr)->sin6_addr;
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    datatype.c:826:11: error: cast from 'struct sockaddr *' to 'struct sockaddr_in *' increases required alignment from 2 to 4 [-Werror,-Wcast-align]
                    port = ((struct sockaddr_in *)ai->ai_addr)->sin_port;
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix that by casting to (void*) first. Also, add an assertion that the
type is as expected.

For inet_service_type_parse(), differentiate between AF_INET and
AF_INET6. It might not have been a problem in practice, because the
struct offsets of sin_port/sin6_port are identical.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: avoid "-Wenum-conversion" warning in parser_bison.y

Clang warns:

    parser_bison.y:3658:83: error: implicit conversion from enumeration type 'enum nft_nat_types' to different enumeration type 'enum nft_nat_etypes' [-Werror,-Wenum-conversion]
                                            { (yyval.stmt) = nat_stmt_alloc(&(yyloc), NFT_NAT_SNAT); }
                                                             ~~~~~~~~~~~~~~           ^~~~~~~~~~~~
    parser_bison.y:3659:83: error: implicit conversion from enumeration type 'enum nft_nat_types' to different enumeration type 'enum nft_nat_etypes' [-Werror,-Wenum-conversion]
                                            { (yyval.stmt) = nat_stmt_alloc(&(yyloc), NFT_NAT_DNAT); }
                                                             ~~~~~~~~~~~~~~           ^~~~~~~~~~~~

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: avoid "-Wenum-conversion" warning in dtype_map_from_kernel()

Clang warns:

    netlink.c:806:26: error: implicit conversion from enumeration type 'enum nft_data_types' to different enumeration type 'enum datatypes' [-Werror,-Wenum-conversion]
                    return datatype_lookup(type);
                           ~~~~~~~~~~~~~~~ ^~~~

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests/shell: expand vmap test case to also cause batch abort

Let the last few batches also push an update that contains
elements twice.

This is expected to cause the batch to be aborted,
which increases code coverage on kernel side.

Signed-off-by: Florian Westphal <fw@strlen.de>

parser: permit gc-interval in map declarations

Maps support timeouts, so allow to set the gc interval as well.

Fixes: 949cc39eb93f ("parser: support of maps with timeout")
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: monitor: Fix for wrong ordering in expected JSON output

Adjust JSON for delete before add for replace after respective kernel
fix, too.

Fixes: ba786ac758fba ("tests: monitor: update insert and replace commands")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: monitor: Fix for wrong syntax in set-interval.t

Expected JSON output must be prefixed 'J'.

Fixes: 7ab453a033c9a ("monitor: do not call interval_map_decompose() for concat intervals")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: monitor: Fix time format in ct timeout test

The old "plain" numbers are still accepted (and assumed to be in
seconds), but output will use units which is unexpected due to 'O -'.
Adjust input instead of adding this subtly different output line.

Fixes: 5c25c5a35cbd2 ("parser: allow ct timeouts to use time_spec values")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: monitor: Fix monitor JSON output for insert command

Looks like commit ba786ac758fba ("tests: monitor: update insert and
replace commands") missed to also fix expected JSON output.

Fixes: 48d20b8cf162e ("monitor: honor NLM_F_APPEND flag for rules")
Signed-off-by: Phil Sutter <phil@nwl.cc>

evaluate: Drop dead code from expr_evaluate_mapping()

Since commit 343a51702656a ("src: store expr, not dtype to track data in
sets"), set->data is allocated for object maps in set_evaluate(), all
other map types have set->data initialized by the parser already,
set_evaluate() also checks that.

Drop the confusing check, later in the function set->data is
dereferenced unconditionally.

Fixes: 343a51702656a ("src: store expr, not dtype to track data in sets")
Signed-off-by: Phil Sutter <phil@nwl.cc>

configure: drop AM_PROG_CC_C_O autoconf check

This macro is obsolete since automake 1.14 (2013). It might have been
unnecessary even before, in practice only gcc/clang are supported
compilers.

[1] https://www.gnu.org/software/automake/manual/html_node/Public-Macros.html#index-AM_005fPROG_005fCC_005fC_005fO
[2] https://lists.gnu.org/archive/html/info-gnu/2013-06/msg00009.html

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

include: include <std{bool,int}.h> via <nft.h>

There is a minimum base that all our sources will end up needing. This
is what <nft.h> provides.

Add <stdbool.h> and <stdint.h> there. It's unlikely that we want to
implement anything, without having "bool" and "uint32_t" types
available.

Yes, this means the internal headers are not self-contained, with
respect to what <nft.h> provides. This is the exception to the rule, and
our internal headers should rely to have <nft.h> included for them.
They should not include <nft.h> themselves, because <nft.h> needs always
be included as first. So when an internal header would include <nft.h>
it would be unnecessary, because the header is *always* included
already.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

configure: use AC_USE_SYSTEM_EXTENSIONS to get _GNU_SOURCE

Let "configure" detect which features are available. Also, nftables is a
Linux project, so portability beyond gcc/clang and glibc/musl is less
relevant. And even if it were, then feature detection by "configure"
would still be preferable.

Use AC_USE_SYSTEM_EXTENSIONS ([1]).

Available since autoconf 2.60, from 2006 ([2]).

[1] https://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/Posix-Variants.html#index-AC_005fUSE_005fSYSTEM_005fEXTENSIONS-1046
[2] https://lists.gnu.org/archive/html/autoconf/2006-06/msg00111.html

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

include: don't define _GNU_SOURCE in public header

_GNU_SOURCE is supposed to be defined as first thing, before including any
libc headers. Defining it in the public header of nftables is wrong, because
it would only (somewhat) work if the user includes the nftables header as first
thing too. But that is not what users commonly would do, in particular with
autotools projects, where users would include <config.h> first.

It's also unnecessary. Nothing in "nftables/libnftables.h" itself
requires _GNU_SOURCE. Drop it.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add <nft.h> header and include it as first

<config.h> is generated by the configure script. As it contains our
feature detection, it want to use it everywhere.

Likewise, in some of our sources, we define _GNU_SOURCE. This defines
the C variant we want to use. Such a define need to come before anything
else, and it would be confusing if different source files adhere to a
different C variant. It would be good to use autoconf's
AC_USE_SYSTEM_EXTENSIONS, in which case we would also need to ensure
that <config.h> is always included as first.

Instead of going through all source files and include <config.h> as
first, add a new header "include/nft.h", which is supposed to be
included in all our sources (and as first).

This will also allow us later to prepare some common base, like include
<stdbool.h> everywhere.

We aim that headers are self-contained, so that they can be included in
any order. Which, by the way, already didn't work because some headers
define _GNU_SOURCE, which would only work if the header gets included as
first. <nft.h> is however an exception to the rule: everything we compile
shall rely on having <nft.h> header included as first. This applies to
source files (which explicitly include <nft.h>) and to internal header
files (which are only compiled indirectly, by being included from a source
file).

Note that <config.h> has no include guards, which is at least ugly to
include multiple times. It doesn't cause problems in practice, because
it only contains defines and the compiler doesn't warn about redefining
a macro with the same value. Still, <nft.h> also ensures to include
<config.h> exactly once.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

meta: define _GNU_SOURCE to get strptime() from <time.h>

To use `strptime()`, the documentation indicates

  #define _XOPEN_SOURCE
  #include <time.h>

However, previously this was done wrongly.

For example, when building with musl we got a warning:

      CC       meta.lo
    meta.c:40: warning: "_XOPEN_SOURCE" redefined
       40 | #define _XOPEN_SOURCE
          |
    In file included from /usr/include/errno.h:8,
                     from meta.c:13:
    /usr/include/features.h:16: note: this is the location of the previous definition
       16 | #define _XOPEN_SOURCE 700
          |

Defining "__USE_XOPEN" is wrong. This is a glibc internal define not for
the user.

Note that if we just set _XOPEN_SOURCE (or _XOPEN_SOURCE=700), we won't
get other things like "struct tm.tm_gmtoff".

Instead, we already define _GNU_SOURCE at other places. Do that here
too, it will give us strptime() and all is good.

Also, those directives should be defined as first thing (or via "-D"
command line). See [1].

This is also important, because to use "time_t" in a header, we would
need to include <time.h>. That only works, if we get the feature test
macros right. That is, define the _?_SOURCE macro as first thing.

[1] https://www.gnu.org/software/libc/manual/html_node/Feature-Test-Macros.html

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: 30s-stress: add failslab and abort phase tests

Pablo suggested to also cover abort phase by intentionally deleting
non-existent or adding clashing keys.

While at it:
add rules with anon sets and jumps to anonymous chains and a few
constant sets.

Pick different key sizes so there is a higher chance kernel picks
different backend storages such as bitmap or hash_fast.

add failslab support, this also covers unlikely or "impossible" cases like
failing GFP_KERNEL allocations.

randomly spawn 'nft monitor' in the background for a random duration
to cover notification path.

Try to randomly delete a set or chain from control plane.

Randomly set a table as dormant (and back to normal).

Allow to pass the test runtime as argument, so one can now do

./30s-stress 3600

to have the test run for one hour.

For such long test durations, make sure the ruleset
gets regenerated periodically.

Signed-off-by: Florian Westphal <fw@strlen.de>

py: add Nftables.{get,set}_input_flags() API

Similar to the existing Nftables.{get,set}_debug() API.

Only notable (internal) difference is that nft_ctx_input_set_flags()
returns the old value already, so we don't need to call
Nftables.get_input_flags() first.

The benefit of this API, is that it follows the existing API for debug
flags. Also, when future flags are added it requires few changes to the
python code.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

py: extract flags helper functions for set_debug()/get_debug()

Will be re-used for nft_ctx_input_set_flags() and
nft_ctx_input_get_flags().

There are changes in behavior here.

- when passing an unrecognized string (e.g. `ctx.set_debug('foo')` or
  `ctx.set_debug(['foo'])`), a ValueError is now raised instead of a
  KeyError.

- when passing an out-of-range integer, now a ValueError is no raised.
  Previously the integer was truncated to 32bit.

Changing the exception is an API change, but most likely nobody will
care or try to catch a KeyError to find out whether a flag is supported.
Especially, since such a check would be better performed via `'foo' in
ctx.debug_flags`.

In other cases, a TypeError is raised as before.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

py: fix exception during cleanup of half-initialized Nftables

When we create a Nftables instance against an older library version,
we might not find a symbol and fail with an exception when initializing
the context object.

Then, __del__() is still called, but resulting in a second exception
because self.__ctx is not set. Avoid that second exception.

    $ python -c 'import nftables; nftables.Nftables()'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/data/src/nftables/py/nftables.py", line 90, in __init__
        self.nft_ctx_input_get_flags = lib.nft_ctx_input_get_flags
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.11/ctypes/__init__.py", line 389, in __getattr__
        func = self.__getitem__(name)
               ^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.11/ctypes/__init__.py", line 394, in __getitem__
        func = self._FuncPtr((name_or_ordinal, self))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    AttributeError: /lib64/libnftables.so.1: undefined symbol: nft_ctx_input_get_flags
    Exception ignored in: <function Nftables.__del__ at 0x7f6315a2c540>
    Traceback (most recent call last):
      File "/data/src/nftables/py/nftables.py", line 166, in __del__
        self.nft_ctx_free(self.__ctx)
        ^^^^^^^^^^^^^^^^^
    AttributeError: 'Nftables' object has no attribute 'nft_ctx_free'

Signed-off-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add input flag NFT_CTX_INPUT_JSON to enable JSON parsing

By default, the input is parsed using the nftables grammar. When setting
NFT_CTX_OUTPUT_JSON flag, nftables will first try to parse the input as
JSON before falling back to the nftables grammar.

But NFT_CTX_OUTPUT_JSON flag also turns on JSON for the output. Add a
flag NFT_CTX_INPUT_JSON which allows to treat only the input as JSON,
but keep the output mode unchanged.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add input flag NFT_CTX_INPUT_NO_DNS to avoid blocking

getaddrinfo() blocks while trying to resolve the name. Blocking the
caller of the library is in many cases undesirable. Also, while
reconfiguring the firewall, it's not clear that resolving names via
the network will work or makes sense.

Add a new input flag NFT_CTX_INPUT_NO_DNS to opt-out from getaddrinfo()
and only accept plain IP addresses.

We could also use AI_NUMERICHOST with getaddrinfo() instead of
inet_pton(). By parsing via inet_pton(), we are better aware of
what we expect and can generate a better error message in case of
failure.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

src: add input flags for nft_ctx

Similar to the existing output flags, add input flags. No flags are yet
implemented, that will follow.

One difference to nft_ctx_output_set_flags(), is that the setter for
input flags returns the previously set flags.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

gitignore: ignore cscope files

Cscope is useful. Ignore the files it creates.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: use minutes granularity in sets/0036add_set_element_expiration_0

Use minute granularity to fix bogus failures of this test on slow testbed.

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: error out on meter overlap with an existing set/map declaration

One of the problems with meters is that they use the set/map
infrastructure behind the scenes which might be confusing to users.

This patch errors out in case user declares a meter whose name overlaps
with an existing set/map:

meter.nft:15:18-91: Error: File exists; meter ‘syn4-meter’ overlaps an existing set ‘syn4-meter’ in family inet
tcp dport 22 meter syn4-meter { ip saddr . tcp dport timeout 5m limit rate 20/minute } counter accept
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

An old 5.10 kernel bails out simply with EEXIST, with this patch a
better hint is provided.

Dynamic sets are preferred over meters these days.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cache: chain listing implicitly sets on terse option

If user specifies a chain to be listed (which is internally handled via
filtering options), then toggle NFT_CACHE_TERSE to skip fetching set
content from kernel for non-anonymous sets.

With a large IPv6 set with bogons, before this patch:

# time nft list chain inet raw x
table inet raw {
        chain x {
                ip6 saddr @bogons6
                ip6 saddr { aaaa::, bbbb:: }
        }
}

real    0m2,913s
user    0m1,345s
sys     0m1,568s

After this patch:

# time nft list chain inet raw prerouting
table inet raw {
        chain x {
                ip6 saddr @bogons6
                ip6 saddr { aaaa::, bbbb:: }
        }
}

real    0m0,056s
user    0m0,018s
sys     0m0,039s

This speeds up chain listing in the presence of a large set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: Stabilize sets/0043concatenated_ranges_0 test

On a slow system, one of the 'delete element' commands would
occasionally fail. Assuming it can only happen if the 2s timeout passes
"too quickly", work around it by adding elements with a 2m timeout
instead and when wanting to test the element expiry just drop and add
the element again with a short timeout.

Fixes: 6231d3fa4af1e ("tests: shell: Fix for unstable sets/0043concatenated_ranges_0")
Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: shell: Stabilize sets/reset_command_0 test

Timeout/expiry value testing based on seconds is way too fragile,
especially with slow debug kernels. Rewrite the unit to test
minute-based values. This means it is no longer feasible to wait for
values to sufficiently change, so instead specify an 'expires' value
when creating the ruleset and drop the 'sleep' call.

While being at it:

- Combine 'get element' and 'reset element' calls into one, assert the
relevant (sanitized) line appears twice in output instead of comparing
with 'diff'.
- Turn comments into 'echo' calls to help debugging if the test fails.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Phil Sutter <phil@nwl.cc>

meta: use reentrant localtime_r()/gmtime_r() functions

These functions are POSIX.1-2001. We should have them in all
environments we care about.

Use them as they are thread-safe.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

meta: don't assume time_t is 64 bit in date_type_print()

time_t on 32bit arch is not uint64_t. Even if it always were, it would
be ugly to make such an assumption (without a static assert). Copy the
value to a time_t variable first.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

INSTALL: provide examples to install python bindings

Provide a bit more details on how to install python bindings with legacy
setup.py and pip with .toml file.

Original text from Jeremy Sowden <jeremy@azazel.net>.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

nftutils: add and use wrappers for getprotoby{name,number}_r(), getservbyport_r()

We should aim to use the thread-safe variants of getprotoby{name,number}
and getservbyport(). However, they may not be available with other libc,
so it requires a configure check. As that is cumbersome, add wrappers
that do that at one place.

These wrappers are thread-safe, if libc provides the reentrant versions.
Use them.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

json: use strtok_r() instead of strtok()

strtok_r() is probably(?) everywhere available where we care.
Use it. It is thread-safe, and libnftables shouldn't make
assumptions about what other threads of the process are doing.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: update bad_expression test case

Check that the ruleset also fails to validate if there is
another table that passes validation checks.

Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

tests: add table validation check

Pablo noticed problems with commit validation, investigation
shows nfnetlink can retry forever in infinite -EAGAIN cycle,
test for this.

The process is killable, this only hogs cpu. Add a test for this.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: add one more chain jump in vmap test

This triggers a splat on kernels that lack
314c82841602 ("netfilter: nf_tables: can't schedule in nft_chain_validate").

There is another test case that triggers this splat
(optimize/ruleset), but that test uses some more advanced
features that don't exist on older kernels, so the splat is never seen.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: add transaction stress test with parallel delete/add/flush and netns deletion

Based on nft_trans_stress.sh from kernel selftests, changed to run from
run-tests.sh, plus additional ideas from Pablo Neira, such as del+readd
of the netns.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: add test with concatenation, vmap and timeout

Add 4k elements to map, with timeouts in range 1..3s, also add a
catchall element with timeout.

Check that all elements are no longer included in set list after 4s.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: add test case for double-deactivation

Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

doc: move man-pages to `MAINTAINERCLEANFILES`

Since the man-pages are built and included in the distribution
tar-balls, the appropriate clean target is `maintainer-clean`.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

doc: move man-pages to `dist_man_MANS`

Removes the need to add them to `EXTRA_DIST`.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: Review test-cases for destroy command

Having separate files for successful destroy of existing and
non-existing objects is a bit too much, just combine them into one.

While being at it:

* No bashisms, using /bin/sh is fine
* Append '-e' to shebang itself instead of calling 'set'
* Use 'nft -a -e' instead of assuming the created rule's handle value
* Shellcheck warned about curly braces, quote them

Signed-off-by: Phil Sutter <phil@nwl.cc>

parser: deduplicate map with data interval

Its copypasted, the copy is same as original
except that it specifies a map key that maps to an interval.

Add an exra rule that returns 0 or EXPR_F_INTERVAL, then
use that in a single rule.

Signed-off-by: Florian Westphal <fw@strlen.de>

parser: allow ct timeouts to use time_spec values

For some reason the parser only allows raw numbers (seconds)
for ct timeouts, e.g.

ct timeout ttcp {
protocol tcp;
policy = { syn_sent : 3, ...

Also permit time_spec, e.g. "established : 5d".
Print the nicer time formats on output, but retain
raw numbers support on input for compatibility.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: add dynmap datapath add/delete test case

Signed-off-by: Florian Westphal <fw@strlen.de>

doc: document add chain device parameter

nft add chain lacked documentation of its optional device parameter,
specifically what values the parameter accepted, what it did and
when to use it.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1093
Suggested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Brennan Paciorek <bpaciore@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

py: add pyproject.toml to support PEP-517-compatible build-systems

This makes it possible to build and install the module without directly
invoking setup.py which has been deprecated.

Retain the setup.py script for backwards-compatibility.

Update INSTALL to mention the new config-file.

Link: https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

py: use setup.cfg to configure setuptools

Setuptools has had support for declarative configuration for several
years.  To quote their documentation:

  Setuptools allows using configuration files (usually setup.cfg) to
  define a package’s metadata and other options that are normally
  supplied to the setup() function (declarative config).

  This approach not only allows automation scenarios but also reduces
  boilerplate code in some cases.

Additionally, this allows us to introduce support for PEP-517-compatible
build-systems.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

py: move package source into src directory

Separate the actual package source from the build files. In addition
to being a bit tidier, this will prevent setup.py being erroneously
installed when we introduce PEP-517 support in a later commit.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: fix inet nat prio tests

Its legal to DNAT in output and SNAT in input chain, so don't test
for that being illegal.

Fixes: 8beafab74c39 ("rule: allow src/dstnat prios in input and output")
Fixes: 34ce4e4a7bb6 ("test: shell: Test cases for standard chain prios")
Signed-off-by: Florian Westphal <fw@strlen.de>

libnftables: Drop cache in -c/--check mode

Extend e0aace943412 ("libnftables: Drop cache in error case") to also
drop the cache with -c/--check, this is a dry run mode and kernel does
not get any update.

This fixes a bug with -o/--optimize, which first runs in an implicit
-c/--check mode to validate that the ruleset is correct, then it
provides the proposed optimization. In this case, if the cache is not
emptied, old objects in the cache refer to scanner data that was
already released, which triggers BUG like this:

BUG: invalid input descriptor type 151665524
nft: erec.c:161: erec_print: Assertion `0' failed.
Aborted

This bug was triggered in a ruleset that contains a set for geoip
filtering. This patch also extends tests/shell to cover this case.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ct expectation: fix 'list object x' vs. 'list objects in table' confusion

Just like "ct timeout", "ct expectation" is in need of the same fix,
we get segfault on "nft list ct expectation table t", if table t exists.

This is the exact same pattern as resolved for "ct timeout" in commit
1d2e22fc0521 ("ct timeout: fix 'list object x' vs. 'list objects in table' confusion").

Signed-off-by: Florian Westphal <fw@strlen.de>

rule: allow src/dstnat prios in input and output

Dan Winship says:

The "dnat" command is usable from either "prerouting" or "output", but the
"dstnat" priority is only usable from "prerouting". (Likewise, "snat" is usable
from either "postrouting" or "input", but "srcnat" is only usable from
"postrouting".)

No need to restrict those priorities to pre/postrouting.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1694
Signed-off-by: Florian Westphal <fw@strlen.de>

py: remove setup.py integration with autotools

With Python distutils and setuptools going deprecated, remove
integration with autotools. This integration is causing issues
in modern environments.

Note that setup.py is still left in place under the py/ folder.

Update INSTALL file to refer to Python support and setup.py.

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netlink: delinearize: copy set keytype if needed

Output before:
add @dynmark { 0xa020304 [invalid type] timeout 1s : 0x00000002 } comment "also check timeout-gc"

after:
add @dynmark { 10.2.3.4 timeout 1s : 0x00000002 } comment "also check timeout-gc"

This is a followup to 76c358ccfea0 ("src: maps: update data expression dtype based on set"),
which did fix the map expression, but not the key.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: shell: extend implicit chain map with flush command

Add a rule flush command.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

update INSTALL file

Update it to current library dependencies and existing options.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

tests: shell: auto-run kmemleak if its available

On my test vm a full scan takes almost 5s. As this would slowdown
the test runs too much, only run them every couple of tests.

This allows to detect when there is a leak reported at the
end of the script, and it allows to narrow down the test
case/group that triggers the issue.

Add new -K flag to force kmemleak runs after each test if its
available, this can then be used to find the exact test case.

Signed-off-by: Florian Westphal <fw@strlen.de>

tests: monitor: Summarize failures per test case

Explicitly print when tests from a file fail in addition to the diff +
"output differs" message.

Signed-off-by: Phil Sutter <phil@nwl.cc>

meta: stash context statement length when generating payload/meta dependency

... meta mark set ip dscp

generates an implicit dependency from the inet family to match on meta
nfproto ip.

The length of this implicit expression is incorrectly adjusted to the
statement length, ie. relational to compare meta nfproto takes 4 bytes
instead of 1 byte. The evaluation of 'ip dscp' under the meta mark
statement triggers this implicit dependency which should not consider
the context statement length since it is added before the statement
itself.

This problem shows when listing the ruleset, since netlink_parse_cmp()
where left->len < right->len, hence handling the implicit dependency as
a concatenation, but it is actually a bug in the evaluation step that
leads to incorrect bytecode.

Fixes: 3c64ea7995cb ("evaluate: honor statement length in integer evaluation")
Fixes: edecd58755a8 ("evaluate: support shifts larger than the width of the left operand")
Tested-by: Brian Davidson <davidson.brian@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

py: return boolean value from Nftables.__[gs]et_output_flag()

The callers of __get_output_flag() and __set_output_flag(), for example
get_reversedns_output(), are all documented to return a "boolean" value.

Instead, they returned the underlying, non-zero flags value. That number
is not obviously useful to the caller, because there is no API so that
the caller could do anything with it (except evaluating it in a boolean
context). Adjust that, to match the documentation.

The alternative would be to update the documentation, to indicate that
the functions return a non-zero integer when the flag is set. That would
preserve the previous behavior and maybe the number could be useful
somehow(?).

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>

exthdr: prefer raw_type instead of desc->type

On ancient kernels desc can be NULL, because such kernels do not
understand NFTA_EXTHDR_TYPE.

Thus they don't include it in the reverse dump, so the tcp/ip
option gets treated like an ipv6 exthdr, but no matching
description will be found.

This then gives a crash due to the null deref.

Just use the raw value here, this avoid a crash and at least
print *something*, e.g.:

unknown-exthdr unknown & 0xf0 [invalid type] == 0x0 [invalid type]

Signed-off-by: Florian Westphal <fw@strlen.de>

tests/build/run-tests.sh: fix issues reported by shellcheck

Improve a bit the script as reported by shellcheck, also including
information about the log file.

The log file, by the way, is added to the gitignore to reduce noise
in the git tree.

Signed-off-by: Arturo Borrero Gonzalez <arturo@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

build: Bump version to 1.0.8

Update dependency on libnftnl >= 1.2.6 which contains support for
meta broute.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

include: missing dccpopt.h breaks make distcheck

Add it to Makefile.am.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Implement 'reset {set,map,element}' commands

All these are used to reset state in set/map elements, i.e. reset the
timeout or zero quota and counter values.

While 'reset element' expects a (list of) elements to be specified which
should be reset, 'reset set/map' will reset all elements in the given
set/map.

Signed-off-by: Phil Sutter <phil@nwl.cc>

evaluate: Cache looked up set for list commands

Evaluation phase checks the given table and set exist in cache. Relieve
execution phase from having to perform the lookup again by storing the
set reference in cmd->set. Just have to increase the ref counter so
cmd_free() does the right thing (which lacked handling of MAP and METER
objects for some reason).

Signed-off-by: Phil Sutter <phil@nwl.cc>

evaluate: Merge some cases in cmd_evaluate_list()

The code for set, map and meter were almost identical apart from the
specific last check. Fold them together and make the distinction in that
spot only.

Signed-off-by: Phil Sutter <phil@nwl.cc>

tests: shell: cover old scanner bug

Add a test to cover 423abaa40ec4 ("scanner: don't rely on fseek for
input stream repositioning") that fixes the bug described in
https://bugs.gentoo.org/675188.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

libnftables: drop check for nf_sock in nft_ctx_free()

The "nft_ctx" API does not provide a way to change or reconnect the
netlink socket. And none of the users would rely on that.

Also note that nft_ctx_new() initializes nf_sock via
nft_mnl_socket_open(), which panics of the socket could not be
initialized.

This means, the check is unnecessary and needlessly confusing. Drop it.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

libnftables: inline creation of nf_sock in nft_ctx_new()

The function only has one caller. It's not clear how to extend this in a
useful way, so that it makes sense to keep the initialization in a
separate function.

Simplify the code, by inlining and dropping the static function
nft_ctx_netlink_init(). There was only one caller.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

libnftables: drop unused argument nf_sock from nft_netlink()

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

libnftables: always initialize netlink socket in nft_ctx_new()

nft_ctx_new() has a flags argument, but currently no flags are
supported. The documentation suggests to pass 0 (NFT_CTX_DEFAULT).

Initializing the netlink socket happens by default already, we should do
it for all flags. Also because nft_ctx_netlink_init() is not public
API so it's not clear how the user gets a functioning context instance
otherwise.

If we ever want to not initialize the netlink socket for a context
instance, then there should be a dedicated flag for doing that (and
additional API for making that mode of operation usable).

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

evaluate: place byteorder conversion before rshift in payload statement

For bitfield that spans more than one byte, such as ip6 dscp, byteorder
conversion needs to be done before rshift. Add unary expression for this
conversion only in the case of meta and ct statements.

Before this patch:

# nft --debug=netlink add rule ip6 x y 'meta mark set ip6 dscp'
ip6 x y
  [ payload load 2b @ network header + 0 => reg 1 ]
  [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ]
  [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ]
  [ byteorder reg 1 = ntoh(reg 1, 2, 2) ] <--------- incorrect
  [ meta set mark with reg 1 ]

After this patch:

# nft --debug=netlink add rule ip6 x y 'meta mark set ip6 dscp'
ip6 x y
  [ payload load 2b @ network header + 0 => reg 1 ]
  [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ]
  [ byteorder reg 1 = ntoh(reg 1, 2, 2) ] <-------- correct
  [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ]
  [ meta set mark with reg 1 ]

For the matching case, binary transfer already deals with the rshift to
adjust left and right hand side of the expression, the unary conversion
is not needed in such case.

Fixes: 8221d86e616b ("tests: py: add test-cases for ct and packet mark payload expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>