]> git.ipfire.org Git - thirdparty/iproute2.git/log
thirdparty/iproute2.git
19 months agoip route: add support for TCP usec TS
Eric Dumazet [Mon, 4 Dec 2023 09:19:07 +0000 (09:19 +0000)] 
ip route: add support for TCP usec TS

linux-6.7 got support for TCP usec resolution timestamps,
using one bit in the features mask : RTAX_FEATURE_TCP_USEC_TS.

ip route add 10/8 ... features tcp_usec_ts

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agoMerge branch 'main' into next
David Ahern [Wed, 22 Nov 2023 19:38:34 +0000 (19:38 +0000)] 
Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agoMerge branch 'parsing-cleanup' into next
David Ahern [Wed, 22 Nov 2023 19:34:01 +0000 (19:34 +0000)] 
Merge branch 'parsing-cleanup' into next

Petr Machata  says:

====================

Library functions parse_one_of() and parse_on_off() were added about three
years ago to unify all the disparate reimplementations of the same basic
idea. It used the matches() function to determine whether a string under
consideration corresponds to one of the keywords. This reflected many,
though not all cases of on/off parsing at the time.

This decision has some odd consequences. In particular, "o" can be used as
a shorthand for "off", which is not obvious, because "o" is the prefix of
both. By sheer luck, the end result actually makes some sense: "on" means
on, anything else either means off or errors out. Similar issues are in
principle also possible for parse_one_of() uses, though currently this does
not come up.

Ideally parse_on_off() would accept the strings "on" and "off" and no
others.

Patch #1 is a cleanup. Patch #2 is shaping the code for the next patches.

Patch #3 converts parse_on_off() to strcmp(). See the commit message for
the rationale of why the change should be considered acceptable.

We'd ideally do parse_one_of() likewise. But the strings this function
parses tend to be longer, which means more opportunities for typos and more
of a reason to abbreviate things.

So instead, patch #4 adds a function parse_one_of_deprecated() for ip
macsec to use in one place, where these typos are to be expected, and
converts that site to the new function.

Then patch #5 changes the behavior of parse_one_of() to accept prefixes
like it has so far, but to warn that they are deprecated:

    # dcb ets set dev swp1 tc-tsa 0:s
    WARNING: 's' matches 'strict' by prefix.
    Matching by prefix is deprecated in this context, please use the full string.

The idea is that several releases down the line, we might consider
switching over to strcmp(), as presumably enough advance warning will have
been given.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agolib: utils: Have parse_one_of() warn about prefix matches
Petr Machata [Wed, 22 Nov 2023 15:23:32 +0000 (16:23 +0100)] 
lib: utils: Have parse_one_of() warn about prefix matches

The function parse_one_of() currently uses matches() for string comparison
under the hood. Extending matches()-based parsers is tricky, because newly
added matches might change the way strings are parsed, if the newly-added
string shares a prefix with a string that is matched later in the code.

Therefore in this patch, add a twist to parse_one_of() that partial prefix
matches yield a warning. This will not disturb standard output or the
overall behavior, but will make it obvious that the usage is discouraged
and prompt users to update their scripts and habits.

An example of output:

    # dcb ets set dev swp1 tc-tsa 0:s
    WARNING: 's' matches 'strict' by prefix.
    Matching by prefix is deprecated in this context, please use the full string.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agolib: utils: Introduce parse_one_of_deprecated()
Petr Machata [Wed, 22 Nov 2023 15:23:31 +0000 (16:23 +0100)] 
lib: utils: Introduce parse_one_of_deprecated()

The function parse_one_of() currently uses matches() for string comparison
under the hood. Extending matches()-based parsers is tricky, because newly
added matches might change the way strings are parsed, if the newly-added
string shares a prefix with a string that is matched later in the code.

In this patch, introduce a new function, parse_one_of_deprecated(). This
will be currently synonymous with parse_one_of(), however the latter will
change behavior in the next patch.

Use the new function for parsing of the macsec "validate" option. The
reason is that the valid strings for that option are "disabled", "check"
and "strict". It is not hard to see how "disabled" could be misspelled as
"disable", and be baked in some script in this form.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agolib: utils: Convert parse_on_off() to strcmp()
Petr Machata [Wed, 22 Nov 2023 15:23:30 +0000 (16:23 +0100)] 
lib: utils: Convert parse_on_off() to strcmp()

The function parse_on_off() currently uses matches() for string comparison
under the hood. This has some odd consequences. In particular, "o" can be
used as a shorthand for "off", which is not obvious, because "o" is the
prefix of both. In this patch, change parsing to strcmp(). This is a
breaking change. The following paragraphs give arguments for why it should
be considered acceptable.

First and foremost: on/off are very short strings that it makes practically
no sense to shorten. Since "o" is the universal prefix, the only
unambiguous shortening is "of" for "off". It is doubtful that anyone would
intentionally decide to save typing of the second "f" when they already
typed the first. It also seems unlikely that the typo of "of" for "off"
would not be caught immediately, as missing a third of the word length
would likely be noticed. In other words, it seems improbable that the
abbreviated variants are used, intentionally or by mistake.

Commit 9262ccc3ed32 ("bridge: link: Port over to parse_on_off()") and
commit 3e0d2a73ba06 ("ip: iplink_bridge_slave: Port over to
parse_on_off()") converted several sites from open-coding strcmp()-based
on/off parsing to parse_on_off(), which is itself based on matches(). This
made the list of permissible strings more generic, but the behavior was
exact match to begin with, and this patch restores it.

Commit 5f685d064b03 ("ip: iplink: Convert to use parse_on_off()") has
changed from matches()-based parsing, which however had branches in the
other order, and "o" would parse to mean on. This indicates that at least
in this context, people were not using the shorthand of "o" or the commit
would have broken their use case. This supports the thesis that the
abbreviations are not really used for on/off parsing.

For completeness, commit 82604d28525a ("lib: Add parse_one_of(),
parse_on_off()") introduced parse_on_off(), converting several users in the
ip link macsec code in the process. Those users have always used matches(),
and had branches in the same order as the newly-introduced parse_on_off().

A survey of selftests and documentation of Linux kernel (by way of git
grep), has not discovered any cases of the involved options getting
arguments other than the exact strings on and off.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agolib: utils: Generalize parse_one_of()
Petr Machata [Wed, 22 Nov 2023 15:23:29 +0000 (16:23 +0100)] 
lib: utils: Generalize parse_one_of()

The following patch will change the way parse_one_of() and parse_on_off()
parse the strings they are given. To prepare for this change, extract from
parse_one_of() the functional core, which express in terms of a
configurable matcher, a pointer to a function that does the string
comparison. Then rewrite parse_one_of() and parse_on_off() as wrappers that
just pass matches() as the matcher, thereby maintaining the same behavior
as they currently have.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agolib: utils: Switch matches() to returning int again
Petr Machata [Wed, 22 Nov 2023 15:23:28 +0000 (16:23 +0100)] 
lib: utils: Switch matches() to returning int again

Since commit 1f420318bda3 ("utils: don't match empty strings as prefixes")
the function has pretended to return a boolean. But every user expects it
to return zero on success and a non-zero value on failure, like strcmp().
Even the function itself actually returns "true" to mean "no match". This
only makes sense if one considers a boolean to be a one-bit unsigned
integer with no inherent meaning, which I do not think is reasonable.

Switch the prototype back to int, and return 1 instead of true.

Cc: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agoip, link: Add support for netkit
Daniel Borkmann [Mon, 20 Nov 2023 23:33:41 +0000 (00:33 +0100)] 
ip, link: Add support for netkit

Add base support for creating/dumping netkit devices.

Minimal example usage:

  # ip link add type netkit
  # ip -d a
  [...]
  7: nk0@nk1: <BROADCAST,MULTICAST,NOARP,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
    netkit mode l3 type peer policy forward numtxqueues 1 numrxqueues 1 [...]
  8: nk1@nk0: <BROADCAST,MULTICAST,NOARP,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
    netkit mode l3 type primary policy forward numtxqueues 1 numrxqueues 1 [...]

Example usage with netns (for BPF examples, see BPF selftests linked below):

  # ip netns add blue
  # ip link add nk0 type netkit peer nk1 netns blue
  # ip link set up nk0
  # ip addr add 10.0.0.1/24 dev nk0
  # ip -n blue link set up nk1
  # ip -n blue addr add 10.0.0.2/24 dev nk1
  # ping -c1 10.0.0.2
  PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
  64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.021 ms

Example usage with L2 mode and peer blackholing when no BPF is attached:

  # ip link add foo type netkit mode l2 forward peer blackhole bar
  # ip -d a
  [...]
  13: bar@foo: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
     link/ether 5e:5b:81:17:02:27 brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
     netkit mode l2 type peer policy blackhole numtxqueues 1 numrxqueues 1 [...]
  14: foo@bar: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
     link/ether de:01:a5:88:9e:99 brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
     netkit mode l2 type primary policy forward numtxqueues 1 numrxqueues 1 [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://git.kernel.org/torvalds/c/35dfaad7188c
Link: https://git.kernel.org/torvalds/c/05c31b4ab205
Link: https://git.kernel.org/torvalds/c/ace15f91e569
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agoman: allow up to 100 character lines
Stephen Hemminger [Sun, 19 Nov 2023 16:56:43 +0000 (08:56 -0800)] 
man: allow up to 100 character lines

There are some long URL's that cause warnings from the
man page checker. Go ahead and allow these even though Debian
lintian may still complain.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
20 months agoman: fix man page errors
Stephen Hemminger [Fri, 17 Nov 2023 17:22:19 +0000 (09:22 -0800)] 
man: fix man page errors

Debian is now more picky about man pages.
Need to tell man command that tbl is being used on a man page now.
Also, font macros need to have proper font.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
20 months agoip: move get_failed blocks
Stephen Hemminger [Fri, 17 Nov 2023 17:16:14 +0000 (09:16 -0800)] 
ip: move get_failed blocks

Rather than doing goto back into the middle of an earlier
if() statement. Move the error returns to the end of the functions
to follow kernel coding practice.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
20 months agoiproute2: prevent memory leak
heminhong [Thu, 16 Nov 2023 03:13:08 +0000 (11:13 +0800)] 
iproute2: prevent memory leak

When the return value of rtnl_talk() is not less than 0,
'answer' will be allocated. The 'answer' should be free
after using, otherwise it will cause memory leak.

Fixes: a066cc6623e1 ("gre/gre6: Unify local/remote endpoint address parsing")
Signed-off-by: heminhong <heminhong@kylinos.cn>
Reviewed-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
20 months agoMakefile: use /usr/share/iproute2 for config files
Andrea Claudi [Wed, 15 Nov 2023 17:25:35 +0000 (18:25 +0100)] 
Makefile: use /usr/share/iproute2 for config files

According to FHS:

"/usr/lib includes object files and libraries. On some systems, it may
also include internal binaries that are not intended to be executed
directly by users or shell scripts."

A better directory to store config files is /usr/share:

"The /usr/share hierarchy is for all read-only architecture independent
data files.

This hierarchy is intended to be shareable among all architecture
platforms of a given OS; thus, for example, a site with i386, Alpha, and
PPC platforms might maintain a single /usr/share directory that is
centrally-mounted."

Accordingly, move configuration files to $(DATADIR)/iproute2.

Fixes: 946753a4459b ("Makefile: ensure CONF_USR_DIR honours the libdir config")
Reported-by: Luca Boccassi <luca.boccassi@gmail.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
20 months agouapi: update headers from 6.7-rc1
Stephen Hemminger [Mon, 13 Nov 2023 16:38:58 +0000 (08:38 -0800)] 
uapi: update headers from 6.7-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
20 months agoMerge branch 'devlink-instances' into next
David Ahern [Sat, 11 Nov 2023 17:33:34 +0000 (17:33 +0000)] 
Merge branch 'devlink-instances' into next

Jiri Pirko  says:

====================

Print out recently added attributes that expose relationships between
devlink instances. This patchset extends the outputs by
"nested_devlink" attributes.

Examples:
$ devlink dev
pci/0000:08:00.0:
  nested_devlink:
    auxiliary/mlx5_core.eth.0
auxiliary/mlx5_core.eth.0
pci/0000:08:00.1:
  nested_devlink:
    auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1

$ devlink dev -j -p
{
    "dev": {
        "pci/0000:08:00.0": {
            "nested_devlink": {
                "auxiliary/mlx5_core.eth.0": {}
            }
        },
        "auxiliary/mlx5_core.eth.0": {},
        "pci/0000:08:00.1": {
            "nested_devlink": {
                "auxiliary/mlx5_core.eth.1": {}
            }
        },
        "auxiliary/mlx5_core.eth.1": {}
    }
}

$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 106
pci/0000:08:00.0/32768: type eth netdev eth2 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
$ devlink port function set pci/0000:08:00.0/32768 state active
$ devlink port show pci/0000:08:00.0/32768
pci/0000:08:00.0/32768: type eth netdev eth2 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state active opstate attached roce enable
      nested_devlink:
        auxiliary/mlx5_core.sf.2
$ devlink port show pci/0000:08:00.0/32768 -j -p
{
    "port": {
        "pci/0000:08:00.0/32768": {
            "type": "eth",
            "netdev": "eth2",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 106,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00",
                "state": "active",
                "opstate": "attached",
                "roce": "enable",
                "nested_devlink": {
                    "auxiliary/mlx5_core.sf.2": {}
                }
            }
        }
    }
}

$ devlink dev reload auxiliary/mlx5_core.sf.2 netns ns1
$ devlink port show pci/0000:08:00.0/32768
pci/0000:08:00.0/32768: type eth netdev eth2 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state active opstate attached roce enable
      nested_devlink:
        auxiliary/mlx5_core.sf.2: netns ns1
$ devlink port show pci/0000:08:00.0/32768 -j -p
{
    "port": {
        "pci/0000:08:00.0/32768": {
            "type": "eth",
            "netdev": "eth2",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 106,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00",
                "state": "active",
                "opstate": "attached",
                "roce": "enable",
                "nested_devlink": {
                    "auxiliary/mlx5_core.sf.2": {
                        "netns": "ns1"
                    }
                }
            }
        }
    }
}

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agodevlink: print nested devlink handle for devlink dev
Jiri Pirko [Tue, 7 Nov 2023 08:06:07 +0000 (09:06 +0100)] 
devlink: print nested devlink handle for devlink dev

Devlink dev may contain one or more nested devlink instances.
Print them using previously introduced pr_out_nested_handle_obj()
helper.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agodevlink: print nested handle for port function
Jiri Pirko [Tue, 7 Nov 2023 08:06:06 +0000 (09:06 +0100)] 
devlink: print nested handle for port function

If port function contains nested handle attribute, print it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agodevlink: introduce support for netns id for nested handle
Jiri Pirko [Tue, 7 Nov 2023 08:06:05 +0000 (09:06 +0100)] 
devlink: introduce support for netns id for nested handle

Nested handle may contain DEVLINK_ATTR_NETNS_ID attribute that indicates
the network namespace where the nested devlink instance resides. Process
this converting to netns name if possible and print to user.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agodevlink: extend pr_out_nested_handle() to print object
Jiri Pirko [Tue, 7 Nov 2023 08:06:04 +0000 (09:06 +0100)] 
devlink: extend pr_out_nested_handle() to print object

For existing pr_out_nested_handle() user (line card), the output stays
the same. For the new users, introduce __pr_out_nested_handle()
to allow to print devlink instance as object allowing to carry
attributes in it (like netns).

Note that as __pr_out_handle_start() and pr_out_handle_end() are newly
used, the function is moved below the definitions.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agodevlink: do conditional new line print in pr_out_port_handle_end()
Jiri Pirko [Tue, 7 Nov 2023 08:06:03 +0000 (09:06 +0100)] 
devlink: do conditional new line print in pr_out_port_handle_end()

Instead of printing out new line unconditionally, use __pr_out_newline()
to print it only when needed avoiding double prints.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agodevlink: use snprintf instead of sprintf
Jiri Pirko [Tue, 7 Nov 2023 08:06:02 +0000 (09:06 +0100)] 
devlink: use snprintf instead of sprintf

Use snprintf instead of sprintf to ensure only valid memory is printed
to and the output string is properly terminated.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agoip/ipnetns: move internals of get_netnsid_from_name() into namespace.c
Jiri Pirko [Tue, 7 Nov 2023 08:06:01 +0000 (09:06 +0100)] 
ip/ipnetns: move internals of get_netnsid_from_name() into namespace.c

In order to be able to reuse get_netnsid_from_name() function outside of
ip code, move the internals to lib/namespace.c to a new function called
netns_id_from_name().

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agolibnetlink: validate nlmsg header length first
Max Kunzelmann [Tue, 7 Nov 2023 01:20:55 +0000 (01:20 +0000)] 
libnetlink: validate nlmsg header length first

Validate the nlmsg header length before accessing the nlmsg payload
length.

Fixes: 892a25e286fb ("libnetlink: break up dump function")
Signed-off-by: Max Kunzelmann <maxdev@posteo.de>
Reviewed-by: Benny Baumann <BenBE@geshi.org>
Reviewed-by: Robert Geislinger <github@crpykng.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
20 months agoRevert "Makefile: ensure CONF_USR_DIR honours the libdir config"
Luca Boccassi [Mon, 6 Nov 2023 00:14:10 +0000 (00:14 +0000)] 
Revert "Makefile: ensure CONF_USR_DIR honours the libdir config"

LIBDIR in Debian and derivatives is not /usr/lib/, it's
/usr/lib/<architecture triplet>/, which is different, and it's the
wrong location where to install architecture-independent default
configuration files, which should always go to /usr/lib/ instead.
Installing these files to the per-architecture directory is not
the right thing, hence revert the change.

This reverts commit 946753a4459bd035132a27bb2eb87529c1979b90.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
20 months agoMerge branch 'main' of git://git.kernel.org/pub/scm/network/iproute2/iproute2-next
Stephen Hemminger [Mon, 6 Nov 2023 20:40:38 +0000 (12:40 -0800)] 
Merge branch 'main' of git://git.kernel.org/pub/scm/network/iproute2/iproute2-next

20 months agobridge: mdb: Add get support
Ido Schimmel [Wed, 1 Nov 2023 07:45:10 +0000 (09:45 +0200)] 
bridge: mdb: Add get support

Implement MDB get functionality, allowing user space to query a single
MDB entry from the kernel instead of dumping all the entries. Example
usage:

 # bridge mdb add dev br0 port swp1 grp 239.1.1.1 vid 10
 # bridge mdb add dev br0 port swp2 grp 239.1.1.1 vid 10
 # bridge mdb add dev br0 port swp2 grp 239.1.1.5 vid 10
 # bridge mdb get dev br0 grp 239.1.1.1 vid 10
 dev br0 port swp1 grp 239.1.1.1 temp vid 10
 dev br0 port swp2 grp 239.1.1.1 temp vid 10
 # bridge -j -p mdb get dev br0 grp 239.1.1.1 vid 10
 [ {
         "index": 10,
         "dev": "br0",
         "port": "swp1",
         "grp": "239.1.1.1",
         "state": "temp",
         "flags": [ ],
         "vid": 10
     },{
         "index": 10,
         "dev": "br0",
         "port": "swp2",
         "grp": "239.1.1.1",
         "state": "temp",
         "flags": [ ],
         "vid": 10
     } ]
 # bridge mdb get dev br0 grp 239.1.1.1 vid 20
 Error: bridge: MDB entry not found.
 # bridge mdb get dev br0 grp 239.1.1.2 vid 10
 Error: bridge: MDB entry not found.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
20 months agoUpdate kernel headers
David Ahern [Mon, 6 Nov 2023 17:08:23 +0000 (10:08 -0700)] 
Update kernel headers

Update kernel headers to commit:
    ff269e2cd5ad ("Merge tag 'net-next-6.7-followup' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")

Import mptcp_pm.h due to a new dependency.

Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agov6.6.0 v6.6.0
Stephen Hemminger [Sat, 4 Nov 2023 16:22:25 +0000 (09:22 -0700)] 
v6.6.0

21 months agovv6.6.0
Stephen Hemminger [Sat, 4 Nov 2023 01:04:49 +0000 (18:04 -0700)] 
vv6.6.0

21 months agossfilter: fix clang warning about conversion
Stephen Hemminger [Tue, 31 Oct 2023 23:03:58 +0000 (16:03 -0700)] 
ssfilter: fix clang warning about conversion

Clang warns:
ssfilter_check.c:100:13: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
21 months agoss: add support for rcv_wnd and rehash
Eric Dumazet [Tue, 31 Oct 2023 11:17:20 +0000 (11:17 +0000)] 
ss: add support for rcv_wnd and rehash

tcpi_rcv_wnd and tcpi_rehash were added in linux-6.2.

$ ss -ti
...
 cubic wscale:7,7 ... minrtt:0.01 snd_wnd:65536 rcv_wnd:458496

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
21 months agotc: drop support for ATM qdisc
Stephen Hemminger [Mon, 30 Oct 2023 21:15:36 +0000 (14:15 -0700)] 
tc: drop support for ATM qdisc

The upstream kernel dropped support for ATM qdisc in
fb38306ceb9e (net/sched: Retire ATM qdisc, 2023-02-14)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
21 months agotc: remove dsmark qdisc
Stephen Hemminger [Mon, 30 Oct 2023 18:35:32 +0000 (11:35 -0700)] 
tc: remove dsmark qdisc

The kernel has removed support for dsmark qdisc in commit
bbe77c14ee61 (net/sched: Retire dsmark qdisc, 2023-02-14)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
21 months agotc: remove tcindex classifier
Stephen Hemminger [Mon, 30 Oct 2023 18:26:33 +0000 (11:26 -0700)] 
tc: remove tcindex classifier

Support for tcindex classifier was removed by upstream commit
8c710f75256b (net/sched: Retire tcindex classifier, 2023-02-14)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
21 months agotc: remove support for RSVP classifier
Stephen Hemminger [Mon, 30 Oct 2023 18:23:12 +0000 (11:23 -0700)] 
tc: remove support for RSVP classifier

The RSVP classifier was removed in 6.3 kernel by upstream commit
265b4da82dbf (net/sched: Retire rsvp classifier, 2023-02-14)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
21 months agotc: remove support for CBQ
Stephen Hemminger [Mon, 30 Oct 2023 18:10:18 +0000 (11:10 -0700)] 
tc: remove support for CBQ

The CBQ qdisc was removed in 6.3 kernel by upstream
051d44209842 (net/sched: Retire CBQ qdisc, 2023-02-14)

Remove associated support from iproute2 including dropping
tests, man pages and fixing other references.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
21 months agobpf: increase verifier verbosity when in verbose mode
Shung-Hsi Yu [Fri, 27 Oct 2023 08:57:06 +0000 (16:57 +0800)] 
bpf: increase verifier verbosity when in verbose mode

The BPF verifier allows setting a higher verbosity level, which is
helpful when it comes to debugging verifier issue, specially when used
on BPF program that loads successfully (but should not have passed the
verifier in the first place). Increase the BPF verifier log level when
in verbose mode to help with such cases.

Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agolibbpf: set kernel_log_level when available
Shung-Hsi Yu [Fri, 27 Oct 2023 08:57:05 +0000 (16:57 +0800)] 
libbpf: set kernel_log_level when available

libbpf allows setting the log_level in struct bpf_object_open_opts
through the kernel_log_level field since v0.7, use it to set log level
to align with bpf_prog_load_dev() and bpf_btf_load().

Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agordma: Adjust man page for rdma system set privileged-qkey command
Patrisious Haddad [Wed, 25 Oct 2023 12:31:02 +0000 (15:31 +0300)] 
rdma: Adjust man page for rdma system set privileged-qkey command

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agordma: Add an option to set privileged QKEY parameter
Patrisious Haddad [Wed, 25 Oct 2023 12:31:01 +0000 (15:31 +0300)] 
rdma: Add an option to set privileged QKEY parameter

Enrich rdmatool with an option to enable or disable privileged QKEY.
When enabled, non-privileged users will be allowed to specify a
controlled QKEY.

By default this parameter is disabled in order to comply with IB spec.
According to the IB specification rel-1.6, section 3.5.3:
"QKEYs with the most significant bit set are considered controlled
QKEYs, and a HCA does not allow a consumer to arbitrarily specify a
controlled QKEY."

This allows old applications which existed before the kernel commit:
0cadb4db79e1 ("RDMA/uverbs: Restrict usage of privileged QKEYs")
they can use privileged QKEYs without being a privileged user to now
be able to work again without being privileged granted they turn on this
parameter.

rdma tool command examples and output.

$ rdma system show
netns shared privileged-qkey off copy-on-fork on

$ rdma system set privileged-qkey on

$ rdma system show
netns shared privileged-qkey on copy-on-fork on

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agordma: update uapi headers
Patrisious Haddad [Wed, 25 Oct 2023 12:31:00 +0000 (15:31 +0300)] 
rdma: update uapi headers

Update rdma_netlink.h file upto kernel commit 36ce80759f8c
("RDMA/core: Add support to set privileged qkey parameter")

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agoss: fix directory leak when -T option is used
Maxim Petrov [Sat, 21 Oct 2023 08:44:08 +0000 (10:44 +0200)] 
ss: fix directory leak when -T option is used

To get information about threads used in a process, the /proc/$PID/task
directory content is analyzed by ss code. However, the opened 'dirent'
object is not closed after use, leading to memory leaks. Add missing
closedir call in 'user_ent_hash_build' to avoid it.

Detected by valgrind: "valgrind ./misc/ss -T"

Fixes: e2267e68b9b5 ("ss: Introduce -T, --threads option")
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
21 months agoMerge branch 'bridge-flush-vxlan-attr' into next
David Ahern [Fri, 20 Oct 2023 15:43:39 +0000 (09:43 -0600)] 
Merge branch 'bridge-flush-vxlan-attr' into next

Amit Cohen  says:

====================

The merge commit f84e3f8cced9 ("Merge branch 'bridge-fdb-flush' into next")
added support for fdb flushing.

The kernel was extended to support flush for VXLAN device, so the
"bridge fdb flush" command should support new attributes.

Add support for flushing FDB entries based on the following:
* Source VNI
* Nexthop ID
* Destination VNI
* Destination Port
* Destination IP
* 'router' flag

With this set, flush works with attributes which are relevant for VXLAN
FDBs, for example:

$ bridge fdb flush dev vx10 vni 5000 dst 192.2.2.1
< flush all vx10 entries with VNI 5000 and destination IP 192.2.2.1 >

There are examples for each attribute in the respective commit messages.

Patch set overview:
Patch #1 prepares the code for adding support for 'port' keyword
Patches #2-#7 add support for new keywords in flush command
Patch #8 adds a note in man page

v2:
* Print 'nhid' instead of 'id' in the error in patch #3
* Use capital letters for 'ECMP' in man page in patch #3

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agoman: bridge: add a note about using 'master' and 'self' with flush
Amit Cohen [Tue, 17 Oct 2023 10:55:32 +0000 (13:55 +0300)] 
man: bridge: add a note about using 'master' and 'self' with flush

When 'master' and 'self' keywords are used, the command will be handled
by the driver of the device itself and by the driver that the device is
master on. For VXLAN, such command will be handled by VXLAN driver and by
bridge driver in case that the VXLAN is master on a bridge.

The bridge driver and VXLAN driver do not support the same arguments for
flush command, for example - "vlan" is supported by bridge and not by
VXLAN and "vni" is supported by VXLAN and not by bridge.

The following command returns an error:
$ bridge fdb flush dev vx10 vlan 1 self master
Error: Unsupported attribute.

This error comes from the VXLAN driver, which does not support flush by
VLAN, but this command is handled by bridge driver, so entries in bridge
are flushed even though user gets an error.

Note in the man page that such command is not recommended, instead, user
should run flush command twice - once with 'self' and once with 'master',
and each one with the supported attributes.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agobridge: fdb: support match on [no]router flag in flush command
Amit Cohen [Tue, 17 Oct 2023 10:55:31 +0000 (13:55 +0300)] 
bridge: fdb: support match on [no]router flag in flush command

Extend "fdb flush" command to match entries with or without (if "no" is
prepended) router flag.

Examples:
$ bridge fdb flush dev vx10 router
This will delete all fdb entries pointing to vx10 with router flag.

$ bridge fdb flush dev vx10 norouter
This will delete all fdb entries pointing to vx10, except the ones with
router flag.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agobridge: fdb: support match on destination IP in flush command
Amit Cohen [Tue, 17 Oct 2023 10:55:30 +0000 (13:55 +0300)] 
bridge: fdb: support match on destination IP in flush command

Extend "fdb flush" command to match fdb entries with a specific destination
IP.

Example:
$ bridge fdb flush dev vx10 dst 192.1.1.1
This will flush all fdb entries pointing to vx10 with destination IP
192.1.1.1

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agobridge: fdb: support match on destination port in flush command
Amit Cohen [Tue, 17 Oct 2023 10:55:29 +0000 (13:55 +0300)] 
bridge: fdb: support match on destination port in flush command

Extend "fdb flush" command to match fdb entries with a specific destination
port.

Example:
$ bridge fdb flush dev vx10 port 1111
This will flush all fdb entries pointing to vx10 with destination port
1111.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agobridge: fdb: support match on destination VNI in flush command
Amit Cohen [Tue, 17 Oct 2023 10:55:28 +0000 (13:55 +0300)] 
bridge: fdb: support match on destination VNI in flush command

Extend "fdb flush" command to match fdb entries with a specific destination
VNI.

Example:
$ bridge fdb flush dev vx10 vni 1000
This will flush all fdb entries pointing to vx10 with destination VNI 1000.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agobridge: fdb: support match on nexthop ID in flush command
Amit Cohen [Tue, 17 Oct 2023 10:55:27 +0000 (13:55 +0300)] 
bridge: fdb: support match on nexthop ID in flush command

Extend "fdb flush" command to match fdb entries with a specific nexthop ID.

Example:
$ bridge fdb flush dev vx10 nhid 2
This will flush all fdb entries pointing to vx10 with nexthop ID 2.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agobridge: fdb: support match on source VNI in flush command
Amit Cohen [Tue, 17 Oct 2023 10:55:26 +0000 (13:55 +0300)] 
bridge: fdb: support match on source VNI in flush command

Extend "fdb flush" command to match fdb entries with a specific source VNI.

Example:
$ bridge fdb flush dev vx10 src_vni 1000
This will flush all fdb entries pointing to vx10 with source VNI 1000.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agobridge: fdb: rename some variables to contain 'brport'
Amit Cohen [Tue, 17 Oct 2023 10:55:25 +0000 (13:55 +0300)] 
bridge: fdb: rename some variables to contain 'brport'

Currently, the flush command supports the keyword 'brport'. To handle
this argument the variables 'port_ifidx' and 'port' are used.

A following patch will add support for 'port' keyword in flush command,
rename the existing variables to include 'brport' prefix, so then it
will be clear that they are used to parse 'brport' argument.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agoiplink: bridge: Add support for bridge FDB learning limits
Johannes Nixdorf [Wed, 18 Oct 2023 07:04:43 +0000 (09:04 +0200)] 
iplink: bridge: Add support for bridge FDB learning limits

Support setting the FDB limit through ip link. The arguments is:
 - fdb_max_learned: A 32-bit unsigned integer specifying the maximum
                    number of learned FDB entries, with 0 disabling
                    the limit.

Also support reading back the current number of learned FDB entries in
the bridge by this count. The returned value's name is:
 - fdb_n_learned: A 32-bit unsigned integer specifying the current number
                  of learned FDB entries.

Example:

 # ip -d -j -p link show br0
[ {
...
        "linkinfo": {
            "info_kind": "bridge",
            "info_data": {
...
                "fdb_n_learned": 2,
                "fdb_max_learned": 0,
...
            }
        },
...
    } ]
 # ip link set br0 type bridge fdb_max_learned 1024
 # ip -d -j -p link show br0
[ {
...
        "linkinfo": {
            "info_kind": "bridge",
            "info_data": {
...
                "fdb_n_learned": 2,
                "fdb_max_learned": 1024,
...
            }
        },
...
    } ]

Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agoUpdate kernel headers
David Ahern [Thu, 19 Oct 2023 15:34:46 +0000 (15:34 +0000)] 
Update kernel headers

Update kernel headers to commit
    dcf02bac377e ("Merge branch 'net-stmmac-improve-tx-timer-logic'")

Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agoMerge remote-tracking branch 'main/main' into next
David Ahern [Mon, 16 Oct 2023 16:18:32 +0000 (10:18 -0600)] 
Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agordma: Add support to dump SRQ resource in raw format
wenglianfa [Tue, 10 Oct 2023 07:55:26 +0000 (15:55 +0800)] 
rdma: Add support to dump SRQ resource in raw format

Add support to dump SRQ resource in raw format.

This patch relies on the corresponding kernel commit aebf8145e11a
("RDMA/core: Add support to dump SRQ resource in RAW format")

Example:
$ rdma res show srq -r
dev hns3 149000...

$ rdma res show srq -j -r
[{"ifindex":0,"ifname":"hns3","data":[149,0,0,...]}]

Signed-off-by: wenglianfa <wenglianfa@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agordma: Update uapi headers
Junxian Huang [Tue, 10 Oct 2023 07:55:25 +0000 (15:55 +0800)] 
rdma: Update uapi headers

Update rdma_netlink.h file upto kernel commit aebf8145e11a
("RDMA/core: Add support to dump SRQ resource in RAW format")

Signed-off-by: wenglianfa <wenglianfa@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
21 months agoip: fix memory leak in 'ip maddr show'
Maxim Petrov [Sun, 15 Oct 2023 14:32:12 +0000 (16:32 +0200)] 
ip: fix memory leak in 'ip maddr show'

In `read_dev_mcast`, the list of ma_info is allocated, but not cleared
after use. Free the list in the end to make valgrind happy.

Detected by valgrind: "valgrind ./ip/ip maddr show"

Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
21 months agobridge: fdb: add an error print for unknown command
Amit Cohen [Tue, 10 Oct 2023 09:57:50 +0000 (12:57 +0300)] 
bridge: fdb: add an error print for unknown command

Commit 6e1ca489c5a2 ("bridge: fdb: add new flush command") added support
for "bridge fdb flush" command. This commit did not handle unsupported
keywords, they are just ignored.

Add an error print to notify the user when a keyword which is not supported
is used. The kernel will be extended to support flush with VXLAN device,
so new attributes will be supported (e.g., vni, port). When iproute-2 does
not warn for unsupported keyword, user might think that the flush command
works, although the iproute-2 version is too old and it does not send VXLAN
attributes to the kernel.

Fixes: 6e1ca489c5a2 ("bridge: fdb: add new flush command")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
21 months agouapi: update from 6.6-rc5
Stephen Hemminger [Fri, 13 Oct 2023 02:33:46 +0000 (19:33 -0700)] 
uapi: update from 6.6-rc5

Update to if_packet.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
22 months agoila: fix array overflow warning
Stephen Hemminger [Wed, 4 Oct 2023 17:00:19 +0000 (10:00 -0700)] 
ila: fix array overflow warning

Aliasing a 64 bit value seems to confuse Gcc 12.2.
ipila.c:57:32: warning: ‘addr’ may be used uninitialized [-Wmaybe-uninitialized]

Use a union instead.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
22 months agodevlink: Support setting port function ipsec_packet cap
Dima Chumak [Mon, 2 Oct 2023 10:43:49 +0000 (13:43 +0300)] 
devlink: Support setting port function ipsec_packet cap

Support port function commands to enable / disable IPsec packet
offloads, this is used to control the port IPsec device capabilities.

When IPsec packet capability is disabled for a function of the port
(default), function cannot offload IPsec operation. When enabled, IPsec
operation can be offloaded by the function of the port.

Enabling IPsec packet offloads lets the kernel to delegate
encrypt/decrypt operations, as well as encapsulation and SA/policy and
state to the device hardware.

Example of a PCI VF port which supports IPsec packet offloads:

$ devlink port show pci/0000:06:00.0/1
    pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable ipsec_packet disable

$ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable

$ devlink port show pci/0000:06:00.0/1
    pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable ipsec_packet enable

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agodevlink: Support setting port function ipsec_crypto cap
Dima Chumak [Mon, 2 Oct 2023 10:43:48 +0000 (13:43 +0300)] 
devlink: Support setting port function ipsec_crypto cap

Support port function commands to enable / disable IPsec crypto
offloads, this is used to control the port IPsec device capabilities.

When IPsec crypto capability is disabled for a function of the port
(default), function cannot offload IPsec operation. When enabled, IPsec
operation can be offloaded by the function of the port.

Enabling IPsec crypto offloads lets the kernel to delegate XFRM state
processing and encrypt/decrypt operation to the device hardware.

Example of a PCI VF port which supports IPsec crypto offloads:

$ devlink port show pci/0000:06:00.0/1
    pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable

$ devlink port function set pci/0000:06:00.0/1 ipsec_crypto enable

$ devlink port show pci/0000:06:00.0/1
    pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto enable

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agoMerge remote-tracking branch 'main/main' into next
David Ahern [Wed, 4 Oct 2023 15:22:23 +0000 (09:22 -0600)] 
Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agouapi: update headers from 6.6-rc4
Stephen Hemminger [Mon, 2 Oct 2023 21:29:10 +0000 (14:29 -0700)] 
uapi: update headers from 6.6-rc4

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
22 months agoAdd security policy
Stephen Hemminger [Fri, 29 Sep 2023 23:03:07 +0000 (16:03 -0700)] 
Add security policy

Iproute2 security policy is minimal since the security
domain is controlled by the kernel. But it should be documented
before some new security related bug arises at some future time.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
22 months agoila: fix potential snprintf buffer overflow
Stephen Hemminger [Mon, 18 Sep 2023 18:36:32 +0000 (11:36 -0700)] 
ila: fix potential snprintf buffer overflow

The code to print 64 bit address has a theoretical overflow
of snprintf buffer found by CodeQL scan.
Address by checking result.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
22 months agobridge: fix potential snprintf overflow
Stephen Hemminger [Mon, 18 Sep 2023 18:34:42 +0000 (11:34 -0700)] 
bridge: fix potential snprintf overflow

There is a theoretical snprintf overflow in bridge slave bitmask
print code found by CodeQL scan.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
22 months agoMakefile: ensure CONF_USR_DIR honours the libdir config
Andrea Claudi [Fri, 15 Sep 2023 19:59:06 +0000 (21:59 +0200)] 
Makefile: ensure CONF_USR_DIR honours the libdir config

Following commit cee0cf84bd32 ("configure: add the --libdir option"),
iproute2 lib directory is configurable using the --libdir option on the
configure script. However, CONF_USR_DIR does not honour the configured
lib path in its default value.

This fixes the issue simply using $(LIBDIR) instead of $(PREFIX)/lib.
Please note that the default value for $(LIBDIR) is exactly
$(PREFIX)/lib, so this does not change the default value for
CONF_USR_DIR.

Fixes: 0a0a8f12fa1b ("Read configuration files from /etc and /usr")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
22 months agofix set-not-used warnings
Stephen Hemminger [Sun, 17 Sep 2023 17:04:55 +0000 (10:04 -0700)] 
fix set-not-used warnings

Building with clang and warnings enabled finds several
places where variable was set but not used.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
22 months agouapi: headers update from 6.6-rc2
Stephen Hemminger [Fri, 15 Sep 2023 17:23:02 +0000 (10:23 -0700)] 
uapi: headers update from 6.6-rc2

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
22 months agotc: add missing space before else
Stephen Hemminger [Fri, 15 Sep 2023 16:46:21 +0000 (09:46 -0700)] 
tc: add missing space before else

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
22 months agoMerge branch 'configurable-color' into next
David Ahern [Thu, 14 Sep 2023 15:21:45 +0000 (09:21 -0600)] 
Merge branch 'configurable-color' into next

Andrea Claudi  says:

====================

This series add support for the color parameter in iproute2 configure
script. The idea is to make it possible for iproute2 users and packagers
to set a default value for the color option different from the current
one, COLOR_OPT_NEVER, while maintaining the current default behaviour.

Patch 1 add the color option to the configure script. Users can set
three different values, never, auto and always, with the same meanings
they have for the -c / -color ip option. Default value is 'never', which
results in ip, tc and bridge to maintain their current output behaviour
(i.e. colorless output).

Patch 2 makes it possible for ip, tc and bridge to use the configured
value for color as their default color output.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agotreewide: use configured value as the default color output
Andrea Claudi [Wed, 13 Sep 2023 17:58:26 +0000 (19:58 +0200)] 
treewide: use configured value as the default color output

With Makefile providing -DCONF_COLOR, we can use its value as the
default color output.

This effectively allow users and packagers to define a default for the
color output feature without using shell aliases, and with minimum code
impact.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agoconfigure: add the --color option
Andrea Claudi [Wed, 13 Sep 2023 17:58:25 +0000 (19:58 +0200)] 
configure: add the --color option

This commit allows users/packagers to choose a default for the color
output feature provided by some iproute2 tools.

The configure script option is documented in the script itself and it is
pretty much self-explanatory. The default value is set to "never" to
avoid changes to the current ip, tc, and bridge behaviour.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agovdpa: consume device_features parameter
Allen Hubbe [Mon, 11 Sep 2023 18:08:15 +0000 (11:08 -0700)] 
vdpa: consume device_features parameter

Consume the parameter to device_features when parsing command line
options.  Otherwise the parameter may be used again as an option name.

 # vdpa dev add ... device_features 0xdeadbeef mac 00:11:22:33:44:55
 Unknown option "0xdeadbeef"

Fixes: a4442ce58ebb ("vdpa: allow provisioning device features")
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agovdpa: consume device_features parameter
Allen Hubbe [Mon, 11 Sep 2023 18:08:15 +0000 (11:08 -0700)] 
vdpa: consume device_features parameter

Consume the parameter to device_features when parsing command line
options.  Otherwise the parameter may be used again as an option name.

 # vdpa dev add ... device_features 0xdeadbeef mac 00:11:22:33:44:55
 Unknown option "0xdeadbeef"

Fixes: a4442ce58ebb ("vdpa: allow provisioning device features")
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
22 months agoMerge branch 'devlink-dump-selector' into next
David Ahern [Mon, 11 Sep 2023 15:19:48 +0000 (09:19 -0600)] 
Merge branch 'devlink-dump-selector' into next

Jiri Pirko  says:

====================

From: Jiri Pirko <jiri@nvidia.com>

First 5 patches are preparations for the last one.

Motivation:

For SFs, one devlink instance per SF is created. There might be
thousands of these on a single host. When a user needs to know port
handle for specific SF, he needs to dump all devlink ports on the host
which does not scale good.

Solution:

Allow user to pass devlink handle (and possibly other attributes)
alongside the dump command and dump only objects which are matching
the selection.

Example:
$ devlink port show
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false

$ devlink port show auxiliary/mlx5_core.eth.0
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false

$ devlink port show auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agodevlink: implement dump selector for devlink objects show commands
Jiri Pirko [Wed, 6 Sep 2023 11:11:13 +0000 (13:11 +0200)] 
devlink: implement dump selector for devlink objects show commands

Introduce a new helper dl_argv_parse_with_selector() to be used
by show() functions instead of dl_argv().

Implement it to check if all needed options got get commands are
specified. In case they are not, ask kernel for dump passing only
the options (attributes) that are present, creating sort of partial
key to instruct kernel to do partial dump.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agomnl_utils: introduce a helper to check if dump policy exists for command
Jiri Pirko [Wed, 6 Sep 2023 11:11:12 +0000 (13:11 +0200)] 
mnl_utils: introduce a helper to check if dump policy exists for command

Benefit from GET_POLICY command of ctrl netlink and introduce a helper
that dumps policies and finds out, if there is a separate policy
specified for dump op of specified command.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agodevlink: return -ENOENT if argument is missing
Jiri Pirko [Wed, 6 Sep 2023 11:11:11 +0000 (13:11 +0200)] 
devlink: return -ENOENT if argument is missing

In preparation to the follow-up dump selector patch, make sure that the
command line arguments parsing function returns -ENOENT in case the
option is missing so the caller can distinguish.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agodevlink: implement command line args dry parsing
Jiri Pirko [Wed, 6 Sep 2023 11:11:10 +0000 (13:11 +0200)] 
devlink: implement command line args dry parsing

In preparation to the follow-up dump selector patch, introduce function
dl_argv_dry_parse() which allows to do dry parsing of command line
arguments without printing out any error messages to the user.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agodevlink: make parsing of handle non-destructive to argv
Jiri Pirko [Wed, 6 Sep 2023 11:11:09 +0000 (13:11 +0200)] 
devlink: make parsing of handle non-destructive to argv

Currently, handle parsing is destructive as the "\0" string ends are
being put in certain positions during parsing. That prevents it from
being used repeatedly. This is problematic with the follow-up patch
implementing dry-parsing. Fix by making a copy of handle argv during
parsing.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agodevlink: move DL_OPT_SB into required options
Jiri Pirko [Wed, 6 Sep 2023 11:11:08 +0000 (13:11 +0200)] 
devlink: move DL_OPT_SB into required options

This is basically a cosmetic change. The SB index is not required to be
passed by user and implicitly index 0 is used. This is ensured by
special treating at the end of dl_argv_parse(). Move this option from
optional to required options.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agotc: fix several typos in netem's usage string
François Michel [Thu, 31 Aug 2023 14:01:32 +0000 (16:01 +0200)] 
tc: fix several typos in netem's usage string

Add missing brackets and surround brackets by single spaces
in the netem usage string.
Also state the P14 argument as optional.

Signed-off-by: François Michel <francois.michel@uclouvain.be>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agoMerge remote-tracking branch 'main' into next
David Ahern [Mon, 11 Sep 2023 15:14:18 +0000 (09:14 -0600)] 
Merge remote-tracking branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
22 months agov6.5.0 v6.5.0
Stephen Hemminger [Wed, 6 Sep 2023 16:26:52 +0000 (09:26 -0700)] 
v6.5.0

22 months agoiplink_bridge: fix incorrect root id dump
Hangbin Liu [Fri, 1 Sep 2023 08:02:26 +0000 (16:02 +0800)] 
iplink_bridge: fix incorrect root id dump

Fix the typo when dump root_id.

Fixes: 70dfb0b8836d ("iplink: bridge: export bridge_id and designated_root")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
23 months agotc: fix typo in netem's usage string
François Michel [Wed, 30 Aug 2023 15:05:21 +0000 (17:05 +0200)] 
tc: fix typo in netem's usage string

Fixes a misplaced newline in netem's usage string.

Signed-off-by: François Michel <francois.michel@uclouvain.be>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
23 months agoMerge remote-tracking branch 'main' into next
David Ahern [Tue, 29 Aug 2023 02:54:04 +0000 (20:54 -0600)] 
Merge remote-tracking branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
23 months agoman: tc-netem: add section for specifying the netem seed
François Michel [Wed, 23 Aug 2023 10:01:10 +0000 (12:01 +0200)] 
man: tc-netem: add section for specifying the netem seed

Signed-off-by: François Michel <francois.michel@uclouvain.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
23 months agotc: support the netem seed parameter for loss and corruption events
François Michel [Wed, 23 Aug 2023 10:01:09 +0000 (12:01 +0200)] 
tc: support the netem seed parameter for loss and corruption events

Signed-off-by: François Michel <francois.michel@uclouvain.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
23 months agoUpdate kernel headers
David Ahern [Tue, 29 Aug 2023 02:51:44 +0000 (20:51 -0600)] 
Update kernel headers

Update kernel headers to commit:
    6c9cfb853063 ("net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show")

Signed-off-by: David Ahern <dsahern@kernel.org>
23 months agoMerge branch 'vrf-exec-selinux' into next
David Ahern [Fri, 25 Aug 2023 00:38:58 +0000 (17:38 -0700)] 
Merge branch 'vrf-exec-selinux' into next

Andrea Claudi  says:

====================

In order to execute a service with VRF, a user should start it using
"ip vrf exec". For example, using systemd, the user can encapsulate the
ExecStart command in ip vrf exec as shown below:

ExecStart=/usr/sbin/ip vrf exec vrf1 /usr/sbin/httpd $OPTIONS -DFOREGROUND

Assuming SELinux is in permissive mode, starting the service with the
current ip vrf implementation results in:

 # systemctl start httpd
 # ps -eafZ | grep httpd
system_u:system_r:ifconfig_t:s0 root      597448       1  1 19:22 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
system_u:system_r:ifconfig_t:s0 apache    597452  597448  0 19:22 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
[snip]

This is incorrect, as the context for httpd should be httpd_t, not
ifconfig_t.

This happens because ipvrf_exec invokes cmd_exec without setting the
correct SELinux context before. Without the correct setting, the process
is executed using ip's SELinux context.

This patch series makes "ip vrf exec" SELinux-aware using the
setexecfilecon functions, which retrieves the correct context to be used
on the next execvp() call.

After this series:
 # systemctl start httpd
 # ps -eafZ | grep httpd
system_u:system_r:httpd_t:s0    root      595805       1  0 19:01 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
system_u:system_r:httpd_t:s0    apache    595809  595805  0 19:01 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
23 months agoip vrf: make ipvrf_exec SELinux-aware
Andrea Claudi [Wed, 23 Aug 2023 17:30:02 +0000 (19:30 +0200)] 
ip vrf: make ipvrf_exec SELinux-aware

When using ip vrf and SELinux is enabled, make sure to set the exec file
context before calling cmd_exec.

This ensures that the command is executed with the right context,
falling back to the ifconfig_t context when needed.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
23 months agolib: add SELinux include and stub functions
Andrea Claudi [Wed, 23 Aug 2023 17:30:01 +0000 (19:30 +0200)] 
lib: add SELinux include and stub functions

ss provides some selinux stub functions, useful when iproute2 is
compiled without selinux support.

Move them to lib/ so we can use them in other iproute2 tools.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
23 months agoss: make SELinux stub functions conformant to API definitions
Andrea Claudi [Wed, 23 Aug 2023 17:30:00 +0000 (19:30 +0200)] 
ss: make SELinux stub functions conformant to API definitions

getfilecon() and security_get_initial_context() use the const qualifier
for their first paramater in SELinux APIs.

This commit adds the const qualifier to these functions, making them
conformant to API definitions.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
23 months agoss: make is_selinux_enabled stub work like in SELinux
Andrea Claudi [Wed, 23 Aug 2023 17:29:59 +0000 (19:29 +0200)] 
ss: make is_selinux_enabled stub work like in SELinux

From the is_selinux_enabled() manpage:

is_selinux_enabled() returns 1 if SELinux is running or 0 if it is not.

This makes the is_selinux_enabled() stub functions works exactly like
the SELinux function it is supposed to replace.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
23 months agoss: mptcp: print missing info counters
Matthieu Baerts [Wed, 23 Aug 2023 07:24:08 +0000 (09:24 +0200)] 
ss: mptcp: print missing info counters

These new counters have been added in different kernel versions:

- v5.12: local_addr_used, local_addr_max

- v5.13: csum_enabled

- v6.5: retransmits, bytes_retrans, bytes_sent, bytes_received,
  bytes_acked

It is interesting to display them if they are available.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/415
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
23 months agoss: mptcp: display seq related counters as decimal
Matthieu Baerts [Wed, 23 Aug 2023 07:24:07 +0000 (09:24 +0200)] 
ss: mptcp: display seq related counters as decimal

This is aligned with what is printed for TCP sockets.

The main difference here is that these counters can be larger (u32 vs
u64) but WireShark and TCPDump are also printing these MPTCP counters as
decimal and they look fine.

So it sounds better to do the same here with ss for those who want to
easily count how many bytes have been exchanged between two runs without
having to think in hexa.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>