]> git.ipfire.org Git - thirdparty/iproute2.git/log
thirdparty/iproute2.git
2 years agoip-monitor: Fix the selection of rtnl groups when listening for all object types
Benjamin Poirier [Thu, 22 Sep 2022 06:19:38 +0000 (15:19 +0900)] 
ip-monitor: Fix the selection of rtnl groups when listening for all object types

Currently, when using `ip monitor`, family-specific rtnl multicast groups
(ex. RTNLGRP_IPV4_IFADDR) are used when specifying the '-family' option (or
one of its short forms) and an object type is specified (ex. `ip -4 monitor
addr`) but not when listening for changes to all object types (ex. `ip -4
monitor`). In that case, multicast groups for all families, regardless of
the '-family' option, are used. Depending on the object type, this leads to
ignoring the '-family' selection (MROUTE, ADDR, NETCONF), or printing stray
prefix headers with no event (ROUTE, RULE).

Rewrite the parameter parsing code so that per-family rtnl multicast groups
are selected in all cases.

The issue can be witnessed while running `ip -4 monitor label` at the same
time as the following command:
ip link add dummy0 address 02:00:00:00:00:01 up type dummy
The output includes:
[ROUTE][ROUTE][ADDR]9: dummy0    inet6 fe80::ff:fe00:1/64 scope link
       valid_lft forever preferred_lft forever
Notice the stray "[ROUTE]" labels (related to filtered out ipv6 routes) and
the ipv6 ADDR entry. Those do not appear if using `ip -4 monitor label
route address`.

Fixes: aba5acdfdb34 ("(Logical change 1.3)")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agoip-monitor: Include stats events in default and "all" cases
Benjamin Poirier [Thu, 22 Sep 2022 06:19:37 +0000 (15:19 +0900)] 
ip-monitor: Include stats events in default and "all" cases

It seems that stats were omitted from `ip monitor` and `ip monitor all`.
Since all other event types are included, include stats as well. Use the
same logic as for nexthops.

Fixes: a05a27c07cbf ("ipmonitor: Add monitoring support for stats events")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agoip-monitor: Do not listen for nexthops by default when specifying stats
Benjamin Poirier [Thu, 22 Sep 2022 06:19:36 +0000 (15:19 +0900)] 
ip-monitor: Do not listen for nexthops by default when specifying stats

`ip monitor stats` listens for changes to nexthops and stats. It should
listen for stats only.

Fixes: a05a27c07cbf ("ipmonitor: Add monitoring support for stats events")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agobridge: Do not print stray prefixes in monitor mode
Benjamin Poirier [Thu, 22 Sep 2022 06:19:35 +0000 (15:19 +0900)] 
bridge: Do not print stray prefixes in monitor mode

When using `bridge monitor` with the '-timestamp' option or the "all"
parameter, prefixes are printed before the actual event descriptions.
Currently, those prefixes are printed for each netlink message that's
received. However, some netlink messages do not lead to an event
description being printed. That's usually because a message is not related
to AF_BRIDGE. This results in stray prefixes being printed.

Restructure accept_msg() and its callees such that prefixes are only
printed after a message has been checked for eligibility.

The issue can be witnessed using the following commands:
ip link add dummy0 type dummy
# Start `bridge monitor all` now in another terminal.
# Cause a stray "[LINK]" to be printed (family 10).
# It does not appear yet because the output is line buffered.
ip link set dev dummy0 up
# Cause a stray "[NEIGH]" to be printed (family 2).
ip neigh add 10.0.0.1 lladdr 02:00:00:00:00:01 dev dummy0
# Cause a genuine entry to be printed, which flushes the previous
# output.
bridge fdb add 02:00:00:00:00:01 dev dummy0
# We now see:
# [LINK][NEIGH][NEIGH]02:00:00:00:00:01 dev dummy0 self permanent

Fixes: d04bc300c3e3 ("Add bridge command")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agouapi: update of if_tun.h
Stephen Hemminger [Fri, 30 Sep 2022 19:35:48 +0000 (12:35 -0700)] 
uapi: update of if_tun.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agortnetlink: add new function rtnl_echo_talk()
Hangbin Liu [Thu, 29 Sep 2022 08:10:16 +0000 (16:10 +0800)] 
rtnetlink: add new function rtnl_echo_talk()

Add a new function rtnl_echo_talk() that could be used when the
sub-component supports NLM_F_ECHO flag. With this function we can
remove the redundant code added by commit b264b4c6568c7 ("ip: add
NLM_F_ECHO support").

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: fix typo in variable name in ifname_map_cb()
Jiri Pirko [Thu, 29 Sep 2022 10:24:36 +0000 (12:24 +0200)] 
devlink: fix typo in variable name in ifname_map_cb()

s/port_ifindex/port_index/

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: move use_iec into struct dl
Jiri Pirko [Thu, 29 Sep 2022 10:24:35 +0000 (12:24 +0200)] 
devlink: move use_iec into struct dl

Similar to other bool opts that could be set by the user, move the
global variable use_iec to be part of struct dl.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agotc/tc_monitor: print netlink extack message
Hangbin Liu [Tue, 27 Sep 2022 10:21:07 +0000 (18:21 +0800)] 
tc/tc_monitor: print netlink extack message

Upstream commit "sched: add extack for tfilter_notify" will make
tc event contain extack message, which could be used for logging
offloading failures. Let's print the extack message in tc monitor.
e.g.

  # tc monitor
  added chain dev enp3s0f1np1 parent ffff: chain 0
  added filter dev enp3s0f1np1 ingress protocol all pref 49152 flower chain 0 handle 0x1
    ct_state +trk+new
    not_in_hw
          action order 1: gact action drop
           random type none pass val 0
           index 1 ref 1 bind 1

  Warning: mlx5_core: matching on ct_state +new isn't supported.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agolibnetlink: add offset for nl_dump_ext_ack_done
Hangbin Liu [Tue, 27 Sep 2022 10:21:06 +0000 (18:21 +0800)] 
libnetlink: add offset for nl_dump_ext_ack_done

There is no rule to have an error code after NLMSG_DONE msg. The only reason
we has this offset is that kernel function netlink_dump_done() has an error
code followed by the netlink message header.

Making nl_dump_ext_ack_done() has an offset parameter. So we can adjust
this for NLMSG_DONE message without error code.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoip link: add sub-command to view and change DSA conduit interface
Vladimir Oltean [Thu, 22 Sep 2022 22:06:55 +0000 (01:06 +0300)] 
ip link: add sub-command to view and change DSA conduit interface

Support the "dsa" kind of rtnl_link_ops exported by the kernel, and
export reads/writes to IFLA_DSA_MASTER.

Examples:

$ ip link set swp0 type dsa conduit eth1

$ ip -d link show dev swp0
    (...)
    dsa conduit eth0

$ ip -d -j link show swp0
[
{
"link": "eth1",
"linkinfo": {
"info_kind": "dsa",
"info_data": {
"conduit": "eth1"
}
},
}
]

Note that by construction and as shown in the example, the IFLA_LINK
reported by a DSA user port is identical to what is reported through
IFLA_DSA_MASTER. However IFLA_LINK is not writable, and overloading its
meaning to make it writable would clash with other users of IFLA_LINK
(vlan etc) for which writing this property does not make sense.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agolink: display 'allmulti' counter
Nicolas Dichtel [Mon, 19 Sep 2022 08:31:36 +0000 (10:31 +0200)] 
link: display 'allmulti' counter

This counter is based on the same principle that the 'promiscuity' counter:
the flag ALLMULTI is displayed only when it is explicitly requested by the
userland. This counter enables to know if 'allmulti' is configured on an
interface.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoip: add NLM_F_ECHO support
Hangbin Liu [Fri, 16 Sep 2022 03:34:28 +0000 (11:34 +0800)] 
ip: add NLM_F_ECHO support

When user space configures the kernel with netlink messages, it can set the
NLM_F_ECHO flag to request the kernel to send the applied configuration back
to the caller. This allows user space to retrieve configuration information
that are filled by the kernel (either because these parameters can only be
set by the kernel or because user space let the kernel choose a default
value).

NLM_F_ACK is also supplied incase the kernel doesn't support NLM_F_ECHO
and we will wait for the reply forever. Just like the update in
iplink.c, which I plan to post a patch to kernel later.

A new parameter -echo is added when user want to get feedback from kernel.
e.g.

  # ip -echo addr add 192.168.0.1/24 dev eth1
  3: eth1    inet 192.168.0.1/24 scope global eth1
         valid_lft forever preferred_lft forever
  # ip -j -p -echo addr del 192.168.0.1/24 dev eth1
  [ {
          "deleted": true,
          "index": 3,
          "dev": "eth1",
          "family": "inet",
          "local": "192.168.0.1",
          "prefixlen": 24,
          "scope": "global",
          "label": "eth1",
          "valid_life_time": 4294967295,
          "preferred_life_time": 4294967295
      } ]

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoseg6: add support for flavors in SRv6 End* behaviors
Paolo Lungaroni [Mon, 12 Sep 2022 17:39:23 +0000 (19:39 +0200)] 
seg6: add support for flavors in SRv6 End* behaviors

As described in RFC 8986 [1], processing operations carried out by SRv6
End, End.X and End.T (End* for short) behaviors can be modified or
extended using the "flavors" mechanism. This patch adds the support for
PSP,USP,USD flavors (defined in [1]) and for NEXT-C-SID flavor (defined
in [2]) in SRv6 End* behaviors. Specifically, we add a new optional
attribute named "flavors" that can be leveraged by the user to enable
specific flavors while creating an SRv6 End* behavior instance.
Multiple flavors can be specified together by separating them using
commas.

If a specific flavor (or a combination of flavors) is not supported by the
underlying Linux kernel, an error message is reported to the user and the
creation of the specific behavior instance is aborted.

When the flavors attribute is omitted, the regular SRv6 End* behavior is
performed.

Flavors such as PSP, USP and USD do not accept additional configuration
attributes. Conversely, the NEXT-C-SID flavor can be configured to support
user-provided Locator-Block and Locator-Node Function lengths using,
respectively, the lblen and the nflen attributes.

Both lblen and nflen values must be evenly divisible by 8 and their sum
must not exceed 128 bit (i.e. the C-SID container size).

If the lblen attribute is omitted, the default value chosen by the Linux
kernel is 32-bit. If the nflen attribute is omitted, the default value
chosen by the Linux kernel is 16-bit.

Some examples:
ip -6 route add 2001:db8::1 encap seg6local action End flavors next-csid dev eth0
ip -6 route add 2001:db8::2 encap seg6local action End flavors next-csid lblen 48 nflen 16 dev eth0

Standard Output:
ip -6 route show 2001:db8::2
2001:db8::2  encap seg6local action End flavors next-csid lblen 48 nflen 16 dev eth0 metric 1024 pref medium

JSON Output:
ip -6 -j -p route show 2001:db8::2
[ {
        "dst": "2001:db8::2",
        "encap": "seg6local",
        "action": "End",
        "flavors": [ "next-csid" ],
        "lblen": 48,
        "nflen": 16,
        "dev": "eth0",
        "metric": 1024,
        "flags": [ ],
        "pref": "medium"
} ]

[1] - https://datatracker.ietf.org/doc/html/rfc8986
[2] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression

Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoUpdate kernel headers
David Ahern [Thu, 22 Sep 2022 22:50:08 +0000 (15:50 -0700)] 
Update kernel headers

Update kernel headers to commit:
    0140a7168f8b ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")

Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agouapi: update bpf and virtio_net
Stephen Hemminger [Tue, 20 Sep 2022 22:58:41 +0000 (15:58 -0700)] 
uapi: update bpf and virtio_net

Update headers based on 6.0-rc6

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agomacsec: add user manual description for extended packet number feature
Emeel Hakim [Sun, 11 Sep 2022 09:26:56 +0000 (12:26 +0300)] 
macsec: add user manual description for extended packet number feature

Update the user manual describing how to use extended packet number (XPN)
feature for macsec. As part of configuring XPN, providing ssci and salt is
required hence update user manual on  how to provide the above as part of
the ip macsec command.

Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agomacsec: add Extended Packet Number support
Emeel Hakim [Sun, 11 Sep 2022 09:26:55 +0000 (12:26 +0300)] 
macsec: add Extended Packet Number support

This patch adds support for extended packet number (XPN).
XPN can be configured by passing 'cipher gcm-aes-xpn-128' as part of
the ip link add command using macsec type.
In addition, using 'xpn' keyword instead of the 'pn', passing a 12
bytes salt using the 'salt' keyword and passing short secure channel
id (ssci) using the 'ssci' keyword as part of the ip macsec command
is required (see example).

e.g:

create a MACsec device on link eth0 with enabled xpn
  # ip link add link eth0 macsec0 type macsec port 11
encrypt on cipher gcm-aes-xpn-128

configure a secure association on the device
  # ip macsec add macsec0 tx sa 0 xpn 1024 on ssci 5
salt 838383838383838383838383
key 01 81818181818181818181818181818181

configure a secure association on the device with ssci = 5
  # ip macsec add macsec0 tx sa 0 xpn 1024 on ssci 5
salt 838383838383838383838383
key 01 82828282828282828282828282828282

Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoman: devlink-region(8): document the 'new' subcommand
Baruch Siach [Fri, 2 Sep 2022 05:01:17 +0000 (08:01 +0300)] 
man: devlink-region(8): document the 'new' subcommand

Some driver provide no region snapshot unless created first with the
'new' operation. Add documentation and example.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agodevlink: fix region-new usage message
Baruch Siach [Mon, 29 Aug 2022 07:49:13 +0000 (10:49 +0300)] 
devlink: fix region-new usage message

The snapshot parameter is optional.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agoutils: extract CTRL_ATTR_MAXATTR and save it
Jacob Keller [Fri, 26 Aug 2022 18:17:41 +0000 (11:17 -0700)] 
utils: extract CTRL_ATTR_MAXATTR and save it

mnlu_gen_socket_open opens a socket and configures it for use with a
generic netlink family. As part of this process it sends a
CTRL_CMD_GETFAMILY to get the ID for the family name requested.

In addition to the family id, this command reports a few other useful
values including the maximum attribute. The maximum attribute is useful in
order to know whether a given attribute is supported and for knowing the
necessary size to allocate for other operations such as policy dumping.

Since we already have to issue a CTRL_CMD_GETFAMILY to get the id, we can
also store the maximum attribute as well. Modify the callback functions to
parse the maximum attribute NLA and store it in the mnlu_gen_socket
structure.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agomnlg: remove unnused mnlg_socket structure
Jacob Keller [Fri, 26 Aug 2022 18:17:40 +0000 (11:17 -0700)] 
mnlg: remove unnused mnlg_socket structure

Commit 62ff25e51bb6 ("devlink: Use generic socket helpers from library")
removed all of the users of struct mnlg_socket, but didn't remove the
structure itself. Fix that.

Fixes: 62ff25e51bb6 ("devlink: Use generic socket helpers from library")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoUpdate kernel headers
David Ahern [Thu, 1 Sep 2022 02:42:52 +0000 (20:42 -0600)] 
Update kernel headers

Update kernel headers to commit:
    cb45a8bf4693 ("net: axienet: Switch to 64-bit RX/TX statistics")

Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: fix parallel flash notifications processing
Jiri Pirko [Thu, 25 Aug 2022 08:04:20 +0000 (10:04 +0200)] 
devlink: fix parallel flash notifications processing

Now that it is possible to flash multiple devlink instances in parallel,
the notification processing callback needs to count in the fact that it
receives message that belongs to different devlink instance. So handle
the it gracefully and don't error out.

Reported-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: load port-ifname map on demand
Jiri Pirko [Thu, 25 Aug 2022 08:04:19 +0000 (10:04 +0200)] 
devlink: load port-ifname map on demand

So far, the port-ifname map was loaded during devlink init
no matter if actually needed or not. Port dump cmd which is utilized
for this in kernel takes lock for every devlink instance.
That may lead to unnecessary blockage of command.

Load the map only in time it is needed to lookup ifname.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoman: fix a typo in devlink-dev(8)
Denis Ovsienko [Wed, 31 Aug 2022 17:07:25 +0000 (10:07 -0700)] 
man: fix a typo in devlink-dev(8)

Signed-off-by: Denis Ovsienko <denis@ovsienko.info>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agouapi: update headers for xfrm and virtio_ring.h
Stephen Hemminger [Wed, 31 Aug 2022 16:37:13 +0000 (09:37 -0700)] 
uapi: update headers for xfrm and virtio_ring.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agoMerge branch 'devlink-rm-dl_argv_parse_put' into next
David Ahern [Wed, 24 Aug 2022 15:54:16 +0000 (08:54 -0700)] 
Merge branch 'devlink-rm-dl_argv_parse_put' into next

Jacob Keller  says:

====================

This series removes the dl_argv_parse_put function which both parses the
command line arguments and places them into the netlink header.

This was originally sent as an RFC at
https://lore.kernel.org/netdev/20220805234155.2878160-1-jacob.e.keller@intel.com/

Since there is some ongoing work around policy code being generated from
YAML, I thought it best to wait on the devlink policy portion of this series
for now.

Jiri mentioned he wanted to base some work on top of this, so I am sending
just the cleanup patches.

The primary motivation for this is due to the fact that dl_argv_parse_put
requires a netlink header, meaning a command must have already been
prepared. This prevents addition of a different netlink command to get the
policy data, and thus prevents us from using this variant while checking
netlink policy.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: remove dl_argv_parse_put
Jacob Keller [Thu, 18 Aug 2022 21:15:21 +0000 (14:15 -0700)] 
devlink: remove dl_argv_parse_put

The dl_argv_parse_put function is used to extract arguments from the
command line and convert them to the appropriate netlink attributes. This
function is a combination of calling dl_argv_parse and dl_put_opts.

A future change is going to refactor dl_argv_parse to check the kernel's
netlink policy for the command. This requires issuing another netlink
message which requires calling dl_argv_parse before
mnlu_gen_socket_cmd_prepare. Otherwise, the get policy command issued in
dl_argv_parse would overwrite the prepared buffer.

This conflicts with dl_argv_parse_put which requires being called after
mnlu_gen_socket_cmd_prepare.

Remove dl_argv_parse_put and replace it with appropriate calls to
dl_argv_parse and dl_put_opts. This allows us to ensure dl_argv_parse is
called before mnlu_gen_socket_cmd_prepare while dl_put_opts is called
afterwards.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: use dl_no_arg instead of checking dl_argc == 0
Jacob Keller [Thu, 18 Aug 2022 21:15:20 +0000 (14:15 -0700)] 
devlink: use dl_no_arg instead of checking dl_argc == 0

Use the helper dl_no_arg function to check for whether the command has any
arguments.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agouapi: update headers from 6.0-rc1
Stephen Hemminger [Mon, 15 Aug 2022 02:25:21 +0000 (19:25 -0700)] 
uapi: update headers from 6.0-rc1

These are the post-merge of netwoking user headers.
Note: this fixes compilation with gcc-12

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agovdpa: fix statistics API mismatch
Stephen Hemminger [Mon, 15 Aug 2022 02:22:37 +0000 (19:22 -0700)] 
vdpa: fix statistics API mismatch

The final vdpa.h header from upstream has slightly different
definition of VDPA stats get.

Fixes: 6f97e9c9337b ("vdpa: Add support for reading vdpa device statistics")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: expose nested devlink for a line card object
Jiri Pirko [Tue, 9 Aug 2022 13:17:30 +0000 (15:17 +0200)] 
devlink: expose nested devlink for a line card object

If line card object contains a nested devlink, expose it.

Example:

$ devlink lc show pci/0000:01:00.0 lc 1
pci/0000:01:00.0:
  lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0
    supported_types:
      16x100G
$ devlink dev show auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoMerge branch 'main' into next
David Ahern [Sun, 14 Aug 2022 17:31:10 +0000 (11:31 -0600)] 
Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoconfigure: Define _GNU_SOURCE when checking for setns
Khem Raj [Thu, 11 Aug 2022 05:34:40 +0000 (22:34 -0700)] 
configure: Define _GNU_SOURCE when checking for setns

glibc defines this function only as gnu extention

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoipstats: add missing headers
Stephen Hemminger [Tue, 9 Aug 2022 20:27:33 +0000 (13:27 -0700)] 
ipstats: add missing headers

IWYU reports several headers are not explicitly
included by ipstats.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoipstats: Add param.h for musl
Changhyeok Bae [Tue, 9 Aug 2022 04:01:05 +0000 (04:01 +0000)] 
ipstats: Add param.h for musl

Fix build error for musl
| /usr/src/debug/iproute2/5.19.0-r0/iproute2-5.19.0/ip/ipstats.c:231: undefined reference to `MIN'

Signed-off-by: Changhyeok Bae <changhyeok.bae@gmail.com>
3 years agoMerge branch 'main' into next
David Ahern [Thu, 4 Aug 2022 18:38:31 +0000 (12:38 -0600)] 
Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: add support for running selftests
Vikas Gupta [Thu, 4 Aug 2022 09:18:02 +0000 (14:48 +0530)] 
devlink: add support for running selftests

Add commands and helper APIs to run selftests.
Include a selftest id for a non volatile memory i.e. flash.
Also, update the man page and bash-completion for selftests
commands.

Examples:
$ devlink dev selftests run pci/0000:03:00.0 id flash
pci/0000:03:00.0:
    flash:
      status passed

$ devlink dev selftests show pci/0000:03:00.0
pci/0000:03:00.0
      flash

$ devlink dev selftests show pci/0000:03:00.0 -j
{"selftests":{"pci/0000:03:00.0":["flash"]}}

$ devlink dev selftests run pci/0000:03:00.0 id flash -j
{"selftests":{"pci/0000:03:00.0":{"flash":{"status":"passed"}}}}

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agov5.19.0 v5.19.0
Stephen Hemminger [Tue, 2 Aug 2022 18:36:33 +0000 (11:36 -0700)] 
v5.19.0

3 years agoMerge branch 'main' into next
David Ahern [Mon, 1 Aug 2022 15:42:31 +0000 (09:42 -0600)] 
Merge branch 'main' into next

Conflicts:
vdpa/include/uapi/linux/vdpa.h

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoseg6: add support for SRv6 Headend Reduced Encapsulation
Paolo Lungaroni [Wed, 27 Jul 2022 18:59:15 +0000 (20:59 +0200)] 
seg6: add support for SRv6 Headend Reduced Encapsulation

This patch adds the support for the reduced version of the H.Encaps and
H.L2Encaps behaviors as defined in RFC 8986 [1].

H.Encaps.Red and H.L2Encaps.Red SRv6 behaviors are an optimization of the
H.Encaps and H.L2Encaps aiming to reduce the length of the SID List carried
in the pushed SRH. Specifically, the reduced version of the behaviors
removes the first SID contained in the SID List (i.e. SRv6 Policy) by
storing it into the IPv6 Destination Address. When SRv6 Policy is made of
only one SID, the reduced version of the behaviors omits the SRH at all and
pushes that SID directly into the IPv6 DA.

Some examples:
ip -6 route add 2001:db8::1 encap seg6 mode encap.red segs fcf0:1::e,fcf0:2::d6 dev eth0
ip -6 route add 2001:db8::2 encap seg6 mode l2encap.red segs fcf0:1::d2 dev eth0

Standard Output:
ip -6 route show 2001:db8::1
2001:db8::1  encap seg6 mode encap.red segs 2 [ fcf0:1::e fcf0:2::d6 ] dev eth0 metric 1024 pref medium

JSON Output:
ip -6 -j -p route show 2001:db8::1
[ {
        "dst": "2001:db8::1",
        "encap": "seg6",
        "mode": "encap.red",
        "segs": [ "fcf0:1::e","fcf0:2::d6" ],
        "dev": "eth0",
        "metric": 1024,
        "flags": [ ],
        "pref": "medium"
    } ]

[1] - https://datatracker.ietf.org/doc/html/rfc8986

Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Sat, 30 Jul 2022 16:29:01 +0000 (10:29 -0600)] 
Update kernel headers

Update kernel headers to commit
63757225a933 ("Merge tag 'mlx5-updates-2022-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoMerge branch 'pppoe-in-flower' into next
David Ahern [Fri, 29 Jul 2022 17:25:14 +0000 (11:25 -0600)] 
Merge branch 'pppoe-in-flower' into next

Wojciech Drewek  says:

====================

This patchset implements support for matching
on PPPoE specific fields using tc-flower.
First patch introduces small refactor which allows
to use same mechanism of finding protocol for
both ppp and ether protocols. Second patch
adds support for parsing ppp protocols.
Last patch is about parsing PPPoE fields.

Kernel changes (merged):
https://lore.kernel.org/netdev/20220726203133.2171332-1-anthony.l.nguyen@intel.com/T/#t

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agof_flower: Introduce PPPoE support
Wojciech Drewek [Fri, 29 Jul 2022 08:50:35 +0000 (10:50 +0200)] 
f_flower: Introduce PPPoE support

Introduce PPPoE specific fields in tc-flower:
- session id (16 bits)
- ppp protocol (16 bits)
Those fields can be provided only when protocol was set to
ETH_P_PPP_SES. ppp_proto works similar to vlan_ethtype, i.e.
ppp_proto overwrites eth_type. Thanks to that, fields from
encapsulated protocols (such as src_ip) can be specified.

e.g.
  # tc filter add dev ens6f0 ingress prio 1 protocol ppp_ses \
      flower \
        pppoe_sid 1234 \
        ppp_proto ip \
        dst_ip 127.0.0.1 \
        src_ip 127.0.0.2 \
      action drop

Vlan and cvlan is also supported, in this case cvlan_ethtype
or vlan_ethtype has to be set to ETH_P_PPP_SES.

e.g.
  # tc filter add dev ens6f0 ingress prio 1 protocol 802.1Q \
      flower \
        vlan_id 2 \
        vlan_ethtype ppp_ses \
        pppoe_sid 1234 \
        ppp_proto ip \
        dst_ip 127.0.0.1 \
        src_ip 127.0.0.2 \
      action drop

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agolib: Introduce ppp protocols
Wojciech Drewek [Fri, 29 Jul 2022 08:50:34 +0000 (10:50 +0200)] 
lib: Introduce ppp protocols

PPP protocol field uses different values than ethertype. Introduce
utilities for translating PPP protocols from strings to values
and vice versa. Use generic API from utils in order to get
proto id and name.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agolib: refactor ll_proto functions
Wojciech Drewek [Fri, 29 Jul 2022 08:50:33 +0000 (10:50 +0200)] 
lib: refactor ll_proto functions

Move core logic of ll_proto_n2a and ll_proto_a2n
to utils.c and make it more generic by allowing to
pass table of protocols as argument (proto_tb).
Introduce struct proto with protocol ID and name to
allow this. This wil allow to use those functions by
other use cases.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoImport posix_types.h uapi file from point of last kernel headers sync
David Ahern [Fri, 29 Jul 2022 17:21:52 +0000 (11:21 -0600)] 
Import posix_types.h uapi file from point of last kernel headers sync

__kernel_old_time_t definition is needed for pppoe-in-flower patches.

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoImport ppp_defs.h uapi file from point of last kernel headers sync
David Ahern [Thu, 28 Jul 2022 22:15:24 +0000 (16:15 -0600)] 
Import ppp_defs.h uapi file from point of last kernel headers sync

ppp_defs header file is needed by PPPoE in flower support.

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agobpf_glue: include errno.h
Juhee Kang [Mon, 18 Jul 2022 15:58:27 +0000 (00:58 +0900)] 
bpf_glue: include errno.h

If __NR_bpf is not enabled, bpf() function set errno and return -1. Thus,
this patch includes the header.

Fixes: ac4e0913beb1 ("bpf: Export bpf syscall wrapper")
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: add support for linecard show and type set
Jiri Pirko [Sat, 16 Jul 2022 11:24:51 +0000 (13:24 +0200)] 
devlink: add support for linecard show and type set

Introduce a new object "lc" to add devlink support for line cards with
two commands:
show - to get the info about the line card state, list of supported
       types as reported by kernel/driver.
set - to set/clear the line card type.

Example:
$ devlink lc
pci/0000:01:00.0:
  lc 1 state unprovisioned
    supported_types:
       16x100G
  lc 2 state unprovisioned
    supported_types:
       16x100G
  lc 3 state unprovisioned
    supported_types:
       16x100G
  lc 4 state unprovisioned
    supported_types:
       16x100G
  lc 5 state unprovisioned
    supported_types:
       16x100G
  lc 6 state unprovisioned
    supported_types:
       16x100G
  lc 7 state unprovisioned
    supported_types:
       16x100G
  lc 8 state unprovisioned
    supported_types:
       16x100G

To provision the slot #8:

$ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G
$ devlink lc show pci/0000:01:00.0 lc 8
pci/0000:01:00.0:
  lc 8 state active type 16x100G
    supported_types:
       16x100G

To uprovision the slot #8:

$ devlink lc set pci/0000:01:00.0 lc 8 notype

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Thu, 21 Jul 2022 15:16:51 +0000 (09:16 -0600)] 
Update kernel headers

Update kernel headers to commit:
    5588d6280270 ("net/cdc_ncm: Increase NTB max RX/TX values to 64kb")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agovdpa: Update man page to include vdpa statistics
Eli Cohen [Thu, 21 Jul 2022 06:00:07 +0000 (09:00 +0300)] 
vdpa: Update man page to include vdpa statistics

Update the man page to include vdpa statistics information inroduce in
6f97e9c9337b ("vdpa: Add support for reading vdpa device statistics")

Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agordma: update uapi/ib_user_verbs.h
Stephen Hemminger [Mon, 18 Jul 2022 16:58:28 +0000 (09:58 -0700)] 
rdma: update uapi/ib_user_verbs.h

Update from 5.19-rc7

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agovdpa: update uapi headers from 5.19-rc7
Stephen Hemminger [Mon, 18 Jul 2022 16:56:57 +0000 (09:56 -0700)] 
vdpa: update uapi headers from 5.19-rc7

Keep VDPA sanitized headers up to current kernel.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoRevert "uapi: add vdpa.h"
Stephen Hemminger [Mon, 18 Jul 2022 16:50:58 +0000 (09:50 -0700)] 
Revert "uapi: add vdpa.h"

This reverts commit 291898c5ff881d0dc5d947031def0528101476cb.

3 years agoip neigh: Fix memory leak when doing 'get'
Benjamin Poirier [Sun, 10 Jul 2022 23:52:54 +0000 (08:52 +0900)] 
ip neigh: Fix memory leak when doing 'get'

With the following command sequence:

ip link add dummy0 type dummy
ip neigh add 192.168.0.1 dev dummy0
ip neigh get 192.168.0.1 dev dummy0

when running the last command under valgrind, it reports

32,768 bytes in 1 blocks are definitely lost in loss record 2 of 2
   at 0x483F7B5: malloc (vg_replace_malloc.c:381)
   by 0x17A0EC: rtnl_recvmsg (libnetlink.c:838)
   by 0x17A3D1: __rtnl_talk_iov.constprop.0 (libnetlink.c:1040)
   by 0x17B894: __rtnl_talk (libnetlink.c:1141)
   by 0x17B894: rtnl_talk (libnetlink.c:1147)
   by 0x12E49B: ipneigh_get (ipneigh.c:728)
   by 0x1174CB: do_cmd (ip.c:136)
   by 0x116F7C: main (ip.c:324)

Free the answer obtained from rtnl_talk().

Fixes: 62842362370b ("ipneigh: neigh get support")
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agomptcp: Fix memory leak when getting limits
Benjamin Poirier [Sun, 10 Jul 2022 23:52:53 +0000 (08:52 +0900)] 
mptcp: Fix memory leak when getting limits

When running the command `ip mptcp limits` under valgrind, it reports

32,768 bytes in 1 blocks are definitely lost in loss record 1 of 1
   at 0x483F7B5: malloc (vg_replace_malloc.c:381)
   by 0x17A0BC: rtnl_recvmsg (libnetlink.c:838)
   by 0x17A3A1: __rtnl_talk_iov.constprop.0 (libnetlink.c:1040)
   by 0x17B864: __rtnl_talk (libnetlink.c:1141)
   by 0x17B864: rtnl_talk (libnetlink.c:1147)
   by 0x16837D: mptcp_limit_get_set (ipmptcp.c:436)
   by 0x1174CB: do_cmd (ip.c:136)
   by 0x116F7C: main (ip.c:324)

Free the answer obtained from rtnl_talk().

Fixes: 7e0767cd862b ("add support for mptcp netlink interface")
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agomptcp: Fix memory leak when doing 'endpoint show'
Benjamin Poirier [Sun, 10 Jul 2022 23:52:52 +0000 (08:52 +0900)] 
mptcp: Fix memory leak when doing 'endpoint show'

With the following command sequence:

ip mptcp endpoint add 127.0.0.1 id 1
ip mptcp endpoint show id 1

when running the last command under valgrind, it reports

32,768 bytes in 1 blocks are definitely lost in loss record 2 of 2
   at 0x483F7B5: malloc (vg_replace_malloc.c:381)
   by 0x17A0AC: rtnl_recvmsg (libnetlink.c:838)
   by 0x17A391: __rtnl_talk_iov.constprop.0 (libnetlink.c:1040)
   by 0x17B854: __rtnl_talk (libnetlink.c:1141)
   by 0x17B854: rtnl_talk (libnetlink.c:1147)
   by 0x168A56: mptcp_addr_show (ipmptcp.c:334)
   by 0x1174CB: do_cmd (ip.c:136)
   by 0x116F7C: main (ip.c:324)

Free the answer obtained from rtnl_talk().

Fixes: 7e0767cd862b ("add support for mptcp netlink interface")
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agobridge: Fix memory leak when doing 'fdb get'
Benjamin Poirier [Sun, 10 Jul 2022 23:52:51 +0000 (08:52 +0900)] 
bridge: Fix memory leak when doing 'fdb get'

With the following command sequence:

ip link add br0 up type bridge
ip link add dummy0 up address 02:00:00:00:00:01 master br0 type dummy
bridge fdb get 02:00:00:00:00:01 br br0

when running the last command under valgrind, it reports

32,768 bytes in 1 blocks are definitely lost in loss record 2 of 2
   at 0x483F7B5: malloc (vg_replace_malloc.c:381)
   by 0x11C1EC: rtnl_recvmsg (libnetlink.c:838)
   by 0x11C4D1: __rtnl_talk_iov.constprop.0 (libnetlink.c:1040)
   by 0x11D994: __rtnl_talk (libnetlink.c:1141)
   by 0x11D994: rtnl_talk (libnetlink.c:1147)
   by 0x10D336: fdb_get (fdb.c:652)
   by 0x48907FC: (below main) (libc-start.c:332)

Free the answer obtained from rtnl_talk().

Fixes: 4ed5ad7bd3c6 ("bridge: fdb get support")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoip address: Fix memory leak when specifying device
Benjamin Poirier [Sun, 10 Jul 2022 23:52:50 +0000 (08:52 +0900)] 
ip address: Fix memory leak when specifying device

Running a command like `ip addr show dev lo` under valgrind informs us that

32,768 bytes in 1 blocks are definitely lost in loss record 4 of 4
   at 0x483577F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x16CBE2: rtnl_recvmsg (libnetlink.c:775)
   by 0x16CF04: __rtnl_talk_iov (libnetlink.c:954)
   by 0x16E257: __rtnl_talk (libnetlink.c:1059)
   by 0x16E257: rtnl_talk (libnetlink.c:1065)
   by 0x115CB1: ipaddr_link_get (ipaddress.c:1833)
   by 0x11A0D1: ipaddr_list_flush_or_save (ipaddress.c:2030)
   by 0x1152EB: do_cmd (ip.c:115)
   by 0x114D6F: main (ip.c:321)

After calling store_nlmsg(), the original buffer should be freed. That is
the pattern used elsewhere through the rtnl_dump_filter() call chain.

Fixes: 884709785057 ("ip address: Set device index in dump request")
Reported-by: Binu Gopalakrishnapillai <binug@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: add virtio_ring.h
Stephen Hemminger [Mon, 18 Jul 2022 16:33:52 +0000 (09:33 -0700)] 
uapi: add virtio_ring.h

When vdpa was updated, it included linux/virtio_ring.h but that
sanitized header file was not added.

Fixes: bd91c7647189 ("vdpa: Allow for printing negotiated features of a device")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: add vdpa.h
Stephen Hemminger [Mon, 18 Jul 2022 16:29:45 +0000 (09:29 -0700)] 
uapi: add vdpa.h

Iproute2 depends on kernel headers and all necessary kernel headers
should be in iproute tree.

Fixes: c2ecc82b9d4c ("vdpa: Add vdpa tool")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: update bpf.h
Stephen Hemminger [Sat, 16 Jul 2022 16:55:07 +0000 (09:55 -0700)] 
uapi: update bpf.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agolibbpf: add xdp program name support
Hangbin Liu [Tue, 5 Jul 2022 04:25:01 +0000 (12:25 +0800)] 
libbpf: add xdp program name support

In bpf program, only the program name is unique. Before this patch, if there
are multiple programs with the same section name, only the first program
will be attached. With program name support, users could specify the exact
program they want to attach.

Note this feature is only supported when iproute2 build with libbpf.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoip: Fix rx_otherhost_dropped support
Petr Machata [Tue, 28 Jun 2022 10:19:11 +0000 (12:19 +0200)] 
ip: Fix rx_otherhost_dropped support

The commit cited below added a new column to print_stats64(). However it
then updated only one size_columns() call site, neglecting to update the
remaining three. As a result, in those not-updated invocations,
size_columns() now accesses a vararg argument that is not being passed,
which is undefined behavior.

Fixes: cebf67a35d8a ("show rx_otherehost_dropped stat in ip link show")
CC: Tariq Toukan <tariqt@nvidia.com>
CC: Itay Aveksis <itayav@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoMerge branch 'main' into next
David Ahern [Wed, 6 Jul 2022 14:46:12 +0000 (08:46 -0600)] 
Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoip: Fix size_columns() invocation that passes a 32-bit quantity
Petr Machata [Tue, 28 Jun 2022 10:17:31 +0000 (12:17 +0200)] 
ip: Fix size_columns() invocation that passes a 32-bit quantity

In print_stats64(), the last size_columns() invocation passes number of
carrier changes as one of the arguments. The value is decoded as a 32-bit
quantity, but size_columns() expects a 64-bit one. This is undefined
behavior.

The reason valgrind does not cite this is that the previous size_columns()
invocations prime the ABI area used for the value transfer. When these
other invocations are commented away, valgrind does complain that
"conditional jump or move depends on uninitialised value", as would be
expected.

Fixes: 49437375b6c1 ("ip: dynamically size columns when printing stats")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoman: tc-fq_codel: add drop_batch
Yuki Inoguchi [Tue, 28 Jun 2022 10:12:51 +0000 (19:12 +0900)] 
man: tc-fq_codel: add drop_batch

Let's describe the drop_batch parameter added to tc command
by Commit 7868f802e2d9 ("tc: fq_codel: add drop_batch parameter")

Signed-off-by: Yuki Inoguchi <inoguchi.yuki@fujitsu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: update mptcp.h
Stephen Hemminger [Fri, 1 Jul 2022 23:46:13 +0000 (16:46 -0700)] 
uapi: update mptcp.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge branch 'main' into next
David Ahern [Fri, 1 Jul 2022 14:39:43 +0000 (08:39 -0600)] 
Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoip: Fix size_columns() for very large values
Petr Machata [Mon, 27 Jun 2022 13:18:21 +0000 (15:18 +0200)] 
ip: Fix size_columns() for very large values

For values near the 64-bit boundary, the iterative application of
powi *= 10 causes powi to overflow without the termination condition of
powi >= val having ever been satisfied. Instead, when determining the
length of the number, iterate val /= 10 and terminate when it's a single
digit.

Fixes: 49437375b6c1 ("ip: dynamically size columns when printing stats")
CC: Tariq Toukan <tariqt@nvidia.com>
CC: Itay Aveksis <itayav@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoiplink: bond_slave: add per port prio support
Hangbin Liu [Tue, 21 Jun 2022 07:51:05 +0000 (15:51 +0800)] 
iplink: bond_slave: add per port prio support

Add per port priority support for active slave re-selection during
bonding failover. A higher number means higher priority.

This option is only valid for active-backup(1), balance-tlb (5) and
balance-alb (6) mode.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Sun, 26 Jun 2022 17:13:08 +0000 (11:13 -0600)] 
Update kernel headers

Update kernel headers to commit:
    ebeae54d3a77 ("net: pcs: xpcs: depends on PHYLINK in Kconfig")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoman: tc-ct.8: fix example
Andrea Claudi [Tue, 21 Jun 2022 16:59:06 +0000 (18:59 +0200)] 
man: tc-ct.8: fix example

tc-ct manpage provides a wrong command to add an ingress qdisc to an
interface:

$ tc qdisc add dev eth0 handle ingress
Error: argument "ingress" is wrong: invalid qdisc ID

Fix it removing the useless "handle" keyword.

Fixes: 924c43778a84 ("man: tc-ct.8: Add manual page for ct tc action")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agol2tp: fix typo in AF_INET6 checksum JSON print
Andrea Claudi [Tue, 21 Jun 2022 16:53:08 +0000 (18:53 +0200)] 
l2tp: fix typo in AF_INET6 checksum JSON print

In print_tunnel json output, a typo makes it impossible to know the
value of udp6_csum_rx, printing instead udp6_csum_tx two times.

Fixed getting rid of the typo.

Fixes: 98453b65800f ("ip/l2tp: add JSON support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge branch 'lgtm'
Stephen Hemminger [Tue, 21 Jun 2022 22:35:03 +0000 (15:35 -0700)] 
Merge branch 'lgtm'

3 years agoman: tc-fq_codel: Fix a typo.
Yuki Inoguchi [Fri, 17 Jun 2022 09:28:12 +0000 (18:28 +0900)] 
man: tc-fq_codel: Fix a typo.

In tc-fq_codel man page, "length .B interval" should be "length interval."

Signed-off-by: Yuki Inoguchi <inoguchi.yuki@fujitsu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agovdpa: Add support for reading vdpa device statistics
Eli Cohen [Wed, 15 Jun 2022 03:46:16 +0000 (06:46 +0300)] 
vdpa: Add support for reading vdpa device statistics

Read statistics of a vdpa device. The specific data is a received as a
pair of attribute name and attribute value.

Examples:
1. Read statistics for the virtqueue at index 1

$ vdpa dev vstats show vdpa-a qidx 1
vdpa-a:
vdpa-a: queue_type tx received_desc 321812 completed_desc 321812

2. Read statistics for the virtqueue at index 16
$ vdpa dev vstats show vdpa-a qidx 16
vdpa-a: queue_type control_vq received_desc 17 completed_desc 17

3. Read statisitics for the virtqueue at index 0 with json output
$ vdpa -j dev vstats show vdpa-a qidx 0
{"vstats":{"vdpa-a":{"queue_type":"rx","received_desc":114855,"completed_desc":114617}}}

4. Read statistics for the virtqueue at index 0 with preety json
   output
$ vdpa -jp dev vstats show vdpa-a qidx 0
vdpa -jp dev vstats show vdpa-a qidx 0
{
    "vstats": {
        "vdpa-a": {
            "queue_type": "rx",
            "received_desc": 114855,
            "completed_desc": 114617
        }
    }
}

Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agotc: declaration hides parameter
Stephen Hemminger [Thu, 2 Jun 2022 21:38:52 +0000 (14:38 -0700)] 
tc: declaration hides parameter

In several places (code reuse?), the variable handle
is a parameter to the function, but then
is defined inside basic block for classid.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agogenl: fix duplicate include guard
Stephen Hemminger [Thu, 2 Jun 2022 21:38:06 +0000 (14:38 -0700)] 
genl: fix duplicate include guard

Found by LGTM.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: change name for zerocopy sendfile in tls
Stephen Hemminger [Fri, 17 Jun 2022 03:14:23 +0000 (20:14 -0700)] 
uapi: change name for zerocopy sendfile in tls

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: update socket.h
Stephen Hemminger [Sun, 5 Jun 2022 17:43:00 +0000 (10:43 -0700)] 
uapi: update socket.h

From 5.19-rc0

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoman: tc-fw: Document masked handle usage
Ido Schimmel [Tue, 14 Jun 2022 14:26:57 +0000 (17:26 +0300)] 
man: tc-fw: Document masked handle usage

The tc-fw filter can be used to match on the packet's fwmark by adding a
filter with a matching handle. It also supports matching on specific
bits of the fwmark by specifying the handle together with a mask. This
is documented in the usage message below, but not in the man page.

Document it in the man page together with an example.

 $ tc filter add fw help
 Usage: ... fw [ classid CLASSID ] [ indev DEV ] [ action ACTION_SPEC ]
         CLASSID := Push matching packets to the class identified by CLASSID with format X:Y
                 CLASSID is parsed as hexadecimal input.
         DEV := specify device for incoming device classification.
         ACTION_SPEC := Apply an action on matching packets.
         NOTE: handle is represented as HANDLE[/FWMASK].
                 FWMASK is 0xffffffff by default.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoshow rx_otherehost_dropped stat in ip link show
Jeffrey Ji [Thu, 9 Jun 2022 21:05:16 +0000 (21:05 +0000)] 
show rx_otherehost_dropped stat in ip link show

This stat was added in commit 794c24e9921f ("net-core: rx_otherhost_dropped to core_stats")

Tested: sent packet with wrong MAC address from 1
network namespace to another, verified that counter showed "1" in
`ip -s -s link sh` and `ip -s -s -j link sh`

Signed-off-by: Jeffrey Ji <jeffreyji@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoss: Shorter display format for TLS zerocopy sendfile
Maxim Mikityanskiy [Wed, 8 Jun 2022 15:34:45 +0000 (18:34 +0300)] 
ss: Shorter display format for TLS zerocopy sendfile

Commit 21c07b45688f ("ss: Show zerocopy sendfile status of TLS
sockets") started displaying the activation status of zerocopy sendfile
on TLS sockets, exposed via sock_diag. This commit makes the format more
compact: the flag's name is shorter and is printed only when the feature
is active, similar to other flag options.

The flag's name is also generalized ("sendfile" -> "tx") to embrace
possible future optimizations, and includes an explicit indication that
the underlying data must not be modified during transfer ("ro").

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Sun, 12 Jun 2022 15:43:54 +0000 (09:43 -0600)] 
Update kernel headers

Update kernel headers to commit:
    27f2533bcc6e ("nfp: flower: support to offload pedit of IPv6 flowinto fields")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoMerge branch 'bridge-fdb-flush' into next
David Ahern [Fri, 10 Jun 2022 15:02:29 +0000 (09:02 -0600)] 
Merge branch 'bridge-fdb-flush' into next

Nikolay Aleksandrov  says:

====================

Hi,
This set adds support for the new bulk delete flag to allow fdb flushing
for specific entries which are matched based on the supplied options.
The new bridge fdb subcommand is "flush", and as can be seen from the
commits it allows to delete entries based on many different criteria:
 - matching vlan
 - matching port
 - matching all sorts of flags (combinations are allowed)

There are also examples for each option in the respective commit messages.

Examples:
$ bridge fdb flush dev swp2 master vlan 100 dynamic
 [ delete all dynamic entries with port swp2 and vlan 100 ]
$ bridge fdb flush dev br0 vlan 1 static
 [ delete all static entries in br0's fdb table ]
$ bridge fdb flush dev swp2 master extern_learn nosticky
 [ delete all entries with port swp2 which have extern_learn set and
   don't have the sticky flag set ]
$ bridge fdb flush dev br0 brport br0 vlan 100 permanent
 [ delete all entries pointing to the bridge itself with vlan 100 ]
$ bridge fdb flush dev swp2 master nostatic nooffloaded
 [ delete all entries with port swp2 which are not static and not
   offloaded ]

If keyword is specified and after that nokeyword is specified obviously
the nokeyword would override keyword.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agobridge: fdb: add flush [no]offloaded entry matching
Nikolay Aleksandrov [Wed, 8 Jun 2022 12:29:21 +0000 (15:29 +0300)] 
bridge: fdb: add flush [no]offloaded entry matching

Add flush support to match entries with or without (if "no" is
prepended) offloaded flag.

Examples:
$ bridge fdb flush dev br0 offloaded
This will delete all offloaded entries in br0's fdb table.

$ bridge fdb flush dev br0 nooffloaded
This will delete all entries except the ones with offloaded flag in
br0's fdb table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agobridge: fdb: add flush [no]sticky entry matching
Nikolay Aleksandrov [Wed, 8 Jun 2022 12:29:20 +0000 (15:29 +0300)] 
bridge: fdb: add flush [no]sticky entry matching

Add flush support to match entries with or without (if "no" is
prepended) sticky flag.

Examples:
$ bridge fdb flush dev br0 sticky
This will delete all sticky entries in br0's fdb table.

$ bridge fdb flush dev br0 nosticky
This will delete all entries except the ones with sticky flag in
br0's fdb table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agobridge: fdb: add flush [no]extern_learn entry matching
Nikolay Aleksandrov [Wed, 8 Jun 2022 12:29:19 +0000 (15:29 +0300)] 
bridge: fdb: add flush [no]extern_learn entry matching

Add flush support to match entries with or without (if "no" is
prepended) extern_learn flag.

Examples:
$ bridge fdb flush dev br0 extern_learn
This will delete all extern_learn entries in br0's fdb table.

$ bridge fdb flush dev br0 noextern_learn
This will delete all entries except the ones with extern_learn flag in
br0's fdb table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agobridge: fdb: add flush [no]added_by_user entry matching
Nikolay Aleksandrov [Wed, 8 Jun 2022 12:29:18 +0000 (15:29 +0300)] 
bridge: fdb: add flush [no]added_by_user entry matching

Add flush support to match entries with or without (if "no" is
prepended) added_by_user flag. Note that NTF_USE is used internally
because there is no NTF_ flag that describes such entries.

Examples:
$ bridge fdb flush dev br0 added_by_user
This will delete all added_by_user entries in br0's fdb table.

$ bridge fdb flush dev br0 noadded_by_user
This will delete all entries except the ones with added_by_user flag in
br0's fdb table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agobridge: fdb: add flush [no]dynamic entry matching
Nikolay Aleksandrov [Wed, 8 Jun 2022 12:29:17 +0000 (15:29 +0300)] 
bridge: fdb: add flush [no]dynamic entry matching

Add flush support to match dynamic or non-dynamic (static or permanent)
entries if "no" is prepended respectively. Note that dynamic entries are
defined as fdbs without NUD_NOARP and NUD_PERMANENT set, and non-dynamic
entries are fdbs with NUD_NOARP set (that matches both static and
permanent entries).

Examples:
$ bridge fdb flush dev br0 dynamic
This will delete all dynamic entries in br0's fdb table.

$ bridge fdb flush dev br0 nodynamic
This will delete all entries except the dynamic ones in br0's fdb
table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agobridge: fdb: add flush [no]static entry matching
Nikolay Aleksandrov [Wed, 8 Jun 2022 12:29:16 +0000 (15:29 +0300)] 
bridge: fdb: add flush [no]static entry matching

Add flush support to match static or non-static entries if "no" is
prepended respectively. Note that static entries are only NUD_NOARP ones
without NUD_PERMANENT, also when matching non-static entries exclude
permanent entries as well (permanent entries by definition are also
static).

Examples:
$ bridge fdb flush dev br0 static
This will delete all static entries in br0's fdb table.

$ bridge fdb flush dev br0 nostatic
This will delete all entries except the static ones in br0's fdb
table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agobridge: fdb: add flush [no]permanent entry matching
Nikolay Aleksandrov [Wed, 8 Jun 2022 12:29:15 +0000 (15:29 +0300)] 
bridge: fdb: add flush [no]permanent entry matching

Add flush support to match permanent or non-permanent entries if "no" is
prepended respectively.

Examples:
$ bridge fdb flush dev br0 permanent
This will delete all permanent entries in br0's fdb table.

$ bridge fdb flush dev br0 nopermanent
This will delete all entries except the permanent ones in br0's fdb
table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agobridge: fdb: add flush port matching
Nikolay Aleksandrov [Wed, 8 Jun 2022 12:29:14 +0000 (15:29 +0300)] 
bridge: fdb: add flush port matching

Usually we match on the device specified after "dev" but there are
special cases where we need an additional device attribute for matching
such as when matching entries specifically pointing to the bridge device
itself. We use NDA_IFINDEX for that purpose.

Example:
$ bridge fdb flush dev br0 brport br0
This will flush only entries pointing to the bridge itself.

$ bridge fdb flush dev swp1 brport swp2 master
Note this will flush entries pointing to swp2 only. The NDA_IFINDEX
attribute overrides the dev argument. This is documented in the man
page.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agobridge: fdb: add flush vlan matching
Nikolay Aleksandrov [Wed, 8 Jun 2022 12:29:13 +0000 (15:29 +0300)] 
bridge: fdb: add flush vlan matching

Add flush support to match fdb entries in a specific vlan.
Example:
$ bridge fdb flush dev swp1 vlan 10 master
This will flush all fdb entries with port swp1 and vlan 10.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agobridge: fdb: add new flush command
Nikolay Aleksandrov [Wed, 8 Jun 2022 12:29:12 +0000 (15:29 +0300)] 
bridge: fdb: add new flush command

Add support for fdb bulk delete (aka flush) command. Currently it only
supports the self and master flags with the same semantics as fdb
add/del. The device is a mandatory argument.

Example:
$ bridge fdb flush dev br0
This will delete *all* fdb entries in br0's fdb table.

$ bridge fdb flush dev swp1 master
This will delete all fdb entries pointing to swp1.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoMerge branch 'main' into next
David Ahern [Thu, 9 Jun 2022 15:12:36 +0000 (09:12 -0600)] 
Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoip: Convert non-constant initializers to macros
Petr Machata [Tue, 31 May 2022 11:35:48 +0000 (13:35 +0200)] 
ip: Convert non-constant initializers to macros

As per the C standard, "expressions in an initializer for an object that
has static or thread storage duration shall be constant expressions".
Aggregate objects are not constant expressions. Newer GCC doesn't mind, but
older GCC and LLVM do.

Therefore convert to a macro. And since all these macros will look very
similar, extract a generic helper, IPSTATS_STAT_DESC_XSTATS_LEAF, which
takes the leaf name as an argument and initializes the rest as appropriate
for an xstats descriptor.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>