]> git.ipfire.org Git - thirdparty/iproute2.git/log
thirdparty/iproute2.git
2 years agodevlink: Support setting port function roce cap
Shay Drory [Sun, 11 Dec 2022 11:58:47 +0000 (13:58 +0200)] 
devlink: Support setting port function roce cap

Support port function commands to enable / disable RoCE, this is used to
control the port RoCE device capabilities.

When RoCE is disabled for a function of the port, function cannot create
any RoCE specific resources (e.g GID table).
It also saves system memory utilization. For example disabling RoCE
enable a VF/SF to save 1 Mbytes of system memory per function.

Example of a PCI VF port which supports a port function:

$ devlink port show pci/0000:06:00.0/2
    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum
    0 vfnum 1
      function:
        hw_addr 00:00:00:00:00:00 roce enabled

$ devlink port function set pci/0000:06:00.0/2 roce disable

$ devlink port show pci/0000:06:00.0/2
    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum
    0 vfnum 1
      function:
        hw_addr 00:00:00:00:00:00 roce disabled

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoUpdate kernel headers
David Ahern [Wed, 14 Dec 2022 15:54:03 +0000 (08:54 -0700)] 
Update kernel headers

Update kernel headers to commit:
    7e68dd7d07a2 ("Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")

Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: update ifname map when message contains DEVLINK_ATTR_PORT_NETDEV_NAME
Jiri Pirko [Mon, 5 Dec 2022 12:21:58 +0000 (13:21 +0100)] 
devlink: update ifname map when message contains DEVLINK_ATTR_PORT_NETDEV_NAME

Recent kernels send PORT_NEW message with when ifname changes,
so benefit from that by having ifnames updated.

Whenever there is a message containing DEVLINK_ATTR_PORT_NETDEV_NAME
attribute, use it to update ifname map.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: push common code to __pr_out_port_handle_start_tb()
Jiri Pirko [Mon, 5 Dec 2022 12:21:57 +0000 (13:21 +0100)] 
devlink: push common code to __pr_out_port_handle_start_tb()

There is a common code in pr_out_port_handle_start() and
pr_out_port_handle_start_arr(). As the next patch is going to extend it
even more, push the code into common helper.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: get devlink port for ifname using RTNL get link command
Jiri Pirko [Mon, 5 Dec 2022 12:21:56 +0000 (13:21 +0100)] 
devlink: get devlink port for ifname using RTNL get link command

Currently, when user specifies ifname as a handle on command line of
devlink, the related devlink port is looked-up in previously taken dump
of all devlink ports on the system. There are 3 problems with that:
1) The dump iterates over all devlink instances in kernel and takes a
   devlink instance lock for each.
2) Dumping all devlink ports would not scale.
3) Alternative ifnames are not exposed by devlink netlink interface.

Instead, benefit from RTNL get link command extension and get the
devlink port handle info from IFLA_DEVLINK_PORT attribute, if supported.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: add ifname_map_add/del() helpers
Jiri Pirko [Mon, 5 Dec 2022 12:21:55 +0000 (13:21 +0100)] 
devlink: add ifname_map_add/del() helpers

Add couple of helpers to alloc/free of map object alongside with list
addition/removal.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoMerge branch 'pcp-prio-apptrust' into next
David Ahern [Thu, 8 Dec 2022 16:23:56 +0000 (09:23 -0700)] 
Merge branch 'pcp-prio-apptrust' into next

Daniel Machon  says:

====================

This patch series makes use of the newly introduced [1] DCB_APP_SEL_PCP
selector, for PCP/DEI prioritization, and DCB_ATTR_IEEE_APP_TRUST
attribute for configuring per-selector trust and trust-order.

========================================================================
New parameter "pcp-prio" to existing "app" subcommand:
========================================================================

A new pcp-prio parameter has been added to the app subcommand, which can
be used to classify traffic based on PCP and DEI from the VLAN header.
PCP and DEI is specified in a combination of numerical and symbolic
form, where 'de' (drop-eligible) means DEI=1 and 'nd' (not-drop-eligible)
means DEI=0.

Map PCP 1 and DEI 0 to priority 1
$ dcb app add dev eth0 pcp-prio 1nd:1

Map PCP 1 and DEI 1 to priority 1
$ dcb app add dev eth0 pcp-prio 1de:1

========================================================================
New apptrust subcommand for configuring per-selector trust and trust
order:
========================================================================

This new command currently has a single parameter, which lets you
specify an ordered list of trusted selectors. The microchip sparx5
driver is already enabled to offload said list of trusted selectors. The
new command has been given the name apptrust, to indicate that the trust
covers APP table selectors only. I found that 'apptrust' was better than
plain 'trust' as the latter does not indicate the scope of what is to be
trusted.

Example:

Trust selectors dscp and pcp, in that order:
$ dcb apptrust set dev eth0 order dscp pcp

Trust selectors ethtype, stream-port and pcp, in that order
$ dcb apptrust set dev eth0 order ethtype stream-port pcp

Show the trust order
$ dcb apptrust show dev eth0 order order: ethtype stream-port pcp

A concern was raised here [2], that 'apptrust' would not work well with
matches(), so instead strcmp() has been used to match for the new
subcommand, as suggested here [3]. Same goes with pcp-prio parameter for
dcb app.

The man page for dcb_app has been extended to cover the new pcp-prio
parameter, and a new man page for dcb_apptrust has been created.

[1] https://lore.kernel.org/netdev/20221101094834.2726202-1-daniel.machon@microchip.com/
[2] https://lore.kernel.org/netdev/20220909080631.6941a770@hermes.local/
[3] https://lore.kernel.org/netdev/Y0fP+9C0tE7P2xyK@shredder/

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodcb: add new subcommand for apptrust
Daniel Machon [Mon, 5 Dec 2022 22:21:45 +0000 (23:21 +0100)] 
dcb: add new subcommand for apptrust

Add new apptrust subcommand for the dcbnl apptrust extension object.

The apptrust command lets you specify a consecutive ordered list of
trusted selectors, which can be used by drivers to determine which
selectors are eligible (trusted) for packet prioritization, and in which
order.

Selectors are sent in a new nested attribute:
DCB_ATTR_IEEE_APP_TRUST_TABLE.  The nest contains trusted selectors
encapsulated in either DCB_ATTR_IEEE_APP or DCB_ATTR_DCB_APP attributes,
for standard and non-standard selectors, respectively.

Example:

Trust selectors dscp and pcp, in that order
$ dcb apptrust set dev eth0 order dscp pcp

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodcb: add new pcp-prio parameter to dcb app
Daniel Machon [Mon, 5 Dec 2022 22:21:44 +0000 (23:21 +0100)] 
dcb: add new pcp-prio parameter to dcb app

Add new pcp-prio parameter to the app subcommand, which can be used to
classify traffic based on PCP and DEI from the VLAN header. PCP and DEI
is specified in a combination of numerical and symbolic form, where 'de'
(drop-eligible) means DEI=1 and 'nd' (not-drop-eligible) means DEI=0.

Map PCP 1 and DEI 0 to priority 1
$ dcb app add dev eth0 pcp-prio 1nd:1

Map PCP 1 and DEI 1 to priority 1
$ dcb app add dev eth0 pcp-prio 1de:1

Internally, PCP and DEI is encoded in the protocol field of the dcb_app
struct. Each combination of PCP and DEI maps to a priority, thus needing
a range of  0-15. A well formed dcb_app entry for PCP/DEI
prioritization, could look like:

    struct dcb_app pcp = {
        .selector = DCB_APP_SEL_PCP,
.priority = 7,
        .protocol = 15
    }

For mapping PCP=7 and DEI=1 to Prio=7.

Also, three helper functions for translating between std and non-std APP
selectors, have been added to dcb_app.c and exposed through dcb.h.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: support direct region read requests
Jacob Keller [Mon, 5 Dec 2022 22:59:31 +0000 (14:59 -0800)] 
devlink: support direct region read requests

The kernel has gained support for reading from regions without needing to
create a snapshot. To use this support, the DEVLINK_ATTR_REGION_DIRECT
attribute must be added to the command.

For the "read" command, if the user did not specify a snapshot, add the new
attribute to request a direct read. The "dump" command will still require a
snapshot. While technically a dump could be performed without a snapshot it
is not guaranteed to be atomic unless the region size is no larger than
256 bytes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agolibnetlink: Fix wrong netlink header placement
Ido Schimmel [Thu, 8 Dec 2022 14:38:16 +0000 (16:38 +0200)] 
libnetlink: Fix wrong netlink header placement

The netlink header must be first in the netlink message, so move it
there.

Fixes: fee4a56f0191 ("Update kernel headers")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: Add documentation for tx_prority and tx_weight
Michal Wilczynski [Thu, 1 Dec 2022 10:26:26 +0000 (11:26 +0100)] 
devlink: Add documentation for tx_prority and tx_weight

New netlink attributes tx_priority and tx_weight were added.
Update the man page for devlink-rate to account for new attributes.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: Introduce new attribute 'tx_weight' to devlink-rate
Michal Wilczynski [Thu, 1 Dec 2022 10:26:25 +0000 (11:26 +0100)] 
devlink: Introduce new attribute 'tx_weight' to devlink-rate

To fully utilize hierarchical QoS algorithm new attribute 'tx_weight'
needs to be introduced. Weight attribute allows for usage of Weighted
Fair Queuing arbitration scheme among siblings. This arbitration
scheme can be used simultaneously with the strict priority.

Introduce ability to configure tx_weight from devlink userspace
utility. Make the new attribute optional.

Example commands:
$ devlink port function rate add pci/0000:4b:00.0/node_custom \
  tx_weight 50 parent node_0

$ devlink port function rate set pci/0000:4b:00.0/2 tx_weight 20

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agodevlink: Introduce new attribute 'tx_priority' to devlink-rate
Michal Wilczynski [Thu, 1 Dec 2022 10:26:24 +0000 (11:26 +0100)] 
devlink: Introduce new attribute 'tx_priority' to devlink-rate

To fully utilize hierarchical QoS algorithm new attribute 'tx_priority'
needs to be introduced. Priority attribute allows for usage of strict
priority arbiter among siblings. This arbitration scheme attempts to
schedule nodes based on their priority as long as the nodes remain within
their bandwidth limit.

Introduce ability to configure tx_priority from devlink userspace
utility. Make the new attribute optional.

Example commands:
$ devlink port function rate add pci/0000:4b:00.0/node_custom \
  tx_priority 5 parent node_0
$ devlink port function rate set pci/0000:4b:00.0/2 tx_priority 5

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoMerge branch 'main' into next
David Ahern [Fri, 2 Dec 2022 15:59:28 +0000 (08:59 -0700)] 
Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoUpdate kernel headers
David Ahern [Fri, 2 Dec 2022 15:57:25 +0000 (08:57 -0700)] 
Update kernel headers

Update kernel headers to commit:
    dbadae927287 ("tsnep: Rework RX buffer allocation")

Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agotestsuite: Add test for ip --json neigh get
Leonard Crestez [Thu, 1 Dec 2022 21:41:06 +0000 (23:41 +0200)] 
testsuite: Add test for ip --json neigh get

Signed-off-by: Leonard Crestez <cdleonard@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoip neigh: Support --json on ip neigh get
Leonard Crestez [Thu, 1 Dec 2022 21:41:05 +0000 (23:41 +0200)] 
ip neigh: Support --json on ip neigh get

The ip neigh command supports --json for "list" but not for "get". Add
json support for the "get" command so that it's possible to fetch
information about specific neighbors without regular expressions.

Fixes: aac7f725fa46 ("ipneigh: add color and json support")
Signed-off-by: Leonard Crestez <cdleonard@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agovdpa: allow provisioning device features
Jason Wang [Tue, 29 Nov 2022 04:28:16 +0000 (12:28 +0800)] 
vdpa: allow provisioning device features

This patch allows device features to be provisioned via vdpa. This
will be useful for preserving migration compatibility between source
and destination:

# vdpa dev add name dev1 mgmtdev pci/0000:02:00.0 device_features 0x300020000
# vdpa dev config show dev1
# dev1: mac 52:54:00:12:34:56 link up link_announce false mtu 65535
      negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agotaprio: fix wrong for loop condition in add_tc_entries()
Tan Tee Min [Fri, 2 Dec 2022 06:25:42 +0000 (14:25 +0800)] 
taprio: fix wrong for loop condition in add_tc_entries()

The for loop in add_tc_entries() mistakenly included the last entry
index+1. Fix it to correctly loop the max_sdu entry between tc=0 and
num_max_sdu_entries-1.

Fixes: b10a6509c195 ("taprio: support dumping and setting per-tc max SDU")
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agotc/basic: fix json output filter
Stephen Hemminger [Thu, 1 Dec 2022 15:30:54 +0000 (07:30 -0800)] 
tc/basic: fix json output filter

The flowid and handle in basic were not using JSON routines to print.
 To reproduce the issue:

 $ tc qdisc add dev eth1 handle ffff: ingress
 $ tc filter add dev eth1 parent ffff: prio 20 protocol all u32 match ip dport 22 \
     0xffff action police conform-exceed drop/ok rate 100000 burst 15k flowid ffff:1

 $ tc filter add dev eth1 parent ffff: prio 255 protocol all basic action police \
     conform-exceed drop/ok rate 100000 burst 15k flowid ffff:3

Reported-by: Christian Pössinger <christian@poessinger.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agoip: fix return value for rtnl_talk failures
Hangbin Liu [Tue, 8 Nov 2022 12:43:44 +0000 (20:43 +0800)] 
ip: fix return value for rtnl_talk failures

Since my last commit "rtnetlink: add new function rtnl_echo_talk()" we
return the kernel rtnl exit code directly, which breaks some kernel
selftest checking. As there are still a lot of tests checking -2 as the
error return value, to keep backward compatibility, let's keep using
-2 for all the rtnl return values.

Reported-by: Ido Schimmel <idosch@idosch.org>
Fixes: 6c09257f1bf6 ("rtnetlink: add new function rtnl_echo_talk()")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agodevlink: load ifname map on demand from ifname_map_rev_lookup() as well
Jiri Pirko [Wed, 9 Nov 2022 12:48:51 +0000 (13:48 +0100)] 
devlink: load ifname map on demand from ifname_map_rev_lookup() as well

Commit 5cddbb274eab ("devlink: load port-ifname map on demand") changed
the ifname map to be loaded on demand from ifname_map_lookup(). However,
it didn't put this on-demand loading into ifname_map_rev_lookup() which
causes ifname_map_rev_lookup() to return -ENOENT all the time.

Fix this by triggering on-demand ifname map load
from ifname_map_rev_lookup() as well.

Fixes: 5cddbb274eab ("devlink: load port-ifname map on demand")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agotc: put size table options in json object
Stephen Hemminger [Fri, 25 Nov 2022 18:48:02 +0000 (10:48 -0800)] 
tc: put size table options in json object

Missed this part from earlier change.

Fixes: 6af6f02cce42 ("tc: add json support to size table")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agotc_util: Change datatype for maj to avoid overflow issue
Lai Peter Jun Ann [Mon, 21 Nov 2022 02:29:09 +0000 (10:29 +0800)] 
tc_util: Change datatype for maj to avoid overflow issue

The return value by stroul() is unsigned long int. Hence the datatype
for maj should defined as unsigned long to avoid overflow issue.

Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agotc_util: Fix no error return when large parent id used
Lai Peter Jun Ann [Thu, 17 Nov 2022 05:33:17 +0000 (13:33 +0800)] 
tc_util: Fix no error return when large parent id used

This patch is to fix the issue where there is no error return
when large value of parent ID is being used. The return value by
stroul() is unsigned long int. Hence the datatype for maj and min
should defined as unsigned long to avoid overflow issue.

Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agotc: add json support to size table
Stephen Hemminger [Wed, 23 Nov 2022 04:16:07 +0000 (20:16 -0800)] 
tc: add json support to size table

Fix the JSON output if size addaption table is used.

Example:
[ {
        "kind": "fq_codel",
        "handle": "1:",
        "dev": "enp2s0",
        "root": true,
        "refcnt": 2,
        "options": {
            "limit": 10240,
            "flows": 1024,
            "quantum": 1514,
            "target": 4999,
            "interval": 99999,
            "memory_limit": 33554432,
            "ecn": true,
            "drop_batch": 64
        },
        "stab": {
            "overhead": 30,
            "mpu": 68,
            "mtu": 2047,
            "tsize": 512
        }
    } ]

Remove fixed prefix arg and no longer needed fp arg.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agoremove #if 0 code
Stephen Hemminger [Wed, 23 Nov 2022 03:01:23 +0000 (19:01 -0800)] 
remove #if 0 code

Let's not keep unused code. The YAGNI means that this dead
code doesn't work now, and if it did it would have to change.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agouapi: update for in.h and ip.h
Stephen Hemminger [Tue, 22 Nov 2022 19:33:49 +0000 (11:33 -0800)] 
uapi: update for in.h and ip.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agotc_stab: remove dead code
Stephen Hemminger [Mon, 21 Nov 2022 19:04:53 +0000 (11:04 -0800)] 
tc_stab: remove dead code

This code to print the STAB table is not supportable,
not converting to JSON.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 years agobridge: Remove unused function argument
Benjamin Poirier [Tue, 15 Nov 2022 21:07:15 +0000 (16:07 -0500)] 
bridge: Remove unused function argument

print_vnifilter_rtm() was probably modeled on print_vlan_rtm() but the
'monitor' argument is unused in the vnifilter case.

Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoMerge branch 'main' into next
David Ahern [Mon, 14 Nov 2022 02:29:25 +0000 (19:29 -0700)] 
Merge branch 'main' into next

Conflicts:
include/uapi/linux/bpf.h

Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoman: bridge: Reword description of "locked" bridge port option
Ido Schimmel [Sun, 6 Nov 2022 11:39:57 +0000 (13:39 +0200)] 
man: bridge: Reword description of "locked" bridge port option

Adjust the description to mention the "no_linklocal_learn" bridge option
and make sure it is consistent between both the bridge(8) and ip-link(8)
man pages.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agobridge: link: Add MAC Authentication Bypass (MAB) support
Hans Schultz [Sun, 6 Nov 2022 11:39:56 +0000 (13:39 +0200)] 
bridge: link: Add MAC Authentication Bypass (MAB) support

Add MAB support in bridge(8) and ip(8), allowing these utilities to
enable / disable MAB and display its current status.

Signed-off-by: Hans Schultz <netdev@kapio-technology.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agobridge: fdb: Add support for locked FDB entries
Hans Schultz [Sun, 6 Nov 2022 11:39:55 +0000 (13:39 +0200)] 
bridge: fdb: Add support for locked FDB entries

Print the "locked" FDB flag when it is set in the 'NDA_FLAGS_EXT'
attribute. Example output:

 # bridge fdb get 00:11:22:33:44:55 br br0
 00:11:22:33:44:55 dev swp1 locked master br0

 # bridge -j -p fdb get 00:11:22:33:44:55 br br0
 [ {
         "mac": "00:11:22:33:44:55",
         "ifname": "swp1",
         "flags": [ "locked" ],
         "master": "br0",
         "state": ""
     } ]

Signed-off-by: Hans Schultz <netdev@kapio-technology.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agoUpdate kernel headers
David Ahern [Mon, 7 Nov 2022 15:47:16 +0000 (08:47 -0700)] 
Update kernel headers

Update kernel headers to commit:
    bf46390f39c6 ("Merge branch 'genetlink-per-op-type-policies'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2 years agojson: do not escape single quotes
Andrea Claudi [Thu, 3 Nov 2022 17:39:25 +0000 (18:39 +0100)] 
json: do not escape single quotes

ECMA-404 standard does not include single quote character among the json
escape sequences. This means single quotes does not need to be escaped.

Indeed the single quote escape produces an invalid json output:

$ ip link add "john's" type dummy
$ ip link show "john's"
9: john's: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether c6:8e:53:f6:a3:4b brd ff:ff:ff:ff:ff:ff
$ ip -j link | jq .
parse error: Invalid escape at line 1, column 765

This can be fixed removing the single quote escape in jsonw_puts.
With this patch in place:

$ ip -j link | jq .[].ifname
"lo"
"john's"

Fixes: fcc16c2287bf ("provide common json output formatter")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
3 years agotaprio: support dumping and setting per-tc max SDU
Vladimir Oltean [Fri, 28 Oct 2022 11:50:53 +0000 (14:50 +0300)] 
taprio: support dumping and setting per-tc max SDU

The 802.1Q queueMaxSDU table is technically implemented in Linux as
the TCA_TAPRIO_TC_ENTRY_MAX_SDU attribute of the TCA_TAPRIO_ATTR_TC_ENTRY
nest. Multiple TCA_TAPRIO_ATTR_TC_ENTRY nests may appear in the netlink
message, one per traffic class. Other configuration items that are per
traffic class are also supposed to go there.

This is done for future extensibility of the netlink interface (I have
the feeling that the struct tc_mqprio_qopt passed through
TCA_TAPRIO_ATTR_PRIOMAP is not exactly extensible, which kind of defeats
the purpose of using netlink). But otherwise, the max-sdu is parsed from
the user, and printed, just like any other fixed-size 16 element array.

I've modified the example for a fully offloaded configuration (flags 2)
to also show a max-sdu use case. The gate intervals were 0x80 (for TC 7),
0xa0 (for TCs 7 and 5) and 0xdf (for TCs 7, 6, 4, 3, 2, 1, 0).
I modified the last gate to exclude TC 7 (df -> 5f), so that TC 7 now
only interferes with TC 5.

Output after running the full offload command from the man page example
(the new attribute is "max-sdu"):

$ tc qdisc show dev swp0 root
qdisc taprio 8002: root tc 8 map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0
queues offset 0 count 1 offset 1 count 1 offset 2 count 1 offset 3 count 1 offset 4 count 1 offset 5 count 1 offset 6 count 1 offset 7 count 1
 flags 0x2      base-time 200 cycle-time 100000 cycle-time-extension 0
        index 0 cmd S gatemask 0x80 interval 20000
        index 1 cmd S gatemask 0xa0 interval 20000
        index 2 cmd S gatemask 0x5f interval 60000
max-sdu 0 0 0 0 0 200 0 0 0 0 0 0 0 0 0 0

$ tc -j -p qdisc show dev eno0 root
[ {
        "kind": "taprio",
        "handle": "8002:",
        "root": true,
        "options": {
            "tc": 8,
            "map": [ 0,1,2,3,4,5,6,7,0,0,0,0,0,0,0,0 ],
            "queues": [ {
                    "offset": 0,
                    "count": 1
                },{
                    "offset": 1,
                    "count": 1
                },{
                    "offset": 2,
                    "count": 1
                },{
                    "offset": 3,
                    "count": 1
                },{
                    "offset": 4,
                    "count": 1
                },{
                    "offset": 5,
                    "count": 1
                },{
                    "offset": 6,
                    "count": 1
                },{
                    "offset": 7,
                    "count": 1
                } ],
            "flags": "0x2",
            "base_time": 200,
            "cycle_time": 100000,
            "cycle_time_extension": 0,
            "schedule": [ {
                    "index": 0,
                    "cmd": "S",
                    "gatemask": "0x80",
                    "interval": 20000
                },{
                    "index": 1,
                    "cmd": "S",
                    "gatemask": "0xa0",
                    "interval": 20000
                },{
                    "index": 2,
                    "cmd": "S",
                    "gatemask": "0x5f",
                    "interval": 60000
                } ],
            "max-sdu": [ 0,0,0,0,0,200,0,0,0,0,0,0,0,0,0,0 ]
        }
    } ]

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoip-monitor: Do not error out when RTNLGRP_STATS is not available
Benjamin Poirier [Wed, 26 Oct 2022 06:49:07 +0000 (15:49 +0900)] 
ip-monitor: Do not error out when RTNLGRP_STATS is not available

Following commit 4e8a9914c4d4 ("ip-monitor: Include stats events in default
and "all" cases"), `ip monitor` fails to start on kernels which do not
contain linux.git commit 5fd0b838efac ("net: rtnetlink: Add UAPI toggle for
IFLA_OFFLOAD_XSTATS_L3_STATS") because the netlink group RTNLGRP_STATS
doesn't exist:

 $ ip monitor
 Failed to add stats group to list

When "stats" is not explicitly requested, ignore the error so that `ip
monitor` and `ip monitor all` continue to work on older kernels.

Note that the same change is not done for RTNLGRP_NEXTHOP because its value
is 32 and group numbers <= 32 are always supported; see the comment above
netlink_change_ngroups() in the kernel source. Therefore
NETLINK_ADD_MEMBERSHIP 32 does not error out even on kernels which do not
support RTNLGRP_NEXTHOP.

v2:
* Silently ignore a failure to implicitly add the stats group, instead of
  printing a warning.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes: 4e8a9914c4d4 ("ip-monitor: Include stats events in default and "all" cases")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agogenl: remove unused vars in Makefile
Andrea Claudi [Sun, 23 Oct 2022 16:41:01 +0000 (18:41 +0200)] 
genl: remove unused vars in Makefile

Both GENLLIB and LIBUTIL are not used in genl Makefile, let's get rid of
them.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agotestsuite: fix build failure
Andrea Claudi [Sun, 23 Oct 2022 15:37:11 +0000 (17:37 +0200)] 
testsuite: fix build failure

After commit 6c09257f1bf6 ("rtnetlink: add new function
rtnl_echo_talk()") "make check" results in:

$ make check

make -C testsuite
make -C iproute2 configure
make -C testsuite alltests
make -C tools
    CC       generate_nlmsg
/usr/bin/ld: /tmp/cc6YaGBM.o: in function `rtnl_echo_talk':
libnetlink.c:(.text+0x25bd): undefined reference to `new_json_obj'
/usr/bin/ld: libnetlink.c:(.text+0x25c7): undefined reference to `open_json_object'
/usr/bin/ld: libnetlink.c:(.text+0x25e3): undefined reference to `close_json_object'
/usr/bin/ld: libnetlink.c:(.text+0x25e8): undefined reference to `delete_json_obj'
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:6: generate_nlmsg] Error 1
make[1]: *** [Makefile:40: generate_nlmsg] Error 2
make: *** [Makefile:130: check] Error 2

This is due to json function calls included in libutil and not in
libnetlink. Fix this adding libutil.a to the tools Makefile, and linking
against libcap as required by libutil itself.

Fixes: 6c09257f1bf6 ("rtnetlink: add new function rtnl_echo_talk()")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoss: re-add TIPC query support
Matthieu Baerts [Mon, 17 Oct 2022 17:03:08 +0000 (19:03 +0200)] 
ss: re-add TIPC query support

TIPC support has been introduced in 'iproute-master' (not -next) in
commit 5caf79a0 ("ss: Add support for TIPC socket diag in ss tool"), at
the same time a refactoring introducing filter_db_parse() was done, see
commit 67d5fd55 ("ss: Put filter DB parsing into a separate function")
from iproute2-next.

When the two commits got merged, the support for TIPC has been
apparently accidentally dropped.

This simply adds the missing entry for TIPC.

Fixes: 2c62a64d ("Merge branch 'iproute2-master' into iproute2-next")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoss: usage: add missing parameters
Matthieu Baerts [Mon, 17 Oct 2022 17:03:07 +0000 (19:03 +0200)] 
ss: usage: add missing parameters

These query entries were in the man page but not in 'ss -h':

- packet_raw
- packet_dgram
- dccp
- sctp
- xdp (+ the --xdp option)

I only created one commit with all: this fixes multiple commits but all
on the same line.

The only exception is with '--xdp' parameter which is linked to
commit 2abc3d76 ("ss: add AF_XDP support").

Fixes: aba5acdf ("(Logical change 1.3)") # packet raw/dgram
Fixes: 351efcde ("Update header files to 2.6.14") # dccp
Fixes: f89d46ad ("ss: Add support for SCTP protocol") # sctp
Fixes: 2abc3d76 ("ss: add AF_XDP support") # xdp
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoss: man: add missing entries for TIPC
Matthieu Baerts [Mon, 17 Oct 2022 17:03:06 +0000 (19:03 +0200)] 
ss: man: add missing entries for TIPC

'ss -h' was mentioning TIPC but not the man page.

Fixes: 5caf79a0 ("ss: Add support for TIPC socket diag in ss tool")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoss: man: add missing entries for MPTCP
Matthieu Baerts [Mon, 17 Oct 2022 17:03:05 +0000 (19:03 +0200)] 
ss: man: add missing entries for MPTCP

'ss -h' was mentioning MPTCP but not the man page.

While at it, also add the missing '.' at the end of the list, before the
new sentence.

Fixes: 9c3be2c0 ("ss: mptcp: add msk diag interface support")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodcb: unblock mnl_socket_recvfrom if not message received
Junxin Chen [Wed, 19 Oct 2022 01:20:08 +0000 (09:20 +0800)] 
dcb: unblock mnl_socket_recvfrom if not message received

Currently, the dcb command sinks to the kernel through the netlink
to obtain information. However, if the kernel fails to obtain infor-
mation or is not processed, the dcb command is suspended.

For example, if we don't implement dcbnl_ops->ieee_getpfc in the
kernel, the command "dcb pfc show dev eth1" will be stuck and subsequent
commands cannot be executed.

This patch adds the NLM_F_ACK flag to the netlink in mnlu_msg_prepare
to ensure that the kernel responds to user requests.

After the problem is solved, the execution result is as follows:
$ dcb pfc show dev eth1
Attribute not found: Success

Fixes: 67033d1c1c8a ("Add skeleton of a new tool, dcb")
Signed-off-by: Junxin Chen <chenjunxin1@huawei.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoiplink_can: add missing `]' of the bitrate, dbitrate and termination arrays
Vincent Mailhol [Mon, 10 Oct 2022 14:16:38 +0000 (23:16 +0900)] 
iplink_can: add missing `]' of the bitrate, dbitrate and termination arrays

The command "ip --details link show canX" misses the closing bracket
`]' of the bitrate, the dbitrate and the termination arrays. The --json
output is not impacted.

Change the first argument of close_json_array() from PRINT_JSON to
PRINT_ANY to fix the problem. The second argument was already set
correctly.

Fixes: 67f3c7a5cc0d ("iplink_can: use PRINT_ANY to factorize code and fix signedness")
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agou32: fix json formatting of flowid
Stephen Hemminger [Thu, 13 Oct 2022 15:30:34 +0000 (08:30 -0700)] 
u32: fix json formatting of flowid

The code to print json was not done for the flow id.
This would lead to incorrect JSON format output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: update from 6.1 pre rc1
Stephen Hemminger [Tue, 11 Oct 2022 14:17:28 +0000 (07:17 -0700)] 
uapi: update from 6.1 pre rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge branch 'merge' of ../iproute2-next
Stephen Hemminger [Tue, 11 Oct 2022 14:14:19 +0000 (07:14 -0700)] 
Merge branch 'merge' of ../iproute2-next

3 years agof_flower: Introduce L2TPv3 support
Wojciech Drewek [Fri, 7 Oct 2022 07:51:01 +0000 (09:51 +0200)] 
f_flower: Introduce L2TPv3 support

Add support for matching on L2TPv3 session ID.
Session ID can be specified only when ip proto was
set to IPPROTO_L2TP.

L2TPv3 might be transported over IP or over UDP,
this implementation is only about L2TPv3 over IP.
IPv6 is also supported, in this case next header
is set to IPPROTO_L2TP.

Example filter:
  # tc filter add dev eth0 ingress prio 1 protocol ip \
      flower \
        ip_proto l2tp \
        l2tpv3_sid 1234 \
        skip_sw \
      action drop

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agotaprio: don't print the clockid if invalid
Vladimir Oltean [Tue, 4 Oct 2022 12:00:27 +0000 (15:00 +0300)] 
taprio: don't print the clockid if invalid

The clockid will not be reported by the kernel if the qdisc is fully
offloaded, since it is implicitly the PTP Hardware Clock of the device.

Currently "tc qdisc show" points us to a "clockid invalid" for a qdisc
created with "flags 0x2", let's hide that attribute instead.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoman: ss.8: fix a typo
Andrea Claudi [Tue, 4 Oct 2022 14:25:03 +0000 (16:25 +0200)] 
man: ss.8: fix a typo

Fixes: f76ad635f21d ("man: break long lines in man page sources")
Reported-by: Prijesh Patel <prpatel@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
3 years agov6.0.0 v6.0.0
Stephen Hemminger [Tue, 4 Oct 2022 15:17:15 +0000 (08:17 -0700)] 
v6.0.0

3 years agoss: fix duplicate include
Stephen Hemminger [Tue, 4 Oct 2022 15:11:01 +0000 (08:11 -0700)] 
ss: fix duplicate include

No need to include rt_names.h twice.

Fixes: 31f45088c9c8 ("build: fix build failure with -fno-common")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge remote-tracking branch 'main/main' into next
David Ahern [Mon, 3 Oct 2022 14:51:23 +0000 (08:51 -0600)] 
Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoip: xfrm: support adding xfrm metadata as lwtunnel info in routes
Eyal Birger [Mon, 3 Oct 2022 09:12:12 +0000 (12:12 +0300)] 
ip: xfrm: support adding xfrm metadata as lwtunnel info in routes

Support for xfrm metadata as lwtunnel metadata was added in kernel commit
2c2493b9da91 ("xfrm: lwtunnel: add lwtunnel support for xfrm interfaces in collect_md mode")

This commit adds the respective support in lwt routes.

Example use (consider ipsec1 as an xfrm interface in "external" mode):

ip route add 10.1.0.0/24 dev ipsec1 encap xfrm if_id 1

Or in the context of vrf, one can also specify the "link" property:

ip route add 10.1.0.0/24 dev ipsec1 encap xfrm if_id 1 link_dev eth15

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoip: xfrm: support "external" (`collect_md`) mode in xfrm interfaces
Eyal Birger [Mon, 3 Oct 2022 09:12:11 +0000 (12:12 +0300)] 
ip: xfrm: support "external" (`collect_md`) mode in xfrm interfaces

Support for collect metadata mode was introduced in kernel commit
abc340b38ba2 ("xfrm: interface: support collect metadata mode")

This commit adds support for creating xfrm interfaces in this
mode.

Example use:

ip link add ipsec1 type xfrm external

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Mon, 3 Oct 2022 14:42:41 +0000 (08:42 -0600)] 
Update kernel headers

Update kernel headers to commit:
    62c07983bef9 ("once: add DO_ONCE_SLOW() for sleepable contexts")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoiplink_bridge: Add no_linklocal_learn option support
Ido Schimmel [Sat, 1 Oct 2022 14:35:51 +0000 (17:35 +0300)] 
iplink_bridge: Add no_linklocal_learn option support

Kernel commit 70e4272b4c81 ("net: bridge: add no_linklocal_learn bool
option") added the no_linklocal_learn bridge option that can be set via
sysfs or netlink.

Add iproute2 support, allowing it to query and set the option via
netlink.

The option is useful, for example, in scenarios where we want the bridge
to be able to refresh dynamic FDB entries that were added by user space
and are pointing to locked bridge ports, but do not want the bridge to
populate its FDB from EAPOL frames used for authentication.

Example:

 $ ip -j -d link show dev br0 | jq ".[][\"linkinfo\"][\"info_data\"][\"no_linklocal_learn\"]"
 0
 $ cat /sys/class/net/br0/bridge/no_linklocal_learn
 0

 # ip link set dev br0 type bridge no_linklocal_learn 1

 $ ip -j -d link show dev br0 | jq ".[][\"linkinfo\"][\"info_data\"][\"no_linklocal_learn\"]"
 1
 $ cat /sys/class/net/br0/bridge/no_linklocal_learn
 1

 # ip link set dev br0 type bridge no_linklocal_learn 0

 $ ip -j -d link show dev br0 | jq ".[][\"linkinfo\"][\"info_data\"][\"no_linklocal_learn\"]"
 0
 $ cat /sys/class/net/br0/bridge/no_linklocal_learn
 0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Sun, 2 Oct 2022 22:45:25 +0000 (16:45 -0600)] 
Update kernel headers

Update kernel headers to commit:
    bc37b24ee05e ("Merge branch 'mlx5-xsk-updates-part3-2022-09-30'")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: fix man page for linecard
Stephen Hemminger [Fri, 30 Sep 2022 19:40:44 +0000 (12:40 -0700)] 
devlink: fix man page for linecard

Doing make check on iproute2 runs several checks including man page
checks for common errors. Recent addition of linecard support to
devlink introduced this error.

Checking manpages for syntax errors...
an-old.tmac: <standard input>: line 31: 'R' is a string (producing the registered sign), not a macro.
Error in devlink-lc.8

Fixes: 4cb0bec3744a ("devlink: add support for linecard show and type set")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoip-monitor: Fix the selection of rtnl groups when listening for all object types
Benjamin Poirier [Thu, 22 Sep 2022 06:19:38 +0000 (15:19 +0900)] 
ip-monitor: Fix the selection of rtnl groups when listening for all object types

Currently, when using `ip monitor`, family-specific rtnl multicast groups
(ex. RTNLGRP_IPV4_IFADDR) are used when specifying the '-family' option (or
one of its short forms) and an object type is specified (ex. `ip -4 monitor
addr`) but not when listening for changes to all object types (ex. `ip -4
monitor`). In that case, multicast groups for all families, regardless of
the '-family' option, are used. Depending on the object type, this leads to
ignoring the '-family' selection (MROUTE, ADDR, NETCONF), or printing stray
prefix headers with no event (ROUTE, RULE).

Rewrite the parameter parsing code so that per-family rtnl multicast groups
are selected in all cases.

The issue can be witnessed while running `ip -4 monitor label` at the same
time as the following command:
ip link add dummy0 address 02:00:00:00:00:01 up type dummy
The output includes:
[ROUTE][ROUTE][ADDR]9: dummy0    inet6 fe80::ff:fe00:1/64 scope link
       valid_lft forever preferred_lft forever
Notice the stray "[ROUTE]" labels (related to filtered out ipv6 routes) and
the ipv6 ADDR entry. Those do not appear if using `ip -4 monitor label
route address`.

Fixes: aba5acdfdb34 ("(Logical change 1.3)")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoip-monitor: Include stats events in default and "all" cases
Benjamin Poirier [Thu, 22 Sep 2022 06:19:37 +0000 (15:19 +0900)] 
ip-monitor: Include stats events in default and "all" cases

It seems that stats were omitted from `ip monitor` and `ip monitor all`.
Since all other event types are included, include stats as well. Use the
same logic as for nexthops.

Fixes: a05a27c07cbf ("ipmonitor: Add monitoring support for stats events")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoip-monitor: Do not listen for nexthops by default when specifying stats
Benjamin Poirier [Thu, 22 Sep 2022 06:19:36 +0000 (15:19 +0900)] 
ip-monitor: Do not listen for nexthops by default when specifying stats

`ip monitor stats` listens for changes to nexthops and stats. It should
listen for stats only.

Fixes: a05a27c07cbf ("ipmonitor: Add monitoring support for stats events")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agobridge: Do not print stray prefixes in monitor mode
Benjamin Poirier [Thu, 22 Sep 2022 06:19:35 +0000 (15:19 +0900)] 
bridge: Do not print stray prefixes in monitor mode

When using `bridge monitor` with the '-timestamp' option or the "all"
parameter, prefixes are printed before the actual event descriptions.
Currently, those prefixes are printed for each netlink message that's
received. However, some netlink messages do not lead to an event
description being printed. That's usually because a message is not related
to AF_BRIDGE. This results in stray prefixes being printed.

Restructure accept_msg() and its callees such that prefixes are only
printed after a message has been checked for eligibility.

The issue can be witnessed using the following commands:
ip link add dummy0 type dummy
# Start `bridge monitor all` now in another terminal.
# Cause a stray "[LINK]" to be printed (family 10).
# It does not appear yet because the output is line buffered.
ip link set dev dummy0 up
# Cause a stray "[NEIGH]" to be printed (family 2).
ip neigh add 10.0.0.1 lladdr 02:00:00:00:00:01 dev dummy0
# Cause a genuine entry to be printed, which flushes the previous
# output.
bridge fdb add 02:00:00:00:00:01 dev dummy0
# We now see:
# [LINK][NEIGH][NEIGH]02:00:00:00:00:01 dev dummy0 self permanent

Fixes: d04bc300c3e3 ("Add bridge command")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: update of if_tun.h
Stephen Hemminger [Fri, 30 Sep 2022 19:35:48 +0000 (12:35 -0700)] 
uapi: update of if_tun.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agortnetlink: add new function rtnl_echo_talk()
Hangbin Liu [Thu, 29 Sep 2022 08:10:16 +0000 (16:10 +0800)] 
rtnetlink: add new function rtnl_echo_talk()

Add a new function rtnl_echo_talk() that could be used when the
sub-component supports NLM_F_ECHO flag. With this function we can
remove the redundant code added by commit b264b4c6568c7 ("ip: add
NLM_F_ECHO support").

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: fix typo in variable name in ifname_map_cb()
Jiri Pirko [Thu, 29 Sep 2022 10:24:36 +0000 (12:24 +0200)] 
devlink: fix typo in variable name in ifname_map_cb()

s/port_ifindex/port_index/

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: move use_iec into struct dl
Jiri Pirko [Thu, 29 Sep 2022 10:24:35 +0000 (12:24 +0200)] 
devlink: move use_iec into struct dl

Similar to other bool opts that could be set by the user, move the
global variable use_iec to be part of struct dl.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agotc/tc_monitor: print netlink extack message
Hangbin Liu [Tue, 27 Sep 2022 10:21:07 +0000 (18:21 +0800)] 
tc/tc_monitor: print netlink extack message

Upstream commit "sched: add extack for tfilter_notify" will make
tc event contain extack message, which could be used for logging
offloading failures. Let's print the extack message in tc monitor.
e.g.

  # tc monitor
  added chain dev enp3s0f1np1 parent ffff: chain 0
  added filter dev enp3s0f1np1 ingress protocol all pref 49152 flower chain 0 handle 0x1
    ct_state +trk+new
    not_in_hw
          action order 1: gact action drop
           random type none pass val 0
           index 1 ref 1 bind 1

  Warning: mlx5_core: matching on ct_state +new isn't supported.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agolibnetlink: add offset for nl_dump_ext_ack_done
Hangbin Liu [Tue, 27 Sep 2022 10:21:06 +0000 (18:21 +0800)] 
libnetlink: add offset for nl_dump_ext_ack_done

There is no rule to have an error code after NLMSG_DONE msg. The only reason
we has this offset is that kernel function netlink_dump_done() has an error
code followed by the netlink message header.

Making nl_dump_ext_ack_done() has an offset parameter. So we can adjust
this for NLMSG_DONE message without error code.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoip link: add sub-command to view and change DSA conduit interface
Vladimir Oltean [Thu, 22 Sep 2022 22:06:55 +0000 (01:06 +0300)] 
ip link: add sub-command to view and change DSA conduit interface

Support the "dsa" kind of rtnl_link_ops exported by the kernel, and
export reads/writes to IFLA_DSA_MASTER.

Examples:

$ ip link set swp0 type dsa conduit eth1

$ ip -d link show dev swp0
    (...)
    dsa conduit eth0

$ ip -d -j link show swp0
[
{
"link": "eth1",
"linkinfo": {
"info_kind": "dsa",
"info_data": {
"conduit": "eth1"
}
},
}
]

Note that by construction and as shown in the example, the IFLA_LINK
reported by a DSA user port is identical to what is reported through
IFLA_DSA_MASTER. However IFLA_LINK is not writable, and overloading its
meaning to make it writable would clash with other users of IFLA_LINK
(vlan etc) for which writing this property does not make sense.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agolink: display 'allmulti' counter
Nicolas Dichtel [Mon, 19 Sep 2022 08:31:36 +0000 (10:31 +0200)] 
link: display 'allmulti' counter

This counter is based on the same principle that the 'promiscuity' counter:
the flag ALLMULTI is displayed only when it is explicitly requested by the
userland. This counter enables to know if 'allmulti' is configured on an
interface.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoip: add NLM_F_ECHO support
Hangbin Liu [Fri, 16 Sep 2022 03:34:28 +0000 (11:34 +0800)] 
ip: add NLM_F_ECHO support

When user space configures the kernel with netlink messages, it can set the
NLM_F_ECHO flag to request the kernel to send the applied configuration back
to the caller. This allows user space to retrieve configuration information
that are filled by the kernel (either because these parameters can only be
set by the kernel or because user space let the kernel choose a default
value).

NLM_F_ACK is also supplied incase the kernel doesn't support NLM_F_ECHO
and we will wait for the reply forever. Just like the update in
iplink.c, which I plan to post a patch to kernel later.

A new parameter -echo is added when user want to get feedback from kernel.
e.g.

  # ip -echo addr add 192.168.0.1/24 dev eth1
  3: eth1    inet 192.168.0.1/24 scope global eth1
         valid_lft forever preferred_lft forever
  # ip -j -p -echo addr del 192.168.0.1/24 dev eth1
  [ {
          "deleted": true,
          "index": 3,
          "dev": "eth1",
          "family": "inet",
          "local": "192.168.0.1",
          "prefixlen": 24,
          "scope": "global",
          "label": "eth1",
          "valid_life_time": 4294967295,
          "preferred_life_time": 4294967295
      } ]

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoseg6: add support for flavors in SRv6 End* behaviors
Paolo Lungaroni [Mon, 12 Sep 2022 17:39:23 +0000 (19:39 +0200)] 
seg6: add support for flavors in SRv6 End* behaviors

As described in RFC 8986 [1], processing operations carried out by SRv6
End, End.X and End.T (End* for short) behaviors can be modified or
extended using the "flavors" mechanism. This patch adds the support for
PSP,USP,USD flavors (defined in [1]) and for NEXT-C-SID flavor (defined
in [2]) in SRv6 End* behaviors. Specifically, we add a new optional
attribute named "flavors" that can be leveraged by the user to enable
specific flavors while creating an SRv6 End* behavior instance.
Multiple flavors can be specified together by separating them using
commas.

If a specific flavor (or a combination of flavors) is not supported by the
underlying Linux kernel, an error message is reported to the user and the
creation of the specific behavior instance is aborted.

When the flavors attribute is omitted, the regular SRv6 End* behavior is
performed.

Flavors such as PSP, USP and USD do not accept additional configuration
attributes. Conversely, the NEXT-C-SID flavor can be configured to support
user-provided Locator-Block and Locator-Node Function lengths using,
respectively, the lblen and the nflen attributes.

Both lblen and nflen values must be evenly divisible by 8 and their sum
must not exceed 128 bit (i.e. the C-SID container size).

If the lblen attribute is omitted, the default value chosen by the Linux
kernel is 32-bit. If the nflen attribute is omitted, the default value
chosen by the Linux kernel is 16-bit.

Some examples:
ip -6 route add 2001:db8::1 encap seg6local action End flavors next-csid dev eth0
ip -6 route add 2001:db8::2 encap seg6local action End flavors next-csid lblen 48 nflen 16 dev eth0

Standard Output:
ip -6 route show 2001:db8::2
2001:db8::2  encap seg6local action End flavors next-csid lblen 48 nflen 16 dev eth0 metric 1024 pref medium

JSON Output:
ip -6 -j -p route show 2001:db8::2
[ {
        "dst": "2001:db8::2",
        "encap": "seg6local",
        "action": "End",
        "flavors": [ "next-csid" ],
        "lblen": 48,
        "nflen": 16,
        "dev": "eth0",
        "metric": 1024,
        "flags": [ ],
        "pref": "medium"
} ]

[1] - https://datatracker.ietf.org/doc/html/rfc8986
[2] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression

Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Thu, 22 Sep 2022 22:50:08 +0000 (15:50 -0700)] 
Update kernel headers

Update kernel headers to commit:
    0140a7168f8b ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agouapi: update bpf and virtio_net
Stephen Hemminger [Tue, 20 Sep 2022 22:58:41 +0000 (15:58 -0700)] 
uapi: update bpf and virtio_net

Update headers based on 6.0-rc6

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agomacsec: add user manual description for extended packet number feature
Emeel Hakim [Sun, 11 Sep 2022 09:26:56 +0000 (12:26 +0300)] 
macsec: add user manual description for extended packet number feature

Update the user manual describing how to use extended packet number (XPN)
feature for macsec. As part of configuring XPN, providing ssci and salt is
required hence update user manual on  how to provide the above as part of
the ip macsec command.

Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agomacsec: add Extended Packet Number support
Emeel Hakim [Sun, 11 Sep 2022 09:26:55 +0000 (12:26 +0300)] 
macsec: add Extended Packet Number support

This patch adds support for extended packet number (XPN).
XPN can be configured by passing 'cipher gcm-aes-xpn-128' as part of
the ip link add command using macsec type.
In addition, using 'xpn' keyword instead of the 'pn', passing a 12
bytes salt using the 'salt' keyword and passing short secure channel
id (ssci) using the 'ssci' keyword as part of the ip macsec command
is required (see example).

e.g:

create a MACsec device on link eth0 with enabled xpn
  # ip link add link eth0 macsec0 type macsec port 11
encrypt on cipher gcm-aes-xpn-128

configure a secure association on the device
  # ip macsec add macsec0 tx sa 0 xpn 1024 on ssci 5
salt 838383838383838383838383
key 01 81818181818181818181818181818181

configure a secure association on the device with ssci = 5
  # ip macsec add macsec0 tx sa 0 xpn 1024 on ssci 5
salt 838383838383838383838383
key 01 82828282828282828282828282828282

Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoman: devlink-region(8): document the 'new' subcommand
Baruch Siach [Fri, 2 Sep 2022 05:01:17 +0000 (08:01 +0300)] 
man: devlink-region(8): document the 'new' subcommand

Some driver provide no region snapshot unless created first with the
'new' operation. Add documentation and example.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: fix region-new usage message
Baruch Siach [Mon, 29 Aug 2022 07:49:13 +0000 (10:49 +0300)] 
devlink: fix region-new usage message

The snapshot parameter is optional.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoutils: extract CTRL_ATTR_MAXATTR and save it
Jacob Keller [Fri, 26 Aug 2022 18:17:41 +0000 (11:17 -0700)] 
utils: extract CTRL_ATTR_MAXATTR and save it

mnlu_gen_socket_open opens a socket and configures it for use with a
generic netlink family. As part of this process it sends a
CTRL_CMD_GETFAMILY to get the ID for the family name requested.

In addition to the family id, this command reports a few other useful
values including the maximum attribute. The maximum attribute is useful in
order to know whether a given attribute is supported and for knowing the
necessary size to allocate for other operations such as policy dumping.

Since we already have to issue a CTRL_CMD_GETFAMILY to get the id, we can
also store the maximum attribute as well. Modify the callback functions to
parse the maximum attribute NLA and store it in the mnlu_gen_socket
structure.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agomnlg: remove unnused mnlg_socket structure
Jacob Keller [Fri, 26 Aug 2022 18:17:40 +0000 (11:17 -0700)] 
mnlg: remove unnused mnlg_socket structure

Commit 62ff25e51bb6 ("devlink: Use generic socket helpers from library")
removed all of the users of struct mnlg_socket, but didn't remove the
structure itself. Fix that.

Fixes: 62ff25e51bb6 ("devlink: Use generic socket helpers from library")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoUpdate kernel headers
David Ahern [Thu, 1 Sep 2022 02:42:52 +0000 (20:42 -0600)] 
Update kernel headers

Update kernel headers to commit:
    cb45a8bf4693 ("net: axienet: Switch to 64-bit RX/TX statistics")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: fix parallel flash notifications processing
Jiri Pirko [Thu, 25 Aug 2022 08:04:20 +0000 (10:04 +0200)] 
devlink: fix parallel flash notifications processing

Now that it is possible to flash multiple devlink instances in parallel,
the notification processing callback needs to count in the fact that it
receives message that belongs to different devlink instance. So handle
the it gracefully and don't error out.

Reported-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: load port-ifname map on demand
Jiri Pirko [Thu, 25 Aug 2022 08:04:19 +0000 (10:04 +0200)] 
devlink: load port-ifname map on demand

So far, the port-ifname map was loaded during devlink init
no matter if actually needed or not. Port dump cmd which is utilized
for this in kernel takes lock for every devlink instance.
That may lead to unnecessary blockage of command.

Load the map only in time it is needed to lookup ifname.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoman: fix a typo in devlink-dev(8)
Denis Ovsienko [Wed, 31 Aug 2022 17:07:25 +0000 (10:07 -0700)] 
man: fix a typo in devlink-dev(8)

Signed-off-by: Denis Ovsienko <denis@ovsienko.info>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agouapi: update headers for xfrm and virtio_ring.h
Stephen Hemminger [Wed, 31 Aug 2022 16:37:13 +0000 (09:37 -0700)] 
uapi: update headers for xfrm and virtio_ring.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoMerge branch 'devlink-rm-dl_argv_parse_put' into next
David Ahern [Wed, 24 Aug 2022 15:54:16 +0000 (08:54 -0700)] 
Merge branch 'devlink-rm-dl_argv_parse_put' into next

Jacob Keller  says:

====================

This series removes the dl_argv_parse_put function which both parses the
command line arguments and places them into the netlink header.

This was originally sent as an RFC at
https://lore.kernel.org/netdev/20220805234155.2878160-1-jacob.e.keller@intel.com/

Since there is some ongoing work around policy code being generated from
YAML, I thought it best to wait on the devlink policy portion of this series
for now.

Jiri mentioned he wanted to base some work on top of this, so I am sending
just the cleanup patches.

The primary motivation for this is due to the fact that dl_argv_parse_put
requires a netlink header, meaning a command must have already been
prepared. This prevents addition of a different netlink command to get the
policy data, and thus prevents us from using this variant while checking
netlink policy.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: remove dl_argv_parse_put
Jacob Keller [Thu, 18 Aug 2022 21:15:21 +0000 (14:15 -0700)] 
devlink: remove dl_argv_parse_put

The dl_argv_parse_put function is used to extract arguments from the
command line and convert them to the appropriate netlink attributes. This
function is a combination of calling dl_argv_parse and dl_put_opts.

A future change is going to refactor dl_argv_parse to check the kernel's
netlink policy for the command. This requires issuing another netlink
message which requires calling dl_argv_parse before
mnlu_gen_socket_cmd_prepare. Otherwise, the get policy command issued in
dl_argv_parse would overwrite the prepared buffer.

This conflicts with dl_argv_parse_put which requires being called after
mnlu_gen_socket_cmd_prepare.

Remove dl_argv_parse_put and replace it with appropriate calls to
dl_argv_parse and dl_put_opts. This allows us to ensure dl_argv_parse is
called before mnlu_gen_socket_cmd_prepare while dl_put_opts is called
afterwards.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agodevlink: use dl_no_arg instead of checking dl_argc == 0
Jacob Keller [Thu, 18 Aug 2022 21:15:20 +0000 (14:15 -0700)] 
devlink: use dl_no_arg instead of checking dl_argc == 0

Use the helper dl_no_arg function to check for whether the command has any
arguments.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agouapi: update headers from 6.0-rc1
Stephen Hemminger [Mon, 15 Aug 2022 02:25:21 +0000 (19:25 -0700)] 
uapi: update headers from 6.0-rc1

These are the post-merge of netwoking user headers.
Note: this fixes compilation with gcc-12

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agovdpa: fix statistics API mismatch
Stephen Hemminger [Mon, 15 Aug 2022 02:22:37 +0000 (19:22 -0700)] 
vdpa: fix statistics API mismatch

The final vdpa.h header from upstream has slightly different
definition of VDPA stats get.

Fixes: 6f97e9c9337b ("vdpa: Add support for reading vdpa device statistics")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agodevlink: expose nested devlink for a line card object
Jiri Pirko [Tue, 9 Aug 2022 13:17:30 +0000 (15:17 +0200)] 
devlink: expose nested devlink for a line card object

If line card object contains a nested devlink, expose it.

Example:

$ devlink lc show pci/0000:01:00.0 lc 1
pci/0000:01:00.0:
  lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0
    supported_types:
      16x100G
$ devlink dev show auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoMerge branch 'main' into next
David Ahern [Sun, 14 Aug 2022 17:31:10 +0000 (11:31 -0600)] 
Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
3 years agoconfigure: Define _GNU_SOURCE when checking for setns
Khem Raj [Thu, 11 Aug 2022 05:34:40 +0000 (22:34 -0700)] 
configure: Define _GNU_SOURCE when checking for setns

glibc defines this function only as gnu extention

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoipstats: add missing headers
Stephen Hemminger [Tue, 9 Aug 2022 20:27:33 +0000 (13:27 -0700)] 
ipstats: add missing headers

IWYU reports several headers are not explicitly
included by ipstats.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 years agoipstats: Add param.h for musl
Changhyeok Bae [Tue, 9 Aug 2022 04:01:05 +0000 (04:01 +0000)] 
ipstats: Add param.h for musl

Fix build error for musl
| /usr/src/debug/iproute2/5.19.0-r0/iproute2-5.19.0/ip/ipstats.c:231: undefined reference to `MIN'

Signed-off-by: Changhyeok Bae <changhyeok.bae@gmail.com>
3 years agoMerge branch 'main' into next
David Ahern [Thu, 4 Aug 2022 18:38:31 +0000 (12:38 -0600)] 
Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>