git.ipfire.org Git - thirdparty/iproute2.git/log

devlink: use dl_no_arg instead of checking dl_argc == 0

Use the helper dl_no_arg function to check for whether the command has any
arguments.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

devlink: expose nested devlink for a line card object

If line card object contains a nested devlink, expose it.

Example:

$ devlink lc show pci/0000:01:00.0 lc 1
pci/0000:01:00.0:
  lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0
    supported_types:
      16x100G
$ devlink dev show auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

configure: Define _GNU_SOURCE when checking for setns

glibc defines this function only as gnu extention

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ipstats: add missing headers

IWYU reports several headers are not explicitly
included by ipstats.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ipstats: Add param.h for musl

Fix build error for musl
| /usr/src/debug/iproute2/5.19.0-r0/iproute2-5.19.0/ip/ipstats.c:231: undefined reference to `MIN'

Signed-off-by: Changhyeok Bae <changhyeok.bae@gmail.com>

Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

devlink: add support for running selftests

Add commands and helper APIs to run selftests.
Include a selftest id for a non volatile memory i.e. flash.
Also, update the man page and bash-completion for selftests
commands.

Examples:
$ devlink dev selftests run pci/0000:03:00.0 id flash
pci/0000:03:00.0:
    flash:
      status passed

$ devlink dev selftests show pci/0000:03:00.0
pci/0000:03:00.0
      flash

$ devlink dev selftests show pci/0000:03:00.0 -j
{"selftests":{"pci/0000:03:00.0":["flash"]}}

$ devlink dev selftests run pci/0000:03:00.0 id flash -j
{"selftests":{"pci/0000:03:00.0":{"flash":{"status":"passed"}}}}

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

v5.19.0

Merge branch 'main' into next

Conflicts:
vdpa/include/uapi/linux/vdpa.h

Signed-off-by: David Ahern <dsahern@kernel.org>

seg6: add support for SRv6 Headend Reduced Encapsulation

This patch adds the support for the reduced version of the H.Encaps and
H.L2Encaps behaviors as defined in RFC 8986 [1].

H.Encaps.Red and H.L2Encaps.Red SRv6 behaviors are an optimization of the
H.Encaps and H.L2Encaps aiming to reduce the length of the SID List carried
in the pushed SRH. Specifically, the reduced version of the behaviors
removes the first SID contained in the SID List (i.e. SRv6 Policy) by
storing it into the IPv6 Destination Address. When SRv6 Policy is made of
only one SID, the reduced version of the behaviors omits the SRH at all and
pushes that SID directly into the IPv6 DA.

Some examples:
ip -6 route add 2001:db8::1 encap seg6 mode encap.red segs fcf0:1::e,fcf0:2::d6 dev eth0
ip -6 route add 2001:db8::2 encap seg6 mode l2encap.red segs fcf0:1::d2 dev eth0

Standard Output:
ip -6 route show 2001:db8::1
2001:db8::1  encap seg6 mode encap.red segs 2 [ fcf0:1::e fcf0:2::d6 ] dev eth0 metric 1024 pref medium

JSON Output:
ip -6 -j -p route show 2001:db8::1
[ {
        "dst": "2001:db8::1",
        "encap": "seg6",
        "mode": "encap.red",
        "segs": [ "fcf0:1::e","fcf0:2::d6" ],
        "dev": "eth0",
        "metric": 1024,
        "flags": [ ],
        "pref": "medium"
    } ]

[1] - https://datatracker.ietf.org/doc/html/rfc8986

Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit
63757225a933 ("Merge tag 'mlx5-updates-2022-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux")

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'pppoe-in-flower' into next

Wojciech Drewek says:

====================

This patchset implements support for matching
on PPPoE specific fields using tc-flower.
First patch introduces small refactor which allows
to use same mechanism of finding protocol for
both ppp and ether protocols. Second patch
adds support for parsing ppp protocols.
Last patch is about parsing PPPoE fields.

Kernel changes (merged):
https://lore.kernel.org/netdev/20220726203133.2171332-1-anthony.l.nguyen@intel.com/T/#t

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

f_flower: Introduce PPPoE support

Introduce PPPoE specific fields in tc-flower:
- session id (16 bits)
- ppp protocol (16 bits)
Those fields can be provided only when protocol was set to
ETH_P_PPP_SES. ppp_proto works similar to vlan_ethtype, i.e.
ppp_proto overwrites eth_type. Thanks to that, fields from
encapsulated protocols (such as src_ip) can be specified.

e.g.
  # tc filter add dev ens6f0 ingress prio 1 protocol ppp_ses \
      flower \
        pppoe_sid 1234 \
        ppp_proto ip \
        dst_ip 127.0.0.1 \
        src_ip 127.0.0.2 \
      action drop

Vlan and cvlan is also supported, in this case cvlan_ethtype
or vlan_ethtype has to be set to ETH_P_PPP_SES.

e.g.
  # tc filter add dev ens6f0 ingress prio 1 protocol 802.1Q \
      flower \
        vlan_id 2 \
        vlan_ethtype ppp_ses \
        pppoe_sid 1234 \
        ppp_proto ip \
        dst_ip 127.0.0.1 \
        src_ip 127.0.0.2 \
      action drop

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

lib: Introduce ppp protocols

PPP protocol field uses different values than ethertype. Introduce
utilities for translating PPP protocols from strings to values
and vice versa. Use generic API from utils in order to get
proto id and name.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

lib: refactor ll_proto functions

Move core logic of ll_proto_n2a and ll_proto_a2n
to utils.c and make it more generic by allowing to
pass table of protocols as argument (proto_tb).
Introduce struct proto with protocol ID and name to
allow this. This wil allow to use those functions by
other use cases.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Import posix_types.h uapi file from point of last kernel headers sync

__kernel_old_time_t definition is needed for pppoe-in-flower patches.

Signed-off-by: David Ahern <dsahern@kernel.org>

Import ppp_defs.h uapi file from point of last kernel headers sync

ppp_defs header file is needed by PPPoE in flower support.

Signed-off-by: David Ahern <dsahern@kernel.org>

bpf_glue: include errno.h

If __NR_bpf is not enabled, bpf() function set errno and return -1. Thus,
this patch includes the header.

Fixes: ac4e0913beb1 ("bpf: Export bpf syscall wrapper")
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

devlink: add support for linecard show and type set

Introduce a new object "lc" to add devlink support for line cards with
two commands:
show - to get the info about the line card state, list of supported
       types as reported by kernel/driver.
set - to set/clear the line card type.

Example:
$ devlink lc
pci/0000:01:00.0:
  lc 1 state unprovisioned
    supported_types:
       16x100G
  lc 2 state unprovisioned
    supported_types:
       16x100G
  lc 3 state unprovisioned
    supported_types:
       16x100G
  lc 4 state unprovisioned
    supported_types:
       16x100G
  lc 5 state unprovisioned
    supported_types:
       16x100G
  lc 6 state unprovisioned
    supported_types:
       16x100G
  lc 7 state unprovisioned
    supported_types:
       16x100G
  lc 8 state unprovisioned
    supported_types:
       16x100G

To provision the slot #8:

$ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G
$ devlink lc show pci/0000:01:00.0 lc 8
pci/0000:01:00.0:
  lc 8 state active type 16x100G
    supported_types:
       16x100G

To uprovision the slot #8:

$ devlink lc set pci/0000:01:00.0 lc 8 notype

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit:
5588d6280270 ("net/cdc_ncm: Increase NTB max RX/TX values to 64kb")

Signed-off-by: David Ahern <dsahern@kernel.org>

vdpa: Update man page to include vdpa statistics

Update the man page to include vdpa statistics information inroduce in
6f97e9c9337b ("vdpa: Add support for reading vdpa device statistics")

Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

rdma: update uapi/ib_user_verbs.h

Update from 5.19-rc7

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

vdpa: update uapi headers from 5.19-rc7

Keep VDPA sanitized headers up to current kernel.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Revert "uapi: add vdpa.h"

This reverts commit 291898c5ff881d0dc5d947031def0528101476cb.

ip neigh: Fix memory leak when doing 'get'

With the following command sequence:

ip link add dummy0 type dummy
ip neigh add 192.168.0.1 dev dummy0
ip neigh get 192.168.0.1 dev dummy0

when running the last command under valgrind, it reports

32,768 bytes in 1 blocks are definitely lost in loss record 2 of 2
   at 0x483F7B5: malloc (vg_replace_malloc.c:381)
   by 0x17A0EC: rtnl_recvmsg (libnetlink.c:838)
   by 0x17A3D1: __rtnl_talk_iov.constprop.0 (libnetlink.c:1040)
   by 0x17B894: __rtnl_talk (libnetlink.c:1141)
   by 0x17B894: rtnl_talk (libnetlink.c:1147)
   by 0x12E49B: ipneigh_get (ipneigh.c:728)
   by 0x1174CB: do_cmd (ip.c:136)
   by 0x116F7C: main (ip.c:324)

Free the answer obtained from rtnl_talk().

Fixes: 62842362370b ("ipneigh: neigh get support")
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

mptcp: Fix memory leak when getting limits

When running the command `ip mptcp limits` under valgrind, it reports

32,768 bytes in 1 blocks are definitely lost in loss record 1 of 1
   at 0x483F7B5: malloc (vg_replace_malloc.c:381)
   by 0x17A0BC: rtnl_recvmsg (libnetlink.c:838)
   by 0x17A3A1: __rtnl_talk_iov.constprop.0 (libnetlink.c:1040)
   by 0x17B864: __rtnl_talk (libnetlink.c:1141)
   by 0x17B864: rtnl_talk (libnetlink.c:1147)
   by 0x16837D: mptcp_limit_get_set (ipmptcp.c:436)
   by 0x1174CB: do_cmd (ip.c:136)
   by 0x116F7C: main (ip.c:324)

Free the answer obtained from rtnl_talk().

Fixes: 7e0767cd862b ("add support for mptcp netlink interface")
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

mptcp: Fix memory leak when doing 'endpoint show'

With the following command sequence:

ip mptcp endpoint add 127.0.0.1 id 1
ip mptcp endpoint show id 1

when running the last command under valgrind, it reports

32,768 bytes in 1 blocks are definitely lost in loss record 2 of 2
   at 0x483F7B5: malloc (vg_replace_malloc.c:381)
   by 0x17A0AC: rtnl_recvmsg (libnetlink.c:838)
   by 0x17A391: __rtnl_talk_iov.constprop.0 (libnetlink.c:1040)
   by 0x17B854: __rtnl_talk (libnetlink.c:1141)
   by 0x17B854: rtnl_talk (libnetlink.c:1147)
   by 0x168A56: mptcp_addr_show (ipmptcp.c:334)
   by 0x1174CB: do_cmd (ip.c:136)
   by 0x116F7C: main (ip.c:324)

Free the answer obtained from rtnl_talk().

Fixes: 7e0767cd862b ("add support for mptcp netlink interface")
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

bridge: Fix memory leak when doing 'fdb get'

With the following command sequence:

ip link add br0 up type bridge
ip link add dummy0 up address 02:00:00:00:00:01 master br0 type dummy
bridge fdb get 02:00:00:00:00:01 br br0

when running the last command under valgrind, it reports

32,768 bytes in 1 blocks are definitely lost in loss record 2 of 2
   at 0x483F7B5: malloc (vg_replace_malloc.c:381)
   by 0x11C1EC: rtnl_recvmsg (libnetlink.c:838)
   by 0x11C4D1: __rtnl_talk_iov.constprop.0 (libnetlink.c:1040)
   by 0x11D994: __rtnl_talk (libnetlink.c:1141)
   by 0x11D994: rtnl_talk (libnetlink.c:1147)
   by 0x10D336: fdb_get (fdb.c:652)
   by 0x48907FC: (below main) (libc-start.c:332)

Free the answer obtained from rtnl_talk().

Fixes: 4ed5ad7bd3c6 ("bridge: fdb get support")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip address: Fix memory leak when specifying device

Running a command like `ip addr show dev lo` under valgrind informs us that

32,768 bytes in 1 blocks are definitely lost in loss record 4 of 4
   at 0x483577F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x16CBE2: rtnl_recvmsg (libnetlink.c:775)
   by 0x16CF04: __rtnl_talk_iov (libnetlink.c:954)
   by 0x16E257: __rtnl_talk (libnetlink.c:1059)
   by 0x16E257: rtnl_talk (libnetlink.c:1065)
   by 0x115CB1: ipaddr_link_get (ipaddress.c:1833)
   by 0x11A0D1: ipaddr_list_flush_or_save (ipaddress.c:2030)
   by 0x1152EB: do_cmd (ip.c:115)
   by 0x114D6F: main (ip.c:321)

After calling store_nlmsg(), the original buffer should be freed. That is
the pattern used elsewhere through the rtnl_dump_filter() call chain.

Fixes: 884709785057 ("ip address: Set device index in dump request")
Reported-by: Binu Gopalakrishnapillai <binug@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

uapi: add virtio_ring.h

When vdpa was updated, it included linux/virtio_ring.h but that
sanitized header file was not added.

Fixes: bd91c7647189 ("vdpa: Allow for printing negotiated features of a device")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

uapi: add vdpa.h

Iproute2 depends on kernel headers and all necessary kernel headers
should be in iproute tree.

Fixes: c2ecc82b9d4c ("vdpa: Add vdpa tool")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

uapi: update bpf.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

libbpf: add xdp program name support

In bpf program, only the program name is unique. Before this patch, if there
are multiple programs with the same section name, only the first program
will be attached. With program name support, users could specify the exact
program they want to attach.

Note this feature is only supported when iproute2 build with libbpf.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ip: Fix rx_otherhost_dropped support

The commit cited below added a new column to print_stats64(). However it
then updated only one size_columns() call site, neglecting to update the
remaining three. As a result, in those not-updated invocations,
size_columns() now accesses a vararg argument that is not being passed,
which is undefined behavior.

Fixes: cebf67a35d8a ("show rx_otherehost_dropped stat in ip link show")
CC: Tariq Toukan <tariqt@nvidia.com>
CC: Itay Aveksis <itayav@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

ip: Fix size_columns() invocation that passes a 32-bit quantity

In print_stats64(), the last size_columns() invocation passes number of
carrier changes as one of the arguments. The value is decoded as a 32-bit
quantity, but size_columns() expects a 64-bit one. This is undefined
behavior.

The reason valgrind does not cite this is that the previous size_columns()
invocations prime the ABI area used for the value transfer. When these
other invocations are commented away, valgrind does complain that
"conditional jump or move depends on uninitialised value", as would be
expected.

Fixes: 49437375b6c1 ("ip: dynamically size columns when printing stats")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man: tc-fq_codel: add drop_batch

Let's describe the drop_batch parameter added to tc command
by Commit 7868f802e2d9 ("tc: fq_codel: add drop_batch parameter")

Signed-off-by: Yuki Inoguchi <inoguchi.yuki@fujitsu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

uapi: update mptcp.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

ip: Fix size_columns() for very large values

For values near the 64-bit boundary, the iterative application of
powi *= 10 causes powi to overflow without the termination condition of
powi >= val having ever been satisfied. Instead, when determining the
length of the number, iterate val /= 10 and terminate when it's a single
digit.

Fixes: 49437375b6c1 ("ip: dynamically size columns when printing stats")
CC: Tariq Toukan <tariqt@nvidia.com>
CC: Itay Aveksis <itayav@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

iplink: bond_slave: add per port prio support

Add per port priority support for active slave re-selection during
bonding failover. A higher number means higher priority.

This option is only valid for active-backup(1), balance-tlb (5) and
balance-alb (6) mode.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit:
ebeae54d3a77 ("net: pcs: xpcs: depends on PHYLINK in Kconfig")

Signed-off-by: David Ahern <dsahern@kernel.org>

man: tc-ct.8: fix example

tc-ct manpage provides a wrong command to add an ingress qdisc to an
interface:

$ tc qdisc add dev eth0 handle ingress
Error: argument "ingress" is wrong: invalid qdisc ID

Fix it removing the useless "handle" keyword.

Fixes: 924c43778a84 ("man: tc-ct.8: Add manual page for ct tc action")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

l2tp: fix typo in AF_INET6 checksum JSON print

In print_tunnel json output, a typo makes it impossible to know the
value of udp6_csum_rx, printing instead udp6_csum_tx two times.

Fixed getting rid of the typo.

Fixes: 98453b65800f ("ip/l2tp: add JSON support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'lgtm'

man: tc-fq_codel: Fix a typo.

In tc-fq_codel man page, "length .B interval" should be "length interval."

Signed-off-by: Yuki Inoguchi <inoguchi.yuki@fujitsu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

vdpa: Add support for reading vdpa device statistics

Read statistics of a vdpa device. The specific data is a received as a
pair of attribute name and attribute value.

Examples:
1. Read statistics for the virtqueue at index 1

$ vdpa dev vstats show vdpa-a qidx 1
vdpa-a:
vdpa-a: queue_type tx received_desc 321812 completed_desc 321812

2. Read statistics for the virtqueue at index 16
$ vdpa dev vstats show vdpa-a qidx 16
vdpa-a: queue_type control_vq received_desc 17 completed_desc 17

3. Read statisitics for the virtqueue at index 0 with json output
$ vdpa -j dev vstats show vdpa-a qidx 0
{"vstats":{"vdpa-a":{"queue_type":"rx","received_desc":114855,"completed_desc":114617}}}

4. Read statistics for the virtqueue at index 0 with preety json
   output
$ vdpa -jp dev vstats show vdpa-a qidx 0
vdpa -jp dev vstats show vdpa-a qidx 0
{
    "vstats": {
        "vdpa-a": {
            "queue_type": "rx",
            "received_desc": 114855,
            "completed_desc": 114617
        }
    }
}

Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

tc: declaration hides parameter

In several places (code reuse?), the variable handle
is a parameter to the function, but then
is defined inside basic block for classid.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

genl: fix duplicate include guard

Found by LGTM.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

uapi: change name for zerocopy sendfile in tls

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

uapi: update socket.h

From 5.19-rc0

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man: tc-fw: Document masked handle usage

The tc-fw filter can be used to match on the packet's fwmark by adding a
filter with a matching handle. It also supports matching on specific
bits of the fwmark by specifying the handle together with a mask. This
is documented in the usage message below, but not in the man page.

Document it in the man page together with an example.

$ tc filter add fw help
Usage: ... fw [ classid CLASSID ] [ indev DEV ] [ action ACTION_SPEC ]
         CLASSID := Push matching packets to the class identified by CLASSID with format X:Y
                 CLASSID is parsed as hexadecimal input.
         DEV := specify device for incoming device classification.
         ACTION_SPEC := Apply an action on matching packets.
         NOTE: handle is represented as HANDLE[/FWMASK].
                 FWMASK is 0xffffffff by default.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

show rx_otherehost_dropped stat in ip link show

This stat was added in commit 794c24e9921f ("net-core: rx_otherhost_dropped to core_stats")

Tested: sent packet with wrong MAC address from 1
network namespace to another, verified that counter showed "1" in
`ip -s -s link sh` and `ip -s -s -j link sh`

Signed-off-by: Jeffrey Ji <jeffreyji@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Shorter display format for TLS zerocopy sendfile

Commit 21c07b45688f ("ss: Show zerocopy sendfile status of TLS
sockets") started displaying the activation status of zerocopy sendfile
on TLS sockets, exposed via sock_diag. This commit makes the format more
compact: the flag's name is shorter and is printed only when the feature
is active, similar to other flag options.

The flag's name is also generalized ("sendfile" -> "tx") to embrace
possible future optimizations, and includes an explicit indication that
the underlying data must not be modified during transfer ("ro").

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit:
27f2533bcc6e ("nfp: flower: support to offload pedit of IPv6 flowinto fields")

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'bridge-fdb-flush' into next

Nikolay Aleksandrov  says:

====================

Hi,
This set adds support for the new bulk delete flag to allow fdb flushing
for specific entries which are matched based on the supplied options.
The new bridge fdb subcommand is "flush", and as can be seen from the
commits it allows to delete entries based on many different criteria:
- matching vlan
- matching port
- matching all sorts of flags (combinations are allowed)

There are also examples for each option in the respective commit messages.

Examples:
$ bridge fdb flush dev swp2 master vlan 100 dynamic
[ delete all dynamic entries with port swp2 and vlan 100 ]
$ bridge fdb flush dev br0 vlan 1 static
[ delete all static entries in br0's fdb table ]
$ bridge fdb flush dev swp2 master extern_learn nosticky
[ delete all entries with port swp2 which have extern_learn set and
   don't have the sticky flag set ]
$ bridge fdb flush dev br0 brport br0 vlan 100 permanent
[ delete all entries pointing to the bridge itself with vlan 100 ]
$ bridge fdb flush dev swp2 master nostatic nooffloaded
[ delete all entries with port swp2 which are not static and not
   offloaded ]

If keyword is specified and after that nokeyword is specified obviously
the nokeyword would override keyword.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]offloaded entry matching

Add flush support to match entries with or without (if "no" is
prepended) offloaded flag.

Examples:
$ bridge fdb flush dev br0 offloaded
This will delete all offloaded entries in br0's fdb table.

$ bridge fdb flush dev br0 nooffloaded
This will delete all entries except the ones with offloaded flag in
br0's fdb table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]sticky entry matching

Add flush support to match entries with or without (if "no" is
prepended) sticky flag.

Examples:
$ bridge fdb flush dev br0 sticky
This will delete all sticky entries in br0's fdb table.

$ bridge fdb flush dev br0 nosticky
This will delete all entries except the ones with sticky flag in
br0's fdb table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]extern_learn entry matching

Add flush support to match entries with or without (if "no" is
prepended) extern_learn flag.

Examples:
$ bridge fdb flush dev br0 extern_learn
This will delete all extern_learn entries in br0's fdb table.

$ bridge fdb flush dev br0 noextern_learn
This will delete all entries except the ones with extern_learn flag in
br0's fdb table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]added_by_user entry matching

Add flush support to match entries with or without (if "no" is
prepended) added_by_user flag. Note that NTF_USE is used internally
because there is no NTF_ flag that describes such entries.

Examples:
$ bridge fdb flush dev br0 added_by_user
This will delete all added_by_user entries in br0's fdb table.

$ bridge fdb flush dev br0 noadded_by_user
This will delete all entries except the ones with added_by_user flag in
br0's fdb table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]dynamic entry matching

Add flush support to match dynamic or non-dynamic (static or permanent)
entries if "no" is prepended respectively. Note that dynamic entries are
defined as fdbs without NUD_NOARP and NUD_PERMANENT set, and non-dynamic
entries are fdbs with NUD_NOARP set (that matches both static and
permanent entries).

Examples:
$ bridge fdb flush dev br0 dynamic
This will delete all dynamic entries in br0's fdb table.

$ bridge fdb flush dev br0 nodynamic
This will delete all entries except the dynamic ones in br0's fdb
table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]static entry matching

Add flush support to match static or non-static entries if "no" is
prepended respectively. Note that static entries are only NUD_NOARP ones
without NUD_PERMANENT, also when matching non-static entries exclude
permanent entries as well (permanent entries by definition are also
static).

Examples:
$ bridge fdb flush dev br0 static
This will delete all static entries in br0's fdb table.

$ bridge fdb flush dev br0 nostatic
This will delete all entries except the static ones in br0's fdb
table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]permanent entry matching

Add flush support to match permanent or non-permanent entries if "no" is
prepended respectively.

Examples:
$ bridge fdb flush dev br0 permanent
This will delete all permanent entries in br0's fdb table.

$ bridge fdb flush dev br0 nopermanent
This will delete all entries except the permanent ones in br0's fdb
table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush port matching

Usually we match on the device specified after "dev" but there are
special cases where we need an additional device attribute for matching
such as when matching entries specifically pointing to the bridge device
itself. We use NDA_IFINDEX for that purpose.

Example:
$ bridge fdb flush dev br0 brport br0
This will flush only entries pointing to the bridge itself.

$ bridge fdb flush dev swp1 brport swp2 master
Note this will flush entries pointing to swp2 only. The NDA_IFINDEX
attribute overrides the dev argument. This is documented in the man
page.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush vlan matching

Add flush support to match fdb entries in a specific vlan.
Example:
$ bridge fdb flush dev swp1 vlan 10 master
This will flush all fdb entries with port swp1 and vlan 10.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add new flush command

Add support for fdb bulk delete (aka flush) command. Currently it only
supports the self and master flags with the same semantics as fdb
add/del. The device is a mandatory argument.

Example:
$ bridge fdb flush dev br0
This will delete *all* fdb entries in br0's fdb table.

$ bridge fdb flush dev swp1 master
This will delete all fdb entries pointing to swp1.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

ip: Convert non-constant initializers to macros

As per the C standard, "expressions in an initializer for an object that
has static or thread storage duration shall be constant expressions".
Aggregate objects are not constant expressions. Newer GCC doesn't mind, but
older GCC and LLVM do.

Therefore convert to a macro. And since all these macros will look very
similar, extract a generic helper, IPSTATS_STAT_DESC_XSTATS_LEAF, which
takes the leaf name as an argument and initializes the rest as appropriate
for an xstats descriptor.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'ss-threads' into next

Peilin Ye  says:

====================

From: Peilin Ye <peilin.ye@bytedance.com>

This patchset adds a new ss option, -T (--threads), to show thread
information.  It extends the -p (--processes) option, and should be useful
for debugging, monitoring multi-threaded applications.  Example output:

  $ ss -ltT "sport = 1234"
  State   Recv-Q  Send-Q  Local Address:Port      Peer Address:Port       Process
  LISTEN  0       100           0.0.0.0:1234           0.0.0.0:*           users:(("test",pid=2932547,tid=2932548,fd=3),("test",pid=2932547,tid=2932547,fd=3))

It implies -p i.e. it outputs all threads in the thread group, including
the thread group leader.  When -T is used, -Z and -z also show SELinux
contexts for threads.

[1-5/7] are small clean-ups for the user_ent_hash_build() function.  [6/7]
factors out logic iterating $PROC_ROOT/$PID/fd/ from user_ent_hash_build()
to make [7/7] easier.  [7/7] actually implements the feature.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Introduce -T, --threads option

The -p, -Z and -z options only show process (thread group leader)
information.  For example, if the thread group leader has exited, but
another thread in the group is still using a socket, ss -[pZz] does not
show it.

Add a new option, -T (--threads), to show thread information.  It implies
the -p option.  For example, imagine process A and thread B (in the same
group) using the same socket.  ss -p only shows A:

  $ ss -ltp "sport = 1234"
  State   Recv-Q  Send-Q  Local Address:Port      Peer Address:Port       Process
  LISTEN  0       100           0.0.0.0:1234           0.0.0.0:*           users:(("test",pid=2932547,fd=3))

ss -T shows A and B:

  $ ss -ltT "sport = 1234"
  State   Recv-Q  Send-Q  Local Address:Port      Peer Address:Port       Process
  LISTEN  0       100           0.0.0.0:1234           0.0.0.0:*           users:(("test",pid=2932547,tid=2932548,fd=3),("test",pid=2932547,tid=2932547,fd=3))

If -T is used, -Z and -z also show SELinux contexts for threads.

Rename some variables (from "process" to "task", for example) since we
use them for both processes and threads.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Factor out fd iterating logic from user_ent_hash_build()

We are planning to add a thread version of the -p, --process option.
Move the logic iterating $PROC_ROOT/$PID/fd/ into a new function,
user_ent_hash_build_task(), to make it easier.

Since we will use this function for both processes and threads, rename
local variables as such (e.g. from "process" to "task").

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Fix coding style issues in user_ent_hash_build()

Make checkpatch.pl --strict happy about user_ent_hash_build().

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Delete unnecessary call to snprintf() in user_ent_hash_build()

'name' is already $PROC_ROOT/$PID/fd/$FD there, no need to rebuild the
string.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Do not call user_ent_hash_build() more than once

Call user_ent_hash_build() once after the getopt_long() loop if -p, -z
or -Z is used.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Remove unnecessary stack variable 'p' in user_ent_hash_build()

Commit 116ac9270b6d ("ss: Add support for retrieving SELinux contexts")
added an unnecessary stack variable, 'char *p', in
user_ent_hash_build(). Delete it for readability.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Use assignment-suppression character in sscanf()

Use the '*' assignment-suppression character, instead of an
inappropriately named temporary variable.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Show zerocopy sendfile status of TLS sockets

Print the activation status of zerocopy sendfile on TLS sockets.
Zerocopy sendfile was recently added to Linux and exposed via sock_diag.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

iplink: report tso_max_size and tso_max_segs

New netlink attributes IFLA_TSO_MAX_SIZE and IFLA_TSO_MAX_SEGS
are used to report device TSO limits to user-space.

ip -d link sh dev eth0
...
tso_max_size 65536 tso_max_segs 65535

ip -d link sh dev lo
...
tso_max_size 524280 tso_max_segs 65535

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next

v5.18.0

tipc: fix keylen check

Key length check in str2key() is wrong for hex. Fix this using the
proper hex key length.

Fixes: 28ee49e5153b ("tipc: bail out if key is abnormally long")
Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

iplink: remove GSO_MAX_SIZE definition

David removed the check using GSO_MAX_SIZE
in commit f1d18e2e6ec5 ("Update kernel headers").

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

doc: fix 'infact' --> 'in fact' typo

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man: fix some typos

In dcb-app man page, 'direcly' should be 'directly'
In dcb-dcbx man page, 'respecively' should be 'respectively'
In devlink-dev man page, 'unspecificed' should be 'unspecified'

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man: devlink-region: fix typo in example

devlink-region does not accept the legth param, but the length one.

Fixes: 8b4fbf0bed8e ("devlink: Add support for devlink-region access")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

tc: em_u32: fix offset parsing

tc u32 ematch offset parsing might fail even if nexthdr offset is
aligned to 4. The issue can be reproduced with the following script:

tc qdisc del dev dummy0 root
tc qdisc add dev dummy0 root handle 1: htb r2q 1 default 1
tc class add dev dummy0 parent 1:1 classid 1:108 htb quantum 1000000 \
rate 1.00mbit ceil 10.00mbit burst 6k

while true; do
if ! tc filter add dev dummy0 protocol all parent 1: prio 1 basic match \
"meta(vlan mask 0xfff eq 1)" and "u32(u32 0x20011002 0xffffffff \
at nexthdr+8)" flowid 1:108; then
exit 0
fi
done

which we expect to produce an endless loop.
With the current code, instead, this ends with:

u32: invalid offset alignment, must be aligned to 4.
... meta(vlan mask 0xfff eq 1) and >>u32(u32 0x20011002 0xffffffff at nexthdr+8)<< ...
... u32(u32 0x20011002 0xffffffff at >>nexthdr+8<<)...
Usage: u32(ALIGN VALUE MASK at [ nexthdr+ ] OFFSET)
where: ALIGN := { u8 | u16 | u32 }

Example: u32(u16 0x1122 0xffff at nexthdr+4)
Illegal "ematch"

This is caused by memcpy copying into buf an unterminated string.

Fix it using strncpy instead of memcpy.

Fixes: commit 311b41454dc4 ("Add new extended match files.")
Reported-by: Alfred Yang <alf.redyoung@gmail.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

uapi: update of virtio_ids

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Update kernel headers

Update kernel headers to commit:
a65cc8435540 ("Merge branch 'bnxt_en-next'")

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'support-xstats-afstats' into next

Petr Machata says:

====================

The RTM_GETSTATS response attributes IFLA_STATS_LINK_XSTATS and
IFLA_STATS_LINK_XSTATS_SLAVE are used to carry statistics related to,
respectively, netdevices of a certain type, and netdevices enslaved to
netdevices of a certain type. IFLA_STATS_AF_SPEC are similarly used to
carry statistics specific to a certain address family.

In this patch set, add support for three new stats groups that cover the
above attributes: xstats, xstats_slave and afstats. Add bridge and bond
subgroups to the former two groups, and mpls subgroup to the latter one.

Now "group" is used for selecting the top-level attribute, and subgroup
for the link-type or address-family nest below it (bridge, bond, mpls in
this patchset). But xstats (both master and slave) are further
subdivided. E.g. in the case of bridge statistics, the two subdivisions
are called "stp" and "mcast". To make it possible to pick these sets,
add to the two selector levels of group and subgroup a third level,
suite, which is filtered in the userspace.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

man: ip-stats.8: Describe groups xstats, xstats_slave and afstats

Add description of the newly-added statistics groups.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Expose bond stats in ipstats

Describe xstats and xstats_slave subgroups for bond netdevices.

For example:

# ip stats show dev swp1 group xstats_slave subgroup bond
56: swp1: group xstats_slave subgroup bond suite 802.3ad
                     LACPDU Rx 0
                     LACPDU Tx 0
                     LACPDU Unknown type Rx 0
                     LACPDU Illegal Rx 0
                     Marker Rx 0
                     Marker Tx 0
                     Marker response Rx 0
                     Marker response Tx 0
                     Marker unknown type Rx 0

# ip -j stats show dev swp1 group xstats_slave subgroup bond | jq
[
   {
     "ifindex": 56,
     "ifname": "swp1",
     "group": "xstats_slave",
     "subgroup": "bond",
     "suite": "802.3ad",
     "802.3ad": {
       "lacpdu_rx": 0,
       "lacpdu_tx": 0,
       "lacpdu_unknown_rx": 0,
       "lacpdu_illegal_rx": 0,
       "marker_rx": 0,
       "marker_tx": 0,
       "marker_response_rx": 0,
       "marker_response_tx": 0,
       "marker_unknown_rx": 0
     }
   }
]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Expose bridge stats in ipstats

Bridge supports two suites, STP and IGMP, carried by attributes
BRIDGE_XSTATS_STP and BRIDGE_XSTATS_MCAST. Expose them as suites "stp" and
"mcast" (to correspond to the attribute name).

For example:

# ip stats show dev swp1 group xstats_slave subgroup bridge
56: swp1: group xstats_slave subgroup bridge suite mcast
                     IGMP queries:
                       RX: v1 0 v2 0 v3 0
                       TX: v1 0 v2 0 v3 0
                     IGMP reports:
                       RX: v1 0 v2 0 v3 0
                       TX: v1 0 v2 0 v3 0
                     IGMP leaves: RX: 0 TX: 0
                     IGMP parse errors: 0
                     MLD queries:
                       RX: v1 0 v2 0
                       TX: v1 0 v2 0
                     MLD reports:
                       RX: v1 0 v2 0
                       TX: v1 0 v2 0
                     MLD leaves: RX: 0 TX: 0
                     MLD parse errors: 0

56: swp1: group xstats_slave subgroup bridge suite stp
                     STP BPDU:  RX: 0 TX: 0
                     STP TCN:   RX: 0 TX: 0
                     STP Transitions: Blocked: 1 Forwarding: 0

# ip -j stats show dev swp1 group xstats_slave subgroup bridge | jq
[
   {
     "ifindex": 56,
     "ifname": "swp1",
     "group": "xstats_slave",
     "subgroup": "bridge",
     "suite": "mcast",
     "multicast": {
       "igmp_queries": {
         "rx_v1": 0,
         "rx_v2": 0,
         "rx_v3": 0,
         "tx_v1": 0,
         "tx_v2": 0,
         "tx_v3": 0
       },
       "igmp_reports": {
         "rx_v1": 0,
         "rx_v2": 0,
         "rx_v3": 0,
         "tx_v1": 0,
         "tx_v2": 0,
         "tx_v3": 0
       },
       "igmp_leaves": {
         "rx": 0,
         "tx": 0
       },
       "igmp_parse_errors": 0,
       "mld_queries": {
         "rx_v1": 0,
         "rx_v2": 0,
         "tx_v1": 0,
         "tx_v2": 0
       },
       "mld_reports": {
         "rx_v1": 0,
         "rx_v2": 0,
         "tx_v1": 0,
         "tx_v2": 0
       },
       "mld_leaves": {
         "rx": 0,
         "tx": 0
       },
       "mld_parse_errors": 0
     }
   },
   {
     "ifindex": 56,
     "ifname": "swp1",
     "group": "xstats_slave",
     "subgroup": "bridge",
     "suite": "stp",
     "stp": {
       "rx_bpdu": 0,
       "tx_bpdu": 0,
       "rx_tcn": 0,
       "tx_tcn": 0,
       "transition_blk": 1,
       "transition_fwd": 0
     }
   }
]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

iplink_bridge: Split bridge_print_stats_attr()

Extract from bridge_print_stats_attr() two helpers, one for dumping the
multicast attribute, one for dumping the STP attribute.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Add groups "xstats", "xstats_slave"

The RTM_GETSTATS response attributes IFLA_STATS_LINK_XSTATS and
IFLA_STATS_LINK_XSTATS_SLAVE are used to carry statistics related to,
respectively, netdevices of a certain type, and netdevices enslaved to
netdevices of a certain type. Inside the nest is then link-type specific
attribute (e.g. LINK_XSTATS_TYPE_BRIDGE), and inside that nest further
attributes for individual type-specific statistical suites.

Under the "ip stats" model, that corresponds to groups "xstats" and
"xstats_slave", link-type specific subgroup, e.g. "bridge", and one or more
link-type specific suites, such as "stp".

Link-type specific stats are currently supported through struct link_util
and in particular the callbacks parse_ifla_xstats and print_ifla_xstats.

The role of parse_ifla_xstats is to establish which statistical suite to
display, and on which device. "ip stats" has framework for both of these
tasks, which obviates the need for custom parsing. Therefore the module
should instead provide a subgroup descriptor, which "ip stats" will then
use as any other.

The second link_util callback, print_ifla_xstats, is for response
dissection. In "ip stats" model, this belongs to leaf descriptors.

Eventually, the link-specific leaf descriptors will be similar to each
other: either master or slave top-level nest needs to be parsed, and
link-type attribute underneath that, and suite attribute underneath that.

To support this commonality, add struct ipstats_stat_desc_xstats to
describe the xstats suites. Further, expose ipstats_stat_desc_pack_xstats()
and ipstats_stat_desc_show_xstats(), which can be used at leaf descriptors
and do the appropriate thing according to the configuration in
ipstats_stat_desc_xstats.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Add a third level of stats hierarchy, a "suite"

To show statistics nested under IFLA_STATS_LINK_XSTATS_SLAVE or
IFLA_STATS_LINK_XSTATS, one would use "group" to select the top-level
attribute, then "subgroup" to select the link type, which is itself a nest,
and then would lack a way to denote which attribute to select out of the
link-type nest.

To that end, add the selector level "suite", which is filtered in the
userspace.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

iplink: Add JSON support to MPLS stats formatter

MPLS stats currently do not support dumping in JSON format. Recognize when
JSON is requested and dump in an obvious manner:

# ip -n ns0-2G8Ozd9z -j stats show dev veth01 group afstats | jq
[
   {
     "ifindex": 3,
     "ifname": "veth01",
     "group": "afstats",
     "subgroup": "mpls",
     "mpls_stats": {
       "rx": {
         "bytes": 0,
         "packets": 0,
         "errors": 0,
         "dropped": 0,
         "noroute": 0
       },
       "tx": {
         "bytes": 216,
         "packets": 2,
         "errors": 0,
         "dropped": 0
       }
     }
   }
]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Add a group "afstats", subgroup "mpls"

Add a new group, "afstats", for showing counters from the
IFLA_STATS_AF_SPEC nest, and a subgroup "mpls" for the AF_MPLS
specifically.

For example:

# ip -n ns0-NrdgY9sx stats show dev veth01 group afstats
3: veth01: group afstats subgroup mpls
     RX: bytes packets errors dropped noroute
             0       0      0       0       0
     TX: bytes packets errors dropped
           108       1      0       0

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

iplink: Publish a function to format MPLS stats

Extract from print_mpls_stats() a new function, print_mpls_link_stats(),
make it non-static and publish in the header file.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>