git.ipfire.org Git - thirdparty/iproute2.git/log

Merge branch 'bridge-fdb-flush' into next

Nikolay Aleksandrov  says:

====================

Hi,
This set adds support for the new bulk delete flag to allow fdb flushing
for specific entries which are matched based on the supplied options.
The new bridge fdb subcommand is "flush", and as can be seen from the
commits it allows to delete entries based on many different criteria:
- matching vlan
- matching port
- matching all sorts of flags (combinations are allowed)

There are also examples for each option in the respective commit messages.

Examples:
$ bridge fdb flush dev swp2 master vlan 100 dynamic
[ delete all dynamic entries with port swp2 and vlan 100 ]
$ bridge fdb flush dev br0 vlan 1 static
[ delete all static entries in br0's fdb table ]
$ bridge fdb flush dev swp2 master extern_learn nosticky
[ delete all entries with port swp2 which have extern_learn set and
   don't have the sticky flag set ]
$ bridge fdb flush dev br0 brport br0 vlan 100 permanent
[ delete all entries pointing to the bridge itself with vlan 100 ]
$ bridge fdb flush dev swp2 master nostatic nooffloaded
[ delete all entries with port swp2 which are not static and not
   offloaded ]

If keyword is specified and after that nokeyword is specified obviously
the nokeyword would override keyword.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]offloaded entry matching

Add flush support to match entries with or without (if "no" is
prepended) offloaded flag.

Examples:
$ bridge fdb flush dev br0 offloaded
This will delete all offloaded entries in br0's fdb table.

$ bridge fdb flush dev br0 nooffloaded
This will delete all entries except the ones with offloaded flag in
br0's fdb table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]sticky entry matching

Add flush support to match entries with or without (if "no" is
prepended) sticky flag.

Examples:
$ bridge fdb flush dev br0 sticky
This will delete all sticky entries in br0's fdb table.

$ bridge fdb flush dev br0 nosticky
This will delete all entries except the ones with sticky flag in
br0's fdb table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]extern_learn entry matching

Add flush support to match entries with or without (if "no" is
prepended) extern_learn flag.

Examples:
$ bridge fdb flush dev br0 extern_learn
This will delete all extern_learn entries in br0's fdb table.

$ bridge fdb flush dev br0 noextern_learn
This will delete all entries except the ones with extern_learn flag in
br0's fdb table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]added_by_user entry matching

Add flush support to match entries with or without (if "no" is
prepended) added_by_user flag. Note that NTF_USE is used internally
because there is no NTF_ flag that describes such entries.

Examples:
$ bridge fdb flush dev br0 added_by_user
This will delete all added_by_user entries in br0's fdb table.

$ bridge fdb flush dev br0 noadded_by_user
This will delete all entries except the ones with added_by_user flag in
br0's fdb table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]dynamic entry matching

Add flush support to match dynamic or non-dynamic (static or permanent)
entries if "no" is prepended respectively. Note that dynamic entries are
defined as fdbs without NUD_NOARP and NUD_PERMANENT set, and non-dynamic
entries are fdbs with NUD_NOARP set (that matches both static and
permanent entries).

Examples:
$ bridge fdb flush dev br0 dynamic
This will delete all dynamic entries in br0's fdb table.

$ bridge fdb flush dev br0 nodynamic
This will delete all entries except the dynamic ones in br0's fdb
table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]static entry matching

Add flush support to match static or non-static entries if "no" is
prepended respectively. Note that static entries are only NUD_NOARP ones
without NUD_PERMANENT, also when matching non-static entries exclude
permanent entries as well (permanent entries by definition are also
static).

Examples:
$ bridge fdb flush dev br0 static
This will delete all static entries in br0's fdb table.

$ bridge fdb flush dev br0 nostatic
This will delete all entries except the static ones in br0's fdb
table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush [no]permanent entry matching

Add flush support to match permanent or non-permanent entries if "no" is
prepended respectively.

Examples:
$ bridge fdb flush dev br0 permanent
This will delete all permanent entries in br0's fdb table.

$ bridge fdb flush dev br0 nopermanent
This will delete all entries except the permanent ones in br0's fdb
table.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush port matching

Usually we match on the device specified after "dev" but there are
special cases where we need an additional device attribute for matching
such as when matching entries specifically pointing to the bridge device
itself. We use NDA_IFINDEX for that purpose.

Example:
$ bridge fdb flush dev br0 brport br0
This will flush only entries pointing to the bridge itself.

$ bridge fdb flush dev swp1 brport swp2 master
Note this will flush entries pointing to swp2 only. The NDA_IFINDEX
attribute overrides the dev argument. This is documented in the man
page.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add flush vlan matching

Add flush support to match fdb entries in a specific vlan.
Example:
$ bridge fdb flush dev swp1 vlan 10 master
This will flush all fdb entries with port swp1 and vlan 10.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: add new flush command

Add support for fdb bulk delete (aka flush) command. Currently it only
supports the self and master flags with the same semantics as fdb
add/del. The device is a mandatory argument.

Example:
$ bridge fdb flush dev br0
This will delete *all* fdb entries in br0's fdb table.

$ bridge fdb flush dev swp1 master
This will delete all fdb entries pointing to swp1.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

ip: Convert non-constant initializers to macros

As per the C standard, "expressions in an initializer for an object that
has static or thread storage duration shall be constant expressions".
Aggregate objects are not constant expressions. Newer GCC doesn't mind, but
older GCC and LLVM do.

Therefore convert to a macro. And since all these macros will look very
similar, extract a generic helper, IPSTATS_STAT_DESC_XSTATS_LEAF, which
takes the leaf name as an argument and initializes the rest as appropriate
for an xstats descriptor.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'ss-threads' into next

Peilin Ye  says:

====================

From: Peilin Ye <peilin.ye@bytedance.com>

This patchset adds a new ss option, -T (--threads), to show thread
information.  It extends the -p (--processes) option, and should be useful
for debugging, monitoring multi-threaded applications.  Example output:

  $ ss -ltT "sport = 1234"
  State   Recv-Q  Send-Q  Local Address:Port      Peer Address:Port       Process
  LISTEN  0       100           0.0.0.0:1234           0.0.0.0:*           users:(("test",pid=2932547,tid=2932548,fd=3),("test",pid=2932547,tid=2932547,fd=3))

It implies -p i.e. it outputs all threads in the thread group, including
the thread group leader.  When -T is used, -Z and -z also show SELinux
contexts for threads.

[1-5/7] are small clean-ups for the user_ent_hash_build() function.  [6/7]
factors out logic iterating $PROC_ROOT/$PID/fd/ from user_ent_hash_build()
to make [7/7] easier.  [7/7] actually implements the feature.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Introduce -T, --threads option

The -p, -Z and -z options only show process (thread group leader)
information.  For example, if the thread group leader has exited, but
another thread in the group is still using a socket, ss -[pZz] does not
show it.

Add a new option, -T (--threads), to show thread information.  It implies
the -p option.  For example, imagine process A and thread B (in the same
group) using the same socket.  ss -p only shows A:

  $ ss -ltp "sport = 1234"
  State   Recv-Q  Send-Q  Local Address:Port      Peer Address:Port       Process
  LISTEN  0       100           0.0.0.0:1234           0.0.0.0:*           users:(("test",pid=2932547,fd=3))

ss -T shows A and B:

  $ ss -ltT "sport = 1234"
  State   Recv-Q  Send-Q  Local Address:Port      Peer Address:Port       Process
  LISTEN  0       100           0.0.0.0:1234           0.0.0.0:*           users:(("test",pid=2932547,tid=2932548,fd=3),("test",pid=2932547,tid=2932547,fd=3))

If -T is used, -Z and -z also show SELinux contexts for threads.

Rename some variables (from "process" to "task", for example) since we
use them for both processes and threads.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Factor out fd iterating logic from user_ent_hash_build()

We are planning to add a thread version of the -p, --process option.
Move the logic iterating $PROC_ROOT/$PID/fd/ into a new function,
user_ent_hash_build_task(), to make it easier.

Since we will use this function for both processes and threads, rename
local variables as such (e.g. from "process" to "task").

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Fix coding style issues in user_ent_hash_build()

Make checkpatch.pl --strict happy about user_ent_hash_build().

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Delete unnecessary call to snprintf() in user_ent_hash_build()

'name' is already $PROC_ROOT/$PID/fd/$FD there, no need to rebuild the
string.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Do not call user_ent_hash_build() more than once

Call user_ent_hash_build() once after the getopt_long() loop if -p, -z
or -Z is used.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Remove unnecessary stack variable 'p' in user_ent_hash_build()

Commit 116ac9270b6d ("ss: Add support for retrieving SELinux contexts")
added an unnecessary stack variable, 'char *p', in
user_ent_hash_build(). Delete it for readability.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Use assignment-suppression character in sscanf()

Use the '*' assignment-suppression character, instead of an
inappropriately named temporary variable.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ss: Show zerocopy sendfile status of TLS sockets

Print the activation status of zerocopy sendfile on TLS sockets.
Zerocopy sendfile was recently added to Linux and exposed via sock_diag.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

iplink: report tso_max_size and tso_max_segs

New netlink attributes IFLA_TSO_MAX_SIZE and IFLA_TSO_MAX_SEGS
are used to report device TSO limits to user-space.

ip -d link sh dev eth0
...
tso_max_size 65536 tso_max_segs 65535

ip -d link sh dev lo
...
tso_max_size 524280 tso_max_segs 65535

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next

v5.18.0

tipc: fix keylen check

Key length check in str2key() is wrong for hex. Fix this using the
proper hex key length.

Fixes: 28ee49e5153b ("tipc: bail out if key is abnormally long")
Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

iplink: remove GSO_MAX_SIZE definition

David removed the check using GSO_MAX_SIZE
in commit f1d18e2e6ec5 ("Update kernel headers").

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

doc: fix 'infact' --> 'in fact' typo

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man: fix some typos

In dcb-app man page, 'direcly' should be 'directly'
In dcb-dcbx man page, 'respecively' should be 'respectively'
In devlink-dev man page, 'unspecificed' should be 'unspecified'

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man: devlink-region: fix typo in example

devlink-region does not accept the legth param, but the length one.

Fixes: 8b4fbf0bed8e ("devlink: Add support for devlink-region access")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

tc: em_u32: fix offset parsing

tc u32 ematch offset parsing might fail even if nexthdr offset is
aligned to 4. The issue can be reproduced with the following script:

tc qdisc del dev dummy0 root
tc qdisc add dev dummy0 root handle 1: htb r2q 1 default 1
tc class add dev dummy0 parent 1:1 classid 1:108 htb quantum 1000000 \
rate 1.00mbit ceil 10.00mbit burst 6k

while true; do
if ! tc filter add dev dummy0 protocol all parent 1: prio 1 basic match \
"meta(vlan mask 0xfff eq 1)" and "u32(u32 0x20011002 0xffffffff \
at nexthdr+8)" flowid 1:108; then
exit 0
fi
done

which we expect to produce an endless loop.
With the current code, instead, this ends with:

u32: invalid offset alignment, must be aligned to 4.
... meta(vlan mask 0xfff eq 1) and >>u32(u32 0x20011002 0xffffffff at nexthdr+8)<< ...
... u32(u32 0x20011002 0xffffffff at >>nexthdr+8<<)...
Usage: u32(ALIGN VALUE MASK at [ nexthdr+ ] OFFSET)
where: ALIGN := { u8 | u16 | u32 }

Example: u32(u16 0x1122 0xffff at nexthdr+4)
Illegal "ematch"

This is caused by memcpy copying into buf an unterminated string.

Fix it using strncpy instead of memcpy.

Fixes: commit 311b41454dc4 ("Add new extended match files.")
Reported-by: Alfred Yang <alf.redyoung@gmail.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

uapi: update of virtio_ids

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Update kernel headers

Update kernel headers to commit:
a65cc8435540 ("Merge branch 'bnxt_en-next'")

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'support-xstats-afstats' into next

Petr Machata says:

====================

The RTM_GETSTATS response attributes IFLA_STATS_LINK_XSTATS and
IFLA_STATS_LINK_XSTATS_SLAVE are used to carry statistics related to,
respectively, netdevices of a certain type, and netdevices enslaved to
netdevices of a certain type. IFLA_STATS_AF_SPEC are similarly used to
carry statistics specific to a certain address family.

In this patch set, add support for three new stats groups that cover the
above attributes: xstats, xstats_slave and afstats. Add bridge and bond
subgroups to the former two groups, and mpls subgroup to the latter one.

Now "group" is used for selecting the top-level attribute, and subgroup
for the link-type or address-family nest below it (bridge, bond, mpls in
this patchset). But xstats (both master and slave) are further
subdivided. E.g. in the case of bridge statistics, the two subdivisions
are called "stp" and "mcast". To make it possible to pick these sets,
add to the two selector levels of group and subgroup a third level,
suite, which is filtered in the userspace.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

man: ip-stats.8: Describe groups xstats, xstats_slave and afstats

Add description of the newly-added statistics groups.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Expose bond stats in ipstats

Describe xstats and xstats_slave subgroups for bond netdevices.

For example:

# ip stats show dev swp1 group xstats_slave subgroup bond
56: swp1: group xstats_slave subgroup bond suite 802.3ad
                     LACPDU Rx 0
                     LACPDU Tx 0
                     LACPDU Unknown type Rx 0
                     LACPDU Illegal Rx 0
                     Marker Rx 0
                     Marker Tx 0
                     Marker response Rx 0
                     Marker response Tx 0
                     Marker unknown type Rx 0

# ip -j stats show dev swp1 group xstats_slave subgroup bond | jq
[
   {
     "ifindex": 56,
     "ifname": "swp1",
     "group": "xstats_slave",
     "subgroup": "bond",
     "suite": "802.3ad",
     "802.3ad": {
       "lacpdu_rx": 0,
       "lacpdu_tx": 0,
       "lacpdu_unknown_rx": 0,
       "lacpdu_illegal_rx": 0,
       "marker_rx": 0,
       "marker_tx": 0,
       "marker_response_rx": 0,
       "marker_response_tx": 0,
       "marker_unknown_rx": 0
     }
   }
]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Expose bridge stats in ipstats

Bridge supports two suites, STP and IGMP, carried by attributes
BRIDGE_XSTATS_STP and BRIDGE_XSTATS_MCAST. Expose them as suites "stp" and
"mcast" (to correspond to the attribute name).

For example:

# ip stats show dev swp1 group xstats_slave subgroup bridge
56: swp1: group xstats_slave subgroup bridge suite mcast
                     IGMP queries:
                       RX: v1 0 v2 0 v3 0
                       TX: v1 0 v2 0 v3 0
                     IGMP reports:
                       RX: v1 0 v2 0 v3 0
                       TX: v1 0 v2 0 v3 0
                     IGMP leaves: RX: 0 TX: 0
                     IGMP parse errors: 0
                     MLD queries:
                       RX: v1 0 v2 0
                       TX: v1 0 v2 0
                     MLD reports:
                       RX: v1 0 v2 0
                       TX: v1 0 v2 0
                     MLD leaves: RX: 0 TX: 0
                     MLD parse errors: 0

56: swp1: group xstats_slave subgroup bridge suite stp
                     STP BPDU:  RX: 0 TX: 0
                     STP TCN:   RX: 0 TX: 0
                     STP Transitions: Blocked: 1 Forwarding: 0

# ip -j stats show dev swp1 group xstats_slave subgroup bridge | jq
[
   {
     "ifindex": 56,
     "ifname": "swp1",
     "group": "xstats_slave",
     "subgroup": "bridge",
     "suite": "mcast",
     "multicast": {
       "igmp_queries": {
         "rx_v1": 0,
         "rx_v2": 0,
         "rx_v3": 0,
         "tx_v1": 0,
         "tx_v2": 0,
         "tx_v3": 0
       },
       "igmp_reports": {
         "rx_v1": 0,
         "rx_v2": 0,
         "rx_v3": 0,
         "tx_v1": 0,
         "tx_v2": 0,
         "tx_v3": 0
       },
       "igmp_leaves": {
         "rx": 0,
         "tx": 0
       },
       "igmp_parse_errors": 0,
       "mld_queries": {
         "rx_v1": 0,
         "rx_v2": 0,
         "tx_v1": 0,
         "tx_v2": 0
       },
       "mld_reports": {
         "rx_v1": 0,
         "rx_v2": 0,
         "tx_v1": 0,
         "tx_v2": 0
       },
       "mld_leaves": {
         "rx": 0,
         "tx": 0
       },
       "mld_parse_errors": 0
     }
   },
   {
     "ifindex": 56,
     "ifname": "swp1",
     "group": "xstats_slave",
     "subgroup": "bridge",
     "suite": "stp",
     "stp": {
       "rx_bpdu": 0,
       "tx_bpdu": 0,
       "rx_tcn": 0,
       "tx_tcn": 0,
       "transition_blk": 1,
       "transition_fwd": 0
     }
   }
]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

iplink_bridge: Split bridge_print_stats_attr()

Extract from bridge_print_stats_attr() two helpers, one for dumping the
multicast attribute, one for dumping the STP attribute.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Add groups "xstats", "xstats_slave"

The RTM_GETSTATS response attributes IFLA_STATS_LINK_XSTATS and
IFLA_STATS_LINK_XSTATS_SLAVE are used to carry statistics related to,
respectively, netdevices of a certain type, and netdevices enslaved to
netdevices of a certain type. Inside the nest is then link-type specific
attribute (e.g. LINK_XSTATS_TYPE_BRIDGE), and inside that nest further
attributes for individual type-specific statistical suites.

Under the "ip stats" model, that corresponds to groups "xstats" and
"xstats_slave", link-type specific subgroup, e.g. "bridge", and one or more
link-type specific suites, such as "stp".

Link-type specific stats are currently supported through struct link_util
and in particular the callbacks parse_ifla_xstats and print_ifla_xstats.

The role of parse_ifla_xstats is to establish which statistical suite to
display, and on which device. "ip stats" has framework for both of these
tasks, which obviates the need for custom parsing. Therefore the module
should instead provide a subgroup descriptor, which "ip stats" will then
use as any other.

The second link_util callback, print_ifla_xstats, is for response
dissection. In "ip stats" model, this belongs to leaf descriptors.

Eventually, the link-specific leaf descriptors will be similar to each
other: either master or slave top-level nest needs to be parsed, and
link-type attribute underneath that, and suite attribute underneath that.

To support this commonality, add struct ipstats_stat_desc_xstats to
describe the xstats suites. Further, expose ipstats_stat_desc_pack_xstats()
and ipstats_stat_desc_show_xstats(), which can be used at leaf descriptors
and do the appropriate thing according to the configuration in
ipstats_stat_desc_xstats.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Add a third level of stats hierarchy, a "suite"

To show statistics nested under IFLA_STATS_LINK_XSTATS_SLAVE or
IFLA_STATS_LINK_XSTATS, one would use "group" to select the top-level
attribute, then "subgroup" to select the link type, which is itself a nest,
and then would lack a way to denote which attribute to select out of the
link-type nest.

To that end, add the selector level "suite", which is filtered in the
userspace.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

iplink: Add JSON support to MPLS stats formatter

MPLS stats currently do not support dumping in JSON format. Recognize when
JSON is requested and dump in an obvious manner:

# ip -n ns0-2G8Ozd9z -j stats show dev veth01 group afstats | jq
[
   {
     "ifindex": 3,
     "ifname": "veth01",
     "group": "afstats",
     "subgroup": "mpls",
     "mpls_stats": {
       "rx": {
         "bytes": 0,
         "packets": 0,
         "errors": 0,
         "dropped": 0,
         "noroute": 0
       },
       "tx": {
         "bytes": 216,
         "packets": 2,
         "errors": 0,
         "dropped": 0
       }
     }
   }
]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Add a group "afstats", subgroup "mpls"

Add a new group, "afstats", for showing counters from the
IFLA_STATS_AF_SPEC nest, and a subgroup "mpls" for the AF_MPLS
specifically.

For example:

# ip -n ns0-NrdgY9sx stats show dev veth01 group afstats
3: veth01: group afstats subgroup mpls
     RX: bytes packets errors dropped noroute
             0       0      0       0       0
     TX: bytes packets errors dropped
           108       1      0       0

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

iplink: Publish a function to format MPLS stats

Extract from print_mpls_stats() a new function, print_mpls_link_stats(),
make it non-static and publish in the header file.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

iplink: Fix formatting of MPLS stats

Currently, MPLS stats are formatted this way:

# ip -n ns0-DBZpxj8I link afstats dev veth01
3: veth01
     mpls:
         RX: bytes  packets  errors  dropped  noroute
                  0        0       0        0       0
         TX: bytes  packets  errors  dropped
                216        2       0       0

Note how most numbers are not aligned properly under their column headers.
Fix by converting the code to use size_columns() to dynamically determine
the necessary width of individual columns, which also takes care of
formatting the table properly in case the counter values are high.
After the fix, the formatting looks as follows:

# ip -n ns0-Y1PyEc55 link afstats dev veth01
3: veth01
     mpls:
         RX: bytes packets errors dropped noroute
                 0       0      0       0       0
         TX: bytes packets errors dropped
               108       1      0       0

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ip: ipstats: Do not assume length of response attribute payload

In Linux kernel commit 794c24e9921f ("net-core: rx_otherhost_dropped to
core_stats"), struct rtnl_link_stats64 got a new member. This change got to
iproute2 through commit bba95837524d ("Update kernel headers").

"ip stats" makes the assumption that the payload of attributes that carry
structures is at least as long as the size of the given structure as
iproute2 knows it. But that will not hold when a newer iproute2 is used
against an older kernel: since such kernel misses some fields on the tail
end of the structure, "ip stats" bails out:

# ip stats show group link
1: lo: group link
Error: attribute payload too shortDump terminated

Instead, be tolerant of responses that are both longer and shorter than
what is expected. Instead of forming a pointer directly into the payload,
allocate the stats structure on the stack, zero it, and then copy over the
portion from the response.

Reported-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'bridge-vxlan-vni-filtering' into next

Roopa Prabhu says:

====================

This series adds bridge command to manage
recently added vnifilter on a collect metadata
vxlan (external) device. Also includes per vni stats
support.

examples:
$bridge vni add dev vxlan0 vni 400

$bridge vni add dev vxlan0 vni 200 group 239.1.1.101

$bridge vni del dev vxlan0 vni 400

$bridge vni show

$bridge -s vni show

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: vni: add support for stats dumping

Add support for "-s" option which causes bridge vni to dump per-vni
statistics. Note that it disables vni range compression.

Example:
$ bridge -s vni | more
dev               vni              group/remote
vxlan0             1024  239.1.1.1
                     RX: bytes 0 pkts 0 drops 0 errors 0
                     TX: bytes 0 pkts 0 drops 0 errors 0
                    1025  239.1.1.1
                     RX: bytes 0 pkts 0 drops 0 errors 0
                     TX: bytes 0 pkts 0 drops 0 errors 0

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ip: iplink_vxlan: add support to set vnifiltering flag on vxlan device

This patch adds option to set vnifilter flag on a vxlan device. vnifilter is
only supported on a collect metadata device.

example: set vnifilter flag
$ ip link add vxlan0 type vxlan external vnifilter local 172.16.0.1

Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: vxlan device vnifilter support

This patch adds bridge command to manage
recently added vnifilter on a collect metadata
vxlan device.

examples:
$bridge vni add dev vxlan0 vni 400

$bridge vni add dev vxlan0 vni 200 group 239.1.1.101

$bridge vni del dev vxlan0 vni 400

$bridge vni show

$bridge -s vni show

Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

libbpf: Remove use of bpf_map_is_offload_neutral

bpf_map_is_offload_neutral is deprecated as of v0.8+;
import definition to maintain backwards compatibility.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

libbpf: Remove use of bpf_program__set_priv and bpf_program__priv

bpf_program__set_priv and bpf_program__priv are deprecated as of
libbpf v0.7+. Rather than store the map as priv on the program,
change find_legacy_tail_calls to take an argument to return a reference
to the map.

find_legacy_tail_calls is invoked twice from load_bpf_object - the
first time to check for programs that should be loaded. In this case
a reference to the map is not needed, but it does validate the map
exists. The second is invoked from update_legacy_tail_call_maps where
the map pointer is needed.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

libbpf: Use bpf_object__load instead of bpf_object__load_xattr

bpf_object__load_xattr is deprecated as of v0.8+; remove it
in favor of bpf_object__load.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

libbpf: Remove use of bpf_map_is_offload_neutral

bpf_map_is_offload_neutral is deprecated as of v0.8+;
import definition to maintain backwards compatibility.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>

libbpf: Remove use of bpf_program__set_priv and bpf_program__priv

bpf_program__set_priv and bpf_program__priv are deprecated as of
libbpf v0.7+. Rather than store the map as priv on the program,
change find_legacy_tail_calls to take an argument to return a reference
to the map.

find_legacy_tail_calls is invoked twice from load_bpf_object - the
first time to check for programs that should be loaded. In this case
a reference to the map is not needed, but it does validate the map
exists. The second is invoked from update_legacy_tail_call_maps where
the map pointer is needed.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>

libbpf: Use bpf_object__load instead of bpf_object__load_xattr

bpf_object__load_xattr is deprecated as of v0.8+; remove it
in favor of bpf_object__load.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>

f_flower: add number of vlans man entry

The documentation was missing in the number of vlans commit.

Fixes: 5ba31bcf (f_flower: Add num of vlans parameter)
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'flower-vlans' into next

Boris Sukholitko  says:

====================

Our customers in the fiber telecom world have network configurations
where they would like to control their traffic according to the number
of tags appearing in the packet.

For example, TR247 GPON conformance test suite specification mostly
talks about untagged, single, double tagged packets and gives lax
guidelines on the vlan protocol vs. number of vlan tags.

This is different from the common IT networks where 802.1Q and 802.1ad
protocols are usually describe single and double tagged packet. GPON
configurations that we work with have arbitrary mix the above protocols
and number of vlan tags in the packet.

The following patch series implement number of vlans flower filter. They
add num_of_vlans flower filter as an alternative to vlan ethtype protocol
matching. The end result is that the following command becomes possible:

tc filter add dev eth1 ingress flower \
  num_of_vlans 1 vlan_prio 5 action drop

Also, from our logs, we have redirect rules such that:

tc filter add dev $GPON ingress flower num_of_vlans $N \
     action mirred egress redirect dev $DEV

where N can range from 0 to 3 and $DEV is the function of $N.

Also there are rules setting skb mark based on the number of vlans:

tc filter add dev $GPON ingress flower num_of_vlans $N vlan_prio \
    $P action skbedit mark $M

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

f_flower: Check args with num_of_vlans

Having more than one vlan allows matching on the vlan tag parameters.
This patch changes vlan key validation to take number of vlan tags into
account.

Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

f_flower: Add num of vlans parameter

Our customers in the fiber telecom world have network configurations
where they would like to control their traffic according to the number
of tags appearing in the packet.

For example, TR247 GPON conformance test suite specification mostly
talks about untagged, single, double tagged packets and gives lax
guidelines on the vlan protocol vs. number of vlan tags.

This is different from the common IT networks where 802.1Q and 802.1ad
protocols are usually describe single and double tagged packet. GPON
configurations that we work with have arbitrary mix the above protocols
and number of vlan tags in the packet.

This patch adds num_of_vlans flower key and associated print and parse
routines. The following command becomes possible:

tc filter add dev eth1 ingress flower num_of_vlans 1 action drop

Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'ip-stats' into next

Petr Machata  says:

====================

A new rtnetlink message, RTM_SETSTATS, has been added recently in kernel
commit ca0a53dcec94 ("Merge branch 'net-hw-counters-for-soft-devices'").

At the same time, RTM_GETSTATS has been around for a while. The users of
this API are spread in a couple different places: "ip link xstats" reads
stats from the IFLA_STATS_LINK_XSTATS and _XSTATS_SLAVE subgroups, "ip
link afstats" then reads IFLA_STATS_AF_SPEC.

Finally, to read IFLA_STATS_LINK_OFFLOAD_XSTATS, one would use ifstats.
This does not seem to be a good fit for IFLA_OFFLOAD_XSTATS_HW_S_INFO in
particular.

The obvious place to expose all these offload stats suites would be
under a new link subcommand "ip link offload_xstats", or similar, which
would then have syntax for both showing stats and setting them.

However, this looks like a good opportunity to introduce a new top-level
command, "ip stats", that would be the go-to place to access anything
backed by RTM_GETSTATS and RTM_SETSTATS.

This patchset therefore does the following:

- It adds the new "stats" infrastructure

- It adds specifically the ability to toggle and show the suites that
  were recently added to Linux, IFLA_OFFLOAD_XSTATS_HW_S_INFO and
  IFLA_OFFLOAD_XSTATS_L3_STATS.

- It adds support to dump IFLA_OFFLOAD_XSTATS_CPU_HIT, which was not
  available under "ip" at all.

- Does all this in a way that is easy to extend for new stats suites.

The patchset proceeds as follows:

- Patches #1 and #2 lay some groundwork and tweak existing code.

- Patch #3 adds the shell of the new "ip stats" command.

- Patch #4 adds "ip stats set" and the ability to toggle l3_stats in
  particular.

- Patch #5 adds "ip stats show", but no actual stats suites.

- Patches #6-#9 add support for showing individual stats suites:
  respectively, IFLA_STATS_LINK_64, IFLA_OFFLOAD_XSTATS_CPU_HIT,
  IFLA_OFFLOAD_XSTATS_HW_S_INFO and IFLA_OFFLOAD_XSTATS_L3_STATS.

- Patch #10 adds support for monitoring stats events to "ip monitor".

- Patch #11 adds man page verbiage for the above.

The plan is to contribute support for afstats and xstats in a follow-up
patch set.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

man: Add man pages for the "stats" functions

Add a man page for the new "stats" command.
Also mention the new "stats" group in ip-monitor.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipmonitor: Add monitoring support for stats events

Toggles and offloads of HW statistics cause emission of and RTM_NEWSTATS
event. Add support to "ip monitor" for these events.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Add offload subgroup "l3_stats"

Add into the group "offload" a subgroup "l3_stats" for showing
L3 statistics.

For example:

# ip stats show dev swp2.200 group offload subgroup l3_stats
4212: swp2.200: group offload subgroup l3_stats on used on
     RX: bytes packets errors dropped   mcast
          1920      21      1       0       0
     TX: bytes packets errors dropped
           756       9      0       0

# ip -j stats show dev swp2.200 group offload subgroup l3_stats | jq
[
   {
     "ifindex": 4212,
     "ifname": "swp2.200",
     "group": "offload",
     "subgroup": "l3_stats",
     "info": {
       "request": true,
       "used": true
     },
     "stats64": {
       "rx": {
         "bytes": 1920,
         "packets": 21,
         "errors": 1,
         "dropped": 0,
         "multicast": 0
       },
       "tx": {
         "bytes": 756,
         "packets": 9,
         "errors": 0,
         "dropped": 0
       }
     }
   }
]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Add offload subgroup "hw_stats_info"

Add into the group "offload" a subgroup "hw_stats_info" for showing
information about HW statistics counters.

For example:

# ip stats show dev swp1 group offload subgroup hw_stats_info
4178: swp1: group offload subgroup hw_stats_info
     l3_stats on used off
# ip -j stats show dev swp1 group offload subgroup hw_stats_info | jq
[
   {
     "ifindex": 4178,
     "ifname": "swp1",
     "group": "offload",
     "subgroup": "hw_stats_info",
     "info": {
       "l3_stats": {
         "request": true,
         "used": false
       }
     }
   }
]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Add a group "offload", subgroup "cpu_hit"

Add a new group, "offload", for showing counters from the
IFLA_STATS_LINK_OFFLOAD_XSTATS nest, and a subgroup "cpu_hit" for the
IFLA_OFFLOAD_XSTATS_CPU_HIT stats suite.

For example:

# ip stats show dev swp1 group offload subgroup cpu_hit
4178: swp1: group offload subgroup cpu_hit
     RX:  bytes packets errors dropped  missed   mcast
          45522     353      0       0       0       0
     TX:  bytes packets errors dropped carrier collsns
          46054     355      0       0       0       0

# ip -j stats show dev swp1 group offload subgroup cpu_hit | jq
[
   {
     "ifindex": 4178,
     "ifname": "swp1",
     "group": "offload",
     "subgroup": "cpu_hit",
     "stats64": {
       "rx": {
         "bytes": 45522,
         "packets": 353,
         "errors": 0,
         "dropped": 0,
         "over_errors": 0,
         "multicast": 0
       },
       "tx": {
         "bytes": 46054,
         "packets": 355,
         "errors": 0,
         "dropped": 0,
         "carrier_errors": 0,
         "collisions": 0
       }
     }
   }
]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Add a group "link"

Add the "link" top-level group for showing IFLA_STATS_LINK_64 statistics.
For example:

# ip stats show dev swp1 group link
4178: swp1: group link
     RX:  bytes packets errors dropped  missed   mcast
          47048     354      0       0       0      64
     TX:  bytes packets errors dropped carrier collsns
          47474     355      0       0       0       0

# ip -j stats show dev swp1 group link | jq
[
   {
     "ifindex": 4178,
     "ifname": "swp1",
     "group": "link",
     "stats64": {
       "rx": {
         "bytes": 47048,
         "packets": 354,
         "errors": 0,
         "dropped": 0,
         "over_errors": 0,
         "multicast": 64
       },
       "tx": {
         "bytes": 47474,
         "packets": 355,
         "errors": 0,
         "dropped": 0,
         "carrier_errors": 0,
         "collisions": 0
       }
     }
   }
]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Add a shell of "show" command

Add the scaffolding necessary for adding individual stats suites to show.

Expose some ipstats artifacts in ip_common.h. These will be used to support
"xstats" in a follow-up patchset.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ipstats: Add a "set" command

Add a command to allow toggling HW stats. An example usage:

# ip stats set dev swp1 l3_stats on

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ip: Add a new family of commands, "stats"

Add a core of a new frontend tool for interfacing with the RTM_*STATS
family of messages. The following patches will add subcommands for showing
and setting individual statistics suites.

Note that in this patch, "ip stats" is made to be an invalid command line.
This will be changed in later patches to default to "show" when that is
introduced.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ip: Publish functions for stats formatting

Formatting struct rtnl_link_stats64 will be useful outside of iplink.c as
well. Extract from __print_link_stats() a new function, print_stats64(),
make it non-static and publish in the header file.

Additionally, publish the helper size_columns(), which will be useful for
formatting the new struct rtnl_hw_stats64.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

libnetlink: Add filtering to rtnl_statsdump_req_filter()

A number of functions in the rtnl_*_req family accept a caller-provided
callback to set up arbitrary filtering. rtnl_statsdump_req_filter()
currently only allows setting a field in the IFSM header, not custom
attributes. So far these were not necessary, but with introduction of more
detailed filtering settings, the callback becomes necessary.

To that end, add a filter_fn and filter_data arguments to the function.
Unlike the other filters, this one is typed to expect an IFSM pointer, to
permit tweaking the header itself as well.

Pass NULLs in the existing callers.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

devlink: introduce -[he]x cmdline option to allow dumping numbers in hex format

For health reporter dumps it is quite convenient to have the numbers in
hexadecimal format. Introduce a command line option to allow user to
achieve that output.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit:
cc271ab86606 ("wwan_hwsim: Avoid flush_scheduled_work() usage")

Signed-off-by: David Ahern <dsahern@kernel.org>

devlink: fix "devlink health dump" command without arg

Fix bug when user calls "devlink health dump" without "show" or "clear":
$ devlink health dump
Command "(null)" not found

Put the dump command into a separate helper as it is usual in the rest
of the code. Also, treat no cmd as "show", as it is common for other
devlink objects.

Fixes: 041e6e651a8e ("devlink: Add devlink health dump show command")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip-link: put types on man page in alphabetic order

Lets try and keep man pages using alpha order, it looks like
it started that way then drifted.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

ip/iplink_virt_wifi: add support for virt_wifi

Add support for creating virt_wifi devices type.

Syntax:
$ ip link add link eth0 name wlan0 type virt_wifi

Signed-off-by: Baligh Gasmi <gasmibal@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

man: use quote instead of acute accent

Lintian complains:

I: iproute2: acute-accent-in-manual-page usr/share/man/man8/tc-bpf.8.gz:220

"This manual page uses the \' groff sequence. Usually, the intent to
generate an apostrophe, but that sequence actually renders as an acute
accent.
For an apostrophe or a single closing quote, use plain '. For single
opening quote, i.e. a straight downward line ' like the one used in
shell commands, use '\(aq'."

Before:

´s,c t f k,c t f k,c t f k,...´

After:

's,c t f k,c t f k,c t f k,...'

Signed-off-by: Luca Boccassi <bluca@debian.org>

man: 'allow to' -> 'allow one to'

Lintian warnings:

I: iproute2: typo-in-manual-page usr/share/man/man8/tc-ctinfo.8.gz line 61 "allow to" "allow one to"
I: iproute2: typo-in-manual-page usr/share/man/man8/tc-netem.8.gz line 70 "allow to" "allow one to"
I: iproute2: typo-in-manual-page usr/share/man/man8/tc-netem.8.gz line 90 "allow to" "allow one to"
I: iproute2: typo-in-manual-page usr/share/man/man8/tc-pedit.8.gz line 307 "allow to" "allow one to"
I: iproute2: typo-in-manual-page usr/share/man/man8/tc-skbmod.8.gz line 66 "allow to" "allow one to"
I: iproute2: typo-in-manual-page usr/share/man/man8/tc-vlan.8.gz line 48 "allow to" "allow one to"
I: iproute2: typo-in-manual-page usr/share/man/man8/tc.8.gz line 346 "allow to" "allow one to"
Signed-off-by: Luca Boccassi <bluca@debian.org>

uapi: upstream update to stddef.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

uapi: update from 5.18-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'ss-rpcinfo' into next

Andrea Claudi says:

====================

ss uses rpcinfo to get info about rpc service sockets. However, rpcinfo
is not part of iproute2 and it's an implicit dependency for ss.

This series uses libtirpc[1] API to implement the same feature of
rpcinfo for ss. This makes it possible to get info about rpc sockets,
provided ss is compiled with libtirpc support.

As a nice byproduct, this makes ss provide info about some ipv6 rpc
sockets that are not displayed using 'rpcinfo -p'.

- patch 1 adds a configure function to check for libtirpc;
- patch 2 actually rework ss to use libtirpc.

[1] https://git.linux-nfs.org/?p=steved/libtirpc.git

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

ss: remove an implicit dependency on rpcinfo

ss uses rpcinfo to get info about rpc services socket. This makes it
dependent on a tool not included in iproute2, and makes it impossible to
get info on rpc sockets if rpcinfo is not installed.

This reworks init_service_resolver() to use libtirpc, thus avoiding the
implicity dependency on rpcinfo. Moreover, this also makes it possible
to display info about ipv6 rpc socket that are not included in the
rpcinfo -p output.

For example, before this patch:
$ ss -rtap
LISTEN          0               5                                                        localhost:ipp                                        [::]:*                     users:(("cupsd",pid=1600,fd=9))
LISTEN          0               64                                                            [::]:34265                                      [::]:*
LISTEN          0               64                                                            [::]:rpc.nfs_acl                                [::]:*
LISTEN          0               128                                                           [::]:42253                                      [::]:*                     users:(("rpc.statd",pid=146164,fd=12))

After this patch:
$ ss -rtap
LISTEN          0               5                                                        localhost:ipp                                        [::]:*                     users:(("cupsd",pid=1600,fd=9))
LISTEN          0               64                                                            [::]:rpc.nlockmgr                               [::]:*
LISTEN          0               64                                                            [::]:rpc.nfs_acl                                [::]:*
LISTEN          0               128                                                           [::]:rpc.status                                 [::]:*                     users:(("rpc.statd",pid=146164,fd=12))

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

configure: add check_libtirpc()

This patch adds a configure function to check if libtirpc is installed
on the build system. If this is the case, it makes iproute2 to compile
with libtirpc support.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

v5.17.0

ip/geneve: add support for IFLA_GENEVE_INNER_PROTO_INHERIT

Add support for creating devices with this property.
Since it cannot be changed, not adding a [no] option.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'gtp-netdev' into next

Wojciech Drewek says:

====================

This patch series introduces GTP support to iproute2. Since this patch
series it is possible to create net devices of GTP type. Then, those
devices can be used in tc in order to offload GTP packets. New field
in tc flower (gtp_opts) can be used to match on QFI and PDU type.

Kernel changes (merged):
https://lore.kernel.org/netdev/164708701228.11169.15700740251869229843.git-patchwork-notify@kernel.org/

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

f_flower: Implement gtp options support

Add support for parsing TCA_FLOWER_KEY_ENC_OPTS_GTP.
Options are as follows: PDU_TYPE:QFI where each
option is represented as 8-bit hexadecimal value.

e.g.
  # ip link add gtp_dev type gtp role sgsn
  # tc qdisc add dev gtp_dev ingress
  # tc filter add dev gtp_dev protocol ip parent ffff: \
      flower \
        enc_key_id 11 \
        gtp_opts 1:8/ff:ff \
      action mirred egress redirect dev eth0

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ip: GTP support in ip link

Support for creating GTP devices through ip link. Two arguments
can be specified by the user when adding device of the GTP type.
- role (sgsn or ggsn) - indicates whether we are on the GGSN or SGSN
- hsize - indicates the size of the hash table where PDP sessions
are stored

IFLA_GTP_FD0 and IFLA_GTP_FD1 arguments would not be provided. Those
are file descriptores to the sockets created in the userspace. Since
we are not going to create sockets in ip link, we don't have to
provide them.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

man: bridge: document per-port mcast_router settings

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>

bridge: support for controlling mcast_router per port

The bridge vlan command supports setting mcast_router per-port and
per-vlan, what's however missing is the ability to set the per-port
mcast_router options, e.g. when VLAN filtering is disabled.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>

Update kernel headers

Update kernel headers to commit:
092d992b76ed ("Merge tag 'mlx5-updates-2022-03-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux")

Signed-off-by: David Ahern <dsahern@kernel.org>

vdpa: Update man page with added support to configure max vq pair

Update man page to include information how to configure the max
virtqueue pairs for a vdpa device when creating one.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

link_xfrm: if_id must be non zero

Since kernel upstream commit
8dce43919566 ("xfrm: interface with if_id 0 should return error")
if_id must be non zero.

Fix the usage and return error for if_id 0.

Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

testsuite: link xfrm delete no if_id test

Since kernel commit 8dce43919566 ("xfrm: interface with if_id 0 should return error")
if_id should be non zero.
Delete the test without if_id, which defaulted if_id to zero.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

vdpa: Support reading device features

When showing the available management devices, check if
VDPA_ATTR_DEV_SUPPORTED_FEATURES feature is available and print the
supported features for a management device.

Examples:
$ vdpa mgmtdev show
auxiliary/mlx5_core.sf.1:
  supported_classes net
  max_supported_vqs 257
  dev_features CSUM GUEST_CSUM MTU HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ MQ \
               CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM

$ vdpa -jp mgmtdev show
{
    "mgmtdev": {
        "auxiliary/mlx5_core.sf.1": {
            "supported_classes": [ "net" ],
            "max_supported_vqs": 257,
            "dev_features": [
"CSUM","GUEST_CSUM","MTU","HOST_TSO4","HOST_TSO6","STATUS","CTRL_VQ","MQ",\
"CTRL_MAC_ADDR","VERSION_1","ACCESS_PLATFORM" ]
        }
    }
}

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

vdpa: Support for configuring max VQ pairs for a device

Use VDPA_ATTR_DEV_MGMTDEV_MAX_VQS to specify max number of virtqueue
pairs to configure for a vdpa device when adding a device.

Examples:
1. Create a device with 3 virtqueue pairs:
$ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 3

2. Read the configuration of a vdpa device
$ vdpa dev config show vdpa-a
  vdpa-a: mac 00:00:00:00:88:88 link up link_announce false max_vq_pairs 3 \
          mtu 1500
  negotiated_features CSUM GUEST_CSUM MTU MAC HOST_TSO4 HOST_TSO6 STATUS \
                      CTRL_VQ MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

vdpa: Allow for printing negotiated features of a device

When reading the configuration of a vdpa device, check if the
VDPA_ATTR_DEV_NEGOTIATED_FEATURES is available. If it is, parse the
feature bits and print a string representation of each of the feature
bits.

We keep the strings in two different arrays. One for net device related
devices and one for generic feature bits.

In this patch we parse only net device specific features. Support for
other devices can be added later. If the device queried is not a net
device, we print its bit number only.

Examples:
1. Standard presentation
$ vdpa dev config show vdpa-a
vdpa-a: mac 00:00:00:00:88:88 link up link_announce false max_vq_pairs 2 mtu 9000
  negotiated_features CSUM GUEST_CSUM MTU MAC HOST_TSO4 HOST_TSO6 STATUS \
CTRL_VQ MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM

2. json output
$ vdpa -j dev config show vdpa-a
{"config":{"vdpa-a":{"mac":"00:00:00:00:88:88","link":"up","link_announce":false,\
"max_vq_pairs":2,"mtu":9000,"negotiated_features":["CSUM","GUEST_CSUM",\
"MTU","MAC","HOST_TSO4","HOST_TSO6","STATUS","CTRL_VQ","MQ","CTRL_MAC_ADDR",\
"VERSION_1","ACCESS_PLATFORM"]}}}

3. Pretty json
$ vdpa -jp dev config show vdpa-a
{
    "config": {
        "vdpa-a": {
            "mac": "00:00:00:00:88:88",
            "link ": "up",
            "link_announce ": false,
            "max_vq_pairs": 2,
            "mtu": 9000,
            "negotiated_features": [
"CSUM","GUEST_CSUM","MTU","MAC","HOST_TSO4","HOST_TSO6","STATUS","CTRL_VQ",\
"MQ","CTRL_MAC_ADDR","VERSION_1","ACCESS_PLATFORM" ]
        }
    }
}

Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

vdpa: Remove unsupported command line option

"-v[erbose]" option is not supported.
Remove it.

Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>