]> git.ipfire.org Git - thirdparty/iproute2.git/log
thirdparty/iproute2.git
5 days agoip: ipmaddr.c: Fix possible integer underflow in read_igmp() main
Anton Moryakov [Sun, 20 Jul 2025 15:38:43 +0000 (18:38 +0300)] 
ip: ipmaddr.c: Fix possible integer underflow in read_igmp()

Static analyzer pointed out a potential error:

Possible integer underflow: left operand is tainted. An integer underflow
may occur due to arithmetic operation (unsigned subtraction) between variable
'len' and value '1', when 'len' is tainted { [0, 18446744073709551615] }

The fix adds a check for 'len == 0' before accessing the last character of
the name, and skips the current line in such cases to avoid the underflow.

Reported-by: SVACE static analyzer
Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 days agomisc: fix memory leak in ifstat.c
Anton Moryakov [Sat, 19 Jul 2025 10:42:12 +0000 (13:42 +0300)] 
misc: fix memory leak in ifstat.c

A memory leak was detected by the static analyzer SVACE in the function
get_nlmsg_extended(). The issue occurred when parsing extended interface
statistics failed due to a missing nested attribute. In this case,
memory allocated for 'n->name' via strdup() was not freed before returning,
resulting in a leak.

The fix adds an explicit 'free(n->name)' call before freeing the containing
structure in the error path.

Reported-by: SVACE static analyzer
Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 days agomisc: ss.c: fix logical error in main function
Anton Moryakov [Sat, 19 Jul 2025 16:31:22 +0000 (19:31 +0300)] 
misc: ss.c: fix logical error in main function

In the line if (!dump_tcpdiag) { there was a logical error
in checking the descriptor, which the static analyzer complained
about (this action is always false)

fixed by replacing !dump_tcpdiag with !dump_fp

Reported-by: SVACE static analyzer
Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
6 days agobridge: fdb: Add support for FDB activity notification control
Ido Schimmel [Thu, 17 Jul 2025 13:05:09 +0000 (16:05 +0300)] 
bridge: fdb: Add support for FDB activity notification control

Add support for FDB activity notification control [1].

Users can use this to enable activity notifications on a new FDB entry
that was learned on an ES (Ethernet Segment) peer and mark it as locally
inactive:

 # bridge fdb add 00:11:22:33:44:55 dev bond1 master static activity_notify inactive
 $ bridge -d fdb get 00:11:22:33:44:55 br br1
 00:11:22:33:44:55 dev bond1 activity_notify inactive master br1 static
 $ bridge -d -j -p fdb get 00:11:22:33:44:55 br br1
 [ {
         "mac": "00:11:22:33:44:55",
         "ifname": "bond1",
         "activity_notify": true,
         "inactive": true,
         "flags": [ ],
         "master": "br1",
         "state": "static"
     } ]

User space will receive a notification when the entry becomes active and
the control plane will be able to mark the entry as locally active.

It is also possible to enable activity notifications on an existing
dynamic entry:

 $ bridge -d -s -j -p fdb get 00:aa:bb:cc:dd:ee br br1
 [ {
         "mac": "00:aa:bb:cc:dd:ee",
         "ifname": "bond1",
         "used": 8,
         "updated": 8,
         "flags": [ ],
         "master": "br1",
         "state": ""
     } ]
 # bridge fdb replace 00:aa:bb:cc:dd:ee dev bond1 master static activity_notify norefresh
 $ bridge -d -s -j -p fdb get 00:aa:bb:cc:dd:ee br br1
 [ {
         "mac": "00:aa:bb:cc:dd:ee",
         "ifname": "bond1",
         "activity_notify": true,
         "used": 3,
         "updated": 23,
         "flags": [ ],
         "master": "br1",
         "state": "static"
     } ]

The "norefresh" keyword is used to avoid resetting the entry's last
active time (i.e., "updated" time).

User space will receive a notification when the entry becomes inactive
and the control plane will be able to mark the entry as locally
inactive. Note that the entry was converted from a dynamic entry to a
static entry to prevent the kernel from automatically deleting it upon
inactivity.

An existing inactive entry can only be marked as active by the kernel or
by disabling and enabling activity notifications:

 $ bridge -d fdb get 00:11:22:33:44:55 br br1
 00:11:22:33:44:55 dev bond1 activity_notify inactive master br1 static
 # bridge fdb replace 00:11:22:33:44:55 dev bond1 master static activity_notify
 $ bridge -d fdb get 00:11:22:33:44:55 br br1
 00:11:22:33:44:55 dev bond1 activity_notify inactive master br1 static
 # bridge fdb replace 00:11:22:33:44:55 dev bond1 master static
 # bridge fdb replace 00:11:22:33:44:55 dev bond1 master static activity_notify
 $ bridge -d fdb get 00:11:22:33:44:55 br br1
 00:11:22:33:44:55 dev bond1 activity_notify master br1 static

Marking an entry as inactive while activity notifications are disabled
does not make sense and will be rejected by the kernel:

 # bridge fdb replace 00:11:22:33:44:55 dev bond1 master static inactive
 RTNETLINK answers: Invalid argument

[1] https://lore.kernel.org/netdev/20200623204718.1057508-1-nikolay@cumulusnetworks.com/

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
6 days agoMerge remote-tracking branch 'main/main' into next
David Ahern [Mon, 28 Jul 2025 16:45:23 +0000 (16:45 +0000)] 
Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
6 days agoUpdate kernel headers
David Ahern [Mon, 28 Jul 2025 16:44:18 +0000 (16:44 +0000)] 
Update kernel headers

Update kernel headers to commit:
    fa582ca7e187 ("dpll: zl3073x: Fix build failure")

Signed-off-by: David Ahern <dsahern@kernel.org>
6 days agodevlink: Update TC bandwidth parsing
Carolina Jubran [Mon, 28 Jul 2025 15:44:38 +0000 (18:44 +0300)] 
devlink: Update TC bandwidth parsing

Kernel commit 1bbdb81a9836 ("devlink: Fix excessive stack usage in rate TC bandwidth parsing")
introduced a dedicated attribute set (DEVLINK_RATE_TC_ATTR_*) for entries nested
under DEVLINK_ATTR_RATE_TC_BWS.

Update the parser to reflect this change by validating the nested
attributes and sync the UAPI header to include the changes.

Fixes: c83d1477f8b2 ("Add support for 'tc-bw' attribute in devlink-rate")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
7 days agov6.16.0 v6.16.0
Stephen Hemminger [Mon, 28 Jul 2025 05:22:17 +0000 (22:22 -0700)] 
v6.16.0

3 weeks agoAdd support for 'tc-bw' attribute in devlink-rate
Carolina Jubran [Fri, 4 Jul 2025 12:27:53 +0000 (15:27 +0300)] 
Add support for 'tc-bw' attribute in devlink-rate

Introduce a new attribute 'tc-bw' to devlink-rate, allowing users to
set the bandwidth allocation per traffic class. The new attribute
enables fine-grained QoS configurations by assigning relative bandwidth
shares to each traffic class, supporting more precise traffic shaping,
which helps in achieving more precise bandwidth management across
traffic streams.

Add support for configuring 'tc-bw' via the devlink userspace utility
and parse the 'tc-bw' arguments for accurate bandwidth assignment per
traffic class.

This feature supports 8 traffic classes as defined by the IEEE 802.1Qaz
standard.

Example commands:
- devlink port function rate add pci/0000:08:00.0/group \
  tx_share 10Gbit tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0

- devlink port function rate set pci/0000:08:00.0/group \
  tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 weeks agoUpdate kernel headers
David Ahern [Fri, 11 Jul 2025 16:40:36 +0000 (16:40 +0000)] 
Update kernel headers

Update kernel headers to commit:
    fadd1e6231b1 ("Merge branch 'hv-msi-parent-domain' into main")

Signed-off-by: David Ahern <dsahern@kernel.org>
4 weeks agoMerge remote-tracking branch 'main/main' into next
David Ahern [Wed, 2 Jul 2025 14:52:48 +0000 (14:52 +0000)] 
Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
4 weeks agoip neigh: Add support for "extern_valid" flag
Ido Schimmel [Tue, 1 Jul 2025 14:42:16 +0000 (17:42 +0300)] 
ip neigh: Add support for "extern_valid" flag

Add support for the recently added "extern_valid" flag that can be used
to indicate to the kernel that a neighbor entry was learned and
determined to be valid externally. The kernel will not remove or
invalidate the entry, but it can probe the entry and notify user space
when the entry becomes reachable. The kernel will return the entry to
stale state if it did not receive a confirmation after probing the
entry.

Example usage and output:

 # ip neigh add 192.0.2.1 nud none dev br0.10 extern_valid
 Error: Cannot create externally validated neighbor with an invalid state.
 # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid
 $ ip neigh show dev br0.10
 192.0.2.1 lladdr 00:11:22:33:44:55 extern_valid STALE
 $ ip -j -p neigh show dev br0.10
 [ {
         "dst": "192.0.2.1",
         "lladdr": "00:11:22:33:44:55",
         "extern_valid": null,
         "state": [ "STALE" ]
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
4 weeks agoUpdate kernel headers
David Ahern [Wed, 2 Jul 2025 14:49:00 +0000 (14:49 +0000)] 
Update kernel headers

Update kernel headers to commit:
    e96ee511c906 ("net: tulip: Rename PCI driver struct to end in _driver")

Signed-off-by: David Ahern <dsahern@kernel.org>
4 weeks agoMerge branch 'bridge-mcast-state-vlan' into next
David Ahern [Wed, 2 Jul 2025 14:36:20 +0000 (14:36 +0000)] 
Merge branch 'bridge-mcast-state-vlan' into next

Fabian Pfitzner  says:

====================

Dump the multicast querier state per vlan.
This commit is almost identical to [1].

The querier state can be seen with:

bridge -d vlan global

The options for vlan filtering and vlan mcast snooping have to be enabled
in order to see the output:

ip link set [dev] type bridge mcast_vlan_snooping 1 vlan_filtering 1

The querier state shows the following information for IPv4 and IPv6
respectively:

1) The ip address of the current querier in the network. This could be
   ourselves or an external querier.
2) The port on which the querier was seen
3) Querier timeout in seconds

[1] https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=16aa4494d7fc6543e5e92beb2ce01648b79f8fa2

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
4 weeks agobridge: refactor bridge mcast querier function
Fabian Pfitzner [Wed, 25 Jun 2025 08:39:15 +0000 (10:39 +0200)] 
bridge: refactor bridge mcast querier function

Make code more readable and consistent with other functions.

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Fabian Pfitzner <f.pfitzner@pengutronix.de>
Signed-off-by: David Ahern <dsahern@kernel.org>
4 weeks agobridge: dump mcast querier per vlan
Fabian Pfitzner [Wed, 25 Jun 2025 08:39:14 +0000 (10:39 +0200)] 
bridge: dump mcast querier per vlan

Dump the multicast querier state per vlan.
This commit is almost identical to [1].

The querier state can be seen with:

bridge -d vlan global

The options for vlan filtering and vlan mcast snooping have to be enabled
in order to see the output:

ip link set [dev] type bridge mcast_vlan_snooping 1 vlan_filtering 1

The querier state shows the following information for IPv4 and IPv6
respectively:

1) The ip address of the current querier in the network. This could be
   ourselves or an external querier.
2) The port on which the querier was seen
3) Querier timeout in seconds

[1] https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=16aa4494d7fc6543e5e92beb2ce01648b79f8fa2

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Fabian Pfitzner <f.pfitzner@pengutronix.de>
Signed-off-by: David Ahern <dsahern@kernel.org>
4 weeks agobridge: move mcast querier dumping code into a shared function
Fabian Pfitzner [Wed, 25 Jun 2025 08:39:13 +0000 (10:39 +0200)] 
bridge: move mcast querier dumping code into a shared function

Put mcast querier dumping code into a shared function. This function
will be called from the bridge utility in a later patch.

Adapt the code such that the vtb parameter is used
instead of tb[IFLA_BR_MCAST_QUERIER_STATE].

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Fabian Pfitzner <f.pfitzner@pengutronix.de>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 weeks agouapi: update from 6.16-rc4
Stephen Hemminger [Mon, 30 Jun 2025 05:32:31 +0000 (22:32 -0700)] 
uapi: update from 6.16-rc4

MPTCP comments changed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 weeks agobond: fix stack smash in xstats
Stephen Hemminger [Thu, 26 Jun 2025 13:50:17 +0000 (06:50 -0700)] 
bond: fix stack smash in xstats

Building with stack smashing detection finds an off by one
in the bond xstats attribute parsing.

$ ip link xstats type bond dev bond0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
bond0
                    LACPDU Rx 0
                    LACPDU Tx 0
                    LACPDU Unknown type Rx 0
                    LACPDU Illegal Rx 0
                    Marker Rx 0
                    Marker Tx 0
                    Marker response Rx 0
                    Marker response Tx 0
                    Marker unknown type Rx 0
*** stack smashing detected ***: terminated

Program received signal SIGABRT, Aborted.

Reported-by: z30015464 <zhongxuan2@huawei.com>
Fixes: 440c5075d662 ("ip: bond: add xstats support")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
6 weeks agoip: VXLAN: Add support for IFLA_VXLAN_MC_ROUTE
Petr Machata [Wed, 18 Jun 2025 15:44:43 +0000 (17:44 +0200)] 
ip: VXLAN: Add support for IFLA_VXLAN_MC_ROUTE

The flag controls whether underlay packets should be MC-routed or (default)
sent to the indicated physical netdevice.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
6 weeks agoUpdate kernel headers
David Ahern [Sun, 22 Jun 2025 16:51:13 +0000 (16:51 +0000)] 
Update kernel headers

Update kernel headers to commit:
    14966a8df77e ("selftest: add selftest for anycast notifications")

Signed-off-by: David Ahern <dsahern@kernel.org>
7 weeks agoMerge remote-tracking branch 'main/main' into next
David Ahern [Mon, 16 Jun 2025 02:16:40 +0000 (02:16 +0000)] 
Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
7 weeks agoMerge branch 'bridge-vlan-stats' into next
David Ahern [Mon, 16 Jun 2025 02:15:27 +0000 (02:15 +0000)] 
Merge branch 'bridge-vlan-stats' into next

Petr Machata  says:

====================

ip stats displays bridge-related multicast and STP stats, but not VLAN
stats. There is code for requesting, decoding and formatting these stats
accessible through `bridge -s vlan', but the `ip stats' suite lacks it. In
this patchset, extract the `bridge vlan' code to a generally accessible
place and extend `ip stats' to use it.

This reuses the existing display and JSON format, and plugs it into the
existing `ip stats' hierarchy:

 # ip stats show dev v2 group xstats_slave subgroup bridge suite vlan
 2: v2: group xstats_slave subgroup bridge suite vlan
                   10
                     RX: 3376 bytes 50 packets
                     TX: 2824 bytes 44 packets

                   20
                     RX: 684 bytes 7 packets
                     TX: 0 bytes 0 packets

 # ip -j -p stats show dev v2 group xstats_slave subgroup bridge suite vlan
 [ {
         "ifindex": 2,
         "ifname": "v2",
         "group": "xstats_slave",
         "subgroup": "bridge",
         "suite": "vlan",
         "vlans": [ {
                 "vid": 10,
                 "rx_bytes": 3376,
                 "rx_packets": 50,
                 "tx_bytes": 2824,
                 "tx_packets": 44
             },{
                 "vid": 20,
                 "rx_bytes": 684,
                 "rx_packets": 7,
                 "tx_bytes": 0,
                 "tx_packets": 0
             } ]
     } ]

Similarly for the master stats:

 # ip stats show dev br1 group xstats subgroup bridge suite vlan
 211: br1: group xstats subgroup bridge suite vlan
                   10
                     RX: 3376 bytes 50 packets
                     TX: 2824 bytes 44 packets

                   20
                     RX: 684 bytes 7 packets
                     TX: 0 bytes 0 packets

 # ip -j -p stats show dev br1 group xstats subgroup bridge suite vlan
 [ {
         "ifindex": 211,
         "ifname": "br1",
         "group": "xstats",
         "subgroup": "bridge",
         "suite": "vlan",
         "vlans": [ {
                 "vid": 10,
                 "flags": [ ],
                 "rx_bytes": 3376,
                 "rx_packets": 50,
                 "tx_bytes": 2824,
                 "tx_packets": 44
             },{
                 "vid": 20,
                 "flags": [ ],
                 "rx_bytes": 684,
                 "rx_packets": 7,
                 "tx_bytes": 0,
                 "tx_packets": 0
             } ]
     } ]

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
7 weeks agoip: iplink_bridge: Support bridge VLAN stats in `ip stats'
Petr Machata [Tue, 10 Jun 2025 15:51:27 +0000 (17:51 +0200)] 
ip: iplink_bridge: Support bridge VLAN stats in `ip stats'

Add support for displaying bridge VLAN statistics in `ip stats'.
Reuse the existing `bridge vlan' display and JSON format:

 # ip stats show dev v2 group xstats_slave subgroup bridge suite vlan
 2: v2: group xstats_slave subgroup bridge suite vlan
                   10
                     RX: 3376 bytes 50 packets
                     TX: 2824 bytes 44 packets

                   20
                     RX: 684 bytes 7 packets
                     TX: 0 bytes 0 packets

 # ip -j -p stats show dev v2 group xstats_slave subgroup bridge suite vlan
 [ {
         "ifindex": 2,
         "ifname": "v2",
         "group": "xstats_slave",
         "subgroup": "bridge",
         "suite": "vlan",
         "vlans": [ {
                 "vid": 10,
                 "rx_bytes": 3376,
                 "rx_packets": 50,
                 "tx_bytes": 2824,
                 "tx_packets": 44
             },{
                 "vid": 20,
                 "rx_bytes": 684,
                 "rx_packets": 7,
                 "tx_bytes": 0,
                 "tx_packets": 0
             } ]
     } ]

Similarly for the master stats:

 # ip stats show dev br1 group xstats subgroup bridge suite vlan
 211: br1: group xstats subgroup bridge suite vlan
                   10
                     RX: 3376 bytes 50 packets
                     TX: 2824 bytes 44 packets

                   20
                     RX: 684 bytes 7 packets
                     TX: 0 bytes 0 packets

 # ip -j -p stats show dev br1 group xstats subgroup bridge suite vlan
 [ {
         "ifindex": 211,
         "ifname": "br1",
         "group": "xstats",
         "subgroup": "bridge",
         "suite": "vlan",
         "vlans": [ {
                 "vid": 10,
                 "flags": [ ],
                 "rx_bytes": 3376,
                 "rx_packets": 50,
                 "tx_bytes": 2824,
                 "tx_packets": 44
             },{
                 "vid": 20,
                 "flags": [ ],
                 "rx_bytes": 684,
                 "rx_packets": 7,
                 "tx_bytes": 0,
                 "tx_packets": 0
             } ]
     } ]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
7 weeks agolib: bridge: Add a module for bridge-related helpers
Petr Machata [Tue, 10 Jun 2025 15:51:26 +0000 (17:51 +0200)] 
lib: bridge: Add a module for bridge-related helpers

`ip stats' displays a range of bridge_slave-related statistics, but not
the VLAN stats. `bridge vlan' actually has code to show these. Extract the
code to libutil so that it can be reused between the bridge and ip stats
tools.

Rename them reasonably so as not to litter the global namespace.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
7 weeks agoip: ip_common: Drop ipstats_stat_desc_xstats::inner_max
Petr Machata [Tue, 10 Jun 2025 15:51:25 +0000 (17:51 +0200)] 
ip: ip_common: Drop ipstats_stat_desc_xstats::inner_max

After the previous patch, this field is not read anymore. Drop it.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
7 weeks agoip: ipstats: Iterate all xstats attributes
Petr Machata [Tue, 10 Jun 2025 15:51:24 +0000 (17:51 +0200)] 
ip: ipstats: Iterate all xstats attributes

ipstats_stat_desc_show_xstats() operates by first parsing the attribute
stream into a type-indexed table, and then accessing the right attribute.
But bridge VLAN stats are given as several BRIDGE_XSTATS_VLAN attributes,
one per VLAN. With the above approach to parsing, only one of these
attributes would be shown. Instead, iterate the stream of attributes and
call the show_cb for each one with a matching type.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
8 weeks agouapi: update headers to 6.16-rc1
Stephen Hemminger [Mon, 9 Jun 2025 01:37:06 +0000 (18:37 -0700)] 
uapi: update headers to 6.16-rc1

Change to bpf.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 weeks agoParse FQ band weights correctly
Hemanth Malla [Thu, 5 Jun 2025 15:56:07 +0000 (08:56 -0700)] 
Parse FQ band weights correctly

Currently, NEXT_ARG() is called twice resulting in the first
weight being skipped. This results in the following errors:

$ sudo tc qdisc replace dev enP64183s1 root fq weights 589824 196608 65536
Not enough elements in weights

$ sudo tc qdisc replace dev enP64183s1 root fq weights 589824 196608 65536 nopacing
Illegal "weights" element, positive number expected

Fixes: 567eb4e41045 ("tc: fq: add TCA_FQ_WEIGHTS handling")
Signed-off-by: Hemanth Malla <vmalla@microsoft.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
8 weeks agouapi: update headers
Stephen Hemminger [Thu, 5 Jun 2025 15:53:52 +0000 (08:53 -0700)] 
uapi: update headers

Update headers from 6.16 pre rc1.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 months agoip: support setting multiple features
Stanislav Fomichev [Tue, 27 May 2025 21:55:06 +0000 (14:55 -0700)] 
ip: support setting multiple features

Commit a043bea75002 ("ip route: add support for TCP usec TS") added
support for tcp_usec_ts but the existing code was not adjusted
to handle multiple features in the same invocation:

$ ip route add .. dev .. features tcp_usec_ts ecn
Error: either "to" is duplicate, or "ecn" is garbage.

The code exits the while loop as soon as it encounters any feature,
make it more flexible. Tested with the following:

$ ip route add .. dev .. features tcp_usec_ts ecn
$ ip route add .. dev .. features tcp_usec_ts ecn quickack 1

Cc: Stephen Hemminger <stephen@networkplumber.org>
Fixes: a043bea75002 ("ip route: add support for TCP usec TS")
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 months agoip: filter by group before printing
Jean Thomas [Tue, 20 May 2025 14:02:48 +0000 (16:02 +0200)] 
ip: filter by group before printing

Filter the output using the requested group, if necessary.

This avoids to print an empty JSON object for each existing item
not matching the group filter when the --json option is used.

Before:
$ ip --json address list group test
[{},{},{},{},{},{},{},{},{},{},{},{}]

After:
$ ip --json address list group test
[]

Signed-off-by: Jean Thomas <jean.thomas@wifirst.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2 months agoMerge remote-tracking branch 'main/main' into next
David Ahern [Thu, 29 May 2025 01:57:16 +0000 (01:57 +0000)] 
Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
2 months agov6.15.0 v6.15.0
Stephen Hemminger [Mon, 26 May 2025 15:19:09 +0000 (08:19 -0700)] 
v6.15.0

2 months agoiproute2: bugfix - restore ip monitor backward compatibility.
Yuyang Huang [Fri, 23 May 2025 03:25:18 +0000 (12:25 +0900)] 
iproute2: bugfix - restore ip monitor backward compatibility.

The current ip monitor implementation fails on older kernels that lack
newer RTNLGRP_* definitions. As ip monitor is expected to maintain
backward compatibility, this commit updates the code to check if errno
is not EINVAL when rtnl_add_nl_group() fails. This change restores ip
monitor's backward compatibility with older kernel versions.

Cc: David Ahern <dsahern@kernel.org>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Reported-by: Adel Belhouane <bugs.a.b@free.fr>
Fixes: 19514606dce3 ("iproute2: add 'ip monitor maddress' support")
Closes: https://lore.kernel.org/netdev/CADXeF1GgJ_1tee3hc7gca2Z21Lyi3mzxq52sSfMg3mFQd2rGWQ@mail.gmail.com/T/#t
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Tested-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 months agouapi: update bpf.h
Stephen Hemminger [Tue, 20 May 2025 14:44:32 +0000 (07:44 -0700)] 
uapi: update bpf.h

Minor comment from upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2 months agoMerge remote-tracking branch 'main/main' into next
David Ahern [Tue, 13 May 2025 17:55:23 +0000 (17:55 +0000)] 
Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
2 months agoip ntable: Add support for "mcast_reprobes" parameter
Ido Schimmel [Thu, 8 May 2025 11:13:01 +0000 (14:13 +0300)] 
ip ntable: Add support for "mcast_reprobes" parameter

Kernel commit 8da86466b837 ("net: neighbour: Add mcast_resolicit to
configure the number of multicast resolicitations in PROBE state.")
added the "NDTPA_MCAST_REPROBES" netlink attribute that allows user
space to set / get the number of multicast probes that are sent by the
kernel in PROBE state after unicast probes did not solicit a response.

Add support for this parameter in iproute2.

Example usage and output:

 $ ip ntable show dev dummy0 name arp_cache
 inet arp_cache
     dev dummy0
     refcnt 1 reachable 43430 base_reachable 30000 retrans 1000
     gc_stale 60000 delay_probe 5000 queue 101
     app_probes 0 ucast_probes 3 mcast_probes 3 mcast_reprobes 0
     anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000

 # ip ntable change name arp_cache dev dummy0 mcast_reprobes 5
 $ ip ntable show dev dummy0 name arp_cache
 inet arp_cache
     dev dummy0
     refcnt 1 reachable 43430 base_reachable 30000 retrans 1000
     gc_stale 60000 delay_probe 5000 queue 101
     app_probes 0 ucast_probes 3 mcast_probes 3 mcast_reprobes 5
     anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000

 $ ip -j -p ntable show dev dummy0 name arp_cache
 [ {
         "family": "inet",
         "name": "arp_cache",
         "dev": "dummy0",
         "refcnt": 1,
         "reachable": 43430,
         "base_reachable": 30000,
         "retrans": 1000,
         "gc_stale": 60000,
         "delay_probe": 5000,
         "queue": 101,
         "app_probes": 0,
         "ucast_probes": 3,
         "mcast_probes": 3,
         "mcast_reprobes": 5,
         "anycast_delay": 1000,
         "proxy_delay": 800,
         "proxy_queue": 64,
         "locktime": 1000
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 months agoiplink_bridge: Add mdb_offload_fail_notification
Joseph Huang [Tue, 15 Apr 2025 14:43:06 +0000 (10:43 -0400)] 
iplink_bridge: Add mdb_offload_fail_notification

Add mdb_offload_fail_notification option support.

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 months agobridge: mdb: Support offload failed flag
Joseph Huang [Tue, 15 Apr 2025 14:43:05 +0000 (10:43 -0400)] 
bridge: mdb: Support offload failed flag

Add support for the MDB_FLAGS_OFFLOAD_FAILED flag to indicate that
an attempt to offload an mdb entry to switchdev has failed.

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
3 months agoUpdate kernel headers
David Ahern [Tue, 22 Apr 2025 22:37:26 +0000 (22:37 +0000)] 
Update kernel headers

Update kernel headers to commit
    45bd443bfd86 ("net: 802: Remove unused p8022 code")

Signed-off-by: David Ahern <dsahern@kernel.org>
3 months agonstat: NULL Dereference when no entries specified
ZiAo Li [Wed, 9 Apr 2025 15:03:30 +0000 (23:03 +0800)] 
nstat: NULL Dereference when no entries specified

The NULL Pointer Dereference vulnerability happens in load_ugly_table(), misc/nstat.c, in the latest version of iproute2.
The vulnerability can be triggered by:
1. db is set to NULL at struct nstat_ent *db = NULL;
2. n is set to NULL at n = db;
3. NULL dereference of variable n happens at sscanf(p+1, "%llu", &n->val) != 1

Signed-off-by: ZiAo Li <23110240084@m.fudan.edu.cn>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 months agoMAINTAINERS: update bridge entry
Nikolay Aleksandrov [Sat, 5 Apr 2025 10:25:04 +0000 (13:25 +0300)] 
MAINTAINERS: update bridge entry

Sync with the kernel and update the bridge entry with the current bridge
maintainers. Roopa decided to withdraw and Ido has agreed to step in.

Link: https://lore.kernel.org/netdev/20250314100631.40999-1-razor@blackwall.org/
CC: Roopa Prabhu <roopa@nvidia.com>
CC: Ido Schimmel <idosch@nvidia.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: David Ahern <dsahern@kernel.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
3 months agouapi: update from 6.15-rc1
Stephen Hemminger [Mon, 7 Apr 2025 15:01:46 +0000 (08:01 -0700)] 
uapi: update from 6.15-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
3 months agoMerge branch 'color' into next
David Ahern [Sun, 6 Apr 2025 16:55:30 +0000 (16:55 +0000)] 
Merge branch 'color' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
4 months agocolor: Do not use dark blue in dark-background palette
Ben Hutchings [Wed, 26 Mar 2025 14:08:56 +0000 (15:08 +0100)] 
color: Do not use dark blue in dark-background palette

In GNOME Terminal's default dark colour schemes, the default (dark)
blue on a black background is barely readable.  Light blue is
significantly more readable to me, and is also easily readable on a
white background.

In Konsole, rxvt, and xterm, I can see little if any difference
between dark and light blue in the default dark colour schemes.

So replace dark blue with light blue in the dark-background palette.

Signed-off-by: Ben Hutchings <benh@debian.org>
4 months agocolor: Assume background is dark if unknown
Ben Hutchings [Wed, 26 Mar 2025 14:08:29 +0000 (15:08 +0100)] 
color: Assume background is dark if unknown

We rely on the COLORFGBG environment variable to tell us whether the
background is dark.  This variable is set by Konsole and rxvt but not
by GNOME Terminal or xterm.  This means we use the wrong set of
colours when GNOME Terminal or xterm is configured with a dark
background.

It appears to me that the dark-background colour palette works better
on a light background than vice versa.  So it is better to assume a
dark background if we cannot find this out from $COLORFGBG.

- Change the initial value of is_dark_bg to 1.
- In set_color_palette(). conditinally set is_dark_bg to 0 with an
  inverted test of the colour.

Signed-off-by: Ben Hutchings <benh@debian.org>
4 months agoip: display the 'netns-immutable' property
Nicolas Dichtel [Fri, 28 Mar 2025 09:58:26 +0000 (10:58 +0100)] 
ip: display the 'netns-immutable' property

The user needs to specify '-details' to have it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
4 months agoUpdate kernel headers
David Ahern [Tue, 1 Apr 2025 03:34:30 +0000 (03:34 +0000)] 
Update kernel headers

Update kernel headers to commit
    1a9239bb4253 ("Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")

Signed-off-by: David Ahern <dsahern@kernel.org>
4 months agoMerge remote-tracking branch 'main/main' into next
David Ahern [Tue, 1 Apr 2025 03:25:01 +0000 (03:25 +0000)] 
Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
4 months agoMerge ../iproute2-next
Stephen Hemminger [Thu, 27 Mar 2025 15:36:27 +0000 (08:36 -0700)] 
Merge ../iproute2-next

4 months agov6.14.0 v6.14.0
Stephen Hemminger [Mon, 24 Mar 2025 16:04:44 +0000 (09:04 -0700)] 
v6.14.0

4 months agocolor: Handle NO_COLOR environment variable in default_color_opt()
Ben Hutchings [Wed, 19 Mar 2025 21:51:57 +0000 (22:51 +0100)] 
color: Handle NO_COLOR environment variable in default_color_opt()

The NO_COLOR environment variable is a widely supported way for users
to disable coloured text output.  See <https://no-color.org/>.  In
case iproute2 is configured to use colours by default, allow this to
be overridden by setting NO_COLOR.

This is done in default_color_opt() so that colours can still be
explicitly enabled with a command-line option.

Signed-off-by: Ben Hutchings <benh@debian.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
4 months agocolor: Introduce and use default_color_opt() function
Ben Hutchings [Wed, 19 Mar 2025 21:51:01 +0000 (22:51 +0100)] 
color: Introduce and use default_color_opt() function

As a preparatory step for supporting the NO_COLOR environment
variable, replace the direct use of CONF_COLOR with a
default_color_opt() function which initially returns CONF_COLOR.

Signed-off-by: Ben Hutchings <benh@debian.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
4 months agoMerge remote-tracking branch 'main/main' into next
David Ahern [Mon, 24 Mar 2025 02:47:51 +0000 (02:47 +0000)] 
Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
4 months agoMerge branch 'rdma-optional-counters' into next
David Ahern [Mon, 24 Mar 2025 02:47:33 +0000 (02:47 +0000)] 
Merge branch 'rdma-optional-counters' into next

Patrisious Haddad  says:

====================

Add optional-counters binding support together with new packets/bytes
counters. Previously optional-counters were on a per link basis, this
series allows users to bind optional-counters to a specific counter,
which allows tracking optional-counter over a specific QP group.

The support is added for both binding modes, automatic and manual,
in both cases the bound optional counters are those that are currently
configured over the link when trying to bind the QP.

In addition introduce four new optional-counters :
rdma_tx_bytes, rdma_tx_packets, rdma_rx_bytes, rdma_rx_packets
That just as their name implies allow tracking RDMA egress and ingress
traffic.

This is exposed to users through the iproute2 package which needs to be
updated as well to provide the support for this feature.

Example commands:
- rdma stat set link rocep8s0f0/1 optional-counters
  rdma_tx_bytes,rdma_rx_packets
        Enables rdma_tx_bytes and rdma_rx_packets optional-counters over
        the link.

- rdma stat qp set link rocep8s0f0/1 auto type on optional-counters on
        Enabled link automatic counter binding for QPs of same type,
        with optional-counter binding support.

- rdma stat qp bind link rocep8s0f0/1 lqpn 134
        Manually bind QP number 134 to all available counters.

- rdma stat qp bind link rocep8s0f0/1 lqpn 134 cntn 4
        Manually bind QP number 134 to counter number 4 depending on its
        configured counters.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
4 months agordma: Add optional-counter option to rdma stat bind commands
Patrisious Haddad [Wed, 19 Mar 2025 08:25:26 +0000 (10:25 +0200)] 
rdma: Add optional-counter option to rdma stat bind commands

Add a new optional filter named optional-counter to commands:
rdma stat qp set link [link_name] auto

The new filter value can be either on or off and it must be the last
provided filter in the command, not providing it would be the same as off.

It indicates that when binding counters to a QP we also want the
currently enabled optional-counters on the link to be bound as well.

In addition Adjust rdma statistic man page to reflect the new
optional-counter changes.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
4 months agordma: update uapi headers
Patrisious Haddad [Wed, 19 Mar 2025 08:25:25 +0000 (10:25 +0200)] 
rdma: update uapi headers

Update rdma_netlink.h file upto kernel commit da3711074f52
("RDMA/core: Add support to optional-counters binding configuration")

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
4 months agotc: nat: ffs should operation on host byte ordered data
Torben Nielsen [Thu, 6 Mar 2025 11:25:20 +0000 (12:25 +0100)] 
tc: nat: ffs should operation on host byte ordered data

In print_nat the mask length is calculated as

len = ffs(sel->mask);
len = len ? 33 - len : 0;

The mask is stored in network byte order, it should be converted
to host byte order before calculating first bit set.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 months agotc: nat: Fix mask calculation
Torben Nielsen [Thu, 6 Mar 2025 11:25:19 +0000 (12:25 +0100)] 
tc: nat: Fix mask calculation

In parse_nat_args the network mask is calculated as

        sel->mask = htonl(~0u << (32 - addr.bitlen));

According to  ISO/IEC 9899:TC3 6.5.7 Bitwise shift operators:
The integer promotions are performed on each of the operands.
The type of the result is that of the promoted left operand.
If the value of the right operand is negative or is greater
than or equal to the width of the promoted left operand,
the behavior is undefined

Specifically this means that the mask is undefined for
addr.bitlen = 0
On x86_64 the result is 0xffffffff, on armv7l it is 0.

Promoting the left operand of the shift operator solves this issue.

Signed-off-by: Torben Nielsen <torben.nielsen@prevas.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 months agoss: mptcp: subflow: display seq counters as decimal
Matthieu Baerts (NGI0) [Wed, 26 Feb 2025 11:08:44 +0000 (12:08 +0100)] 
ss: mptcp: subflow: display seq counters as decimal

This is similar to commit cfa70237 ("ss: mptcp: display seq related
counters as decimal") but for the subflow info this time. This is also
aligned with what is printed for TCP sockets.

That looks better to do the same with the subflow info (ss -ti), to
compare with the MPTCP info (ss -Mi), or for those who want to easily
count how many bytes have been exchanged between two runs without having
to think in hexa.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
4 months agoman: document tunnel options in ip-route.8.in
Xin Long [Mon, 10 Mar 2025 14:35:46 +0000 (10:35 -0400)] 
man: document tunnel options in ip-route.8.in

Add missing tunnel options [GENEVE_OPTS | VXLAN_OPTS | ERSPAN_OPTS] and
their descriptions to ip-route.8.in.

Additionally, document each parameter of the ip ENCAPHDR section, aligning
it with the style used for other ENCAPHDR descriptions. Most of the
content is adapted from tc-tunnel_key.8 for consistency.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 months agoAdd OVN to rt_protos
Jonas Gottlieb [Wed, 26 Feb 2025 10:28:28 +0000 (11:28 +0100)] 
Add OVN to rt_protos

- OVN is using ID 84 for routes it installs
- Kernel has accepted 84 in `rtnetlink.h`
- For more information: https://github.com/ovn-org/ovn

Signed-off-by: Jonas Gottlieb <jonas.gottlieb@stackit.cloud>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agoiprule: Add DSCP mask support
Ido Schimmel [Tue, 25 Feb 2025 09:09:17 +0000 (11:09 +0200)] 
iprule: Add DSCP mask support

Add DSCP mask support, allowing users to specify a DSCP value with an
optional mask. Example:

 # ip rule add dscp 1 table 100
 # ip rule add dscp 0x02/0x3f table 200
 # ip rule add dscp AF42/0x3f table 300
 # ip rule add dscp 0x10/0x30 table 400

In non-JSON output, the DSCP mask is not printed in case of exact match
and the DSCP value is printed in hexadecimal format in case of inexact
match:

 $ ip rule show
 0:      from all lookup local
 32762:  from all lookup 400 dscp 0x10/0x30
 32763:  from all lookup 300 dscp AF42
 32764:  from all lookup 200 dscp 2
 32765:  from all lookup 100 dscp 1
 32766:  from all lookup main
 32767:  from all lookup default

Dump can be filtered by DSCP value and mask:

 $ ip rule show dscp 1
 32765:  from all lookup 100 dscp 1
 $ ip rule show dscp AF42
 32763:  from all lookup 300 dscp AF42
 $ ip rule show dscp 0x10/0x30
 32762:  from all lookup 400 dscp 0x10/0x30

In JSON output, the DSCP mask is printed as an hexadecimal string to be
consistent with other masks. The DSCP value is printed as an integer in
order not to break existing scripts:

 $ ip -j -p -N rule show dscp 0x10/0x30
 [ {
         "priority": 32762,
         "src": "all",
         "table": "400",
         "dscp": "16",
         "dscp_mask": "0x30"
     } ]

The mask attribute is only sent to the kernel in case of inexact match
so that iproute2 will continue working with kernels that do not support
the attribute.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agoiprule: Add port mask support
Ido Schimmel [Tue, 25 Feb 2025 09:09:16 +0000 (11:09 +0200)] 
iprule: Add port mask support

Add port mask support, allowing users to specify a source or destination
port with an optional mask. Example:

 # ip rule add sport 80 table 100
 # ip rule add sport 90/0xffff table 200
 # ip rule add dport 1000-2000 table 300
 # ip rule add sport 0x123/0xfff table 400
 # ip rule add dport 0x4/0xff table 500
 # ip rule add dport 0x8/0xf table 600
 # ip rule del dport 0x8/0xf table 600

In non-JSON output, the mask is not printed in case of exact match:

 $ ip rule show
 0:      from all lookup local
 32761:  from all dport 0x4/0xff lookup 500
 32762:  from all sport 0x123/0xfff lookup 400
 32763:  from all dport 1000-2000 lookup 300
 32764:  from all sport 90 lookup 200
 32765:  from all sport 80 lookup 100
 32766:  from all lookup main
 32767:  from all lookup default

Dump can be filtered by port value and mask:

 $ ip rule show sport 80
 32765:  from all sport 80 lookup 100
 $ ip rule show sport 90
 32764:  from all sport 90 lookup 200
 $ ip rule show sport 0x123/0x0fff
 32762:  from all sport 0x123/0xfff lookup 400
 $ ip rule show dport 4/0xff
 32761:  from all dport 0x4/0xff lookup 500

In JSON output, the port mask is printed as an hexadecimal string to be
consistent with other masks. The port value is printed as an integer in
order not to break existing scripts:

 $ ip -j -p rule show sport 0x123/0xfff table 400
 [ {
         "priority": 32762,
         "src": "all",
         "sport": 291,
         "sport_mask": "0xfff",
         "table": "400"
     } ]

The mask attribute is only sent to the kernel in case of inexact match
so that iproute2 will continue working with kernels that do not support
the attribute.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agoiprule: Allow specifying ports in hexadecimal notation
Ido Schimmel [Tue, 25 Feb 2025 09:09:15 +0000 (11:09 +0200)] 
iprule: Allow specifying ports in hexadecimal notation

This will be useful when enabling port masks in the next patch.

Before:

 # ip rule add sport 0x1 table 100
 Invalid "sport"

After:

 # ip rule add sport 0x1 table 100
 $ ip rule show sport 0x1
 32765:  from all sport 1 lookup 100

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agoiprule: Move port parsing to a function
Ido Schimmel [Tue, 25 Feb 2025 09:09:14 +0000 (11:09 +0200)] 
iprule: Move port parsing to a function

In preparation for adding port mask support, move port parsing to a
function.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agotc: Fix rounding in tc_calc_xmittime and tc_calc_xmitsize.
Jonathan Lennox [Wed, 26 Feb 2025 18:53:21 +0000 (18:53 +0000)] 
tc: Fix rounding in tc_calc_xmittime and tc_calc_xmitsize.

Currently, tc_calc_xmittime and tc_calc_xmitsize round from double to
int three times — once when they call tc_core_time2tick /
tc_core_tick2time (whose argument is int), once when those functions
return (their return value is int), and then finally when the tc_calc_*
functions return.  This leads to extremely granular and inaccurate
conversions.

As a result, for example, on my test system (where tick_in_usec=15.625,
clock_factor=1, and hz=1000000000) for a bitrate of 1Gbps, all tc htb
burst values between 0 and 999 bytes get encoded as 0 ticks; all values
between 1000 and 1999 bytes get encoded as 15 ticks (equivalent to 960
bytes); all values between 2000 and 2999 bytes as 31 ticks (1984 bytes);
etc.

The patch changes the code so these calculations are done internally in
floating-point, and only rounded to integer values when the value is
returned. It also changes tc_calc_xmittime to round its calculated value
up, rather than down, to ensure that the calculated time is actually
sufficient for the requested size.

Signed-off-by: Jonathan Lennox <jonathan.lennox@8x8.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agoUpdate kernel headers
David Ahern [Fri, 28 Feb 2025 15:39:16 +0000 (15:39 +0000)] 
Update kernel headers

Update kernel headers to commit
    56794b5862c5 ("Merge branch 'mlx5-health-syndrome'")

Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agoip: link: netkit: Support scrub options
Jordan Rife [Wed, 26 Feb 2025 17:06:13 +0000 (09:06 -0800)] 
ip: link: netkit: Support scrub options

Add "scrub" option to configure IFLA_NETKIT_SCRUB and
IFLA_NETKIT_PEER_SCRUB when setting up a link. Add "scrub" and
"peer scrub" to device details as well when printing.

$ sudo ./ip/ip link add jordan type netkit scrub default peer scrub none
$ ./ip/ip -details link show jordan
43: jordan@nk0: <BROADCAST,MULTICAST,NOARP,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
    netkit mode l3 type primary policy forward peer policy forward scrub default peer scrub none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536

v2->v3: Updated man page.
v1->v2: Added some spaces around "scrub SCRUB" in the help message.

Link: https://lore.kernel.org/netdev/20241004101335.117711-1-daniel@iogearbox.net/
Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agoREADME.devel: clarify patch rules and syntax
Michal Koutný [Fri, 21 Feb 2025 09:29:27 +0000 (10:29 +0100)] 
README.devel: clarify patch rules and syntax

Patches should follow kernel style and have Signed-Off-By:

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
5 months agoss: Tone down cgroup path resolution
Michal Koutný [Fri, 21 Feb 2025 09:29:26 +0000 (10:29 +0100)] 
ss: Tone down cgroup path resolution

Sockets and cgroups have different lifetimes (e.g. fd passing between
cgroups) so obtaining a cgroup id to a removed cgroup dir is not an
error. Furthermore, the message is printed for each such a socket (which
is redundant each such socket's cgroup is shown as 'unreachable').
Improve user experience by silencing these specific errors.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
5 months agolib: remove redundant checks in get_u64 and get_s64
Anton Moryakov [Mon, 17 Feb 2025 16:21:53 +0000 (19:21 +0300)] 
lib: remove redundant checks in get_u64 and get_s64

Static analyzer reported:
1. if (res > 0xFFFFFFFFFFFFFFFFULL)
Expression 'res > 0xFFFFFFFFFFFFFFFFULL' is always false , which may be caused by a logical error:
'res' has a type 'unsigned long long' with minimum value '0' and a maximum value '18446744073709551615'

2. if (res > INT64_MAX || res < INT64_MIN)
Expression 'res > INT64_MAX' is always false , which may be caused by a logical error: 'res' has a type 'long long'
with minimum value '-9223372036854775808' and a maximum value '9223372036854775807'
Expression 'res < INT64_MIN' is always false , which may be caused by a logical error: 'res' has a type 'long long'
with minimum value '-9223372036854775808' and a maximum value '9223372036854775807'

Corrections explained:
- Removed redundant check `res > 0xFFFFFFFFFFFFFFFFULL` in `get_u64`,
  as `res` cannot exceed this value due to its type.
- Removed redundant checks `res > INT64_MAX` and `res < INT64_MIN` in `get_s64`,
  as `res` cannot exceed the range of `long long`.

Triggers found by static analyzer Svace.

Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agoip: remove duplicate condition in ila_csum_name2mode in
Anton Moryakov [Mon, 17 Feb 2025 16:21:52 +0000 (19:21 +0300)] 
ip: remove duplicate condition in ila_csum_name2mode in

Static analyzer reported:
expression is identical to previous conditio

Corrections explained:
The condition checking for "neutral-map-auto" was duplicated in the
ila_csum_name2mode function. This commit removes the redundant check
to improve code readability and maintainability.

Triggers found by static analyzer Svace.

Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agoip: handle NULL return from localtime in strxf_time in
Anton Moryakov [Mon, 17 Feb 2025 16:21:51 +0000 (19:21 +0300)] 
ip: handle NULL return from localtime in strxf_time in

Static analyzer reported:
Pointer 'tp', returned from function 'localtime' at ipxfrm.c:352, may be NULL
and is dereferenced at ipxfrm.c:354 by calling function 'strftime'.

Corrections explained:
The function localtime() may return NULL if the provided time value is
invalid. This commit adds a check for NULL and handles the error case
by copying "invalid-time" into the output buffer.
Unlikely, but may return an error

Triggers found by static analyzer Svace.

Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agoip: check return value of iproute_flush_cache() in irpoute.c
Anton Moryakov [Mon, 17 Feb 2025 16:21:50 +0000 (19:21 +0300)] 
ip: check return value of iproute_flush_cache() in irpoute.c

Static analyzer reported:
Return value of function 'iproute_flush_cache', called at iproute.c:1732,
is not checked. The return value is obtained from function 'open64' and possibly contains an error code.

Corrections explained:
The function iproute_flush_cache() may return an error code, which was
previously ignored. This could lead to unexpected behavior if the cache
flush fails. Added error handling to ensure the function fails gracefully
when iproute_flush_cache() returns an error.

Triggers found by static analyzer Svace.

Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agotc_util: Add support for 64-bit hardware packets counter
Ido Schimmel [Tue, 4 Feb 2025 12:31:43 +0000 (14:31 +0200)] 
tc_util: Add support for 64-bit hardware packets counter

The netlink nest that carriers tc action statistics looks as follows:

[TCA_ACT_STATS]
[TCA_STATS_BASIC]
[TCA_STATS_BASIC_HW]

Where 'TCA_STATS_BASIC' carries the combined software and hardware
packets (32-bits) and bytes (64-bit) counters and 'TCA_STATS_BASIC_HW'
carries the hardware statistics.

When the number of packets exceeds 0xffffffff, the kernel emits the
'TCA_STATS_PKT64' attribute:

[TCA_ACT_STATS]
[TCA_STATS_BASIC]
[TCA_STATS_PKT64]
[TCA_STATS_BASIC_HW]
[TCA_STATS_PKT64]

This layout is not ideal as the only way for user space to know what
each 'TCA_STATS_PKT64' attribute carries is to check which attribute
precedes it, which is exactly what some applications are doing [1].

Do the same in iproute2 so that users with existing kernels could read
the 64-bit hardware packets counter of tc actions instead of reading the
truncated 32-bit counter.

Before:

$ tc -s filter show dev swp2 ingress
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
  skip_sw
  in_hw in_hw_count 1
        action order 1: mirred (Egress Redirect to device swp1) stolen
        index 1 ref 1 bind 1 installed 47 sec used 23 sec
        Action statistics:
        Sent 368689092544 bytes 5760767071 pkt (dropped 0, overlimits 0 requeues 0)
        Sent software 0 bytes 0 pkt
        Sent hardware 368689092544 bytes 1465799775 pkt
        backlog 0b 0p requeues 0
        used_hw_stats immediate

Where 5760767071 - 1465799775 = 0x100000000

After:

$ tc -s filter show dev swp2 ingress
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
  skip_sw
  in_hw in_hw_count 1
        action order 1: mirred (Egress Redirect to device swp1) stolen
        index 1 ref 1 bind 1 installed 71 sec used 47 sec
        Action statistics:
        Sent 368689092544 bytes 5760767071 pkt (dropped 0, overlimits 0 requeues 0)
        Sent software 0 bytes 0 pkt
        Sent hardware 368689092544 bytes 5760767071 pkt
        backlog 0b 0p requeues 0
        used_hw_stats immediate

[1] https://github.com/openvswitch/ovs/commit/006e1c6dbfbadf474c17c8fa1ea358918d371588

Reported-by: Joe Botha <joe@atomic.ac>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agoMerge remote-tracking branch 'main/main' into next
David Ahern [Fri, 7 Feb 2025 21:00:44 +0000 (21:00 +0000)] 
Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>
5 months agouapi: update bpf.h
Stephen Hemminger [Tue, 4 Feb 2025 14:59:30 +0000 (06:59 -0800)] 
uapi: update bpf.h

Autogenerated from 6.14-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 months agoMerge ssh://gitolite.kernel.org/pub/scm/network/iproute2/iproute2-next
Stephen Hemminger [Tue, 21 Jan 2025 15:03:48 +0000 (07:03 -0800)] 
Merge ssh://gitolite.kernel.org/pub/scm/network/iproute2/iproute2-next

6 months agov6.13.0 v6.13.0
Stephen Hemminger [Mon, 20 Jan 2025 18:49:12 +0000 (10:49 -0800)] 
v6.13.0

6 months agoip: vxlan: Support IFLA_VXLAN_RESERVED_BITS
Petr Machata [Mon, 20 Jan 2025 15:43:06 +0000 (16:43 +0100)] 
ip: vxlan: Support IFLA_VXLAN_RESERVED_BITS

A new attribute, IFLA_VXLAN_RESERVED_BITS, was added in Linux kernel
commit 6c11379b104e ("vxlan: Add an attribute to make VXLAN header
validation configurable") (See the link below for the full patchset).

The payload is a 64-bit binary field that covers the VXLAN header. The set
bits indicate which bits in a VXLAN packet header should be allowed to
carry 1's. Support the new attribute through a CLI keyword "reserved_bits".

Link: https://patch.msgid.link/173378643250.273075.13832548579412179113.git-patchwork-notify@kernel.org
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
6 months agoiproute2: add 'ip monitor acaddress' support
Yuyang Huang [Fri, 17 Jan 2025 03:20:41 +0000 (12:20 +0900)] 
iproute2: add 'ip monitor acaddress' support

Enhanced the 'ip monitor' command to track changes in IPv6
anycast addresses. This update allows the command to listen for
events related to anycast address additions and deletions by
registering to the newly introduced RTNLGRP_IPV6_ACADDR netlink group.

This patch depends on the kernel patch that adds RTNLGRP_IPV6_ACADDR
being merged first.

Here is an example usage:

root@uml-x86-64:/# ip monitor acaddress
2: if2    inet6 any 2001:db8:7b:0:528e:a53a:9224:c9c5 scope global
       valid_lft forever preferred_lft forever
Deleted 2: if2    inet6 any 2001:db8:7b:0:528e:a53a:9224:c9c5 scope global
       valid_lft forever preferred_lft forever

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
6 months agoUpdate kernel headers
David Ahern [Mon, 20 Jan 2025 00:19:42 +0000 (00:19 +0000)] 
Update kernel headers

Update kernel headers to commit:
59372af69d4d ("Merge tag 'batadv-next-pullrequest-20250117' of git://git.open-mesh.org/linux-merge")

Signed-off-by: David Ahern <dsahern@kernel.org>
6 months agoiproute2: Fix grammar in duplicate argument error message
Neil Svedberg [Sat, 28 Dec 2024 22:33:46 +0000 (17:33 -0500)] 
iproute2: Fix grammar in duplicate argument error message

Change "is a garbage" to "is garbage". Because garbage is a collective
noun, it does not need the indefinite article.

Signed-off-by: Neil Svedberg <neil.svedberg@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
6 months agouapi: update kernel headers
Stephen Hemminger [Tue, 7 Jan 2025 23:18:42 +0000 (15:18 -0800)] 
uapi: update kernel headers

Update for 6.13-rc6

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
7 months agoiprule: Add flow label support
Ido Schimmel [Mon, 30 Dec 2024 08:58:10 +0000 (10:58 +0200)] 
iprule: Add flow label support

Add support for 'flowlabel' selector in ip-rule.

Rules can be added with or without a mask in which case exact match is
used:

 # ip -6 rule add flowlabel 0x12345 table 100
 # ip -6 rule add flowlabel 0x11/0xff table 200
 # ip -6 rule add flowlabel 0x54321 table 300
 # ip -6 rule del flowlabel 0x54321 table 300

Dump output:

 $ ip -6 rule show
 0:      from all lookup local
 32764:  from all lookup 200 flowlabel 0x11/0xff
 32765:  from all lookup 100 flowlabel 0x12345
 32766:  from all lookup main

Dump can be filtered by flow label value and mask:

 $ ip -6 rule show flowlabel 0x12345
 32765:  from all lookup 100 flowlabel 0x12345
 $ ip -6 rule show flowlabel 0x11/0xff
 32764:  from all lookup 200 flowlabel 0x11/0xff

JSON output:

 $ ip -6 -j -p rule show flowlabel 0x12345
 [ {
         "priority": 32765,
         "src": "all",
         "table": "100",
         "flowlabel": "0x12345",
         "flowlabel_mask": "0xfffff"
     } ]
 $ ip -6 -j -p rule show flowlabel 0x11/0xff
 [ {
         "priority": 32764,
         "src": "all",
         "table": "200",
         "flowlabel": "0x11",
         "flowlabel_mask": "0xff"
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
7 months agoip: route: Add IPv6 flow label support
Ido Schimmel [Mon, 30 Dec 2024 08:58:09 +0000 (10:58 +0200)] 
ip: route: Add IPv6 flow label support

Allow specifying an IPv6 flow label when performing a route lookup.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
7 months agotc: fq: add support for TCA_FQ_OFFLOAD_HORIZON attribute
Eric Dumazet [Mon, 30 Dec 2024 19:47:57 +0000 (19:47 +0000)] 
tc: fq: add support for TCA_FQ_OFFLOAD_HORIZON attribute

In linux-6.13, we added the ability to offload pacing on
capable devices.

tc qdisc add ... fq ... offload_horizon 100ms

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
7 months agoUpdate kernel headers
David Ahern [Wed, 1 Jan 2025 00:55:59 +0000 (00:55 +0000)] 
Update kernel headers

Update kernel headers to commit:
    9268abe611b0 ("Merge branch 'net-lan969x-add-rgmii-support'")

Signed-off-by: David Ahern <dsahern@kernel.org>
7 months agoman: fix two small typos on xdp manipulations
Alexis Lothoré [Fri, 20 Dec 2024 16:56:01 +0000 (17:56 +0100)] 
man: fix two small typos on xdp manipulations

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
7 months agoiproute2: add 'ip monitor maddress' support
Yuyang Huang [Wed, 11 Dec 2024 08:24:53 +0000 (17:24 +0900)] 
iproute2: add 'ip monitor maddress' support

Enhanced the 'ip monitor' command to track changes in IPv4 and IPv6
multicast addresses. This update allows the command to listen for
events related to multicast address additions and deletions by
registering to the newly introduced RTNLGRP_IPV4_MCADDR and
RTNLGRP_IPV6_MCADDR netlink groups.

This patch depends on the kernel patch that adds RTNLGRP_IPV4_MCADDR
and RTNLGRP_IPV6_MCADDR being merged first.

Here is an example usage:

root@uml-x86-64:/# ip monitor maddress
9: nettest123    inet6 mcast ff01::1 scope global
       valid_lft forever preferred_lft forever
9: nettest123    inet6 mcast ff02::1 scope global
       valid_lft forever preferred_lft forever
9: nettest123    inet mcast 224.0.0.1 scope global
       valid_lft forever preferred_lft forever
9: nettest123    inet6 mcast ff02::1:ff00:7b01 scope global
       valid_lft forever preferred_lft forever
Deleted 9: nettest123    inet mcast 224.0.0.1 scope global
       valid_lft forever preferred_lft forever
Deleted 9: nettest123    inet6 mcast ff02::1:ff00:7b01 scope global
       valid_lft forever preferred_lft forever
Deleted 9: nettest123    inet6 mcast ff02::1 scope global
       valid_lft forever preferred_lft forever

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
7 months agoUpdate kernel headers
David Ahern [Mon, 16 Dec 2024 03:03:25 +0000 (03:03 +0000)] 
Update kernel headers

Update kernel headers to commit:
  92c932b9946c ("Merge branch 'mptcp-pm-userspace-misc-cleanups'")

Signed-off-by: David Ahern <dsahern@kernel.org>
7 months agoip: link: rmnet: add support for flag handling
Robert Marko [Fri, 13 Dec 2024 12:51:00 +0000 (13:51 +0100)] 
ip: link: rmnet: add support for flag handling

Extend the current rmnet support to allow enabling or disabling
IFLA_RMNET_FLAGS via ip link as well as printing the current settings.

Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: David Ahern <dsahern@kernel.org>
7 months agouapi: remove no longer used linux/limits.h
Stephen Hemminger [Thu, 12 Dec 2024 22:18:35 +0000 (14:18 -0800)] 
uapi: remove no longer used linux/limits.h

Code is now using limits.h instead.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
7 months agoflower: replace XATTR_SIZE_MAX
Stephen Hemminger [Thu, 12 Dec 2024 22:15:59 +0000 (14:15 -0800)] 
flower: replace XATTR_SIZE_MAX

The flower tc parser was using XATTR_SIZE_MAX from linux/limits.h,
but this constant is intended to before extended filesystem attributes
not for TC.  Replace it with a local define.

This fixes issue on systems with musl and XATTR_SIZE_MAX is not
defined in limits.h there.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
7 months agocg_map: use limits.h
Stephen Hemminger [Thu, 12 Dec 2024 19:29:44 +0000 (11:29 -0800)] 
cg_map: use limits.h

Prefer limits.h from system headers over linux/limits.h
Fixes build with musl.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
7 months agoip: rearrange and prune header files
Stephen Hemminger [Tue, 10 Dec 2024 21:38:08 +0000 (13:38 -0800)] 
ip: rearrange and prune header files

The recent report of issues with missing limits.h impacting musl
suggested looking at what files are and are not included in ip code.

The standard practice is to put standard headers first, then system,
then local headers. Used iwyu to get suggestions about missing
and extraneous headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
7 months agordma: add missing header for basename
Stephen Hemminger [Thu, 12 Dec 2024 19:21:56 +0000 (11:21 -0800)] 
rdma: add missing header for basename

The function basename prototype is in libgen.h
Fixes build on musl

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
7 months agolibnetlink: add missing endian.h
Stephen Hemminger [Thu, 12 Dec 2024 19:24:18 +0000 (11:24 -0800)] 
libnetlink: add missing endian.h

Need endian.h to get htobe64 with musl.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>