git.ipfire.org Git - thirdparty/iproute2.git/log

netshaper: fix build failure

netshaper fails to build from sources with this error:

$ make
netshaper
CC netshaper.o
LINK netshaper
/usr/bin/ld: ../lib/libutil.a(utils_math.o): in function `get_rate':
utils_math.c:(.text+0x97): undefined reference to `floor'
/usr/bin/ld: ../lib/libutil.a(utils_math.o): in function `get_size64':
utils_math.c:(.text+0x2a8): undefined reference to `floor'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:10: netshaper] Error 1
make: *** [Makefile:81: all] Error 2

Fix this simply linking against the math C library, similarly to what we
already did with commit 1a22ad2721fb ("build: Fix link errors on some
systems").

Fixes: 6f7779ad4ef6 ("netshaper: Add netshaper command")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

devlink: fix devlink flash error reporting

Currently, devlink silently exits when a non-existent device is specified
for flashing or when the user lacks sufficient permissions. This makes it
hard to diagnose the problem.

Print an appropriate error message in these cases to improve user feedback.

Prior:
$ devlink dev flash foo/bar file test
$ sudo devlink dev flash foo/bar file test
$

After patch:
$ devlink/devlink dev flash foo/bar file test
devlink answers: Operation not permitted
$ sudo devlink/devlink dev flash foo/bar file test
devlink answers: No such device

Fixes: 9b13cddfe268 ("devlink: implement flash status monitoring")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Revert "mptcp: add implicit flag to the 'ip mptcp' inline help"

This reverts commit 14749b22dd8f2246511c6622c2a4646adfc5b184.
Only kernel can create implicit endpoints.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

mptcp: add implicit flag to the 'ip mptcp' inline help

ip mptcp supports the implicit flag since commit 3a2535a41854 ("mptcp:
add support for implicit flag"), however this flag is not listed in the
command inline help.

Add the implicit flag to the inline help.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

dcb: fix tc-maxrate unit conversions

The ieee_maxrate UAPI is defined as kbps, but dcb_maxrate uses Bps.
This fix patch converts Bps to kbps for parse by dividing 125,
and convert kbps to Bps for print_rate() by multiplying 125.

Fixes: 117939d9bd89 ("dcb: Add a subtool for the DCB maxrate object")
Signed-off-by: Yijing Zeng <yijingzeng@meta.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

netshaper: update include files

Use iwyu to make sure all includes are listed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

netshaper: Add netshaper command

Add support for the netshaper Generic Netlink family to
iproute2. Introduce a new command for configuring netshaper
parameters directly from userspace.

This interface allows users to set shaping attributes which
are passed to the kernel to perform the corresponding netshaper
operation.

Example usage:
$netshaper { set | show | delete } dev DEV \
           handle scope SCOPE [id ID] \
           [ bw-max BW_MAX ]

Internally, this triggers a kernel call to apply the shaping
configuration to the specified network device.

Currently, the tool supports the following functionalities:
- Setting bandwidth in Mbps, enabling bandwidth clamping for
  a network device that support netshaper operations.
- Deleting the current configuration.
- Querying the existing configuration.

Additional netshaper operations will be integrated into the tool
as per requirement.

This change enables easy and scriptable configuration of bandwidth
shaping for devices that use the netshaper Netlink family.

Corresponding net-next patches:
1) https://lore.kernel.org/all/cover.1728460186.git.pabeni@redhat.com/
2) https://lore.kernel.org/lkml/1750144656-2021-1-git-send-email-ernis@linux.microsoft.com/

Install pkg-config and libmnl* packages to print kernel extack
errors to stdout.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

uapi: update headers to 6.18-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

lib: bridge: avoid redefinition of in6_addr

On musl libc, which does not use the kernel definitions of in6_addr, including
the libc headers after the kernel (UAPI) headers would cause a redefinition
error. The opposite order avoids the redefinition.

Fixes: 9e89d5b94d749f37525cd8778311e1c9f28f172a
Signed-off-by: Yureka <yureka@cyberchaos.dev>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge ../iproute2-next

ip/bond: add broadcast_neighbor support

This option has no effect in modes other than 802.3ad mode.
When this option enabled, the bond device will broadcast ARP/ND
packets to all active slaves.

Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ip: iplink_bridge: Support fdb_local_vlan_0

Add support for the new bridge option BR_BOOLOPT_FDB_LOCAL_VLAN_0.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit:
e835faaed2f8: ("net/mlx5: Expose uar access and odp page fault counters")

Signed-off-by: David Ahern <dsahern@kernel.org>

v6.17.0

uapi: update to 6.17

Some last minute changes to mptcp in 6.17

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man8: tc: fix incorrect long FORMAT identifier for json

Signed-off-by: Lieuwe Rooijakkers <lieuwerooijakkers@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip: fix minor style issue

Replace 'const char*' with 'const char *'.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

rdma: fix minor style issue

Replace 'const char*' with 'const char *' to be consistent.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

iplink_can: fix coding style for pointer format

checkpatch.pl complains about the pointer symbol * being attached to the
type instead of being attached to the variable:

  ERROR: "foo* bar" should be "foo *bar"
  #85: FILE: ip/iplink_can.c:85:
  +        const char* name)

  ERROR: "foo* bar" should be "foo *bar"
  #93: FILE: ip/iplink_can.c:93:
  +static void print_ctrlmode(enum output_type t, __u32 flags, const char* key)

Fix those two warnings.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

mptcp: fix event attributes type

The 'backup' and 'error' attributes are unsigned.

Even if, for the moment, >2^7 values are not expected, they should be
printed as unsigned (%u) and not as signed (%d).

Fixes: ff619e4f ("mptcp: add support for event monitoring")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Add tc entry to MAINTAINERS file

Add Jamal as a maintainer of tc files.

Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Update email address

Use kernel.org address everywhere in place of gmail.

Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

tc: gred: fix debug print

When build with -DDEBUG, tc build fails with:
q_gred.c: In function ‘init_gred’:
q_gred.c:53:17: error: passing argument 2 of ‘fprintf’ from incompatible pointer type [-Wincompatible-pointer-types]
   53 |                 DPRINTF(stderr, "init_gred: invoked with %s\n", *argv);
      |                 ^~~~~~~
      |                 |
      |                 FILE *

This is due to the DPRINTF macro call. Indeed DPRINTF is defined as a
two-args macro when -DDEBUG is used, while it uses 3 args in this call.

Fix it simply dropping the useless first arg.

Fixes: aba5acdfdb34 ("(Logical change 1.3)")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man8: ip-sr: Document that passphrase must be high-entropy

'ip sr hmac set' takes a newline-terminated "passphrase", but it fails
to stretch it. The "passphrase" actually gets used directly as the key.
This makes it difficult to use securely.

I recommend deprecating this command and replacing it with a command
that either stretches the passphrase or explicitly takes a key instead
of a passphrase. But for now, let's at least document this pitfall.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

tc: add dualpi2 scheduler module

DUALPI2 AQM is a combination of the DUALQ Coupled-AQM with a PI2
base-AQM. The PI2 AQM is in turn both an extension and a simplification
of the PIE AQM. PI2 makes quite some PIE heuristics unnecessary, while
being able to control scalable congestion controls like TCP-Prague.
With PI2, both Reno/Cubic can be used in parallel with Prague,
maintaining window fairness. DUALQ provides latency separation between
low latency Prague flows and Reno/Cubic flows that need a bigger queue.

This patch adds support to tc to configure it through its netlink
interface.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Co-developed-by: Olga Albisser <olga@albisser.org>
Signed-off-by: Olga Albisser <olga@albisser.org>
Co-developed-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Co-developed-by: Oliver Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Oliver Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Bob Briscoe <research@bobbriscoe.net>
Co-developed-by: Henrik Steen <henrist@henrist.net>
Signed-off-by: Henrik Steen <henrist@henrist.net>
Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Add get_float_min_max() in lib/utils.c

get_float_min_max() is based on get_float() and does an additional
check within the range strictly between the minimum and maximum values.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'tc-police-64b-burst' into next

Jay Vosburgh says:

====================

In summary, this patchset changes the user space handling of the
tc police burst parameter to permit burst sizes that exceed 4 GB when the
specified rate is high enough that the kernel API for burst can accomodate
such.

Additionally, if the burst exceeds the upper limit of the kernel
API, this is now flagged as an error.  The existing behavior silently
overflows, resulting in arbitrary values passed to the kernel.

In detail, as presently implemented, the tc police burst option
limits the size of the burst to to 4 GB, i.e., UINT_MAX for a 32 bit
unsigned int.  This is a reasonable limit for the low rates common when
this was developed.  However, the underlying implementation of burst is
computed as "time at the specified rate," and for higher rates, a burst
size exceeding 4 GB is feasible without modification to the kernel.

The burst size provided on the command line is translated into a
duration, representing how much time is required at the specified rate to
transmit the given burst size.

This time is calculated in units of "psched ticks," each of which
is 64 nsec[0].  The computed number of psched ticks is sent to the kernel
as a __u32 value.

Because burst is ultimately calculated as a time duration, the
real upper limit for a burst is UINT_MAX psched ticks, i.e.,

UINT_MAX * psched tick duration / NSEC_PER_SEC
(2^32-1) *         64           / 1E9

which is roughly 274.88 seconds (274.8779...).

At low rates, e.g., 5 Mbit/sec, UINT_MAX psched ticks does not
correspond to a burst size in excess of 4 GB, so the above is moot, e.g.,

5Mbit/sec / 8 = 625000 MBytes/sec
625000 * ~274.88 seconds = ~171800000 max burst size, below UINT_MAX

Thus, the burst size at 5Mbit/sec is limited by the __u32 size of
the psched tick field in the kernel API, not the 4 GB limit of the tc
police burst user space API.

However, at higher rates, e.g., 10 Gbit/sec, the burst size is
currently limited by the 4 GB maximum for the burst command line parameter
value, rather than UINT_MAX psched ticks:

10 Gbit/sec / 8 = 1250000000 MBbytes/sec
1250000000 * ~274.88 seconds = ~343600000000, more than UINT_MAX

Here, the maximum duration of a burst the kernel can handle
exceeds 4 GB of burst size.

While the above maximum may be an excessively large burst value,
at 10 Gbit/sec, a 4 GB burst size corresponds to just under 3.5 seconds in
duration:

2^32 bytes / 10 Gbit/sec
2^32 bytes / 1250000000 bytes/sec
equals ~3.43 sec

So, at higher rates, burst sizes exceeding 4 GB are both
reasonable and feasible, up to the UINT_MAX limit for psched ticks.
Enabling this requires changes only to the user space processing of the
burst size parameter in tc.

In principle, the other packet schedulers utilizing psched ticks
for burst sizing, htb and tbf, could be similarly changed to permit larger
burst sizes, but this patch set does not do so.

Separately, for the burst duration calculation overflow (i.e.,
that the number of psched ticks exceeds UINT_MAX), under the current
implementation, one example of overflow is as follows:

# /sbin/tc filter add dev eth0 protocol ip prio 1 parent ffff: handle 1 fw police rate 1Mbit peakrate 10Gbit burst 34375000 mtu 64Kb conform-exceed reclassify

# /sbin/tc -raw filter get dev eth0 ingress protocol ip pref 1 handle 1 fw
filter ingress protocol ip pref 1 fw chain 0 handle 0x1  police 0x1 rate 1Mbit burst 15261b mtu 64Kb [001d1bf8] peakrate 10Gbit action reclassify overhead 0b
        ref 1 bind 1

Note that the returned burst value is 15261b, which does not match
the supplied value of 34375000.  With this patch set applied, this
situation is flagged as an error.

[0] psched ticks are defined in the kernel in include/net/pkt_sched.h:

#define PSCHED_SHIFT                    6
#define PSCHED_TICKS2NS(x)              ((s64)(x) << PSCHED_SHIFT)
#define PSCHED_NS2TICKS(x)              ((x) >> PSCHED_SHIFT)

#define PSCHED_TICKS_PER_SEC            PSCHED_NS2TICKS(NSEC_PER_SEC)

where PSCHED_TICKS_PER_SEC is 15625000.

These values are exported to user space via /proc/net/psched, the
second field being PSCHED_TICKS2NS(1), which at present is 64 (0x40).  tc
uses this value to compute its internal "tick_in_usec" variable containing
the number of psched ticks per usec (15.625) used for the psched tick
computations.

Lastly, note that PSCHED_SHIFT was previously 10, and changed to 6
in commit a4a710c4a7490 in 2009.  I have not tested backwards
compatibility of these changes with kernels of that era.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

tc/police: enable use of 64 bit burst parameter

Modify tc police to permit burst sizes up to the limit of the
kernel API, which may exceed 4 GB of burst size at higher rates.

As presently implemented, the tc police burst option limits the
size of the burst to 4 GB in size.  This is a reasonable limit for the
rates common when this was developed.  However, the underlying
implementation of burst is expressed in terms of time at the specified
rate, and for higher rates, a burst size exceeding 4 GB is feasible
without modification to the kernel.

The kernel API specifies the burst size as the number of "psched
ticks" needed to send the burst at the specified rate.  As each psched
tick is 64 nsec, the actual kernel limit on burst size is approximately
274.88 seconds (UINT_MAX * 64 / NSEC_PER_SEC).

For example, at a rate of 10 Gbit/sec, the current 4 GB size limit
corresponds to just under 3.5 seconds.

Additionally, overflows (burst values that exceed UINT_MAX psched
ticks) are now correctly detected, and flagged as an error, rather than
passing arbitrary psched tick values to the kernel.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

tc: Expand tc_calc_xmittime, tc_calc_xmitsize to u64

In preparation for accepting 64-bit burst sizes, modify
tc_calc_xmittime and tc_calc_xmitsize to handle 64-bit values.

tc_calc_xmittime continues to return a 32-bit value, as its range
is limited by the kernel API, but overflow is now detected and the return
value is limited to UINT_MAX.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

tc: Add get_size64 and get_size64_and_cell

In preparation for accepting 64 bit burst sizes, create 64-bit
versions of get_size and get_size_and_cell. The 32-bit versions become
wrappers around the 64-bit versions.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

lib: Update backend of print_size to accept 64 bit size

In preparation for accepting 64 bit burst sizes, modify
sprint_size, the formatting function behind print_size, to accept __u64 as
its size parameter. Also include a "Gb" size category.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge remote-tracking branch 'main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

iplink: bond_slave: add support for actor_port_prio

Add support for the actor_port_prio option for bond slaves.
This per-port priority can be used by the bonding driver in ad_select to
choose the higher-priority aggregator during failover.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit:
5adf6f2b9972: ("Merge branch 'ipv4-icmp-fix-source-ip-derivation-in-presence-of-vrfs'")

Signed-off-by: David Ahern <dsahern@kernel.org>

scripts: Add uapi header import script

Add a script to automate importing Linux UAPI headers from kernel source.
The script handles dependency resolution and creates a commit with proper
attribution, similar to the ethtool project approach.

Usage:
$ LINUX_GIT="$LINUX_PATH" iproute2-import-uapi [commit]

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

tc: gred: fix debug print

When build with -DDEBUG, tc build fails with:
q_gred.c: In function ‘init_gred’:
q_gred.c:53:17: error: passing argument 2 of ‘fprintf’ from incompatible pointer type [-Wincompatible-pointer-types]
   53 |                 DPRINTF(stderr, "init_gred: invoked with %s\n", *argv);
      |                 ^~~~~~~
      |                 |
      |                 FILE *

This is due to the DPRINTF macro call. Indeed DPRINTF is defined as a
two-args macro when -DDEBUG is used, while it uses 3 args in this call.

Fix it simply dropping the useless first arg.

Fixes: aba5acdfdb34 ("(Logical change 1.3)")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>

man8: ip-sr: Document that passphrase must be high-entropy

'ip sr hmac set' takes a newline-terminated "passphrase", but it fails
to stretch it. The "passphrase" actually gets used directly as the key.
This makes it difficult to use securely.

I recommend deprecating this command and replacing it with a command
that either stretches the passphrase or explicitly takes a key instead
of a passphrase. But for now, let's at least document this pitfall.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Move get_float() from ip/iplink_can.c to lib/utils.c

No functional change.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

tc: add dualpi2 scheduler module

DUALPI2 AQM is a combination of the DUALQ Coupled-AQM with a PI2
base-AQM. The PI2 AQM is in turn both an extension and a simplification
of the PIE AQM. PI2 makes quite some PIE heuristics unnecessary, while
being able to control scalable congestion controls like TCP-Prague.
With PI2, both Reno/Cubic can be used in parallel with Prague,
maintaining window fairness. DUALQ provides latency separation between
low latency Prague flows and Reno/Cubic flows that need a bigger queue.

This patch adds support to tc to configure it through its netlink
interface.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Co-developed-by: Olga Albisser <olga@albisser.org>
Signed-off-by: Olga Albisser <olga@albisser.org>
Co-developed-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Co-developed-by: Oliver Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Oliver Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Bob Briscoe <research@bobbriscoe.net>
Co-developed-by: Henrik Steen <henrist@henrist.net>
Signed-off-by: Henrik Steen <henrist@henrist.net>
Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>

Add get_float_min_max() in lib/utils.c

get_float_min_max() is based on get_float() and does an additional
check within the range strictly between the minimum and maximum values.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

uapi: update kernel headers

Add net_shaper.h and updates to bpf.h and capability.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip: ipmaddr.c: Fix possible integer underflow in read_igmp()

Static analyzer pointed out a potential error:

Possible integer underflow: left operand is tainted. An integer underflow
may occur due to arithmetic operation (unsigned subtraction) between variable
'len' and value '1', when 'len' is tainted { [0, 18446744073709551615] }

The fix adds a check for 'len == 0' before accessing the last character of
the name, and skips the current line in such cases to avoid the underflow.

Reported-by: SVACE static analyzer
Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

misc: fix memory leak in ifstat.c

A memory leak was detected by the static analyzer SVACE in the function
get_nlmsg_extended(). The issue occurred when parsing extended interface
statistics failed due to a missing nested attribute. In this case,
memory allocated for 'n->name' via strdup() was not freed before returning,
resulting in a leak.

The fix adds an explicit 'free(n->name)' call before freeing the containing
structure in the error path.

Reported-by: SVACE static analyzer
Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

misc: ss.c: fix logical error in main function

In the line if (!dump_tcpdiag) { there was a logical error
in checking the descriptor, which the static analyzer complained
about (this action is always false)

fixed by replacing !dump_tcpdiag with !dump_fp

Reported-by: SVACE static analyzer
Signed-off-by: Anton Moryakov <ant.v.moryakov@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: fdb: Add support for FDB activity notification control

Add support for FDB activity notification control [1].

Users can use this to enable activity notifications on a new FDB entry
that was learned on an ES (Ethernet Segment) peer and mark it as locally
inactive:

# bridge fdb add 00:11:22:33:44:55 dev bond1 master static activity_notify inactive
$ bridge -d fdb get 00:11:22:33:44:55 br br1
00:11:22:33:44:55 dev bond1 activity_notify inactive master br1 static
$ bridge -d -j -p fdb get 00:11:22:33:44:55 br br1
[ {
         "mac": "00:11:22:33:44:55",
         "ifname": "bond1",
         "activity_notify": true,
         "inactive": true,
         "flags": [ ],
         "master": "br1",
         "state": "static"
     } ]

User space will receive a notification when the entry becomes active and
the control plane will be able to mark the entry as locally active.

It is also possible to enable activity notifications on an existing
dynamic entry:

$ bridge -d -s -j -p fdb get 00:aa:bb:cc:dd:ee br br1
[ {
         "mac": "00:aa:bb:cc:dd:ee",
         "ifname": "bond1",
         "used": 8,
         "updated": 8,
         "flags": [ ],
         "master": "br1",
         "state": ""
     } ]
# bridge fdb replace 00:aa:bb:cc:dd:ee dev bond1 master static activity_notify norefresh
$ bridge -d -s -j -p fdb get 00:aa:bb:cc:dd:ee br br1
[ {
         "mac": "00:aa:bb:cc:dd:ee",
         "ifname": "bond1",
         "activity_notify": true,
         "used": 3,
         "updated": 23,
         "flags": [ ],
         "master": "br1",
         "state": "static"
     } ]

The "norefresh" keyword is used to avoid resetting the entry's last
active time (i.e., "updated" time).

User space will receive a notification when the entry becomes inactive
and the control plane will be able to mark the entry as locally
inactive. Note that the entry was converted from a dynamic entry to a
static entry to prevent the kernel from automatically deleting it upon
inactivity.

An existing inactive entry can only be marked as active by the kernel or
by disabling and enabling activity notifications:

$ bridge -d fdb get 00:11:22:33:44:55 br br1
00:11:22:33:44:55 dev bond1 activity_notify inactive master br1 static
# bridge fdb replace 00:11:22:33:44:55 dev bond1 master static activity_notify
$ bridge -d fdb get 00:11:22:33:44:55 br br1
00:11:22:33:44:55 dev bond1 activity_notify inactive master br1 static
# bridge fdb replace 00:11:22:33:44:55 dev bond1 master static
# bridge fdb replace 00:11:22:33:44:55 dev bond1 master static activity_notify
$ bridge -d fdb get 00:11:22:33:44:55 br br1
00:11:22:33:44:55 dev bond1 activity_notify master br1 static

Marking an entry as inactive while activity notifications are disabled
does not make sense and will be rejected by the kernel:

# bridge fdb replace 00:11:22:33:44:55 dev bond1 master static inactive
RTNETLINK answers: Invalid argument

[1] https://lore.kernel.org/netdev/20200623204718.1057508-1-nikolay@cumulusnetworks.com/

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit:
fa582ca7e187 ("dpll: zl3073x: Fix build failure")

Signed-off-by: David Ahern <dsahern@kernel.org>

devlink: Update TC bandwidth parsing

Kernel commit 1bbdb81a9836 ("devlink: Fix excessive stack usage in rate TC bandwidth parsing")
introduced a dedicated attribute set (DEVLINK_RATE_TC_ATTR_*) for entries nested
under DEVLINK_ATTR_RATE_TC_BWS.

Update the parser to reflect this change by validating the nested
attributes and sync the UAPI header to include the changes.

Fixes: c83d1477f8b2 ("Add support for 'tc-bw' attribute in devlink-rate")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

v6.16.0

Add support for 'tc-bw' attribute in devlink-rate

Introduce a new attribute 'tc-bw' to devlink-rate, allowing users to
set the bandwidth allocation per traffic class. The new attribute
enables fine-grained QoS configurations by assigning relative bandwidth
shares to each traffic class, supporting more precise traffic shaping,
which helps in achieving more precise bandwidth management across
traffic streams.

Add support for configuring 'tc-bw' via the devlink userspace utility
and parse the 'tc-bw' arguments for accurate bandwidth assignment per
traffic class.

This feature supports 8 traffic classes as defined by the IEEE 802.1Qaz
standard.

Example commands:
- devlink port function rate add pci/0000:08:00.0/group \
tx_share 10Gbit tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0

- devlink port function rate set pci/0000:08:00.0/group \
tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit:
fadd1e6231b1 ("Merge branch 'hv-msi-parent-domain' into main")

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

ip neigh: Add support for "extern_valid" flag

Add support for the recently added "extern_valid" flag that can be used
to indicate to the kernel that a neighbor entry was learned and
determined to be valid externally. The kernel will not remove or
invalidate the entry, but it can probe the entry and notify user space
when the entry becomes reachable. The kernel will return the entry to
stale state if it did not receive a confirmation after probing the
entry.

Example usage and output:

# ip neigh add 192.0.2.1 nud none dev br0.10 extern_valid
Error: Cannot create externally validated neighbor with an invalid state.
# ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid
$ ip neigh show dev br0.10
192.0.2.1 lladdr 00:11:22:33:44:55 extern_valid STALE
$ ip -j -p neigh show dev br0.10
[ {
         "dst": "192.0.2.1",
         "lladdr": "00:11:22:33:44:55",
         "extern_valid": null,
         "state": [ "STALE" ]
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit:
e96ee511c906 ("net: tulip: Rename PCI driver struct to end in _driver")

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'bridge-mcast-state-vlan' into next

Fabian Pfitzner says:

====================

Dump the multicast querier state per vlan.
This commit is almost identical to [1].

The querier state can be seen with:

bridge -d vlan global

The options for vlan filtering and vlan mcast snooping have to be enabled
in order to see the output:

ip link set [dev] type bridge mcast_vlan_snooping 1 vlan_filtering 1

The querier state shows the following information for IPv4 and IPv6
respectively:

1) The ip address of the current querier in the network. This could be
ourselves or an external querier.
2) The port on which the querier was seen
3) Querier timeout in seconds

[1] https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=16aa4494d7fc6543e5e92beb2ce01648b79f8fa2

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: refactor bridge mcast querier function

Make code more readable and consistent with other functions.

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Fabian Pfitzner <f.pfitzner@pengutronix.de>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: dump mcast querier per vlan

Dump the multicast querier state per vlan.
This commit is almost identical to [1].

The querier state can be seen with:

bridge -d vlan global

The options for vlan filtering and vlan mcast snooping have to be enabled
in order to see the output:

ip link set [dev] type bridge mcast_vlan_snooping 1 vlan_filtering 1

The querier state shows the following information for IPv4 and IPv6
respectively:

1) The ip address of the current querier in the network. This could be
ourselves or an external querier.
2) The port on which the querier was seen
3) Querier timeout in seconds

[1] https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=16aa4494d7fc6543e5e92beb2ce01648b79f8fa2

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Fabian Pfitzner <f.pfitzner@pengutronix.de>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: move mcast querier dumping code into a shared function

Put mcast querier dumping code into a shared function. This function
will be called from the bridge utility in a later patch.

Adapt the code such that the vtb parameter is used
instead of tb[IFLA_BR_MCAST_QUERIER_STATE].

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Fabian Pfitzner <f.pfitzner@pengutronix.de>
Signed-off-by: David Ahern <dsahern@kernel.org>

uapi: update from 6.16-rc4

MPTCP comments changed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

bond: fix stack smash in xstats

Building with stack smashing detection finds an off by one
in the bond xstats attribute parsing.

$ ip link xstats type bond dev bond0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
bond0
                    LACPDU Rx 0
                    LACPDU Tx 0
                    LACPDU Unknown type Rx 0
                    LACPDU Illegal Rx 0
                    Marker Rx 0
                    Marker Tx 0
                    Marker response Rx 0
                    Marker response Tx 0
                    Marker unknown type Rx 0
*** stack smashing detected ***: terminated

Program received signal SIGABRT, Aborted.

Reported-by: z30015464 <zhongxuan2@huawei.com>
Fixes: 440c5075d662 ("ip: bond: add xstats support")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>

ip: VXLAN: Add support for IFLA_VXLAN_MC_ROUTE

The flag controls whether underlay packets should be MC-routed or (default)
sent to the indicated physical netdevice.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit:
14966a8df77e ("selftest: add selftest for anycast notifications")

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'bridge-vlan-stats' into next

Petr Machata  says:

====================

ip stats displays bridge-related multicast and STP stats, but not VLAN
stats. There is code for requesting, decoding and formatting these stats
accessible through `bridge -s vlan', but the `ip stats' suite lacks it. In
this patchset, extract the `bridge vlan' code to a generally accessible
place and extend `ip stats' to use it.

This reuses the existing display and JSON format, and plugs it into the
existing `ip stats' hierarchy:

# ip stats show dev v2 group xstats_slave subgroup bridge suite vlan
2: v2: group xstats_slave subgroup bridge suite vlan
                   10
                     RX: 3376 bytes 50 packets
                     TX: 2824 bytes 44 packets

                   20
                     RX: 684 bytes 7 packets
                     TX: 0 bytes 0 packets

# ip -j -p stats show dev v2 group xstats_slave subgroup bridge suite vlan
[ {
         "ifindex": 2,
         "ifname": "v2",
         "group": "xstats_slave",
         "subgroup": "bridge",
         "suite": "vlan",
         "vlans": [ {
                 "vid": 10,
                 "rx_bytes": 3376,
                 "rx_packets": 50,
                 "tx_bytes": 2824,
                 "tx_packets": 44
             },{
                 "vid": 20,
                 "rx_bytes": 684,
                 "rx_packets": 7,
                 "tx_bytes": 0,
                 "tx_packets": 0
             } ]
     } ]

Similarly for the master stats:

# ip stats show dev br1 group xstats subgroup bridge suite vlan
211: br1: group xstats subgroup bridge suite vlan
                   10
                     RX: 3376 bytes 50 packets
                     TX: 2824 bytes 44 packets

                   20
                     RX: 684 bytes 7 packets
                     TX: 0 bytes 0 packets

# ip -j -p stats show dev br1 group xstats subgroup bridge suite vlan
[ {
         "ifindex": 211,
         "ifname": "br1",
         "group": "xstats",
         "subgroup": "bridge",
         "suite": "vlan",
         "vlans": [ {
                 "vid": 10,
                 "flags": [ ],
                 "rx_bytes": 3376,
                 "rx_packets": 50,
                 "tx_bytes": 2824,
                 "tx_packets": 44
             },{
                 "vid": 20,
                 "flags": [ ],
                 "rx_bytes": 684,
                 "rx_packets": 7,
                 "tx_bytes": 0,
                 "tx_packets": 0
             } ]
     } ]

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

ip: iplink_bridge: Support bridge VLAN stats in `ip stats'

Add support for displaying bridge VLAN statistics in `ip stats'.
Reuse the existing `bridge vlan' display and JSON format:

# ip stats show dev v2 group xstats_slave subgroup bridge suite vlan
2: v2: group xstats_slave subgroup bridge suite vlan
                   10
                     RX: 3376 bytes 50 packets
                     TX: 2824 bytes 44 packets

                   20
                     RX: 684 bytes 7 packets
                     TX: 0 bytes 0 packets

# ip -j -p stats show dev v2 group xstats_slave subgroup bridge suite vlan
[ {
         "ifindex": 2,
         "ifname": "v2",
         "group": "xstats_slave",
         "subgroup": "bridge",
         "suite": "vlan",
         "vlans": [ {
                 "vid": 10,
                 "rx_bytes": 3376,
                 "rx_packets": 50,
                 "tx_bytes": 2824,
                 "tx_packets": 44
             },{
                 "vid": 20,
                 "rx_bytes": 684,
                 "rx_packets": 7,
                 "tx_bytes": 0,
                 "tx_packets": 0
             } ]
     } ]

Similarly for the master stats:

# ip stats show dev br1 group xstats subgroup bridge suite vlan
211: br1: group xstats subgroup bridge suite vlan
                   10
                     RX: 3376 bytes 50 packets
                     TX: 2824 bytes 44 packets

                   20
                     RX: 684 bytes 7 packets
                     TX: 0 bytes 0 packets

# ip -j -p stats show dev br1 group xstats subgroup bridge suite vlan
[ {
         "ifindex": 211,
         "ifname": "br1",
         "group": "xstats",
         "subgroup": "bridge",
         "suite": "vlan",
         "vlans": [ {
                 "vid": 10,
                 "flags": [ ],
                 "rx_bytes": 3376,
                 "rx_packets": 50,
                 "tx_bytes": 2824,
                 "tx_packets": 44
             },{
                 "vid": 20,
                 "flags": [ ],
                 "rx_bytes": 684,
                 "rx_packets": 7,
                 "tx_bytes": 0,
                 "tx_packets": 0
             } ]
     } ]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

lib: bridge: Add a module for bridge-related helpers

`ip stats' displays a range of bridge_slave-related statistics, but not
the VLAN stats. `bridge vlan' actually has code to show these. Extract the
code to libutil so that it can be reused between the bridge and ip stats
tools.

Rename them reasonably so as not to litter the global namespace.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

ip: ip_common: Drop ipstats_stat_desc_xstats::inner_max

After the previous patch, this field is not read anymore. Drop it.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

ip: ipstats: Iterate all xstats attributes

ipstats_stat_desc_show_xstats() operates by first parsing the attribute
stream into a type-indexed table, and then accessing the right attribute.
But bridge VLAN stats are given as several BRIDGE_XSTATS_VLAN attributes,
one per VLAN. With the above approach to parsing, only one of these
attributes would be shown. Instead, iterate the stream of attributes and
call the show_cb for each one with a matching type.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

uapi: update headers to 6.16-rc1

Change to bpf.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Parse FQ band weights correctly

Currently, NEXT_ARG() is called twice resulting in the first
weight being skipped. This results in the following errors:

$ sudo tc qdisc replace dev enP64183s1 root fq weights 589824 196608 65536
Not enough elements in weights

$ sudo tc qdisc replace dev enP64183s1 root fq weights 589824 196608 65536 nopacing
Illegal "weights" element, positive number expected

Fixes: 567eb4e41045 ("tc: fq: add TCA_FQ_WEIGHTS handling")
Signed-off-by: Hemanth Malla <vmalla@microsoft.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

uapi: update headers

Update headers from 6.16 pre rc1.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip: support setting multiple features

Commit a043bea75002 ("ip route: add support for TCP usec TS") added
support for tcp_usec_ts but the existing code was not adjusted
to handle multiple features in the same invocation:

$ ip route add .. dev .. features tcp_usec_ts ecn
Error: either "to" is duplicate, or "ecn" is garbage.

The code exits the while loop as soon as it encounters any feature,
make it more flexible. Tested with the following:

$ ip route add .. dev .. features tcp_usec_ts ecn
$ ip route add .. dev .. features tcp_usec_ts ecn quickack 1

Cc: Stephen Hemminger <stephen@networkplumber.org>
Fixes: a043bea75002 ("ip route: add support for TCP usec TS")
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

ip: filter by group before printing

Filter the output using the requested group, if necessary.

This avoids to print an empty JSON object for each existing item
not matching the group filter when the --json option is used.

Before:
$ ip --json address list group test
[{},{},{},{},{},{},{},{},{},{},{},{}]

After:
$ ip --json address list group test
[]

Signed-off-by: Jean Thomas <jean.thomas@wifirst.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

v6.15.0

iproute2: bugfix - restore ip monitor backward compatibility.

The current ip monitor implementation fails on older kernels that lack
newer RTNLGRP_* definitions. As ip monitor is expected to maintain
backward compatibility, this commit updates the code to check if errno
is not EINVAL when rtnl_add_nl_group() fails. This change restores ip
monitor's backward compatibility with older kernel versions.

Cc: David Ahern <dsahern@kernel.org>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Reported-by: Adel Belhouane <bugs.a.b@free.fr>
Fixes: 19514606dce3 ("iproute2: add 'ip monitor maddress' support")
Closes: https://lore.kernel.org/netdev/CADXeF1GgJ_1tee3hc7gca2Z21Lyi3mzxq52sSfMg3mFQd2rGWQ@mail.gmail.com/T/#t
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Tested-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

uapi: update bpf.h

Minor comment from upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

ip ntable: Add support for "mcast_reprobes" parameter

Kernel commit 8da86466b837 ("net: neighbour: Add mcast_resolicit to
configure the number of multicast resolicitations in PROBE state.")
added the "NDTPA_MCAST_REPROBES" netlink attribute that allows user
space to set / get the number of multicast probes that are sent by the
kernel in PROBE state after unicast probes did not solicit a response.

Add support for this parameter in iproute2.

Example usage and output:

$ ip ntable show dev dummy0 name arp_cache
inet arp_cache
     dev dummy0
     refcnt 1 reachable 43430 base_reachable 30000 retrans 1000
     gc_stale 60000 delay_probe 5000 queue 101
     app_probes 0 ucast_probes 3 mcast_probes 3 mcast_reprobes 0
     anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000

# ip ntable change name arp_cache dev dummy0 mcast_reprobes 5
$ ip ntable show dev dummy0 name arp_cache
inet arp_cache
     dev dummy0
     refcnt 1 reachable 43430 base_reachable 30000 retrans 1000
     gc_stale 60000 delay_probe 5000 queue 101
     app_probes 0 ucast_probes 3 mcast_probes 3 mcast_reprobes 5
     anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000

$ ip -j -p ntable show dev dummy0 name arp_cache
[ {
         "family": "inet",
         "name": "arp_cache",
         "dev": "dummy0",
         "refcnt": 1,
         "reachable": 43430,
         "base_reachable": 30000,
         "retrans": 1000,
         "gc_stale": 60000,
         "delay_probe": 5000,
         "queue": 101,
         "app_probes": 0,
         "ucast_probes": 3,
         "mcast_probes": 3,
         "mcast_reprobes": 5,
         "anycast_delay": 1000,
         "proxy_delay": 800,
         "proxy_queue": 64,
         "locktime": 1000
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

iplink_bridge: Add mdb_offload_fail_notification

Add mdb_offload_fail_notification option support.

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

bridge: mdb: Support offload failed flag

Add support for the MDB_FLAGS_OFFLOAD_FAILED flag to indicate that
an attempt to offload an mdb entry to switchdev has failed.

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit
45bd443bfd86 ("net: 802: Remove unused p8022 code")

Signed-off-by: David Ahern <dsahern@kernel.org>

nstat: NULL Dereference when no entries specified

The NULL Pointer Dereference vulnerability happens in load_ugly_table(), misc/nstat.c, in the latest version of iproute2.
The vulnerability can be triggered by:
1. db is set to NULL at struct nstat_ent *db = NULL;
2. n is set to NULL at n = db;
3. NULL dereference of variable n happens at sscanf(p+1, "%llu", &n->val) != 1

Signed-off-by: ZiAo Li <23110240084@m.fudan.edu.cn>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

MAINTAINERS: update bridge entry

Sync with the kernel and update the bridge entry with the current bridge
maintainers. Roopa decided to withdraw and Ido has agreed to step in.

Link: https://lore.kernel.org/netdev/20250314100631.40999-1-razor@blackwall.org/
CC: Roopa Prabhu <roopa@nvidia.com>
CC: Ido Schimmel <idosch@nvidia.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: David Ahern <dsahern@kernel.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>

uapi: update from 6.15-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'color' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

color: Do not use dark blue in dark-background palette

In GNOME Terminal's default dark colour schemes, the default (dark)
blue on a black background is barely readable. Light blue is
significantly more readable to me, and is also easily readable on a
white background.

In Konsole, rxvt, and xterm, I can see little if any difference
between dark and light blue in the default dark colour schemes.

So replace dark blue with light blue in the dark-background palette.

Signed-off-by: Ben Hutchings <benh@debian.org>

color: Assume background is dark if unknown

We rely on the COLORFGBG environment variable to tell us whether the
background is dark.  This variable is set by Konsole and rxvt but not
by GNOME Terminal or xterm.  This means we use the wrong set of
colours when GNOME Terminal or xterm is configured with a dark
background.

It appears to me that the dark-background colour palette works better
on a light background than vice versa.  So it is better to assume a
dark background if we cannot find this out from $COLORFGBG.

- Change the initial value of is_dark_bg to 1.
- In set_color_palette(). conditinally set is_dark_bg to 0 with an
  inverted test of the colour.

Signed-off-by: Ben Hutchings <benh@debian.org>

ip: display the 'netns-immutable' property

The user needs to specify '-details' to have it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

Update kernel headers

Update kernel headers to commit
1a9239bb4253 ("Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge ../iproute2-next

v6.14.0

color: Handle NO_COLOR environment variable in default_color_opt()

The NO_COLOR environment variable is a widely supported way for users
to disable coloured text output. See <https://no-color.org/>. In
case iproute2 is configured to use colours by default, allow this to
be overridden by setting NO_COLOR.

This is done in default_color_opt() so that colours can still be
explicitly enabled with a command-line option.

Signed-off-by: Ben Hutchings <benh@debian.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

color: Introduce and use default_color_opt() function

As a preparatory step for supporting the NO_COLOR environment
variable, replace the direct use of CONF_COLOR with a
default_color_opt() function which initially returns CONF_COLOR.

Signed-off-by: Ben Hutchings <benh@debian.org>
Signed-off-by: David Ahern <dsahern@kernel.org>

Merge remote-tracking branch 'main/main' into next

Signed-off-by: David Ahern <dsahern@kernel.org>

Merge branch 'rdma-optional-counters' into next

Patrisious Haddad  says:

====================

Add optional-counters binding support together with new packets/bytes
counters. Previously optional-counters were on a per link basis, this
series allows users to bind optional-counters to a specific counter,
which allows tracking optional-counter over a specific QP group.

The support is added for both binding modes, automatic and manual,
in both cases the bound optional counters are those that are currently
configured over the link when trying to bind the QP.

In addition introduce four new optional-counters :
rdma_tx_bytes, rdma_tx_packets, rdma_rx_bytes, rdma_rx_packets
That just as their name implies allow tracking RDMA egress and ingress
traffic.

This is exposed to users through the iproute2 package which needs to be
updated as well to provide the support for this feature.

Example commands:
- rdma stat set link rocep8s0f0/1 optional-counters
  rdma_tx_bytes,rdma_rx_packets
        Enables rdma_tx_bytes and rdma_rx_packets optional-counters over
        the link.

- rdma stat qp set link rocep8s0f0/1 auto type on optional-counters on
        Enabled link automatic counter binding for QPs of same type,
        with optional-counter binding support.

- rdma stat qp bind link rocep8s0f0/1 lqpn 134
        Manually bind QP number 134 to all available counters.

- rdma stat qp bind link rocep8s0f0/1 lqpn 134 cntn 4
        Manually bind QP number 134 to counter number 4 depending on its
        configured counters.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>

rdma: Add optional-counter option to rdma stat bind commands

Add a new optional filter named optional-counter to commands:
rdma stat qp set link [link_name] auto

The new filter value can be either on or off and it must be the last
provided filter in the command, not providing it would be the same as off.

It indicates that when binding counters to a QP we also want the
currently enabled optional-counters on the link to be bound as well.

In addition Adjust rdma statistic man page to reflect the new
optional-counter changes.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

rdma: update uapi headers

Update rdma_netlink.h file upto kernel commit da3711074f52
("RDMA/core: Add support to optional-counters binding configuration")

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>

tc: nat: ffs should operation on host byte ordered data

In print_nat the mask length is calculated as

len = ffs(sel->mask);
len = len ? 33 - len : 0;

The mask is stored in network byte order, it should be converted
to host byte order before calculating first bit set.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>