git.ipfire.org Git - thirdparty/linux.git/log

can: netlink: refactor CAN_CTRLMODE_TDC_{AUTO,MANUAL} flag reset logic

CAN_CTRLMODE_TDC_AUTO and CAN_CTRLMODE_TDC_MANUAL are mutually
exclusive. This means that whenever the user switches from auto to
manual mode (or vice versa), the other flag which was set previously
needs to be cleared.

Currently, this is handled with a masking operation. It can be done in
a simpler manner by clearing any of the previous TDC flags before
copying netlink attributes. The code becomes easier to understand and
will make it easier to add the new upcoming CAN XL flags which will
have a similar reset logic as the current TDC flags.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250923-canxl-netlink-prep-v4-7-e720d28f66fe@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: netlink: add can_validate_databittiming()

Factorise the databittiming validation out of can_validate() and move
it in the new add can_validate_databittiming() function. Also move
can_validate()'s comment because it is specific to CAN FD. This is a
preparation patch for the introduction of CAN XL as this databittiming
validation will be reused later on.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250923-canxl-netlink-prep-v4-6-e720d28f66fe@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: netlink: add can_validate_tdc()

Factorise the TDC validation out of can_validate() and move it in the
new can_validate_tdc() function. This is a preparation patch for the
introduction of CAN XL because this TDC validation will be reused
later on.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250923-canxl-netlink-prep-v4-5-e720d28f66fe@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: netlink: refactor can_validate_bittiming()

Whenever can_validate_bittiming() is called, it is always preceded by
some boilerplate code which was copy pasted all over the place. Move
that repeated code directly inside can_validate_bittiming().

Finally, the mempcy() is not needed: the nla attributes are four bytes
aligned which is just enough for struct can_bittiming. Add a
static_assert() to document that the alignment is correct and just use
the pointer returned by nla_data() as-is.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250923-canxl-netlink-prep-v4-4-e720d28f66fe@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: netlink: document which symbols are FD specific

The CAN XL netlink interface will also have data bitrate and TDC
parameters. The current FD parameters do not have a prefix in their
names to differentiate them.

Because the netlink interface is part of the UAPI, it is unfortunately
not feasible to rename the existing symbols to add an FD_ prefix. The
best alternative is to add a comment for each of the symbols to notify
the reader of which parts are CAN FD specific.

While at it, fix a typo: transiver -> transceiver.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250923-canxl-netlink-prep-v4-3-e720d28f66fe@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: dev: make can_get_relative_tdco() FD agnostic and move it to bittiming.h

can_get_relative_tdco() needs to access can_priv->fd making it
specific to CAN FD. Change the function parameter from struct can_priv
to struct data_bittiming_params. This way, the function becomes CAN FD
agnostic and can be reused later on for the CAN XL TDC.

Now that we dropped the dependency on struct can_priv, also move
can_get_relative_tdco() back to bittiming.h where it was meant to
belong to.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250923-canxl-netlink-prep-v4-2-e720d28f66fe@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: dev: move struct data_bittiming_params to linux/can/bittiming.h

In commit b803c4a4f788 ("can: dev: add struct data_bittiming_params to
group FD parameters"), struct data_bittiming_params was put into
linux/can/dev.h.

This structure being a collection of bittiming parameters, on second
thought, bittiming.h is actually a better location. This way, users of
struct data_bittiming_params will not have to forcefully include
linux/can/dev.h thus removing some complexity and reducing the risk of
circular dependencies in headers.

Move struct data_bittiming_params from linux/can/dev.h to
linux/can/bittiming.h.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250923-canxl-netlink-prep-v4-1-e720d28f66fe@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: rework the CAN MTU logic (CAN XL preparation step 2/3)"

Vincent Mailhol <mailhol@kernel.org> says:

The CAN MTU logic is currently broken. can_change_mtu() will update
both the MTU and the CAN_CTRLMODE_FD flag.

Back then, commit bc05a8944a34 ("can: allow to change the device mtu
for CAN FD capable devices") stated that:

  The configuration can be done either with the 'fd { on | off }'
  option in the 'ip' tool from iproute2 or by setting the CAN
  netdevice MTU to CAN_MTU (16) or to CANFD_MTU (72).

Link: https://git.kernel.org/torvalds/c/bc05a8944a34
The problem is that after doing for example:

  $ ip link set can0 mtu 72 bittiming 500000

on a CAN FD interface, we are left with a device on which CAN FD is
enabled but which does not have the FD databittiming parameters
configured.

The same goes on when setting the mtu back to 16:

  ip link set can0 type can bitrate 500000 fd on dbitrate 5000000
  ip link set can0 mtu 16

The device is now in Classical CAN mode but iproute2 is still
reporting the databittiming values (although this time, the issue
seems less critical as it is only a reporting problem).

The only way to resolve the problem and bring the device back to a
coherent state is to call again the netlink interface using the
"fd on" or "fd off" options.

The idea of being able to infer the CAN_CTRLMODE_FD flag from the MTU
value is just incorrect for physical devices. Note that this logic
remains valid on virtual interfaces (vcan and vxcan) because those do
not have control mode flags and thus no conflict occurs.

This series reworks the CAN MTU logic. The goal is to always maintain
a coherent state between the MTU and the control mode flags as listed
in below table:

fd off, xl off fd on, xl off fd any, xl on
  ---------------------------------------------------------------------------
  default mtu CAN_MTU CANFD_MTU CANXL_MTU
  min mtu CAN_MTU CANFD_MTU CANXL_MIN_MTU
  max mtu CAN_MTU CANFD_MTU CANXL_MAX_MTU

In order to switch between one column to another, the user must use
the fd/xl on/off flags. Directly modifying the MTU from one column to
the other is not permitted any more.

The CAN XL is not yet supported at the moment, so the last column is
just given as a reference to better understand what is coming up. This
series will just implement the first two columns.

While doing the rewrite, the logic is adjusted to reuse as much as
possible the net core infrastructure. By populating:

  net_device->min_mtu

and

  net_device->max_mtu

the net core infrastructure will automatically:

  1. validate that the user's inputs are in range.

  2. report those min and max MTU values through the netlink
     interface.

Point 1. will allow us to get rid of the can_change_mtu() in a near
future for all the physical devices and point 2. allows the end user
to see the valid MTU range by doing a:

  $ ip --details link show can0

Finally, because using the net core, it will be possible after the
removal of can_change_mtu() to modify the MTU while the device is up.
As stated previously, the only modifications allowed will be within
the MTU range of a given CAN protocol. So for Classical CAN and CAN
FD, the MTU is fixed to, respectively, CAN_MTU and CANFD_MTU. For the
upcoming CAN XL, the user will be able to change the MTU to anything
between CANXL_MIN_MTU and CANXL_MAX_MTU even if the device is up.

The first patch of this series annotates the read access on
net_device->mtu. This preparation is needed to prevent any race
condition to occur when modifying the MTU while the device is up.

The second patch is another preparation change which moves
can_set_static_ctrlmode() from dev.h to dev.c.

The third patch populates the MTU minimum and maximum value.

The fourth patch is just a clean-up to remove the old
can_change_mtu().

The fourth and last patch comes as a bonus content and modifies the
default MTU of the vcan and vxcan so that CAN XL is on by default.

Note that after this series, the old can_change_mtu() becomes
useless. That function can not yet be removed because some pending
changes from other maintainers' trees still depend on it. It will be
removed in the next development window once all those changes reach
net-next.

Link: https://patch.msgid.link/20250923-can-fix-mtu-v3-0-581bde113f52@kernel.org
[mkl: squashed fixup patch into 3]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: enable CAN XL for virtual CAN devices by default

In commit 97edec3a11cf ("can: enable CAN FD for virtual CAN devices by
default"), vcan and vxcan default MTU was set to CANFD_MTU by default.
The reason was that users were confused on how to activate CAN FD on
virtual interfaces.

Following the introduction of CAN XL, the same logic should be
applied. Set the MTU to CANXL_MTU by default.

The users who really wish to use a Classical CAN only or a CAN FD
virtual device can do respectively:

$ ip link set vcan0 mtu 16

or

$ ip link set vcan0 mtu 72

to force the old behaviour.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250923-can-fix-mtu-v3-4-581bde113f52@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: populate the minimum and maximum MTU values

By populating:

  net_device->min_mtu

and

  net_device->max_mtu

the net core infrastructure will automatically:

  1. validate that the user's inputs are in range.

  2. report those min and max MTU values through the netlink
     interface.

Add can_set_default_mtu() which sets the default mtu value as well as
the minimum and maximum values. The logic for the default mtu value
remains unchanged:

  - CANFD_MTU if the device has a static CAN_CTRLMODE_FD.

  - CAN_MTU otherwise.

Call can_set_default_mtu() each time the CAN_CTRLMODE_FD is modified.
This will guarantee that the MTU value is always consistent with the
control mode flags.

With this, the checks done in can_change_mtu() become fully redundant
and will be removed in an upcoming change and it is now possible to
confirm the minimum and maximum MTU values on a physical CAN interface
by doing:

  $ ip --details link show can0

The virtual interfaces (vcan and vxcan) are not impacted by this
change.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250923-can-fix-mtu-v3-3-581bde113f52@kernel.org
[mkl: squashed https://patch.msgid.link/20250924143644.17622-2-mailhol@kernel.org]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: dev: turn can_set_static_ctrlmode() into a non-inline function

can_set_static_ctrlmode() is declared as a static inline. But it is
only called in the probe function of the devices and so does not
really benefit from any kind of optimization.

Transform it into a "normal" function by moving it to

drivers/net/can/dev/dev.c

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250923-can-fix-mtu-v3-2-581bde113f52@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: annotate mtu accesses with READ_ONCE()

As hinted in commit 501a90c94510 ("inet: protect against too small mtu
values."), net_device->mtu is vulnerable to race conditions if it is
written and read without holding the RTNL.

At the moment, all the writes are done while the interface is down,
either in the devices' probe() function or in can_changelink(). So
there are no such issues yet. But upcoming changes will allow to
modify the MTU while the CAN XL devices are up.

In preparation to the introduction of CAN XL, annotate all the
net_device->mtu accesses which are not yet guarded by the RTNL with a
READ_ONCE().

Note that all the write accesses are already either guarded by the
RTNL or are already annotated and thus need no changes.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250923-can-fix-mtu-v3-1-581bde113f52@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: raw: optimize the sizes of struct uniqframe and struct raw_sock"

Vincent Mailhol <mailhol@kernel.org> says:

A few bytes can be shaved out of can raw's struct uniqframe and struct
raw_sock.

Patch #1 reorders struct uniqframe fields to save 8 bytes.

Patch #2 and #3 modify struct raw_sock to use bitfields and to reorder
its fields to save 24 bytes in total.

Link: https://patch.msgid.link/20250917-can-raw-repack-v2-0-395e8b3a4437@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: raw: reorder struct raw_sock's members to optimise packing

struct raw_sock has several holes. Reorder the fields to save 8 bytes.

Statistics before:

  $ pahole --class_name=raw_sock net/can/raw.o
  struct raw_sock {
   struct sock                sk __attribute__((__aligned__(8))); /*     0   776 */

   /* XXX last struct has 1 bit hole */

   /* --- cacheline 12 boundary (768 bytes) was 8 bytes ago --- */
   int                        ifindex;              /*   776     4 */

   /* XXX 4 bytes hole, try to pack */

   struct net_device *        dev;                  /*   784     8 */
   netdevice_tracker          dev_tracker;          /*   792     0 */
   struct list_head           notifier;             /*   792    16 */
   unsigned int               bound:1;              /*   808: 0  4 */
   unsigned int               loopback:1;           /*   808: 1  4 */
   unsigned int               recv_own_msgs:1;      /*   808: 2  4 */
   unsigned int               fd_frames:1;          /*   808: 3  4 */
   unsigned int               xl_frames:1;          /*   808: 4  4 */
   unsigned int               join_filters:1;       /*   808: 5  4 */

   /* XXX 2 bits hole, try to pack */
   /* Bitfield combined with next fields */

   struct can_raw_vcid_options raw_vcid_opts;       /*   809     4 */

   /* XXX 3 bytes hole, try to pack */

   canid_t                    tx_vcid_shifted;      /*   816     4 */
   canid_t                    rx_vcid_shifted;      /*   820     4 */
   canid_t                    rx_vcid_mask_shifted; /*   824     4 */
   int                        count;                /*   828     4 */
   /* --- cacheline 13 boundary (832 bytes) --- */
   struct can_filter          dfilter;              /*   832     8 */
   struct can_filter *        filter;               /*   840     8 */
   can_err_mask_t             err_mask;             /*   848     4 */

   /* XXX 4 bytes hole, try to pack */

   struct uniqframe *         uniq;                 /*   856     8 */

   /* size: 864, cachelines: 14, members: 20 */
   /* sum members: 852, holes: 3, sum holes: 11 */
   /* sum bitfield members: 6 bits, bit holes: 1, sum bit holes: 2 bits */
   /* member types with bit holes: 1, total: 1 */
   /* forced alignments: 1 */
   /* last cacheline: 32 bytes */
  } __attribute__((__aligned__(8)));

...and after:

  $ pahole --class_name=raw_sock net/can/raw.o
  struct raw_sock {
   struct sock                sk __attribute__((__aligned__(8))); /*     0   776 */

   /* XXX last struct has 1 bit hole */

   /* --- cacheline 12 boundary (768 bytes) was 8 bytes ago --- */
   struct net_device *        dev;                  /*   776     8 */
   netdevice_tracker          dev_tracker;          /*   784     0 */
   struct list_head           notifier;             /*   784    16 */
   int                        ifindex;              /*   800     4 */
   unsigned int               bound:1;              /*   804: 0  4 */
   unsigned int               loopback:1;           /*   804: 1  4 */
   unsigned int               recv_own_msgs:1;      /*   804: 2  4 */
   unsigned int               fd_frames:1;          /*   804: 3  4 */
   unsigned int               xl_frames:1;          /*   804: 4  4 */
   unsigned int               join_filters:1;       /*   804: 5  4 */

   /* XXX 2 bits hole, try to pack */
   /* Bitfield combined with next fields */

   struct can_raw_vcid_options raw_vcid_opts;       /*   805     4 */

   /* XXX 3 bytes hole, try to pack */

   canid_t                    tx_vcid_shifted;      /*   812     4 */
   canid_t                    rx_vcid_shifted;      /*   816     4 */
   canid_t                    rx_vcid_mask_shifted; /*   820     4 */
   can_err_mask_t             err_mask;             /*   824     4 */
   int                        count;                /*   828     4 */
   /* --- cacheline 13 boundary (832 bytes) --- */
   struct can_filter          dfilter;              /*   832     8 */
   struct can_filter *        filter;               /*   840     8 */
   struct uniqframe *         uniq;                 /*   848     8 */

   /* size: 856, cachelines: 14, members: 20 */
   /* sum members: 852, holes: 1, sum holes: 3 */
   /* sum bitfield members: 6 bits, bit holes: 1, sum bit holes: 2 bits */
   /* member types with bit holes: 1, total: 1 */
   /* forced alignments: 1 */
   /* last cacheline: 24 bytes */
  } __attribute__((__aligned__(8)));

Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250917-can-raw-repack-v2-3-395e8b3a4437@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: raw: use bitfields to store flags in struct raw_sock

The bound, loopback, recv_own_msgs, fd_frames, xl_frames and
join_filters fields of struct raw_sock just need to store one bit of
information.

Declare all those members as a bitfields of type unsigned int and
width one bit.

Add a temporary variable to raw_setsockopt() and raw_getsockopt() to
make the conversion between the stored bits and the socket interface.

This reduces the size of struct raw_sock by sixteen bytes.

Statistics before:

  $ pahole --class_name=raw_sock net/can/raw.o
  struct raw_sock {
   struct sock                sk __attribute__((__aligned__(8))); /*     0   776 */

   /* XXX last struct has 1 bit hole */

   /* --- cacheline 12 boundary (768 bytes) was 8 bytes ago --- */
   int                        bound;                /*   776     4 */
   int                        ifindex;              /*   780     4 */
   struct net_device *        dev;                  /*   784     8 */
   netdevice_tracker          dev_tracker;          /*   792     0 */
   struct list_head           notifier;             /*   792    16 */
   int                        loopback;             /*   808     4 */
   int                        recv_own_msgs;        /*   812     4 */
   int                        fd_frames;            /*   816     4 */
   int                        xl_frames;            /*   820     4 */
   struct can_raw_vcid_options raw_vcid_opts;       /*   824     4 */
   canid_t                    tx_vcid_shifted;      /*   828     4 */
   /* --- cacheline 13 boundary (832 bytes) --- */
   canid_t                    rx_vcid_shifted;      /*   832     4 */
   canid_t                    rx_vcid_mask_shifted; /*   836     4 */
   int                        join_filters;         /*   840     4 */
   int                        count;                /*   844     4 */
   struct can_filter          dfilter;              /*   848     8 */
   struct can_filter *        filter;               /*   856     8 */
   can_err_mask_t             err_mask;             /*   864     4 */

   /* XXX 4 bytes hole, try to pack */

   struct uniqframe *         uniq;                 /*   872     8 */

   /* size: 880, cachelines: 14, members: 20 */
   /* sum members: 876, holes: 1, sum holes: 4 */
   /* member types with bit holes: 1, total: 1 */
   /* forced alignments: 1 */
   /* last cacheline: 48 bytes */
  } __attribute__((__aligned__(8)));

...and after:

  $ pahole --class_name=raw_sock net/can/raw.o
  struct raw_sock {
   struct sock                sk __attribute__((__aligned__(8))); /*     0   776 */

   /* XXX last struct has 1 bit hole */

   /* --- cacheline 12 boundary (768 bytes) was 8 bytes ago --- */
   int                        ifindex;              /*   776     4 */

   /* XXX 4 bytes hole, try to pack */

   struct net_device *        dev;                  /*   784     8 */
   netdevice_tracker          dev_tracker;          /*   792     0 */
   struct list_head           notifier;             /*   792    16 */
   unsigned int               bound:1;              /*   808: 0  4 */
   unsigned int               loopback:1;           /*   808: 1  4 */
   unsigned int               recv_own_msgs:1;      /*   808: 2  4 */
   unsigned int               fd_frames:1;          /*   808: 3  4 */
   unsigned int               xl_frames:1;          /*   808: 4  4 */
   unsigned int               join_filters:1;       /*   808: 5  4 */

   /* XXX 2 bits hole, try to pack */
   /* Bitfield combined with next fields */

   struct can_raw_vcid_options raw_vcid_opts;       /*   809     4 */

   /* XXX 3 bytes hole, try to pack */

   canid_t                    tx_vcid_shifted;      /*   816     4 */
   canid_t                    rx_vcid_shifted;      /*   820     4 */
   canid_t                    rx_vcid_mask_shifted; /*   824     4 */
   int                        count;                /*   828     4 */
   /* --- cacheline 13 boundary (832 bytes) --- */
   struct can_filter          dfilter;              /*   832     8 */
   struct can_filter *        filter;               /*   840     8 */
   can_err_mask_t             err_mask;             /*   848     4 */

   /* XXX 4 bytes hole, try to pack */

   struct uniqframe *         uniq;                 /*   856     8 */

   /* size: 864, cachelines: 14, members: 20 */
   /* sum members: 852, holes: 3, sum holes: 11 */
   /* sum bitfield members: 6 bits, bit holes: 1, sum bit holes: 2 bits */
   /* member types with bit holes: 1, total: 1 */
   /* forced alignments: 1 */
   /* last cacheline: 32 bytes */
  } __attribute__((__aligned__(8)));

Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250917-can-raw-repack-v2-2-395e8b3a4437@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: raw: reorder struct uniqframe's members to optimise packing

struct uniqframe has one hole. Reorder the fields to save 8 bytes.

Statistics before:

  $ pahole --class_name=uniqframe net/can/raw.o
  struct uniqframe {
   int                        skbcnt;               /*     0     4 */

   /* XXX 4 bytes hole, try to pack */

   const struct sk_buff  *    skb;                  /*     8     8 */
   unsigned int               join_rx_count;        /*    16     4 */

   /* size: 24, cachelines: 1, members: 3 */
   /* sum members: 16, holes: 1, sum holes: 4 */
   /* padding: 4 */
   /* last cacheline: 24 bytes */
  };

...and after:

  $ pahole --class_name=uniqframe net/can/raw.o
  struct uniqframe {
   const struct sk_buff  *    skb;                  /*     0     8 */
   int                        skbcnt;               /*     8     4 */
   unsigned int               join_rx_count;        /*    12     4 */

   /* size: 16, cachelines: 1, members: 3 */
   /* last cacheline: 16 bytes */
  };

Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250917-can-raw-repack-v2-1-395e8b3a4437@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: esd_usb: Fixes and improvements"

Stefan Mätje <stefan.maetje@esd.eu> says:

The first patch makes some error messages also print the error
code to achieve a higher significance. Removes also a duplicate
message and makes the register / unregister messages symmetric.

The second patch avoids emitting any error messages during the
disconnect of CAN-USB devices or the driver unload.

Changes in v2:
- Second patch:
  - Convert all occurrences of error status prints to use
    "ERR_PTR(err)" instead of printing the decimal value
    of "err".
  - Rename retval to err in esd_usb_read_bulk_callback() to
    make the naming of error status variables consistent
    with all other functions.

Link: https://patch.msgid.link/20250821143422.3567029-1-stefan.maetje@esd.eu
[mkl: only take paches 4+5, which are new features, adjust cover letter accordingly]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: esd_usb: Avoid errors triggered from USB disconnect

The USB stack calls during disconnect the esd_usb_disconnect() callback.
esd_usb_disconnect() calls netdev_unregister() for each network which
in turn calls the net_device_ops::ndo_stop callback esd_usb_close() if
the net device is up.

The esd_usb_close() callback tries to disable all CAN Ids and to reset
the CAN controller of the device sending appropriate control messages.

Sending these messages in .disconnect() is moot and always fails because
either the device is gone or the USB communication is already torn down
by the USB stack in the course of a rmmod operation.

Move the code that sends these control messages to a new function
esd_usb_stop() which is approximately the counterpart of
esd_usb_start() to make code structure less convoluted.

Then change esd_usb_close() not to send the control messages at all if
the ndo_stop() callback is executed from the USB .disconnect()
callback. Add a new flag in_usb_disconnect to the struct esd_usb
device structure to mark this condition which is checked by
esd_usb_close() whether to skip the send operations in esd_usb_start().

Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu>
Link: https://patch.msgid.link/20250821143422.3567029-6-stefan.maetje@esd.eu
[mkl: minor change patch description to imperative language]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: esd_usb: Rework display of error messages

- esd_usb_open(): Get rid of duplicate "couldn't start device: %d\n"
  message already printed from esd_usb_start().
- Fix duplicate printout of network device name when network device
  is registered. Add an unregister message for the network device
  as counterpart to the register message.
- Add the printout of error codes together with the error messages
  in esd_usb_close() and some in esd_usb_probe(). The additional error
  codes should lead to a better understanding what is really going
  wrong.
- Convert all occurrences of error status prints to use "ERR_PTR(err)"
  instead of printing the decimal value of "err".
- Rename retval to err in esd_usb_read_bulk_callback() to make the
  naming of error status variables consistent with all other functions.

Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu>
Link: https://patch.msgid.link/20250821143422.3567029-5-stefan.maetje@esd.eu
[mkl: minor change patch description to imperative language]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: rcar_can: Miscellaneous cleanups and improvements"

Geert Uytterhoeven <geert+renesas@glider.be> says:

This patch series contains miscellaneous cleanups and improvements for
the R-Car CAN driver. I deliberately sent this as a separate series
from "[PATCH] can: rcar_can: Fix s2ram with PSCI"[1], to avoid
blocking the latter. However, this series (in particular [PATCH 3/9])
does depend on it.

Changes compared to v1[2]:
- Convert new Runtime PM error messages to %pe,
- New patches 10 and 11.

[1] https://lore.kernel.org/699b2f7fcb60b31b6f976a37f08ce99c5ffccb31.1755165227.git.geert+renesas@glider.be
[2] https://lore.kernel.org/cover.1755172404.git.geert+renesas@glider.be

Link: https://patch.msgid.link/cover.1755857536.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_can: Convert to %pe

Replace numerical error codes by mnemotechnic error codes, to improve
the user experience in case of errors.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/adb2dc49c78b45191de410f645a5e423d341f94e.1755857536.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_can: Do not print alloc_candev() failures

If alloc_candev() failed due to out-of-memory, the core memory
allocation code has already printed an error message.
If alloc_candev() failed for a different reason, alloc_netdev_mqs() has
already printed an error message.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/2d6ad4be211a35492570fd7219ca7a89b384bfad.1755857536.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_can: Mailbox bitfield conversion

Convert CAN Mailbox Register field accesses to use the FIELD_PREP() and
FIELD_GET() bitfield access macro.

This gets rid of explicit shifts, and keeps a clear separation between
hardware register layouts and offical CAN definitions.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/c75c7d6ed5929c4becf7c9178cec04a0731e8ab1.1755857536.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_can: BCR bitfield conversion

Convert CAN Bit Configuration Register field accesses to use the
FIELD_PREP() bitfield access macro. While at it, fix the misspelling of
BRP.

This gets rid of custom function-like field preparation macros.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/01cfaedba2be22515ba8700893ea7f113df959c0.1755857536.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_can: TFCR bitfield conversion

Convert CAN Transmit FIFO Control Register field accesses to use the
FIELD_GET() bitfield access macro.

This gets rid of an explicit shift.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/a8b1dc6f1249a01af9b691ca59e2e5cc2dba6d44.1755857536.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_can: CTLR bitfield conversion

Convert CAN Control Register field accesses to use the FIELD_PREP()
bitfield access macro. Add a few more comments and definitions while at
it.

This gets rid of explicit (and sometimes confusing) shifts.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/077640e31949dc3c9d128a08ade94c9e9cd25672.1755857536.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_can: Convert to GENMASK()

Use the GENMASK() macro instead of open-coding the same operation.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/3f947f0f91a8857a2cbce74807e42258e9f209ca.1755857536.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_can: Convert to BIT()

Use the BIT() macro instead of open-coding the same operation.
Add a few more comments while at it.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/78fb16beb74975f6f6140ec9abb48beb94fb0afa.1755857536.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_can: Convert to Runtime PM

The R-Car CAN module is part of a Clock Domain on all supported SoCs.
Hence convert its driver from explicit clock management to Runtime PM.

While at it, use %pe to format error codes.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/68bfa5480a79c17c6ceec4fb073f33872e7ff5d0.1755857536.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_can: Add helper variable dev to rcar_can_probe()

rcar_can_probe() has many users of "pdev->dev". Introduce a shorthand
to simplify the code.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/baf34c8bef5625ae73c830dbb3c617eb8f7adddd.1755857536.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_can: Consistently use ndev for net_device pointers

Most net_device pointers are named "ndev", but some are called "dev".
Increase uniformity by always using "ndev".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/aac66fb5b5e1d6787121cf2ec36b551b41d4b32e.1755857536.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: rcar_canfd: R-Car CANFD Improvements"

Biju <biju.das.au@gmail.com> says:

From: Biju Das <biju.das.jz@bp.renesas.com>

The calculation formula for nominal bit rate of classical CAN is same as
that of nominal bit rate of CANFD on the RZ/G3E SoC and R-Car Gen4
compared to other SoCs. Update the nominal bit rate constants.

Apart from this, for replacing function-like macros, introduced
rcar_canfd_compute_{nominal,data}_bit_rate_cfg().

v2->v3:
* Replaced "shared_bittiming"->"shared_can_regs" as it is same for RZ/G3E
   and R-Car Gen4.
* Updated commit header and description for patch#1.
* Added Rb tag from Geert for patch #2,#3 and #4.
* Dropped _MASK suffix from RCANFD_CFG_* macros.
* Dropped _MASK suffix from RCANFD_NCFG_NBRP_MASK macro.
* Dropped _MASK suffix from the macro RCANFD_DCFG_DBRP_MASK.
* Followed the order as used in struct can_bittiming{_const} for easy
   maintenance.
v1->v2:
* Dropped patch#2 as it is accepted.
* Moved patch#4 to patch#2.
* Updated commit header and description for patch#2.
* Kept RCANFD_CFG* macro definitions to give a meaning to the magic
   number using GENMASK macro and used FIELD_PREP to extract value.
* Split patch#3 for computing nominal  and data bit rate config separate.
* Updated rcar_canfd_compute_nominal_bit_rate_cfg() to handle
   nominal bit rate configuration for both classical CAN and CANFD.
* Replaced RCANFD_NCFG_NBRP->RCANFD_NCFG_NBRP_MASK and used FIELD_PREP to
   extract value.
* Replaced RCANFD_DCFG_DBRP->RCANFD_DCFG_DBRP_MASK and used FIELD_PREP to
   extract value.

Link: https://patch.msgid.link/20250908120940.147196-1-biju.das.jz@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_canfd: Simplify data bit rate config

Introduce rcar_canfd_compute_data_bit_rate_cfg() for simplifying data bit
rate configuration by replacing function-like macros.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/20250908120940.147196-5-biju.das.jz@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_canfd: Simplify nominal bit rate config

Introduce rcar_canfd_compute_nominal_bit_rate_cfg() for simplifying
nominal bit rate configuration by replacing function-like macros.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/20250908120940.147196-4-biju.das.jz@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_canfd: Update RCANFD_CFG_* macros

Update RCANFD_CFG_* macros to give a meaning to the magic number using
GENMASK macro and extract the values using FIELD_PREP macro.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/20250908120940.147196-3-biju.das.jz@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_canfd: Update bit rate constants for RZ/G3E and R-Car Gen4

The calculation formula for nominal bit rate of classical CAN is the same as
that of nominal bit rate of CANFD on the RZ/G3E and R-Car Gen4 SoCs
compared to other SoCs. Update nominal bit rate constants.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250908120940.147196-2-biju.das.jz@bp.renesas.com
[mkl: slightly improve wording of commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: peak: Modification of references to email accounts being deleted

With the upcoming deletion of @peak-system.com accounts and following
the acquisition of PEAK-System and its brand by HMS-Networks, this fix
aims to migrate all address references to @hms-networks.com, as well
as to map my personal committer addresses to author addresses, while
taking the opportunity to correct the accent on the first ‘e’ of my
first name.

Signed-off-by: Stéphane Grosjean <stephane.grosjean@hms-networks.com>
Link: https://patch.msgid.link/20250912081820.86314-1-stephane.grosjean@free.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: dev: sort includes by alphabetical order

Includes are out of order in

drivers/net/can/dev/dev.c

Sort them by alphabetical order.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250830152107.694201-2-mailhol@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

MAINTAINERS: update Vincent Mailhol's email address

Now that I have received my kernel.org account, I am changing my email
address from mailhol.vincent@wanadoo.fr to mailhol@kernel.org. The
wanadoo.fr address was my first email which I created when I was a kid
and has a special meaning to me, but it is restricted to a maximum of
50 messages per hour which starts to be problematic on threads where
many people are CC-ed.

Update all the MAINTAINERS entries accordingly and map the old address
to the new one.

I remain reachable from my old address. The different copyright
notices mentioning my old address are kept as-is for the moment. I
will update those one at a time only if I need to touch those files.

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250826105255.35501-2-mailhol@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: m_can: use us_to_ktime() where appropriate

The tx_coalesce_usecs_irq are more suitable for using the
us_to_ktime(). This can make the code more concise and
enhance readability.

Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Reviewed-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250825090904.248927-1-zhao.xichao@vivo.com
[mkl: remove not needed line break]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

net: phy: micrel: Update Kconfig help text

This driver by now supports 17 different Microchip (formerly known as
Micrel) chips: KSZ9021, KSZ9031, KSZ9131, KSZ8001, KS8737, KSZ8021,
KSZ8031, KSZ8041, KSZ8051, KSZ8061, KSZ8081, KSZ8873MLL, KSZ886X,
KSZ9477, LAN8814, LAN8804 and LAN8841.

Support for the VSC8201 was removed in commit 51f932c4870f ("micrel phy
driver - updated(1)")

Update the help text to reflect that, list families instead of models to
ease future maintenance.

Signed-off-by: Jonas Rebmann <jre@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250911-micrel-kconfig-v2-1-e8f295059050@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter: updates for net-next

1) Don't respond to ICMP_UNREACH errors with another ICMP_UNREACH
   error.
2) Support fetching the current bridge ethernet address.
   This allows a more flexible approach to packet redirection
   on bridges without need to use hardcoded addresses. From
   Fernando Fernandez Mancera.
3) Zap a few no-longer needed conditionals from ipvs packet path
   and convert to READ/WRITE_ONCE to avoid KCSAN warnings.
   From Zhang Tengfei.
4) Remove a no-longer-used macro argument in ipset, from Zhen Ni.

* tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_reject: don't reply to icmp error messages
  ipvs: Use READ_ONCE/WRITE_ONCE for ipvs->enable
  netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support
  netfilter: ipset: Remove unused htable_bits in macro ahash_region
  selftest:net: fixed spelling mistakes
====================

Link: https://patch.msgid.link/20250911143819.14753-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: Drop duplicate brcm,bcm7445-switch-v4.0.txt

The brcm,bcm7445-switch-v4.0.txt binding is already covered by
dsa/brcm,sf2.yaml. The listed deprecated properties aren't used anywhere
either.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250909142339.3219200-2-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mvneta: add support for hardware timestamps

Add support for hardware timestamps in (e.g.) the PHY by calling
skb_tx_timestamp() as close as reasonably possible to the point that
the hardware is instructed to send the queued packets.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uwKHe-00000004glk-3nkJ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp_tunnel: use netdev_warn() instead of netdev_WARN()

netdev_WARN() uses WARN/WARN_ON to print a backtrace along with
file and line information. In this case, udp_tunnel_nic_register()
returning an error is just a failed operation, not a kernel bug.

udp_tunnel_nic_register() can fail due to a memory allocation
failure (kzalloc() or udp_tunnel_nic_alloc()).
This is a normal runtime error and not a kernel bug.

Replace netdev_WARN() with netdev_warn() accordingly.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250910195031.3784748-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tcp-destroy-tcp-ao-tcp-md5-keys-in-sk_destruct'

Dmitry Safonov says:

====================
tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()

On one side a minor/cosmetic issue, especially nowadays when
TCP-AO/TCP-MD5 signature verification failures aren't logged to dmesg.

Yet, I think worth addressing for two reasons:
- unsigned RST gets ignored by the peer and the connection is alive for
  longer (keep-alive interval)
- netstat counters increase and trace events report that trusted BGP peer
  is sending unsigned/incorrectly signed segments, which can ring alarm
  on monitoring.
====================

Link: https://patch.msgid.link/20250909-b4-tcp-ao-md5-rst-finwait2-v5-0-9ffaaaf8b236@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Free TCP-AO/TCP-MD5 info/keys without RCU

Now that the destruction of info/keys is delayed until the socket
destructor, it's safe to use kfree() without an RCU callback.
The socket is in TCP_CLOSE state either because it never left it,
or it's already closed and the refcounter is zero. In any way,
no one can discover it anymore, it's safe to release memory
straight away.

Similar thing was possible for twsk already.

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://patch.msgid.link/20250909-b4-tcp-ao-md5-rst-finwait2-v5-2-9ffaaaf8b236@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()

Currently there are a couple of minor issues with destroying the keys
tcp_v4_destroy_sock():

1. The socket is yet in TCP bind buckets, making it reachable for
   incoming segments [on another CPU core], potentially available to send
   late FIN/ACK/RST replies.

2. There is at least one code path, where tcp_done() is called before
   sending RST [kudos to Bob for investigation]. This is a case of
   a server, that finished sending its data and just called close().

   The socket is in TCP_FIN_WAIT2 and has RCV_SHUTDOWN (set by
   __tcp_close())

   tcp_v4_do_rcv()/tcp_v6_do_rcv()
     tcp_rcv_state_process()            /* LINUX_MIB_TCPABORTONDATA */
       tcp_reset()
         tcp_done_with_error()
           tcp_done()
             inet_csk_destroy_sock()    /* Destroys AO/MD5 keys */
     /* tcp_rcv_state_process() returns SKB_DROP_REASON_TCP_ABORT_ON_DATA */
   tcp_v4_send_reset()                  /* Sends an unsigned RST segment */

   tcpdump:
> 22:53:15.399377 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 33929, offset 0, flags [DF], proto TCP (6), length 60)
>     1.0.0.1.34567 > 1.0.0.2.49848: Flags [F.], seq 2185658590, ack 3969644355, win 502, options [nop,nop,md5 valid], length 0
> 22:53:15.399396 00:00:01:01:00:00 > 00:00:b2:1f:00:00, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 51951, offset 0, flags [DF], proto TCP (6), length 72)
>     1.0.0.2.49848 > 1.0.0.1.34567: Flags [.], seq 3969644375, ack 2185658591, win 128, options [nop,nop,md5 valid,nop,nop,sack 1 {2185658590:2185658591}], length 0
> 22:53:16.429588 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
>     1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658590, win 0, length 0
> 22:53:16.664725 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
>     1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
> 22:53:17.289832 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
>     1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0

  Note the signed RSTs later in the dump - those are sent by the server
  when the fin-wait socket gets removed from hash buckets, by
  the listener socket.

Instead of destroying AO/MD5 info and their keys in inet_csk_destroy_sock(),
slightly delay it until the actual socket .sk_destruct(). As shutdown'ed
socket can yet send non-data replies, they should be signed in order for
the peer to process them. Now it also matches how AO/MD5 gets destructed
for TIME-WAIT sockets (in tcp_twsk_destructor()).

This seems optimal for TCP-MD5, while for TCP-AO it seems to have an
open problem: once RST get sent and socket gets actually destructed,
there is no information on the initial sequence numbers. So, in case
this last RST gets lost in the network, the server's listener socket
won't be able to properly sign another RST. Nothing in RFC 1122
prescribes keeping any local state after non-graceful reset.
Luckily, BGP are known to use keep alive(s).

While the issue is quite minor/cosmetic, these days monitoring network
counters is a common practice and getting invalid signed segments from
a trusted BGP peer can get customers worried.

Investigated-by: Bob Gilligan <gilligan@arista.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://patch.msgid.link/20250909-b4-tcp-ao-md5-rst-finwait2-v5-1-9ffaaaf8b236@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'bridge-allow-keeping-local-fdb-entries-only-on-vlan-0'

Petr Machata says:

====================
bridge: Allow keeping local FDB entries only on VLAN 0

The bridge FDB contains one local entry per port per VLAN, for the MAC of
the port in question, and likewise for the bridge itself. This allows
bridge to locally receive and punt "up" any packets whose destination MAC
address matches that of one of the bridge interfaces or of the bridge
itself.

The number of these local "service" FDB entries grows linearly with number
of bridge-global VLAN memberships, but that in turn will tend to grow
quadratically with number of ports and per-port VLAN memberships. While
that does not cause issues during forwarding lookups, it does make dumps
impractically slow.

As an example, with 100 interfaces, each on 4K VLANs, a full dump of FDB
that just contains these 400K local entries, takes 6.5s. That's _without_
considering iproute2 formatting overhead, this is just how long it takes to
walk the FDB (repeatedly), serialize it into netlink messages, and parse
the messages back in userspace.

This is to illustrate that with growing number of ports and VLANs, the time
required to dump this repetitive information blows up. Arguably 4K VLANs
per interface is not a very realistic configuration, but then modern
switches can instead have several hundred interfaces, and we have fielded
requests for >1K VLAN memberships per port among customers.

FDB entries are currently all kept on a single linked list, and then
dumping uses this linked list to walk all entries and dump them in order.
When the message buffer is full, the iteration is cut short, and later
restarted. Of course, to restart the iteration, it's first necessary to
walk the already-dumped front part of the list before starting dumping
again. So one possibility is to organize the FDB entries in different
structure more amenable to walk restarts.

One option is to walk directly the hash table. The advantage is that no
auxiliary structure needs to be introduced. With a rough sketch of this
approach, the above scenario gets dumped in not quite 3 s, saving over 50 %
of time. However hash table iteration requires maintaining an active cursor
that must be collected when the dump is aborted. It looks like that would
require changes in the NDO protocol to allow to run this cleanup. Moreover,
on hash table resize the iteration is simply restarted. FDB dumps are
currently not guaranteed to correspond to any one particular state: entries
can be missed, or be duplicated. But with hash table iteration we would get
that plus the much less graceful resize behavior, where swaths of FDB are
duplicated.

Another option is to maintain the FDB entries in a red-black tree. We have
a PoC of this approach on hand, and the above scenario is dumped in about
2.5 s. Still not as snappy as we'd like it, but better than the hash table.
However the savings come at the expense of a more expensive insertion, and
require locking during dumps, which blocks insertion.

The upside of these approaches is that they provide benefits whatever the
FDB contents. But it does not seem like either of these is workable.
However we intend to clean up the RB tree PoC and present it for
consideration later on in case the trade-offs are considered acceptable.

Yet another option might be to use in-kernel FDB filtering, and to filter
the local entries when dumping. Unfortunately, this does not help all that
much either, because the linked-list walk still needs to happen. Also, with
the obvious filtering interface built around ndm_flags / ndm_state
filtering, one can't just exclude pure local entries in one query. One
needs to dump all non-local entries first, and then to get permanent
entries in another run filter local & added_by_user. I.e. one needs to pay
the iteration overhead twice, and then integrate the result in userspace.
To get significant savings, one would need a very specific knob like "dump,
but skip/only include local entries". But if we are adding a local-specific
knobs, maybe let's have an option to just not duplicate them in the first
place.

All this FDB duplication is there merely to make things snappy during
forwarding. But high-radix switches with thousands of VLANs typically do
not process much traffic in the SW datapath at all, but rather offload vast
majority of it. So we could exchange some of the runtime performance for a
neater FDB.

To that end, in this patchset, introduce a new bridge option,
BR_BOOLOPT_FDB_LOCAL_VLAN_0, which when enabled, has local FDB entries
installed only on VLAN 0, instead of duplicating them across all VLANs.
Then to maintain the local termination behavior, on FDB miss, the bridge
does a second lookup on VLAN 0.

Enabling this option changes the bridge behavior in expected ways. Since
the entries are only kept on VLAN 0, FDB get, flush and dump will not
perceive them on non-0 VLANs. And deleting the VLAN 0 entry affects
forwarding on all VLANs.

This patchset is loosely based on a privately circulated patch by Nikolay
Aleksandrov.

The patchset progresses as follows:

- Patch #1 introduces a bridge option to enable the above feature. Then
  patches #2 to #5 gradually patch the bridge to do the right thing when
  the option is enabled. Finally patch #6 adds the UAPI knob and the code
  for when the feature is enabled or disabled.
- Patches #7, #8 and #9 contain fixes and improvements to selftest
  libraries
- Patch #10 contains a new selftest
====================

Link: https://patch.msgid.link/cover.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: forwarding: Add test for BR_BOOLOPT_FDB_LOCAL_VLAN_0

Add a selftest to check the operation of this newly-introduced bridge
option.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/62294f96884ab5d341648eef21243fa099a2dee5.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: lib.sh: Don't defer failed commands

Usually the autodefer helpers in lib.sh are expected to be run in context
where success is the expected outcome. However when using them for feature
detection, failure can legitimately occur. But the failed command still
schedules a cleanup, which will likely fail again.

Instead, only schedule deferred cleanup when the positive command succeeds.

This way of organizing the cleanup has the added benefit that now the
return code from these functions reflects whether the command passed.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/af10a5bb82ea11ead978cf903550089e006d7e70.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: defer: Introduce DEFER_PAUSE_ON_FAIL

The fact that all cleanup (ideally) goes through the defer framework makes
debugging of these commands a bit tricky. However, this also gives us a
nice point to place a hook along the lines of PAUSE_ON_FAIL. When the
environment variable DEFER_PAUSE_ON_FAIL is set, and a cleanup command
results in non-zero exit status, show a bit of debuginfo and give the user
an opportunity to interrupt the execution altogether.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/2a07d24568ede6c42e4701657fa0b738e490fe59.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: defer: Allow spaces in arguments of deferred commands

Currently the way deferred commands are stored and invoked causes any
whitespace to act as an argument separator when the command is executed.
To make it possible to use spaces in deferred commands, store the commands
quoted, and then eval the string prior to execution.

Fixes: a6e263f125cd ("selftests: net: lib: Introduce deferred commands")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/6c2523139a6f99103889c9c9fedcdc66a75441f4.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0

The previous patches introduced a new option, BR_BOOLOPT_FDB_LOCAL_VLAN_0.
When enabled, it has local FDB entries installed only on VLAN 0, instead of
duplicating them across all VLANs.

In this patch, add the corresponding UAPI toggle, and the code for turning
the feature on and off.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/ea99bfb10f687fa58091e6e1c2f8acc33f47ca45.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation

When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the
member ports as well as the bridge itself should not be created per-VLAN,
but instead only on VLAN 0.

Thus when a VLAN is added for a port or the bridge itself, a local FDB
entry with the corresponding address should not be added when in the VLAN-0
mode.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/bb13ba01d58ed6d5d700e012c519d38ee6806d22.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge changeaddr, skip per-VLAN FDBs

When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the
bridge itself should not be created per-VLAN, but instead only on VLAN 0.
When the bridge address changes, the local FDB entries need to be updated,
which is done in br_fdb_change_mac_address().

Bail out early when in VLAN-0 mode, so that the per-VLAN FDB entries are
not created. The per-VLAN walk is only done afterwards.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/0bd432cf91921ef7c4ed0e129de1d1cd358c716b.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN FDBs

When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for member
ports should not be created per-VLAN, but instead only on VLAN 0. When the
member port address changes, the local FDB entries need to be updated,
which is done in br_fdb_changeaddr().

Under the VLAN-0 mode, only one local FDB entry will ever be added for a
port's address, and that on VLAN 0. Thus bail out of the delete loop early.
For the same reason, also skip adding the per-VLAN entries.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/0cf9d41836d2a245b0ce07e1a16ee05ca506cbe9.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss

When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the
member ports as well as the bridge itself should not be created per-VLAN,
but instead only on VLAN 0.

That means that br_handle_frame_finish() needs to make two lookups: the
primary lookup on an appropriate VLAN, and when that misses, a lookup on
VLAN 0.

Have the second lookup only accept local MAC addresses. Turning this into a
generic second-lookup feature is not the goal.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/8087475009dce360fb68d873b1ed9c80827da302.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0

The following patches will gradually introduce the ability of the bridge
to look up local FDB entries on VLAN 0 instead of using the VLAN indicated
by a packet.

In this patch, just introduce the option itself, with which the feature
will be linked.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/ab85e33ef41ed19a3deaef0ff7da26830da30642.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devmem: expose tcp_recvmsg_locked errors

tcp_recvmsg_dmabuf can export the following errors:
- EFAULT when linear copy fails
- ETOOSMALL when cmsg put fails
- ENODEV if one of the frags is readable
- ENOMEM on xarray failures

But they are all ignored and replaced by EFAULT in the caller
(tcp_recvmsg_locked). Expose real error to the userspace to
add more transparency on what specifically fails.

In non-devmem case (skb_copy_datagram_msg) doing `if (!copied)
copied=-EFAULT` is ok because skb_copy_datagram_msg can return only EFAULT.

Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250910162429.4127997-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'wireguard-fixes-for-6-17-rc6'

Jason A. Donenfeld says:

====================
wireguard fixes for 6.17-rc6

Please find three small fixes to wireguard:

1) A general simplification to the way wireguard chooses the next
available cpu, by making use of cpumask_nth(), and covering an edge
case.

2) A cleanup to the selftests kconfig.

3) A fix to the selftests kconfig so that it actually runs again.
====================

Link: https://patch.msgid.link/20250910013644.4153708-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wireguard: selftests: select CONFIG_IP_NF_IPTABLES_LEGACY

This is required on recent kernels, where it is now off by default.
While we're here, fix some stray =m's that were supposed to be =y.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250910013644.4153708-5-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel config

It's no longer user-selectable (and the default was already "y"), so
let's just drop it.

It was never really relevant to the wireguard selftests either way.

Cc: Shuah Khan <shuah@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250910013644.4153708-4-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wireguard: queueing: always return valid online CPU in wg_cpumask_choose_online()

The function gets number of online CPUS, and uses it to search for
Nth cpu in cpu_online_mask.

If id == num_online_cpus() - 1, and one CPU gets offlined between
calling num_online_cpus() -> cpumask_nth(), there's a chance for
cpumask_nth() to find nothing and return >= nr_cpu_ids.

The caller code in __queue_work() tries to avoid that by checking the
returned CPU against WORK_CPU_UNBOUND, which is NR_CPUS. It's not the
same as '>= nr_cpu_ids'. On a typical Ubuntu desktop, NR_CPUS is 8192,
while nr_cpu_ids is the actual number of possible CPUs, say 8.

The non-existing cpu may later be passed to rcu_dereference() and
corrupt the logic. Fix it by switching from 'if' to 'while'.

Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250910013644.4153708-3-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wireguard: queueing: simplify wg_cpumask_next_online()

wg_cpumask_choose_online() opencodes cpumask_nth(). Use it and make the
function significantly simpler. While there, fix opencoded cpu_online()
too.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250910013644.4153708-2-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

geneve: Avoid -Wflex-array-member-not-at-end warning

-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Move the conflicting declaration to the end of the corresponding
structure. Notice that `struct ip_tunnel_info` is a flexible
structure, this is a structure that contains a flexible-array
member.

Fix the following warning:

drivers/net/geneve.c:56:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/aMBK78xT2fUnpwE5@kspp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: udp: fix typos in comments

Correct typos in ipv6/udp.c comments:
"execeeds" -> "exceeds"
"tacking care" -> "taking care"
"measureable" -> "measurable"

No functional changes.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250909122611.3711859-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-af_packet-optimize-retire-operation'

Xin Zhao says:

====================
net: af_packet: optimize retire operation

In a system with high real-time requirements, the timeout mechanism of
ordinary timers with jiffies granularity is insufficient to meet the
demands for real-time performance. Meanwhile, the optimization of CPU
usage with af_packet is quite significant. Use hrtimer instead of timer
to help compensate for the shortcomings in real-time performance.
In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
enough, with fluctuations reaching over 8ms (on a system with HZ=250).
This is unacceptable in some high real-time systems that require timely
processing of network packets. By replacing it with hrtimer, if a timeout
of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
3 ms.
====================

Link: https://patch.msgid.link/20250908104549.204412-1-jackzxcui1989@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: af_packet: Use hrtimer to do the retire operation

In a system with high real-time requirements, the timeout mechanism of
ordinary timers with jiffies granularity is insufficient to meet the
demands for real-time performance. Meanwhile, the optimization of CPU
usage with af_packet is quite significant. Use hrtimer instead of timer
to help compensate for the shortcomings in real-time performance.
In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
enough, with fluctuations reaching over 8ms (on a system with HZ=250).
This is unacceptable in some high real-time systems that require timely
processing of network packets. By replacing it with hrtimer, if a timeout
of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
3 ms.

Delete delete_blk_timer field, because hrtimer_cancel will check and wait
until the timer callback return and ensure never enter callback again.

Simplify the logic related to setting timeout, only update the hrtimer
expire time within the hrtimer callback, no longer update the expire time
in prb_open_block which is called by tpacket_rcv or timer callback.
Reasons why NOT update hrtimer in prb_open_block:
1) It will increase complexity to distinguish the two caller scenario.
2) hrtimer_cancel and hrtimer_start need to be called if you want to update
TMO of an already enqueued hrtimer, leading to complex shutdown logic.

One side effect of NOT update hrtimer when called by tpacket_rcv is that
a newly opened block triggered by tpacket_rcv may be retired earlier than
expected. On the other hand, if timeout is updated in prb_open_block, the
frequent reception of network packets that leads to prb_open_block being
called may cause hrtimer to be removed and enqueued repeatedly.

The retire hrtimer expiration is unconditional and periodic. If there are
numerous packet sockets on the system, please set an appropriate timeout
to avoid frequent enqueueing of hrtimers.

Reviewed-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/all/20250831100822.1238795-1-jackzxcui1989@163.com/
Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
Link: https://patch.msgid.link/20250908104549.204412-3-jackzxcui1989@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: af_packet: remove last_kactive_blk_num field

kactive_blk_num (K) is only incremented on block close.
In timer callback prb_retire_rx_blk_timer_expired, except delete_blk_timer
is true, last_kactive_blk_num (L) is set to match kactive_blk_num (K) in
all cases. L is also set to match K in prb_open_block.
The only case K not equal to L is when scheduled by tpacket_rcv
and K is just incremented on block close but no new block could be opened,
so that it does not call prb_open_block in prb_dispatch_next_block.
This patch modifies the prb_retire_rx_blk_timer_expired function by simply
removing the check for L == K. This patch just provides another checkpoint
to thaw the might-be-frozen block in any case. It doesn't have any effect
because __packet_lookup_frame_in_block() has the same logic and does it
again without this patch when detecting the ring is frozen. The patch only
advances checking the status of the ring.

Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/all/20250831100822.1238795-1-jackzxcui1989@163.com/
Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
Link: https://patch.msgid.link/20250908104549.204412-2-jackzxcui1989@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: Convert APM XGene MDIO to DT schema

Convert the APM XGene MDIO bus binding to DT schema format. It's a
straight-forward conversion.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250908231016.2070305-2-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: Convert apm,xgene-enet to DT schema

Convert the APM XGene Ethernet binding to DT schema format.

Add the missing apm,xgene2-sgenet and apm,xgene2-xgenet compatibles.
Drop "reg-names" as required. Add support for up to 16 interrupts.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250908231016.2070305-1-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-ethernet-renesas-rcar_gen4_ptp-simplify-register-layout'

Niklas Söderlund says:

====================
net: ethernet: renesas: rcar_gen4_ptp: Simplify register layout

The daughter driver rcar_gen4_ptp used by both rswitch and rtsn where
upstreamed with support for possible different memory layouts on
different users. With all Gen4 boards upstream no such setup is
documented.

There are other issues related to how the rcar_gen4_ptp driver is shared
between multiple useres that needs to be cleaned up. But that will be a
larger work. So before that get some simple fixes done.

Patch 1/3 and 2/3 removes the support to allow different register
layouts on different SoCs by looking up offsets at runtime with a much
simpler interface. The new interface computes the offsets at compile
time.

While patch 3/3 is a drive-by patch taking a spurs comment and making a
lockdep check of it.

There is no intentional functional change in this series just cleaning
up in preparation of larger works to follow.
====================

Link: https://patch.msgid.link/20250908154426.3062861-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: renesas: rcar_gen4_ptp: Use lockdep to verify internal usage

Instead of a having a comment that the lock must be held when calling
the internal helper add a lockdep check to enforce it.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250908154426.3062861-4-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: renesas: rcar_gen4_ptp: Hide register layout

With the support for multiple register layout removed all support
structures can be removed from the header file. Covert to a simpler
structure using defines for the register offsets.

There is no functional change, only switching from looking up offsets at
runtime to compile time.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250908154426.3062861-3-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: renesas: rcar_gen4_ptp: Remove different memory layout

When upstreaming the Gen4 PTP support for R-Car S4 the possibility for
different memory layouts on other Gen4 SoCs was build in. It turns out
this is not needed and instead needlessly makes the driver harder to
read, remove the support code that would have allowed different memory
layouts.

This change only deals with the public functions used by other drivers,
follow up work will clean up the rcar_gen4_ptp internals.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250908154426.3062861-2-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: 8139too: Make 8139TOO_PIO depend on !NO_IOPORT_MAP

When 8139too is probing and 8139TOO_PIO=y it will call pci_iomap_range()
and from there __pci_ioport_map() for the PCI IO space.
If HAS_IOPORT_MAP=n and NO_GENERIC_PCI_IOPORT_MAP=n, like it is on my
m68k config, __pci_ioport_map() becomes NULL, pci_iomap_range() will
always fail and the driver will complain it couldn't map the PIO space
and return an error.

NO_IOPORT_MAP seems to cover the case where what 8139too is trying
to do cannot ever work so make 8139TOO_PIO depend on being it false
and avoid creating an unusable driver.

Signed-off-by: Daniel Palmer <daniel@thingy.jp>
Link: https://patch.msgid.link/20250907064349.3427600-1-daniel@thingy.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: Replace sleep with slowwait

Replace the sleep in kill_procs with slowwait.

Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250910025828.38900-2-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: Disable dad for ipv6 in fcnal-test.sh

Constrained test environment; duplicate address detection is not needed
and causes races so disable it.

Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250910025828.38900-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-next-2025-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Plenty of things going on, notably:
- iwlwifi: major cleanups/rework
- brcmfmac: gets AP isolation support
- mac80211: gets more S1G support

* tag 'wireless-next-2025-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (94 commits)
  wifi: mwifiex: fix endianness handling in mwifiex_send_rgpower_table
  wifi: cfg80211: Remove the redundant wiphy_dev
  wifi: mac80211: fix incorrect comment
  wifi: cfg80211: update the time stamps in hidden ssid
  wifi: mac80211: Fix HE capabilities element check
  wifi: mac80211: add tx_handlers_drop statistics to ethtool
  wifi: mac80211: fix reporting of all valid links in sta_set_sinfo()
  wifi: iwlwifi: mld: CHANNEL_SURVEY_NOTIF is always supported
  wifi: iwlwifi: mld: remove support of iwl_esr_mode_notif version 1
  wifi: iwlwifi: mld: remove support from of sta cmd version 1
  wifi: iwlwifi: mld: remove support of roc cmd version 5
  wifi: iwlwifi: mld: remove support of mac cmd ver 2
  wifi: iwlwifi: mld: don't consider phy cmd version 5
  wifi: iwlwifi: implement wowlan status notification API update
  wifi: iwlwifi: fw: Add ASUS to PPAG and TAS list
  wifi: iwlwifi: add kunit tests for nvm parse
  wifi: iwlwifi: api: add a flag to iwl_link_ctx_modify_flags
  wifi: iwlwifi: pcie: move ltr_enabled to the specific transport
  wifi: iwlwifi: pcie: move pm_support to the specific transport
  wifi: iwlwifi: rename iwl_finish_nic_init
  ...
====================

Link: https://patch.msgid.link/20250911100854.20445-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.17-rc6).

Conflicts:

net/netfilter/nft_set_pipapo.c
net/netfilter/nft_set_pipapo_avx2.c
c4eaca2e1052 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups")
84c1da7b38d9 ("netfilter: nft_set_pipapo: use avx2 algorithm for insertions too")

Only trivial adjacent changes (in a doc and a Makefile).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fixed_phy: remove two function stubs

Remove stubs for fixed_phy_set_link_update() and
fixed_phy_change_carrier() because all callers
(actually just one per function) select config
symbol FIXED_PHY.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/8729170d-cf39-48d9-aabc-c9aa4acda070@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from CAN, netfilter and wireless.

  We have an IPv6 routing regression with the relevant fix still a WiP.
  This includes a last-minute revert to avoid more problems.

  Current release - new code bugs:

   - wifi: nl80211: completely disable per-link stats for now

  Previous releases - regressions:

   - dev_ioctl: take ops lock in hwtstamp lower paths

   - netfilter:
       - fix spurious set lookup failures
       - fix lockdep splat due to missing annotation

   - genetlink: fix genl_bind() invoking bind() after -EPERM

   - phy: transfer phy_config_inband() locking responsibility to phylink

   - can: xilinx_can: fix use-after-free of transmitted SKB

   - hsr: fix lock warnings

   - eth:
       - igb: fix NULL pointer dereference in ethtool loopback test
       - i40e: fix Jumbo Frame support after iPXE boot
       - macsec: sync features on RTM_NEWLINK

  Previous releases - always broken:

   - tunnels: reset the GSO metadata before reusing the skb

   - mptcp: make sync_socket_options propagate SOCK_KEEPOPEN

   - can: j1939: implement NETDEV_UNREGISTER notification hanidler

   - wifi: ath12k: fix WMI TLV header misalignment"

* tag 'net-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"
  hsr: hold rcu and dev lock for hsr_get_port_ndev
  hsr: use hsr_for_each_port_rtnl in hsr_port_get_hsr
  hsr: use rtnl lock when iterating over ports
  wifi: nl80211: completely disable per-link stats for now
  net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups
  net: ethtool: fix wrong type used in struct kernel_ethtool_ts_info
  MAINTAINERS: add Phil as netfilter reviewer
  netfilter: nf_tables: restart set lookup on base_seq change
  netfilter: nf_tables: make nft_set_do_lookup available unconditionally
  netfilter: nf_tables: place base_seq in struct net
  netfilter: nft_set_rbtree: continue traversal if element is inactive
  netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
  netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
  can: rcar_can: rcar_can_resume(): fix s2ram with PSCI
  can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB
  can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails
  can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed
  can: j1939: implement NETDEV_UNREGISTER notification handler
  selftests: can: enable CONFIG_CAN_VCAN as a module
  ...

Merge tag 's390-6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Alexander Gordeev:

- ptep_modify_prot_start() may be called in a loop, which might lead to
   the preempt_count overflow due to the unnecessary preemption
   disabling. Do not disable preemption to prevent the overflow

- Events of type PERF_TYPE_HARDWARE are not tested for sampling and
   return -EOPNOTSUPP eventually.

   Instead, deny all sampling events by CPUMF counter facility and
   return -ENOENT to allow other PMUs to be tried

- The PAI PMU driver returns -EINVAL if an event out of its range. That
   aborts a search for an alternative PMU driver.

   Instead, return -ENOENT to allow other PMUs to be tried

* tag 's390-6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cpum_cf: Deny all sampling events by counter PMU
  s390/pai: Deny all events not handled by this PMU
  s390/mm: Prevent possible preempt_count overflow

Merge tag 'pm-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix a nasty hibernation regression introduced during the 6.16
  cycle, an issue related to energy model management occurring on Intel
  hybrid systems where some CPUs are offline to start with, and two
  regressions in the amd-pstate driver:

   - Restore a pm_restrict_gfp_mask() call in hibernation_snapshot()
     that was removed incorrectly during the 6.16 development cycle
     (Rafael Wysocki)

   - Introduce a function for registering a perf domain without
     triggering a system-wide CPU capacity update and make the
     intel_pstate driver use it to avoid reocurring unsuccessful
     attempts to update capacities of all CPUs in the system (Rafael
     Wysocki)

   - Fix setting of CPPC.min_perf in the active mode with performance
     governor in the amd-pstate driver to restore its expected behavior
     changed recently (Gautham Shenoy)

   - Avoid mistakenly setting EPP to 0 in the amd-pstate driver after
     system resume as a result of recent code changes (Mario
     Limonciello)"

* tag 'pm-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: hibernate: Restrict GFP mask in hibernation_snapshot()
  PM: EM: Add function for registering a PD without capacity update
  cpufreq/amd-pstate: Fix a regression leading to EPP 0 after resume
  cpufreq/amd-pstate: Fix setting of CPPC.min_perf in active mode for performance governor

Merge tag 'for-6.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- fix delayed inode tracking in xarray, eviction can race with
   insertion and leave behind a disconnected inode

- on systems with large page (64K) and small block size (4K) fix
   compression read that can return partially filled folio

- slightly relax compression option format for backward compatibility,
   allow to specify level for LZO although there's only one

- fix simple quota accounting of compressed extents

- validate minimum device size in 'device add'

- update maintainers' entry

* tag 'for-6.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: don't allow adding block device of less than 1 MB
  MAINTAINERS: update btrfs entry
  btrfs: fix subvolume deletion lockup caused by inodes xarray race
  btrfs: fix corruption reading compressed range when block size is smaller than page size
  btrfs: accept and ignore compression level for lzo
  btrfs: fix squota compressed stats leak

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:
"A number of fixes accumulated due to summer vacations

   - Fix out-of-bounds dynptr write in bpf_crypto_crypt() kfunc which
     was misidentified as a security issue (Daniel Borkmann)

   - Update the list of BPF selftests maintainers (Eduard Zingerman)

   - Fix selftests warnings with icecc compiler (Ilya Leoshkevich)

   - Disable XDP/cpumap direct return optimization (Jesper Dangaard
     Brouer)

   - Fix unexpected get_helper_proto() result in unusual configuration
     BPF_SYSCALL=y and BPF_EVENTS=n (Jiri Olsa)

   - Allow fallback to interpreter when JIT support is limited (KaFai
     Wan)

   - Fix rqspinlock and choose trylock fallback for NMI waiters. Pick
     the simplest fix. More involved fix is targeted bpf-next (Kumar
     Kartikeya Dwivedi)

   - Fix cleanup when tcp_bpf_send_verdict() fails to allocate
     psock->cork (Kuniyuki Iwashima)

   - Disallow bpf_timer in PREEMPT_RT for now. Proper solution is being
     discussed for bpf-next. (Leon Hwang)

   - Fix XSK cq descriptor production (Maciej Fijalkowski)

   - Tell memcg to use allow_spinning=false path in bpf_timer_init() to
     avoid lockup in cgroup_file_notify() (Peilin Ye)

   - Fix bpf_strnstr() to handle suffix match cases (Rong Tao)"

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Skip timer cases when bpf_timer is not supported
  bpf: Reject bpf_timer for PREEMPT_RT
  tcp_bpf: Call sk_msg_free() when tcp_bpf_send_verdict() fails to allocate psock->cork.
  bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()
  bpf: Allow fall back to interpreter for programs with stack size <= 512
  rqspinlock: Choose trylock fallback for NMI waiters
  xsk: Fix immature cq descriptor production
  bpf: Update the list of BPF selftests maintainers
  selftests/bpf: Add tests for bpf_strnstr
  selftests/bpf: Fix "expression result unused" warnings with icecc
  bpf: Fix bpf_strnstr() to handle suffix match cases better
  selftests/bpf: Extend crypto_sanity selftest with invalid dst buffer
  bpf: Fix out-of-bounds dynptr write in bpf_crypto_crypt
  bpf: Check the helper function is valid in get_helper_proto
  bpf, cpumap: Disable page_pool direct xdp_return need larger scope

Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"

This reverts commit 5537a4679403 ("net: usb: asix: ax88772: drop
phylink use in PM to avoid MDIO runtime PM wakeups"), it breaks
operation of asix ethernet usb dongle after system suspend-resume
cycle.

Link: https://lore.kernel.org/all/b5ea8296-f981-445d-a09a-2f389d7f6fdd@samsung.com/
Fixes: 5537a4679403 ("net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/2945b9dbadb8ee1fee058b19554a5cb14f1763c1.1757601118.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netfilter: nf_reject: don't reply to icmp error messages

tcp reject code won't reply to a tcp reset.

But the icmp reject 'netdev' family versions will reply to icmp
dst-unreach errors, unlike icmp_send() and icmp6_send() which are used
by the inet family implementation (and internally by the REJECT target).

Check for the icmp(6) type and do not respond if its an unreachable error.

Without this, something like 'ip protocol icmp reject', when used
in a netdev chain attached to 'lo', cause a packet loop.

Same for two hosts that both use such a rule: each error packet
will be replied to.

Such situation persist until the (bogus) rule is amended to ratelimit or
checks the icmp type before the reject statement.

As the inet versions don't do this make the netdev ones follow along.

Signed-off-by: Florian Westphal <fw@strlen.de>

ipvs: Use READ_ONCE/WRITE_ONCE for ipvs->enable

KCSAN reported a data-race on the `ipvs->enable` flag, which is
written in the control path and read concurrently from many other
contexts.

Following a suggestion by Julian, this patch fixes the race by
converting all accesses to use `WRITE_ONCE()/READ_ONCE()`.
This lightweight approach ensures atomic access and acts as a
compiler barrier, preventing unsafe optimizations where the flag
is checked in loops (e.g., in ip_vs_est.c).

Additionally, the `enable` checks in the fast-path hooks
(`ip_vs_in_hook`, `ip_vs_out_hook`, `ip_vs_forward_icmp`) are
removed. These are unnecessary since commit 857ca89711de
("ipvs: register hooks only with services"). The `enable=0`
condition they check for can only occur in two rare and non-fatal
scenarios: 1) after hooks are registered but before the flag is set,
and 2) after hooks are unregistered on cleanup_net. In the worst
case, a single packet might be mishandled (e.g., dropped), which
does not lead to a system crash or data corruption. Adding a check
in the performance-critical fast-path to handle this harmless
condition is not a worthwhile trade-off.

Fixes: 857ca89711de ("ipvs: register hooks only with services")
Reported-by: syzbot+1651b5234028c294c339@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1651b5234028c294c339
Suggested-by: Julian Anastasov <ja@ssi.bg>
Link: https://lore.kernel.org/lvs-devel/2189fc62-e51e-78c9-d1de-d35b8e3657e3@ssi.bg/
Signed-off-by: Zhang Tengfei <zhtfdev@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support

Expose the input bridge interface ethernet address so it can be used to
redirect the packet to the receiving physical device for processing.

Tested with nft command line tool.

table bridge nat {
chain PREROUTING {
type filter hook prerouting priority 0; policy accept;
ether daddr de:ad:00:00:be:ef meta pkttype set host ether daddr set meta ibrhwdr accept
}
}

Joint work with Pablo Neira.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: ipset: Remove unused htable_bits in macro ahash_region

Since the ahash_region() macro was redefined to calculate the region
index solely from HTABLE_REGION_BITS, the htable_bits parameter became
unused.

Remove the unused htable_bits argument and its call sites, simplifying
the code without changing semantics.

Fixes: 8478a729c046 ("netfilter: ipset: fix region locking in hash types")
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>

selftest:net: fixed spelling mistakes

Fixed spelling errors in test_redirect6() error message and
test_port_shadowing() comments

Signed-off-by: Andres Urian Florez <andres.emb.sys@gmail.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

Merge branches 'pm-sleep' and 'pm-em'

Merge a hibernation regression fix and an fix related to energy model
management for 6.17-rc6

* pm-sleep:
PM: hibernate: Restrict GFP mask in hibernation_snapshot()

* pm-em:
PM: EM: Add function for registering a PD without capacity update

Merge tag 'wireless-2025-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Some more fixes:
- iwlwifi: fix 130/1030 devices
- ath12k: fix alignment, power save
- virt_wifi: fix crash
- cfg80211: disable per-link stats due
             to buffer size issues

* tag 'wireless-2025-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: nl80211: completely disable per-link stats for now
  wifi: virt_wifi: Fix page fault on connect
  wifi: cfg80211: Fix "no buffer space available" error in nl80211_get_station() for MLO
  wifi: iwlwifi: fix 130/1030 configs
  wifi: ath12k: fix WMI TLV header misalignment
  wifi: ath12k: Fix missing station power save configuration
====================

Link: https://patch.msgid.link/20250911100345.20025-3-johannes@sipsolutions.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'ipv4-icmp-fix-source-ip-derivation-in-presence-of-vrfs'

Ido Schimmel says:

====================
ipv4: icmp: Fix source IP derivation in presence of VRFs

Align IPv4 with IPv6 and in the presence of VRFs generate ICMP error
messages with a source IP that is derived from the receiving interface
and not from its VRF master. This is especially important when the error
messages are "Time Exceeded" messages as it means that utilities like
traceroute will show an incorrect packet path.

Patches #1-#2 are preparations.

Patch #3 is the actual change.

Patches #4-#7 make small improvements in the existing traceroute test.

Patch #8 extends the traceroute test with VRF test cases for both IPv4
and IPv6.

Changes since v1 [1]:
* Rebase.

[1] https://lore.kernel.org/netdev/20250901083027.183468-1-idosch@nvidia.com/
====================

Link: https://patch.msgid.link/20250908073238.119240-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: traceroute: Add VRF tests

Create versions of the existing test cases where the routers generating
the ICMP error messages are using VRFs. Check that the source IPs of
these messages do not change in the presence of VRFs.

IPv6 always behaved correctly, but IPv4 fails when reverting "ipv4:
icmp: Fix source IP derivation in presence of VRFs".

Without IPv4 change:

# ./traceroute.sh
TEST: IPv6 traceroute                                               [ OK ]
TEST: IPv6 traceroute with VRF                                      [ OK ]
TEST: IPv4 traceroute                                               [ OK ]
TEST: IPv4 traceroute with VRF                                      [FAIL]
         traceroute did not return 1.0.3.1
$ echo $?
1

The test fails because the ICMP error message is sent with the VRF
device's IP (1.0.4.1):

# traceroute -n -s 1.0.1.3 1.0.2.4
traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
  1  1.0.4.1  0.165 ms  0.110 ms  0.103 ms
  2  1.0.2.4  0.098 ms  0.085 ms  0.078 ms
# traceroute -n -s 1.0.3.3 1.0.2.4
traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
  1  1.0.4.1  0.201 ms  0.138 ms  0.129 ms
  2  1.0.2.4  0.123 ms  0.105 ms  0.098 ms

With IPv4 change:

# ./traceroute.sh
TEST: IPv6 traceroute                                               [ OK ]
TEST: IPv6 traceroute with VRF                                      [ OK ]
TEST: IPv4 traceroute                                               [ OK ]
TEST: IPv4 traceroute with VRF                                      [ OK ]
$ echo $?
0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-9-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: traceroute: Test traceroute with different source IPs

When generating ICMP error messages, the kernel will prefer a source IP
that is on the same subnet as the destination IP (see
inet_select_addr()). Test this behavior by invoking traceroute with
different source IPs and checking that the ICMP error message is
generated with a source IP in the same subnet.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-8-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: traceroute: Reword comment

Both of the addresses are configured as primary addresses, but the
kernel is expected to choose 10.0.1.1/24 as the source IP of the ICMP
error message since it is on the same subnet as the destination IP of
the message (10.0.1.3/24). Reword the comment to reflect that.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-7-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: traceroute: Use require_command()

Use require_command() so that the test will return SKIP (4) when a
required command is not present.

Before:

# ./traceroute.sh
SKIP: Could not run IPV6 test without traceroute6
SKIP: Could not run IPV4 test without traceroute
$ echo $?
0

After:

# ./traceroute.sh
TEST: traceroute6 not installed [SKIP]
$ echo $?
4

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-6-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>