Vincent Mailhol [Tue, 23 Sep 2025 06:58:44 +0000 (15:58 +0900)]
can: dev: add can_get_ctrlmode_str()
In an effort to give more human readable messages when errors occur
because of conflicting options, it can be useful to convert the CAN
control mode flags into text.
Add a function which converts the first set CAN control mode into a
human readable string. The reason to only convert the first one is to
simplify edge cases: imagine that there are several invalid control
modes, we would just return the first invalid one to the user, thus
not having to handle complex string concatenation. The user can then
solve the first problem, call the netlink interface again and see the
next issue.
People who wish to enumerate all the control modes can still do so by,
for example, using this new function in a for_each_set_bit() loop.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:43 +0000 (15:58 +0900)]
can: calc_bittiming: make can_calc_tdco() FD agnostic
can_calc_tdco() uses the CAN_CTRLMODE_FD_TDC_MASK and
CAN_CTRLMODE_TDC_AUTO macros making it specific to CAN FD. Add the tdc
mask to the function parameter list. The value of the tdc auto flag
can then be derived from that mask and stored in a local variable.
This way, the function becomes CAN FD agnostic and can be reused later
on for the CAN XL TDC.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:42 +0000 (15:58 +0900)]
can: netlink: make can_tdc_fill_info() FD agnostic
can_tdc_fill_info() depends on some variables which are specific to CAN
FD. Move these to the function parameters list so that, later on, this
function can be reused for the CAN XL TDC.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:41 +0000 (15:58 +0900)]
can: netlink: add can_bitrate_const_fill_info()
Add can_bitrate_const_fill_info() to factorise the logic when filling
the bitrate constant information for Classical CAN and CAN FD. This
function will be reused later on for CAN XL.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:40 +0000 (15:58 +0900)]
can: netlink: add can_bittiming_const_fill_info()
Add function can_bittiming_const_fill_info() to factorise the logic
when filling the bittiming constant information for Classical CAN and
CAN FD. This function will be reused later on for CAN XL.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:39 +0000 (15:58 +0900)]
can: netlink: add can_bittiming_fill_info()
Add can_bittiming_fill_info() to factorise the logic when filling the
bittiming information for Classical CAN and CAN FD. This function will
be reused later on for CAN XL.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:37 +0000 (15:58 +0900)]
can: netlink: make can_tdc_get_size() FD agnostic
can_tdc_get_size() needs to access can_priv->fd making it specific to
CAN FD. Change the function parameter from struct can_priv to struct
data_bittiming_params.
can_tdc_get_size() also uses the CAN_CTRLMODE_TDC_MANUAL macro making
it specific to CAN FD. Add the tdc mask to the function parameter
list. The value of the tdc manual flag can then be derived from that
mask and stored in a local variable.
This way, the function becomes CAN FD agnostic and can be reused later
on for the CAN XL TDC.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:36 +0000 (15:58 +0900)]
can: netlink: add can_ctrlmode_changelink()
Split the control mode change link logic into a new function:
can_ctrlmode_changelink(). The purpose is to increase code readability
by preventing can_changelink() from becoming too big.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:35 +0000 (15:58 +0900)]
can: netlink: add can_dtb_changelink()
Factorise the databittiming parsing out of can_changelink() and move
it in the new can_dtb_changelink() function. This is a preparation
patch for the introduction of CAN XL because the databittiming
changelink logic will be reused later on.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:34 +0000 (15:58 +0900)]
can: netlink: make can_tdc_changelink() FD agnostic
can_tdc_changelink() needs to access can_priv->fd making it
specific to CAN FD. Change the function parameter from struct can_priv
to struct data_bittiming_params. This way, the function becomes CAN FD
agnostic and can be reused later on for the CAN XL TDC.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:33 +0000 (15:58 +0900)]
can: netlink: remove useless check in can_tdc_changelink()
can_tdc_changelink() return -EOPNOTSUPP under this condition:
!tdc_const || !can_fd_tdc_is_enabled(priv)
But this function is only called if the data[IFLA_CAN_TDC] parameters
are provided. At this point, can_validate_tdc() already checked that
either of the tdc auto or tdc manual control modes were provided, that
is to say, can_fd_tdc_is_enabled(priv) must be true.
Because the right hand operand of this condition is always true,
remove it.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:32 +0000 (15:58 +0900)]
can: netlink: refactor CAN_CTRLMODE_TDC_{AUTO,MANUAL} flag reset logic
CAN_CTRLMODE_TDC_AUTO and CAN_CTRLMODE_TDC_MANUAL are mutually
exclusive. This means that whenever the user switches from auto to
manual mode (or vice versa), the other flag which was set previously
needs to be cleared.
Currently, this is handled with a masking operation. It can be done in
a simpler manner by clearing any of the previous TDC flags before
copying netlink attributes. The code becomes easier to understand and
will make it easier to add the new upcoming CAN XL flags which will
have a similar reset logic as the current TDC flags.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:31 +0000 (15:58 +0900)]
can: netlink: add can_validate_databittiming()
Factorise the databittiming validation out of can_validate() and move
it in the new add can_validate_databittiming() function. Also move
can_validate()'s comment because it is specific to CAN FD. This is a
preparation patch for the introduction of CAN XL as this databittiming
validation will be reused later on.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:30 +0000 (15:58 +0900)]
can: netlink: add can_validate_tdc()
Factorise the TDC validation out of can_validate() and move it in the
new can_validate_tdc() function. This is a preparation patch for the
introduction of CAN XL because this TDC validation will be reused
later on.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:29 +0000 (15:58 +0900)]
can: netlink: refactor can_validate_bittiming()
Whenever can_validate_bittiming() is called, it is always preceded by
some boilerplate code which was copy pasted all over the place. Move
that repeated code directly inside can_validate_bittiming().
Finally, the mempcy() is not needed: the nla attributes are four bytes
aligned which is just enough for struct can_bittiming. Add a
static_assert() to document that the alignment is correct and just use
the pointer returned by nla_data() as-is.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:28 +0000 (15:58 +0900)]
can: netlink: document which symbols are FD specific
The CAN XL netlink interface will also have data bitrate and TDC
parameters. The current FD parameters do not have a prefix in their
names to differentiate them.
Because the netlink interface is part of the UAPI, it is unfortunately
not feasible to rename the existing symbols to add an FD_ prefix. The
best alternative is to add a comment for each of the symbols to notify
the reader of which parts are CAN FD specific.
While at it, fix a typo: transiver -> transceiver.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:27 +0000 (15:58 +0900)]
can: dev: make can_get_relative_tdco() FD agnostic and move it to bittiming.h
can_get_relative_tdco() needs to access can_priv->fd making it
specific to CAN FD. Change the function parameter from struct can_priv
to struct data_bittiming_params. This way, the function becomes CAN FD
agnostic and can be reused later on for the CAN XL TDC.
Now that we dropped the dependency on struct can_priv, also move
can_get_relative_tdco() back to bittiming.h where it was meant to
belong to.
Vincent Mailhol [Tue, 23 Sep 2025 06:58:26 +0000 (15:58 +0900)]
can: dev: move struct data_bittiming_params to linux/can/bittiming.h
In commit b803c4a4f788 ("can: dev: add struct data_bittiming_params to
group FD parameters"), struct data_bittiming_params was put into
linux/can/dev.h.
This structure being a collection of bittiming parameters, on second
thought, bittiming.h is actually a better location. This way, users of
struct data_bittiming_params will not have to forcefully include
linux/can/dev.h thus removing some complexity and reducing the risk of
circular dependencies in headers.
Move struct data_bittiming_params from linux/can/dev.h to
linux/can/bittiming.h.
Merge patch series "can: rework the CAN MTU logic (CAN XL preparation step 2/3)"
Vincent Mailhol <mailhol@kernel.org> says:
The CAN MTU logic is currently broken. can_change_mtu() will update
both the MTU and the CAN_CTRLMODE_FD flag.
Back then, commit bc05a8944a34 ("can: allow to change the device mtu
for CAN FD capable devices") stated that:
The configuration can be done either with the 'fd { on | off }'
option in the 'ip' tool from iproute2 or by setting the CAN
netdevice MTU to CAN_MTU (16) or to CANFD_MTU (72).
on a CAN FD interface, we are left with a device on which CAN FD is
enabled but which does not have the FD databittiming parameters
configured.
The same goes on when setting the mtu back to 16:
ip link set can0 type can bitrate 500000 fd on dbitrate 5000000
ip link set can0 mtu 16
The device is now in Classical CAN mode but iproute2 is still
reporting the databittiming values (although this time, the issue
seems less critical as it is only a reporting problem).
The only way to resolve the problem and bring the device back to a
coherent state is to call again the netlink interface using the
"fd on" or "fd off" options.
The idea of being able to infer the CAN_CTRLMODE_FD flag from the MTU
value is just incorrect for physical devices. Note that this logic
remains valid on virtual interfaces (vcan and vxcan) because those do
not have control mode flags and thus no conflict occurs.
This series reworks the CAN MTU logic. The goal is to always maintain
a coherent state between the MTU and the control mode flags as listed
in below table:
fd off, xl off fd on, xl off fd any, xl on
---------------------------------------------------------------------------
default mtu CAN_MTU CANFD_MTU CANXL_MTU
min mtu CAN_MTU CANFD_MTU CANXL_MIN_MTU
max mtu CAN_MTU CANFD_MTU CANXL_MAX_MTU
In order to switch between one column to another, the user must use
the fd/xl on/off flags. Directly modifying the MTU from one column to
the other is not permitted any more.
The CAN XL is not yet supported at the moment, so the last column is
just given as a reference to better understand what is coming up. This
series will just implement the first two columns.
While doing the rewrite, the logic is adjusted to reuse as much as
possible the net core infrastructure. By populating:
net_device->min_mtu
and
net_device->max_mtu
the net core infrastructure will automatically:
1. validate that the user's inputs are in range.
2. report those min and max MTU values through the netlink
interface.
Point 1. will allow us to get rid of the can_change_mtu() in a near
future for all the physical devices and point 2. allows the end user
to see the valid MTU range by doing a:
$ ip --details link show can0
Finally, because using the net core, it will be possible after the
removal of can_change_mtu() to modify the MTU while the device is up.
As stated previously, the only modifications allowed will be within
the MTU range of a given CAN protocol. So for Classical CAN and CAN
FD, the MTU is fixed to, respectively, CAN_MTU and CANFD_MTU. For the
upcoming CAN XL, the user will be able to change the MTU to anything
between CANXL_MIN_MTU and CANXL_MAX_MTU even if the device is up.
The first patch of this series annotates the read access on
net_device->mtu. This preparation is needed to prevent any race
condition to occur when modifying the MTU while the device is up.
The second patch is another preparation change which moves
can_set_static_ctrlmode() from dev.h to dev.c.
The third patch populates the MTU minimum and maximum value.
The fourth patch is just a clean-up to remove the old
can_change_mtu().
The fourth and last patch comes as a bonus content and modifies the
default MTU of the vcan and vxcan so that CAN XL is on by default.
Note that after this series, the old can_change_mtu() becomes
useless. That function can not yet be removed because some pending
changes from other maintainers' trees still depend on it. It will be
removed in the next development window once all those changes reach
net-next.
Vincent Mailhol [Tue, 23 Sep 2025 06:37:11 +0000 (15:37 +0900)]
can: enable CAN XL for virtual CAN devices by default
In commit 97edec3a11cf ("can: enable CAN FD for virtual CAN devices by
default"), vcan and vxcan default MTU was set to CANFD_MTU by default.
The reason was that users were confused on how to activate CAN FD on
virtual interfaces.
Following the introduction of CAN XL, the same logic should be
applied. Set the MTU to CANXL_MTU by default.
The users who really wish to use a Classical CAN only or a CAN FD
virtual device can do respectively:
Vincent Mailhol [Tue, 23 Sep 2025 06:37:10 +0000 (15:37 +0900)]
can: populate the minimum and maximum MTU values
By populating:
net_device->min_mtu
and
net_device->max_mtu
the net core infrastructure will automatically:
1. validate that the user's inputs are in range.
2. report those min and max MTU values through the netlink
interface.
Add can_set_default_mtu() which sets the default mtu value as well as
the minimum and maximum values. The logic for the default mtu value
remains unchanged:
- CANFD_MTU if the device has a static CAN_CTRLMODE_FD.
- CAN_MTU otherwise.
Call can_set_default_mtu() each time the CAN_CTRLMODE_FD is modified.
This will guarantee that the MTU value is always consistent with the
control mode flags.
With this, the checks done in can_change_mtu() become fully redundant
and will be removed in an upcoming change and it is now possible to
confirm the minimum and maximum MTU values on a physical CAN interface
by doing:
$ ip --details link show can0
The virtual interfaces (vcan and vxcan) are not impacted by this
change.
Vincent Mailhol [Tue, 23 Sep 2025 06:37:09 +0000 (15:37 +0900)]
can: dev: turn can_set_static_ctrlmode() into a non-inline function
can_set_static_ctrlmode() is declared as a static inline. But it is
only called in the probe function of the devices and so does not
really benefit from any kind of optimization.
Transform it into a "normal" function by moving it to
Vincent Mailhol [Tue, 23 Sep 2025 06:37:08 +0000 (15:37 +0900)]
can: annotate mtu accesses with READ_ONCE()
As hinted in commit 501a90c94510 ("inet: protect against too small mtu
values."), net_device->mtu is vulnerable to race conditions if it is
written and read without holding the RTNL.
At the moment, all the writes are done while the interface is down,
either in the devices' probe() function or in can_changelink(). So
there are no such issues yet. But upcoming changes will allow to
modify the MTU while the CAN XL devices are up.
In preparation to the introduction of CAN XL, annotate all the
net_device->mtu accesses which are not yet guarded by the RTNL with a
READ_ONCE().
Note that all the write accesses are already either guarded by the
RTNL or are already annotated and thus need no changes.
Merge patch series "can: esd_usb: Fixes and improvements"
Stefan Mätje <stefan.maetje@esd.eu> says:
The first patch makes some error messages also print the error
code to achieve a higher significance. Removes also a duplicate
message and makes the register / unregister messages symmetric.
The second patch avoids emitting any error messages during the
disconnect of CAN-USB devices or the driver unload.
Changes in v2:
- Second patch:
- Convert all occurrences of error status prints to use
"ERR_PTR(err)" instead of printing the decimal value
of "err".
- Rename retval to err in esd_usb_read_bulk_callback() to
make the naming of error status variables consistent
with all other functions.
Stefan Mätje [Thu, 21 Aug 2025 14:34:22 +0000 (16:34 +0200)]
can: esd_usb: Avoid errors triggered from USB disconnect
The USB stack calls during disconnect the esd_usb_disconnect() callback.
esd_usb_disconnect() calls netdev_unregister() for each network which
in turn calls the net_device_ops::ndo_stop callback esd_usb_close() if
the net device is up.
The esd_usb_close() callback tries to disable all CAN Ids and to reset
the CAN controller of the device sending appropriate control messages.
Sending these messages in .disconnect() is moot and always fails because
either the device is gone or the USB communication is already torn down
by the USB stack in the course of a rmmod operation.
Move the code that sends these control messages to a new function
esd_usb_stop() which is approximately the counterpart of
esd_usb_start() to make code structure less convoluted.
Then change esd_usb_close() not to send the control messages at all if
the ndo_stop() callback is executed from the USB .disconnect()
callback. Add a new flag in_usb_disconnect to the struct esd_usb
device structure to mark this condition which is checked by
esd_usb_close() whether to skip the send operations in esd_usb_start().
Stefan Mätje [Thu, 21 Aug 2025 14:34:21 +0000 (16:34 +0200)]
can: esd_usb: Rework display of error messages
- esd_usb_open(): Get rid of duplicate "couldn't start device: %d\n"
message already printed from esd_usb_start().
- Fix duplicate printout of network device name when network device
is registered. Add an unregister message for the network device
as counterpart to the register message.
- Add the printout of error codes together with the error messages
in esd_usb_close() and some in esd_usb_probe(). The additional error
codes should lead to a better understanding what is really going
wrong.
- Convert all occurrences of error status prints to use "ERR_PTR(err)"
instead of printing the decimal value of "err".
- Rename retval to err in esd_usb_read_bulk_callback() to make the
naming of error status variables consistent with all other functions.
This patch series contains miscellaneous cleanups and improvements for
the R-Car CAN driver. I deliberately sent this as a separate series
from "[PATCH] can: rcar_can: Fix s2ram with PSCI"[1], to avoid
blocking the latter. However, this series (in particular [PATCH 3/9])
does depend on it.
Changes compared to v1[2]:
- Convert new Runtime PM error messages to %pe,
- New patches 10 and 11.
can: rcar_can: Do not print alloc_candev() failures
If alloc_candev() failed due to out-of-memory, the core memory
allocation code has already printed an error message.
If alloc_candev() failed for a different reason, alloc_netdev_mqs() has
already printed an error message.
Merge patch series "can: rcar_canfd: R-Car CANFD Improvements"
Biju <biju.das.au@gmail.com> says:
From: Biju Das <biju.das.jz@bp.renesas.com>
The calculation formula for nominal bit rate of classical CAN is same as
that of nominal bit rate of CANFD on the RZ/G3E SoC and R-Car Gen4
compared to other SoCs. Update the nominal bit rate constants.
Apart from this, for replacing function-like macros, introduced
rcar_canfd_compute_{nominal,data}_bit_rate_cfg().
v2->v3:
* Replaced "shared_bittiming"->"shared_can_regs" as it is same for RZ/G3E
and R-Car Gen4.
* Updated commit header and description for patch#1.
* Added Rb tag from Geert for patch #2,#3 and #4.
* Dropped _MASK suffix from RCANFD_CFG_* macros.
* Dropped _MASK suffix from RCANFD_NCFG_NBRP_MASK macro.
* Dropped _MASK suffix from the macro RCANFD_DCFG_DBRP_MASK.
* Followed the order as used in struct can_bittiming{_const} for easy
maintenance.
v1->v2:
* Dropped patch#2 as it is accepted.
* Moved patch#4 to patch#2.
* Updated commit header and description for patch#2.
* Kept RCANFD_CFG* macro definitions to give a meaning to the magic
number using GENMASK macro and used FIELD_PREP to extract value.
* Split patch#3 for computing nominal and data bit rate config separate.
* Updated rcar_canfd_compute_nominal_bit_rate_cfg() to handle
nominal bit rate configuration for both classical CAN and CANFD.
* Replaced RCANFD_NCFG_NBRP->RCANFD_NCFG_NBRP_MASK and used FIELD_PREP to
extract value.
* Replaced RCANFD_DCFG_DBRP->RCANFD_DCFG_DBRP_MASK and used FIELD_PREP to
extract value.
Biju Das [Mon, 8 Sep 2025 12:09:30 +0000 (13:09 +0100)]
can: rcar_canfd: Update bit rate constants for RZ/G3E and R-Car Gen4
The calculation formula for nominal bit rate of classical CAN is the same as
that of nominal bit rate of CANFD on the RZ/G3E and R-Car Gen4 SoCs
compared to other SoCs. Update nominal bit rate constants.
can: peak: Modification of references to email accounts being deleted
With the upcoming deletion of @peak-system.com accounts and following
the acquisition of PEAK-System and its brand by HMS-Networks, this fix
aims to migrate all address references to @hms-networks.com, as well
as to map my personal committer addresses to author addresses, while
taking the opportunity to correct the accent on the first ‘e’ of my
first name.
Vincent Mailhol [Tue, 26 Aug 2025 10:48:39 +0000 (19:48 +0900)]
MAINTAINERS: update Vincent Mailhol's email address
Now that I have received my kernel.org account, I am changing my email
address from mailhol.vincent@wanadoo.fr to mailhol@kernel.org. The
wanadoo.fr address was my first email which I created when I was a kid
and has a special meaning to me, but it is restricted to a maximum of
50 messages per hour which starts to be problematic on threads where
many people are CC-ed.
Update all the MAINTAINERS entries accordingly and map the old address
to the new one.
I remain reachable from my old address. The different copyright
notices mentioning my old address are kept as-is for the moment. I
will update those one at a time only if I need to touch those files.
Jonas Rebmann [Thu, 11 Sep 2025 08:29:03 +0000 (10:29 +0200)]
net: phy: micrel: Update Kconfig help text
This driver by now supports 17 different Microchip (formerly known as
Micrel) chips: KSZ9021, KSZ9031, KSZ9131, KSZ8001, KS8737, KSZ8021,
KSZ8031, KSZ8041, KSZ8051, KSZ8061, KSZ8081, KSZ8873MLL, KSZ886X,
KSZ9477, LAN8814, LAN8804 and LAN8841.
Support for the VSC8201 was removed in commit 51f932c4870f ("micrel phy
driver - updated(1)")
Update the help text to reflect that, list families instead of models to
ease future maintenance.
Jakub Kicinski [Sat, 13 Sep 2025 00:06:25 +0000 (17:06 -0700)]
Merge tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter: updates for net-next
1) Don't respond to ICMP_UNREACH errors with another ICMP_UNREACH
error.
2) Support fetching the current bridge ethernet address.
This allows a more flexible approach to packet redirection
on bridges without need to use hardcoded addresses. From
Fernando Fernandez Mancera.
3) Zap a few no-longer needed conditionals from ipvs packet path
and convert to READ/WRITE_ONCE to avoid KCSAN warnings.
From Zhang Tengfei.
4) Remove a no-longer-used macro argument in ipset, from Zhen Ni.
* tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_reject: don't reply to icmp error messages
ipvs: Use READ_ONCE/WRITE_ONCE for ipvs->enable
netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support
netfilter: ipset: Remove unused htable_bits in macro ahash_region
selftest:net: fixed spelling mistakes
====================
Russell King [Wed, 10 Sep 2025 12:50:46 +0000 (13:50 +0100)]
net: mvneta: add support for hardware timestamps
Add support for hardware timestamps in (e.g.) the PHY by calling
skb_tx_timestamp() as close as reasonably possible to the point that
the hardware is instructed to send the queued packets.
udp_tunnel: use netdev_warn() instead of netdev_WARN()
netdev_WARN() uses WARN/WARN_ON to print a backtrace along with
file and line information. In this case, udp_tunnel_nic_register()
returning an error is just a failed operation, not a kernel bug.
udp_tunnel_nic_register() can fail due to a memory allocation
failure (kzalloc() or udp_tunnel_nic_alloc()).
This is a normal runtime error and not a kernel bug.
Replace netdev_WARN() with netdev_warn() accordingly.
====================
tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()
On one side a minor/cosmetic issue, especially nowadays when
TCP-AO/TCP-MD5 signature verification failures aren't logged to dmesg.
Yet, I think worth addressing for two reasons:
- unsigned RST gets ignored by the peer and the connection is alive for
longer (keep-alive interval)
- netstat counters increase and trace events report that trusted BGP peer
is sending unsigned/incorrectly signed segments, which can ring alarm
on monitoring.
====================
Now that the destruction of info/keys is delayed until the socket
destructor, it's safe to use kfree() without an RCU callback.
The socket is in TCP_CLOSE state either because it never left it,
or it's already closed and the refcounter is zero. In any way,
no one can discover it anymore, it's safe to release memory
straight away.
tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()
Currently there are a couple of minor issues with destroying the keys
tcp_v4_destroy_sock():
1. The socket is yet in TCP bind buckets, making it reachable for
incoming segments [on another CPU core], potentially available to send
late FIN/ACK/RST replies.
2. There is at least one code path, where tcp_done() is called before
sending RST [kudos to Bob for investigation]. This is a case of
a server, that finished sending its data and just called close().
The socket is in TCP_FIN_WAIT2 and has RCV_SHUTDOWN (set by
__tcp_close())
Note the signed RSTs later in the dump - those are sent by the server
when the fin-wait socket gets removed from hash buckets, by
the listener socket.
Instead of destroying AO/MD5 info and their keys in inet_csk_destroy_sock(),
slightly delay it until the actual socket .sk_destruct(). As shutdown'ed
socket can yet send non-data replies, they should be signed in order for
the peer to process them. Now it also matches how AO/MD5 gets destructed
for TIME-WAIT sockets (in tcp_twsk_destructor()).
This seems optimal for TCP-MD5, while for TCP-AO it seems to have an
open problem: once RST get sent and socket gets actually destructed,
there is no information on the initial sequence numbers. So, in case
this last RST gets lost in the network, the server's listener socket
won't be able to properly sign another RST. Nothing in RFC 1122
prescribes keeping any local state after non-graceful reset.
Luckily, BGP are known to use keep alive(s).
While the issue is quite minor/cosmetic, these days monitoring network
counters is a common practice and getting invalid signed segments from
a trusted BGP peer can get customers worried.
====================
bridge: Allow keeping local FDB entries only on VLAN 0
The bridge FDB contains one local entry per port per VLAN, for the MAC of
the port in question, and likewise for the bridge itself. This allows
bridge to locally receive and punt "up" any packets whose destination MAC
address matches that of one of the bridge interfaces or of the bridge
itself.
The number of these local "service" FDB entries grows linearly with number
of bridge-global VLAN memberships, but that in turn will tend to grow
quadratically with number of ports and per-port VLAN memberships. While
that does not cause issues during forwarding lookups, it does make dumps
impractically slow.
As an example, with 100 interfaces, each on 4K VLANs, a full dump of FDB
that just contains these 400K local entries, takes 6.5s. That's _without_
considering iproute2 formatting overhead, this is just how long it takes to
walk the FDB (repeatedly), serialize it into netlink messages, and parse
the messages back in userspace.
This is to illustrate that with growing number of ports and VLANs, the time
required to dump this repetitive information blows up. Arguably 4K VLANs
per interface is not a very realistic configuration, but then modern
switches can instead have several hundred interfaces, and we have fielded
requests for >1K VLAN memberships per port among customers.
FDB entries are currently all kept on a single linked list, and then
dumping uses this linked list to walk all entries and dump them in order.
When the message buffer is full, the iteration is cut short, and later
restarted. Of course, to restart the iteration, it's first necessary to
walk the already-dumped front part of the list before starting dumping
again. So one possibility is to organize the FDB entries in different
structure more amenable to walk restarts.
One option is to walk directly the hash table. The advantage is that no
auxiliary structure needs to be introduced. With a rough sketch of this
approach, the above scenario gets dumped in not quite 3 s, saving over 50 %
of time. However hash table iteration requires maintaining an active cursor
that must be collected when the dump is aborted. It looks like that would
require changes in the NDO protocol to allow to run this cleanup. Moreover,
on hash table resize the iteration is simply restarted. FDB dumps are
currently not guaranteed to correspond to any one particular state: entries
can be missed, or be duplicated. But with hash table iteration we would get
that plus the much less graceful resize behavior, where swaths of FDB are
duplicated.
Another option is to maintain the FDB entries in a red-black tree. We have
a PoC of this approach on hand, and the above scenario is dumped in about
2.5 s. Still not as snappy as we'd like it, but better than the hash table.
However the savings come at the expense of a more expensive insertion, and
require locking during dumps, which blocks insertion.
The upside of these approaches is that they provide benefits whatever the
FDB contents. But it does not seem like either of these is workable.
However we intend to clean up the RB tree PoC and present it for
consideration later on in case the trade-offs are considered acceptable.
Yet another option might be to use in-kernel FDB filtering, and to filter
the local entries when dumping. Unfortunately, this does not help all that
much either, because the linked-list walk still needs to happen. Also, with
the obvious filtering interface built around ndm_flags / ndm_state
filtering, one can't just exclude pure local entries in one query. One
needs to dump all non-local entries first, and then to get permanent
entries in another run filter local & added_by_user. I.e. one needs to pay
the iteration overhead twice, and then integrate the result in userspace.
To get significant savings, one would need a very specific knob like "dump,
but skip/only include local entries". But if we are adding a local-specific
knobs, maybe let's have an option to just not duplicate them in the first
place.
All this FDB duplication is there merely to make things snappy during
forwarding. But high-radix switches with thousands of VLANs typically do
not process much traffic in the SW datapath at all, but rather offload vast
majority of it. So we could exchange some of the runtime performance for a
neater FDB.
To that end, in this patchset, introduce a new bridge option,
BR_BOOLOPT_FDB_LOCAL_VLAN_0, which when enabled, has local FDB entries
installed only on VLAN 0, instead of duplicating them across all VLANs.
Then to maintain the local termination behavior, on FDB miss, the bridge
does a second lookup on VLAN 0.
Enabling this option changes the bridge behavior in expected ways. Since
the entries are only kept on VLAN 0, FDB get, flush and dump will not
perceive them on non-0 VLANs. And deleting the VLAN 0 entry affects
forwarding on all VLANs.
This patchset is loosely based on a privately circulated patch by Nikolay
Aleksandrov.
The patchset progresses as follows:
- Patch #1 introduces a bridge option to enable the above feature. Then
patches #2 to #5 gradually patch the bridge to do the right thing when
the option is enabled. Finally patch #6 adds the UAPI knob and the code
for when the feature is enabled or disabled.
- Patches #7, #8 and #9 contain fixes and improvements to selftest
libraries
- Patch #10 contains a new selftest
====================
Usually the autodefer helpers in lib.sh are expected to be run in context
where success is the expected outcome. However when using them for feature
detection, failure can legitimately occur. But the failed command still
schedules a cleanup, which will likely fail again.
Instead, only schedule deferred cleanup when the positive command succeeds.
This way of organizing the cleanup has the added benefit that now the
return code from these functions reflects whether the command passed.
Petr Machata [Thu, 4 Sep 2025 17:07:25 +0000 (19:07 +0200)]
selftests: defer: Introduce DEFER_PAUSE_ON_FAIL
The fact that all cleanup (ideally) goes through the defer framework makes
debugging of these commands a bit tricky. However, this also gives us a
nice point to place a hook along the lines of PAUSE_ON_FAIL. When the
environment variable DEFER_PAUSE_ON_FAIL is set, and a cleanup command
results in non-zero exit status, show a bit of debuginfo and give the user
an opportunity to interrupt the execution altogether.
Petr Machata [Thu, 4 Sep 2025 17:07:24 +0000 (19:07 +0200)]
selftests: defer: Allow spaces in arguments of deferred commands
Currently the way deferred commands are stored and invoked causes any
whitespace to act as an argument separator when the command is executed.
To make it possible to use spaces in deferred commands, store the commands
quoted, and then eval the string prior to execution.
Petr Machata [Thu, 4 Sep 2025 17:07:23 +0000 (19:07 +0200)]
net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0
The previous patches introduced a new option, BR_BOOLOPT_FDB_LOCAL_VLAN_0.
When enabled, it has local FDB entries installed only on VLAN 0, instead of
duplicating them across all VLANs.
In this patch, add the corresponding UAPI toggle, and the code for turning
the feature on and off.
Petr Machata [Thu, 4 Sep 2025 17:07:22 +0000 (19:07 +0200)]
net: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation
When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the
member ports as well as the bridge itself should not be created per-VLAN,
but instead only on VLAN 0.
Thus when a VLAN is added for a port or the bridge itself, a local FDB
entry with the corresponding address should not be added when in the VLAN-0
mode.
Petr Machata [Thu, 4 Sep 2025 17:07:21 +0000 (19:07 +0200)]
net: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge changeaddr, skip per-VLAN FDBs
When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the
bridge itself should not be created per-VLAN, but instead only on VLAN 0.
When the bridge address changes, the local FDB entries need to be updated,
which is done in br_fdb_change_mac_address().
Bail out early when in VLAN-0 mode, so that the per-VLAN FDB entries are
not created. The per-VLAN walk is only done afterwards.
Petr Machata [Thu, 4 Sep 2025 17:07:20 +0000 (19:07 +0200)]
net: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN FDBs
When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for member
ports should not be created per-VLAN, but instead only on VLAN 0. When the
member port address changes, the local FDB entries need to be updated,
which is done in br_fdb_changeaddr().
Under the VLAN-0 mode, only one local FDB entry will ever be added for a
port's address, and that on VLAN 0. Thus bail out of the delete loop early.
For the same reason, also skip adding the per-VLAN entries.
Petr Machata [Thu, 4 Sep 2025 17:07:19 +0000 (19:07 +0200)]
net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss
When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the
member ports as well as the bridge itself should not be created per-VLAN,
but instead only on VLAN 0.
That means that br_handle_frame_finish() needs to make two lookups: the
primary lookup on an appropriate VLAN, and when that misses, a lookup on
VLAN 0.
Have the second lookup only accept local MAC addresses. Turning this into a
generic second-lookup feature is not the goal.
Petr Machata [Thu, 4 Sep 2025 17:07:18 +0000 (19:07 +0200)]
net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0
The following patches will gradually introduce the ability of the bridge
to look up local FDB entries on VLAN 0 instead of using the VLAN indicated
by a packet.
In this patch, just introduce the option itself, with which the feature
will be linked.
tcp_recvmsg_dmabuf can export the following errors:
- EFAULT when linear copy fails
- ETOOSMALL when cmsg put fails
- ENODEV if one of the frags is readable
- ENOMEM on xarray failures
But they are all ignored and replaced by EFAULT in the caller
(tcp_recvmsg_locked). Expose real error to the userspace to
add more transparency on what specifically fails.
In non-devmem case (skb_copy_datagram_msg) doing `if (!copied)
copied=-EFAULT` is ok because skb_copy_datagram_msg can return only EFAULT.
Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250910162429.4127997-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel config
It's no longer user-selectable (and the default was already "y"), so
let's just drop it.
It was never really relevant to the wireguard selftests either way.
Cc: Shuah Khan <shuah@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250910013644.4153708-4-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
wireguard: queueing: always return valid online CPU in wg_cpumask_choose_online()
The function gets number of online CPUS, and uses it to search for
Nth cpu in cpu_online_mask.
If id == num_online_cpus() - 1, and one CPU gets offlined between
calling num_online_cpus() -> cpumask_nth(), there's a chance for
cpumask_nth() to find nothing and return >= nr_cpu_ids.
The caller code in __queue_work() tries to avoid that by checking the
returned CPU against WORK_CPU_UNBOUND, which is NR_CPUS. It's not the
same as '>= nr_cpu_ids'. On a typical Ubuntu desktop, NR_CPUS is 8192,
while nr_cpu_ids is the actual number of possible CPUs, say 8.
The non-existing cpu may later be passed to rcu_dereference() and
corrupt the logic. Fix it by switching from 'if' to 'while'.
Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250910013644.4153708-3-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
wg_cpumask_choose_online() opencodes cpumask_nth(). Use it and make the
function significantly simpler. While there, fix opencoded cpu_online()
too.
Signed-off-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250910013644.4153708-2-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Move the conflicting declaration to the end of the corresponding
structure. Notice that `struct ip_tunnel_info` is a flexible
structure, this is a structure that contains a flexible-array
member.
Fix the following warning:
drivers/net/geneve.c:56:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/aMBK78xT2fUnpwE5@kspp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In a system with high real-time requirements, the timeout mechanism of
ordinary timers with jiffies granularity is insufficient to meet the
demands for real-time performance. Meanwhile, the optimization of CPU
usage with af_packet is quite significant. Use hrtimer instead of timer
to help compensate for the shortcomings in real-time performance.
In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
enough, with fluctuations reaching over 8ms (on a system with HZ=250).
This is unacceptable in some high real-time systems that require timely
processing of network packets. By replacing it with hrtimer, if a timeout
of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
3 ms.
====================
net: af_packet: Use hrtimer to do the retire operation
In a system with high real-time requirements, the timeout mechanism of
ordinary timers with jiffies granularity is insufficient to meet the
demands for real-time performance. Meanwhile, the optimization of CPU
usage with af_packet is quite significant. Use hrtimer instead of timer
to help compensate for the shortcomings in real-time performance.
In HZ=100 or HZ=250 system, the update of TP_STATUS_USER is not real-time
enough, with fluctuations reaching over 8ms (on a system with HZ=250).
This is unacceptable in some high real-time systems that require timely
processing of network packets. By replacing it with hrtimer, if a timeout
of 2ms is set, the update of TP_STATUS_USER can be stabilized to within
3 ms.
Delete delete_blk_timer field, because hrtimer_cancel will check and wait
until the timer callback return and ensure never enter callback again.
Simplify the logic related to setting timeout, only update the hrtimer
expire time within the hrtimer callback, no longer update the expire time
in prb_open_block which is called by tpacket_rcv or timer callback.
Reasons why NOT update hrtimer in prb_open_block:
1) It will increase complexity to distinguish the two caller scenario.
2) hrtimer_cancel and hrtimer_start need to be called if you want to update
TMO of an already enqueued hrtimer, leading to complex shutdown logic.
One side effect of NOT update hrtimer when called by tpacket_rcv is that
a newly opened block triggered by tpacket_rcv may be retired earlier than
expected. On the other hand, if timeout is updated in prb_open_block, the
frequent reception of network packets that leads to prb_open_block being
called may cause hrtimer to be removed and enqueued repeatedly.
The retire hrtimer expiration is unconditional and periodic. If there are
numerous packet sockets on the system, please set an appropriate timeout
to avoid frequent enqueueing of hrtimers.
kactive_blk_num (K) is only incremented on block close.
In timer callback prb_retire_rx_blk_timer_expired, except delete_blk_timer
is true, last_kactive_blk_num (L) is set to match kactive_blk_num (K) in
all cases. L is also set to match K in prb_open_block.
The only case K not equal to L is when scheduled by tpacket_rcv
and K is just incremented on block close but no new block could be opened,
so that it does not call prb_open_block in prb_dispatch_next_block.
This patch modifies the prb_retire_rx_blk_timer_expired function by simply
removing the check for L == K. This patch just provides another checkpoint
to thaw the might-be-frozen block in any case. It doesn't have any effect
because __packet_lookup_frame_in_block() has the same logic and does it
again without this patch when detecting the ring is frozen. The patch only
advances checking the status of the ring.
The daughter driver rcar_gen4_ptp used by both rswitch and rtsn where
upstreamed with support for possible different memory layouts on
different users. With all Gen4 boards upstream no such setup is
documented.
There are other issues related to how the rcar_gen4_ptp driver is shared
between multiple useres that needs to be cleaned up. But that will be a
larger work. So before that get some simple fixes done.
Patch 1/3 and 2/3 removes the support to allow different register
layouts on different SoCs by looking up offsets at runtime with a much
simpler interface. The new interface computes the offsets at compile
time.
While patch 3/3 is a drive-by patch taking a spurs comment and making a
lockdep check of it.
There is no intentional functional change in this series just cleaning
up in preparation of larger works to follow.
====================
With the support for multiple register layout removed all support
structures can be removed from the header file. Covert to a simpler
structure using defines for the register offsets.
There is no functional change, only switching from looking up offsets at
runtime to compile time.
net: ethernet: renesas: rcar_gen4_ptp: Remove different memory layout
When upstreaming the Gen4 PTP support for R-Car S4 the possibility for
different memory layouts on other Gen4 SoCs was build in. It turns out
this is not needed and instead needlessly makes the driver harder to
read, remove the support code that would have allowed different memory
layouts.
This change only deals with the public functions used by other drivers,
follow up work will clean up the rcar_gen4_ptp internals.
Daniel Palmer [Sun, 7 Sep 2025 06:43:49 +0000 (15:43 +0900)]
eth: 8139too: Make 8139TOO_PIO depend on !NO_IOPORT_MAP
When 8139too is probing and 8139TOO_PIO=y it will call pci_iomap_range()
and from there __pci_ioport_map() for the PCI IO space.
If HAS_IOPORT_MAP=n and NO_GENERIC_PCI_IOPORT_MAP=n, like it is on my
m68k config, __pci_ioport_map() becomes NULL, pci_iomap_range() will
always fail and the driver will complain it couldn't map the PIO space
and return an error.
NO_IOPORT_MAP seems to cover the case where what 8139too is trying
to do cannot ever work so make 8139TOO_PIO depend on being it false
and avoid creating an unusable driver.
Jakub Kicinski [Fri, 12 Sep 2025 00:50:46 +0000 (17:50 -0700)]
Merge tag 'wireless-next-2025-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Plenty of things going on, notably:
- iwlwifi: major cleanups/rework
- brcmfmac: gets AP isolation support
- mac80211: gets more S1G support
* tag 'wireless-next-2025-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (94 commits)
wifi: mwifiex: fix endianness handling in mwifiex_send_rgpower_table
wifi: cfg80211: Remove the redundant wiphy_dev
wifi: mac80211: fix incorrect comment
wifi: cfg80211: update the time stamps in hidden ssid
wifi: mac80211: Fix HE capabilities element check
wifi: mac80211: add tx_handlers_drop statistics to ethtool
wifi: mac80211: fix reporting of all valid links in sta_set_sinfo()
wifi: iwlwifi: mld: CHANNEL_SURVEY_NOTIF is always supported
wifi: iwlwifi: mld: remove support of iwl_esr_mode_notif version 1
wifi: iwlwifi: mld: remove support from of sta cmd version 1
wifi: iwlwifi: mld: remove support of roc cmd version 5
wifi: iwlwifi: mld: remove support of mac cmd ver 2
wifi: iwlwifi: mld: don't consider phy cmd version 5
wifi: iwlwifi: implement wowlan status notification API update
wifi: iwlwifi: fw: Add ASUS to PPAG and TAS list
wifi: iwlwifi: add kunit tests for nvm parse
wifi: iwlwifi: api: add a flag to iwl_link_ctx_modify_flags
wifi: iwlwifi: pcie: move ltr_enabled to the specific transport
wifi: iwlwifi: pcie: move pm_support to the specific transport
wifi: iwlwifi: rename iwl_finish_nic_init
...
====================
Remove stubs for fixed_phy_set_link_update() and
fixed_phy_change_carrier() because all callers
(actually just one per function) select config
symbol FIXED_PHY.
Merge tag 'net-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from CAN, netfilter and wireless.
We have an IPv6 routing regression with the relevant fix still a WiP.
This includes a last-minute revert to avoid more problems.
Current release - new code bugs:
- wifi: nl80211: completely disable per-link stats for now
Previous releases - regressions:
- dev_ioctl: take ops lock in hwtstamp lower paths
- netfilter:
- fix spurious set lookup failures
- fix lockdep splat due to missing annotation
- genetlink: fix genl_bind() invoking bind() after -EPERM
- phy: transfer phy_config_inband() locking responsibility to phylink
- can: xilinx_can: fix use-after-free of transmitted SKB
- hsr: fix lock warnings
- eth:
- igb: fix NULL pointer dereference in ethtool loopback test
- i40e: fix Jumbo Frame support after iPXE boot
- macsec: sync features on RTM_NEWLINK
Previous releases - always broken:
- tunnels: reset the GSO metadata before reusing the skb
- mptcp: make sync_socket_options propagate SOCK_KEEPOPEN
* tag 'net-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"
hsr: hold rcu and dev lock for hsr_get_port_ndev
hsr: use hsr_for_each_port_rtnl in hsr_port_get_hsr
hsr: use rtnl lock when iterating over ports
wifi: nl80211: completely disable per-link stats for now
net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups
net: ethtool: fix wrong type used in struct kernel_ethtool_ts_info
MAINTAINERS: add Phil as netfilter reviewer
netfilter: nf_tables: restart set lookup on base_seq change
netfilter: nf_tables: make nft_set_do_lookup available unconditionally
netfilter: nf_tables: place base_seq in struct net
netfilter: nft_set_rbtree: continue traversal if element is inactive
netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
can: rcar_can: rcar_can_resume(): fix s2ram with PSCI
can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB
can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails
can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed
can: j1939: implement NETDEV_UNREGISTER notification handler
selftests: can: enable CONFIG_CAN_VCAN as a module
...
Merge tag 's390-6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- ptep_modify_prot_start() may be called in a loop, which might lead to
the preempt_count overflow due to the unnecessary preemption
disabling. Do not disable preemption to prevent the overflow
- Events of type PERF_TYPE_HARDWARE are not tested for sampling and
return -EOPNOTSUPP eventually.
Instead, deny all sampling events by CPUMF counter facility and
return -ENOENT to allow other PMUs to be tried
- The PAI PMU driver returns -EINVAL if an event out of its range. That
aborts a search for an alternative PMU driver.
Instead, return -ENOENT to allow other PMUs to be tried
* tag 's390-6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpum_cf: Deny all sampling events by counter PMU
s390/pai: Deny all events not handled by this PMU
s390/mm: Prevent possible preempt_count overflow
Merge tag 'pm-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a nasty hibernation regression introduced during the 6.16
cycle, an issue related to energy model management occurring on Intel
hybrid systems where some CPUs are offline to start with, and two
regressions in the amd-pstate driver:
- Restore a pm_restrict_gfp_mask() call in hibernation_snapshot()
that was removed incorrectly during the 6.16 development cycle
(Rafael Wysocki)
- Introduce a function for registering a perf domain without
triggering a system-wide CPU capacity update and make the
intel_pstate driver use it to avoid reocurring unsuccessful
attempts to update capacities of all CPUs in the system (Rafael
Wysocki)
- Fix setting of CPPC.min_perf in the active mode with performance
governor in the amd-pstate driver to restore its expected behavior
changed recently (Gautham Shenoy)
- Avoid mistakenly setting EPP to 0 in the amd-pstate driver after
system resume as a result of recent code changes (Mario
Limonciello)"
* tag 'pm-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: hibernate: Restrict GFP mask in hibernation_snapshot()
PM: EM: Add function for registering a PD without capacity update
cpufreq/amd-pstate: Fix a regression leading to EPP 0 after resume
cpufreq/amd-pstate: Fix setting of CPPC.min_perf in active mode for performance governor
Merge tag 'for-6.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix delayed inode tracking in xarray, eviction can race with
insertion and leave behind a disconnected inode
- on systems with large page (64K) and small block size (4K) fix
compression read that can return partially filled folio
- slightly relax compression option format for backward compatibility,
allow to specify level for LZO although there's only one
- fix simple quota accounting of compressed extents
- validate minimum device size in 'device add'
- update maintainers' entry
* tag 'for-6.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't allow adding block device of less than 1 MB
MAINTAINERS: update btrfs entry
btrfs: fix subvolume deletion lockup caused by inodes xarray race
btrfs: fix corruption reading compressed range when block size is smaller than page size
btrfs: accept and ignore compression level for lzo
btrfs: fix squota compressed stats leak
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
"A number of fixes accumulated due to summer vacations
- Fix out-of-bounds dynptr write in bpf_crypto_crypt() kfunc which
was misidentified as a security issue (Daniel Borkmann)
- Update the list of BPF selftests maintainers (Eduard Zingerman)
- Fix selftests warnings with icecc compiler (Ilya Leoshkevich)
- Disable XDP/cpumap direct return optimization (Jesper Dangaard
Brouer)
- Fix unexpected get_helper_proto() result in unusual configuration
BPF_SYSCALL=y and BPF_EVENTS=n (Jiri Olsa)
- Allow fallback to interpreter when JIT support is limited (KaFai
Wan)
- Fix rqspinlock and choose trylock fallback for NMI waiters. Pick
the simplest fix. More involved fix is targeted bpf-next (Kumar
Kartikeya Dwivedi)
- Fix cleanup when tcp_bpf_send_verdict() fails to allocate
psock->cork (Kuniyuki Iwashima)
- Disallow bpf_timer in PREEMPT_RT for now. Proper solution is being
discussed for bpf-next. (Leon Hwang)
- Fix XSK cq descriptor production (Maciej Fijalkowski)
- Tell memcg to use allow_spinning=false path in bpf_timer_init() to
avoid lockup in cgroup_file_notify() (Peilin Ye)
- Fix bpf_strnstr() to handle suffix match cases (Rong Tao)"
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Skip timer cases when bpf_timer is not supported
bpf: Reject bpf_timer for PREEMPT_RT
tcp_bpf: Call sk_msg_free() when tcp_bpf_send_verdict() fails to allocate psock->cork.
bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()
bpf: Allow fall back to interpreter for programs with stack size <= 512
rqspinlock: Choose trylock fallback for NMI waiters
xsk: Fix immature cq descriptor production
bpf: Update the list of BPF selftests maintainers
selftests/bpf: Add tests for bpf_strnstr
selftests/bpf: Fix "expression result unused" warnings with icecc
bpf: Fix bpf_strnstr() to handle suffix match cases better
selftests/bpf: Extend crypto_sanity selftest with invalid dst buffer
bpf: Fix out-of-bounds dynptr write in bpf_crypto_crypt
bpf: Check the helper function is valid in get_helper_proto
bpf, cpumap: Disable page_pool direct xdp_return need larger scope
Paolo Abeni [Thu, 11 Sep 2025 14:33:31 +0000 (16:33 +0200)]
Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"
This reverts commit 5537a4679403 ("net: usb: asix: ax88772: drop
phylink use in PM to avoid MDIO runtime PM wakeups"), it breaks
operation of asix ethernet usb dongle after system suspend-resume
cycle.