]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
2 weeks agoarm64: dts: rockchip: Enable USB 2.0 ports on ArmSoM Sige1
Jonas Karlman [Fri, 29 May 2026 19:03:54 +0000 (21:03 +0200)] 
arm64: dts: rockchip: Enable USB 2.0 ports on ArmSoM Sige1

The ArmSoM Sige1 has two USB 2.0 Type-A HOST ports behind an onboard
USB hub, and one USB 2.0 Type-C OTG port.

Add support for using the USB 2.0 ports on ArmSoM Sige1.

The onboard USB hub handles OHCI so only the EHCI controller is enabled.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
[added phy-supply for otg port]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20260529190355.4148175-5-heiko@sntech.de
2 weeks agoarm64: dts: rockchip: Enable USB ports on Radxa ROCK 2A/2F
Jonas Karlman [Fri, 29 May 2026 19:03:53 +0000 (21:03 +0200)] 
arm64: dts: rockchip: Enable USB ports on Radxa ROCK 2A/2F

The ROCK 2A has three USB 2.0 Type-A HOST ports behind an onboard
USB hub, and one USB 3.0 Type-A port.

And the ROCK 2F has two USB 2.0 Type-A HOST ports behind an onboard
USB hub, and one USB 2.0 Type-C OTG port.

Add support for using the USB ports on Radxa ROCK 2A/2F.

The onboard USB hub handles OHCI so only the EHCI controller is enabled.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20260529190355.4148175-4-heiko@sntech.de
2 weeks agoarm64: dts: rockchip: Enable USB 2.0 ports on Radxa E20C
Jonas Karlman [Fri, 29 May 2026 19:03:52 +0000 (21:03 +0200)] 
arm64: dts: rockchip: Enable USB 2.0 ports on Radxa E20C

The Radxa E20C has one USB2.0 Type-A HOST port and one USB2.0 Type-C port.

The Type-C port is conneced to a FE1.1s_QFN USB hub on the board, with its
ports being connected to the XHCI usb controller and an usb-uart bridge.

This also means, the XHCI controller can only be used in device-mode.

Add support for using the USB 2.0 ports on Radxa E20C.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
[set xhci to peripheral and add comment about the outward-facing hub]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20260529190355.4148175-3-heiko@sntech.de
2 weeks agoarm64: dts: rockchip: Add USB nodes for RK3528
Jonas Karlman [Fri, 29 May 2026 19:03:51 +0000 (21:03 +0200)] 
arm64: dts: rockchip: Add USB nodes for RK3528

Rockchip RK3528 has one USB 3.0 DWC3 controller and oneUSB 2.0 EHCI/OHCI
controller and uses an Innosilicon-USB2PHY for USB 2.0. The DWC3
controller additionally uses the Naneng Combo PHY for USB3.

Add device tree nodes to describe these USB controllers along with the
USB 2.0 PHYs.

[moved snps,dis_u2_susphy_quirk here from individual boards,
 describe both usb2+3 default phy connections, usb2 boards can override]

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20260529190355.4148175-2-heiko@sntech.de
2 weeks agonet: rds: clear i_sends on setup unwind
Yuqi Xu [Fri, 29 May 2026 13:01:44 +0000 (21:01 +0800)] 
net: rds: clear i_sends on setup unwind

The RDS IB connection teardown path is written so it can run during
partial startup and on repeated shutdown attempts. It uses NULL
pointers to distinguish resources that are still owned from resources
that have already been released.

When rds_ib_setup_qp() fails after allocating i_sends but before
allocating i_recvs, the sends_out path frees i_sends without clearing
the pointer. A later shutdown pass can still treat that stale pointer
as a live send ring allocation.

Clear i_sends after vfree() in the error unwind path so the existing
shutdown logic continues to use the correct ownership state.

Fixes: 3b12f73a5c29 ("rds: ib: add error handle")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Yuqi Xu <xuyq21@lenovo.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Reviewed-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/5a0f7624bb9845a7b67d26166a150b59e7f394ce.1779632468.git.xuyq21@lenovo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agofutex/requeue: Prevent NULL pointer dereference in remove_waiter() on self-deadlock
Ji'an Zhou [Tue, 2 Jun 2026 09:12:04 +0000 (09:12 +0000)] 
futex/requeue: Prevent NULL pointer dereference in remove_waiter() on self-deadlock

When FUTEX_CMP_REQUEUE_PI requeues a non-top waiter that already owns the
target PI futex, task_blocks_on_rt_mutex() returns -EDEADLK before setting
waiter->task.

The subsequent remove_waiter() in rt_mutex_start_proxy_lock() dereferences
the NULL waiter->task, causing a kernel crash.

Add a self-deadlock check for non-top waiters before calling
rt_mutex_start_proxy_lock(), analogous to the top-waiter check in
futex_lock_pi_atomic().

Fixes: 3bfdc63936dd4773109b7b8c280c0f3b5ae7d349 ("rtmutex: Use waiter::task instead of current in remove_waiter()")
Signed-off-by: Ji'an Zhou <eilaimemedsnaimel@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
2 weeks agoMerge branch 'net-airoha-preliminary-patches-to-support-multiple-net_devices-connecte...
Jakub Kicinski [Tue, 2 Jun 2026 20:24:29 +0000 (13:24 -0700)] 
Merge branch 'net-airoha-preliminary-patches-to-support-multiple-net_devices-connected-to-the-same-gdm-port'

Lorenzo Bianconi says:

====================
net: airoha: Preliminary patches to support multiple net_devices connected to the same GDM port

EN7581 or AN7583 SoCs support connecting multiple external SerDes (e.g.
Ethernet or USB SerDes) to GDM3 or GDM4 ports via a hw arbiter that
manages the traffic in a TDM manner. As a result multiple net_devices can
connect to the same GDM{3,4} port and there is a theoretical "1:n"
relation between GDM ports and net_devices.

           ┌─────────────────────────────────┐
           │                                 │    ┌──────┐
           │                         P1 GDM1 ├────►MT7530│
           │                                 │    └──────┘
           │                                 │      ETH0 (DSA conduit)
           │                                 │
           │              PSE/FE             │
           │                                 │
           │                                 │
           │                                 │    ┌─────┐
           │                         P0 CDM1 ├────►QDMA0│
           │  P4                     P9 GDM4 │    └─────┘
           └──┬─────────────────────────┬────┘
              │                         │
           ┌──▼──┐                 ┌────▼────┐
           │ PPE │                 │   ARB   │
           └─────┘                 └─┬─────┬─┘
                                     │     │
                                  ┌──▼──┐┌─▼───┐
                                  │ ETH ││ USB │
                                  └─────┘└─────┘
                                   ETH1   ETH2

This is a preliminary series to introduce support for multiple net_devices
connected to the same Frame Engine (FE) GDM port (GDM3 or GDM4) via an
external hw arbiter.
====================

Link: https://patch.msgid.link/20260527-airoha-eth-multi-serdes-preliminary-v1-0-ec6ed73ef7fc@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: airoha: Rename airoha_set_gdm2_loopback in airoha_enable_gdm2_loopback
Lorenzo Bianconi [Wed, 27 May 2026 10:21:20 +0000 (12:21 +0200)] 
net: airoha: Rename airoha_set_gdm2_loopback in airoha_enable_gdm2_loopback

This is a preliminary patch in order to allow the user to select if the
configured device will be used as hw lan or wan.
Please not this patch does not introduce any logical changes, just
cosmetic ones.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260527-airoha-eth-multi-serdes-preliminary-v1-6-ec6ed73ef7fc@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: airoha: Move {cpu,fwd}_tx_packets in airoha_gdm_dev struct
Lorenzo Bianconi [Wed, 27 May 2026 10:21:19 +0000 (12:21 +0200)] 
net: airoha: Move {cpu,fwd}_tx_packets in airoha_gdm_dev struct

Since now multiple net_devices connected to different QDMA blocks can
share the same GDM port, cpu_tx_packets and fwd_tx_packets fields can
be overwritten with the value from a different QDMA block. In order to
fix the issue move cpu_tx_packets and fwd_tx_packets fields from
airoha_gdm_port struct to airoha_gdm_dev one.

Tested-by: Xuegang Lu <xuegang.lu@airoha.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260527-airoha-eth-multi-serdes-preliminary-v1-5-ec6ed73ef7fc@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: airoha: Move qos_sq_bmap in airoha_gdm_dev struct
Lorenzo Bianconi [Wed, 27 May 2026 10:21:18 +0000 (12:21 +0200)] 
net: airoha: Move qos_sq_bmap in airoha_gdm_dev struct

Since now multiple net_devices connected to different QDMA blocks can
share the same GDM port, qos_sq_bmap field can be overwritten with the
configuration obtained from a net_device connected to a different QDMA
block. In order to fix the issue move qos_sq_bmap field from
airoha_gdm_port struct to airoha_gdm_dev one.
Add qos_channel_map bitmap in airoha_qdma struct to track if a shared
QDMA channel is already in use by another net_device.

Tested-by: Xuegang Lu <xuegang.lu@airoha.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260527-airoha-eth-multi-serdes-preliminary-v1-4-ec6ed73ef7fc@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: airoha: Rely on airoha_gdm_dev pointer in airoha_is_lan_gdm_port()
Lorenzo Bianconi [Wed, 27 May 2026 10:21:17 +0000 (12:21 +0200)] 
net: airoha: Rely on airoha_gdm_dev pointer in airoha_is_lan_gdm_port()

Rename airoha_is_lan_gdm_port in airoha_is_lan_gdm_dev. Moreover, rely
on airoha_gdm_dev pointer in airoha_is_lan_gdm_dev() instead of
airoha_gdm_port one.
This is a preliminary patch to support multiple net_devices connected to
the same GDM{3,4} port via an external hw arbiter.

Tested-by: Xuegang Lu <xuegang.lu@airoha.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260527-airoha-eth-multi-serdes-preliminary-v1-3-ec6ed73ef7fc@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: airoha: Move airoha_qdma pointer in airoha_gdm_dev struct
Lorenzo Bianconi [Wed, 27 May 2026 10:21:16 +0000 (12:21 +0200)] 
net: airoha: Move airoha_qdma pointer in airoha_gdm_dev struct

Move airoha_qdma pointer from airoha_gdm_port struct to airoha_gdm_dev
one since the QDMA block used depends on the particular net_device
WAN/LAN configuration and in the current codebase net_device pointer is
associated to airoha_gdm_dev struct.
This is a preliminary patch to support multiple net_devices connected
to the same GDM{3,4} port via an external hw arbiter.

Tested-by: Xuegang Lu <xuegang.lu@airoha.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260527-airoha-eth-multi-serdes-preliminary-v1-2-ec6ed73ef7fc@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: airoha: Introduce airoha_gdm_dev struct
Lorenzo Bianconi [Wed, 27 May 2026 10:21:15 +0000 (12:21 +0200)] 
net: airoha: Introduce airoha_gdm_dev struct

EN7581 and AN7583 SoCs support connecting multiple external SerDes to GDM3
or GDM4 ports via a hw arbiter that manages the traffic in a TDM manner.
As a result multiple net_devices can connect to the same GDM{3,4} port
and there is a theoretical "1:n" relation between GDM port and
net_devices.
Introduce airoha_gdm_dev struct to collect net_device related info (e.g.
net_device and external phy pointer). Please note this is just a
preliminary patch and we are still supporting a single net_device for
each GDM port. Subsequent patches will add support for multiple net_devices
connected to the same GDM port.

Tested-by: Xuegang Lu <xuegang.lu@airoha.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260527-airoha-eth-multi-serdes-preliminary-v1-1-ec6ed73ef7fc@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agodoc/netlink: rt-link: fix binary attributes marked as strings
Remy D. Farley [Fri, 29 May 2026 12:14:44 +0000 (12:14 +0000)] 
doc/netlink: rt-link: fix binary attributes marked as strings

These link-attrs attributes were previously marked as strings:

- wireless - struct iw_event
- protinfo - a nest of ifla6-attrs or linkinfo-brport-attrs
- cost, priority - unused

Signed-off-by: Remy D. Farley <one-d-wide@protonmail.com>
Link: https://patch.msgid.link/20260529121355.1564817-1-one-d-wide@protonmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'netdevsim-psp-fix-issues-with-stats-collection'
Jakub Kicinski [Tue, 2 Jun 2026 19:57:50 +0000 (12:57 -0700)] 
Merge branch 'netdevsim-psp-fix-issues-with-stats-collection'

Daniel Zahka says:

====================
netdevsim: psp: fix issues with stats collection

It has come to my attention via a sashiko review of my net-next series
for aes-gcm in netdevsim [1] that there were preexisting issues with
netdevsim's implementation of psp statistics.

API usage issues:
1. not calling u64_stats_init() on the u64_stats_sync object during
   init
2. not serializing usage of the writer side API during stats update

Logical Bugs:
1. We were incrementing rx stats on the sending devices stats
   counters.

Fix the first set of issues by removing the u64_stats_t api entirely,
and keep track of stats with atomics. Fix the second issue by charging
events to the right netdevsim object.

[1]: https://sashiko.dev/#/patchset/20260508-nsim-psp-crypto-v1-0-4b50ed09b794%40gmail.com

  TAP version 13
  1..28
  ok 1 psp.data_basic_send_v0_ip4
  ok 2 psp.data_basic_send_v0_ip6
  ok 3 psp.data_basic_send_v1_ip4
  ok 4 psp.data_basic_send_v1_ip6
  ok 5 psp.data_basic_send_v2_ip4
  ok 6 psp.data_basic_send_v2_ip6
  ok 7 psp.data_basic_send_v3_ip4
  ok 8 psp.data_basic_send_v3_ip6
  ok 9 psp.data_mss_adjust_ip4
  ok 10 psp.data_mss_adjust_ip6
  ok 11 psp.dev_list_devices
  ok 12 psp.dev_get_device
  ok 13 psp.dev_get_device_bad
  ok 14 psp.dev_rotate
  ok 15 psp.dev_rotate_spi
  ok 16 psp.assoc_basic
  ok 17 psp.assoc_bad_dev
  ok 18 psp.assoc_sk_only_conn
  ok 19 psp.assoc_sk_only_mismatch
  ok 20 psp.assoc_sk_only_mismatch_tx
  ok 21 psp.assoc_sk_only_unconn
  ok 22 psp.assoc_version_mismatch
  ok 23 psp.assoc_twice
  ok 24 psp.data_send_bad_key
  ok 25 psp.data_send_disconnect
  ok 26 psp.data_stale_key
  ok 27 psp.removal_device_rx
  ok 28 psp.removal_device_bi
  # Totals: pass:28 fail:0 xfail:0 xpass:0 skip:0 error:0

Dump stats on both devs tx on one should match rx on other:
local dev:
id=5 ifindex=2 stats={'dev-id': 5, 'key-rotations': 0,
'stale-events': 0, 'rx-packets': 1226, 'rx-bytes': 39244,
'rx-auth-fail': 0, 'rx-error': 0, 'rx-bad': 0, 'tx-packets': 1931,
'tx-bytes': 2478908, 'tx-error': 0}

remote dev:
id=3 ifindex=2 stats={'dev-id': 3, 'key-rotations': 0, 'stale-events':
0, 'rx-packets': 1931, 'rx-bytes': 2478908, 'rx-auth-fail': 0,
'rx-error': 0, 'rx-bad': 0, 'tx-packets': 1226, 'tx-bytes': 39244,
'tx-error': 0}
====================

Link: https://patch.msgid.link/20260529-fix-psp-stats-v2-0-3a194eacf18e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetdevsim: psp: use atomic64 for psp stats counters
Daniel Zahka [Fri, 29 May 2026 13:15:28 +0000 (06:15 -0700)] 
netdevsim: psp: use atomic64 for psp stats counters

The existing u64_stats_t-based psp counters had two preexisting api
usage bugs: u64_stats_init() was never called on the syncp object, and
the writer side of the u64_stats_update_begin()/end() api was not
serialized. Switch the counters to atomic64_t instead. Atomics need
no initialization and are inherently safe against concurrent writers,
eliminating both bugs at once.

Use atomic64_t rather than atomic_long_t so byte counters don't wrap
at 4 GiB on 32-bit builds.

Fixes: 178f0763c5f3 ("netdevsim: implement psp device stats")
Cc: <stable+noautosel@kernel.org> # netdevsim is a test harness, it's never loaded on production systems
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20260529-fix-psp-stats-v2-2-3a194eacf18e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetdevsim: psp: update rx stats on the peer netdevsim
Daniel Zahka [Fri, 29 May 2026 13:15:27 +0000 (06:15 -0700)] 
netdevsim: psp: update rx stats on the peer netdevsim

nsim_do_psp() handles both tx and rx psp processing in the sending
device's nsim_start_xmit() path. The existing code has a logical bug,
where we erroneously increment rx_bytes and rx_packets on the sending
devices stats, instead of the peer device.

Additionally, compute psp_len after psp_dev_encapsulate() and before
psp_dev_rcv(), which modifies the header region of the skb. The
existing calculation was actually correct, because psp_dev_rcv()
leaves skb_inner_transport_header pointing at the tcp header, but this
is fragile and confusing as there is no actual inner transport header
after psp_dev_rcv has removed udp encapsulation.

Fixes: 178f0763c5f3 ("netdevsim: implement psp device stats")
Cc: <stable+noautosel@kernel.org> # netdevsim is a test harness, it's never loaded on production systems
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20260529-fix-psp-stats-v2-1-3a194eacf18e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetdevsim: fib: fix use-after-free of FIB data via debugfs
Zijing Yin [Fri, 29 May 2026 13:57:17 +0000 (06:57 -0700)] 
netdevsim: fib: fix use-after-free of FIB data via debugfs

Writing to the netdevsim debugfs file
"netdevsim/netdevsimN/fib/nexthop_bucket_activity" enters
nsim_nexthop_bucket_activity_write(), which looks up a nexthop in
data->nexthop_ht under rtnl_lock(). If a network namespace teardown,
devlink reload or device deletion runs concurrently, nsim_fib_destroy()
frees that rhashtable (and the surrounding nsim_fib_data) while the
write is still in flight, leading to a slab-use-after-free:

  BUG: KASAN: slab-use-after-free in nsim_nexthop_bucket_activity_write+0xb9e/0xdf0
  Read of size 4 at addr ff1100001a379808 by task syz.0.11967/27894

  CPU: 0 UID: 0 PID: 27894 Comm: syz.0.11967 Not tainted 7.1.0-rc4-gf6f1bfc1980a #4
  Call Trace:
   nsim_nexthop_bucket_activity_write+0xb9e/0xdf0
   full_proxy_write+0x135/0x1a0
   vfs_write+0x2e2/0x1040
   ksys_write+0x146/0x270
   __x64_sys_write+0x76/0xb0
   do_syscall_64+0xb9/0x5b0
   entry_SYSCALL_64_after_hwframe+0x74/0x7c

  Allocated by task 15957:
   rhashtable_init_noprof+0x3ec/0x860
   nsim_fib_create+0x371/0xca0
   nsim_drv_probe+0xd60/0x15c0
   ...
   new_device_store+0x425/0x7f0

  Freed by task 24:
   rhashtable_free_and_destroy+0x10d/0x620
   nsim_fib_destroy+0xc9/0x1c0
   nsim_dev_reload_destroy+0x1e7/0x530
   nsim_dev_reload_down+0x6b/0xd0
   devlink_reload+0x1b5/0x770
   devlink_pernet_pre_exit+0x25d/0x3a0
   ops_undo_list+0x1b7/0xb90
   cleanup_net+0x47f/0x8a0

  The buggy address belongs to the object at ff1100001a379800
   which belongs to the cache kmalloc-1k of size 1024

The freed 1k object is the bucket table of data->nexthop_ht. Shortly
after, the dangling table is dereferenced again and the machine also
takes a GPF in __rht_bucket_nested() from the same call site.

The root cause is a lifetime mismatch: the debugfs files reference
nsim_fib_data (the writer dereferences data->nexthop_ht), but the
interface is not bracketed around the lifetime of that data.
nsim_fib_destroy() freed both rhashtables and only removed the debugfs
directory afterwards, and nsim_fib_create() created the debugfs files
before the rhashtables were initialized and, on the error path, freed
them before removing the files. debugfs keeps the file itself alive
across a ->write() via debugfs_file_get()/debugfs_file_put()
(fs/debugfs/file.c), but it does not keep data->nexthop_ht alive, so the
in-flight writer dereferenced freed memory. rtnl_lock() in the writer
does not help, because the teardown path does not take rtnl around
rhashtable_free_and_destroy().

Fix it by bracketing the debugfs interface around the data it exposes,
keeping nsim_fib_create() and nsim_fib_destroy() symmetric:

 - In nsim_fib_destroy(), tear down the debugfs files before the data
   structures they reference. debugfs_remove_recursive() drops the
   initial active-user reference and then waits for every in-flight
   ->write() to drop its reference before returning, and rejects new
   opens (__debugfs_file_removed(), fs/debugfs/inode.c). Once it returns,
   no debugfs accessor can reach the FIB data, so the rhashtables and
   nsim_fib_data can be destroyed safely. This also covers the bool knobs
   in the same directory, which store pointers into the same
   nsim_fib_data, and the final kfree(data).

 - In nsim_fib_create(), create the debugfs files after the rhashtables
   and notifiers are set up. This closes the same race on the
   error-unwind path, where a concurrent writer could otherwise observe a
   half-constructed instance or a table that the unwind has already
   freed. (With only the destroy-side change, a writer racing the create
   window instead dereferences an uninitialized data->nexthop_ht.)

This is reproducible by racing, in a loop, writes to
/sys/kernel/debug/netdevsim/netdevsimN/fib/nexthop_bucket_activity
against a teardown of the same netdevsim instance -- a devlink reload
("devlink dev reload netdevsim/netdevsimN"), destroying the network
namespace it lives in, or "echo N > /sys/bus/netdevsim/del_device". It
was found with syzkaller; a syzkaller reproducer is available. A
standalone C reproducer does not trigger it reliably because the race
needs the netns-teardown/reload path.

Cc: <stable+noautosel@kernel.org> # netdevsim is a test harness, it's never loaded on production systems
Signed-off-by: Zijing Yin <yzjaurora@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260529135718.1804031-1-yzjaurora@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agodt-bindings: extcon: document Samsung S2M series PMIC extcon device
Kaustabh Chakraborty [Fri, 15 May 2026 21:38:34 +0000 (03:08 +0530)] 
dt-bindings: extcon: document Samsung S2M series PMIC extcon device

Certain Samsung S2M series PMICs have a MUIC device which reports
various cable states by measuring the ID-GND resistance with an internal
ADC. Document the devicetree schema for this device.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Link: https://patch.msgid.link/20260516-s2mu005-pmic-v7-2-73f9702fb461@disroot.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2 weeks agovdso/treewide: Drop GENERIC_TIME_VSYSCALL
Thomas Weißschuh [Tue, 19 May 2026 06:26:17 +0000 (08:26 +0200)] 
vdso/treewide: Drop GENERIC_TIME_VSYSCALL

This Kconfig symbol is not used anymore, remove it.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260519-vdso-generic_time_vsyscal-v1-3-5c2a5905d5f5@linutronix.de
2 weeks agovdso/vsyscall: Gate update_vsyscall() behind CONFIG_GENERIC_GETTIMEOFDAY
Thomas Weißschuh [Tue, 19 May 2026 06:26:16 +0000 (08:26 +0200)] 
vdso/vsyscall: Gate update_vsyscall() behind CONFIG_GENERIC_GETTIMEOFDAY

Both the compilation of kernel/time/vsyscall.c, which contains the real
definition of update_vsyscall() and the other vDSO definitions in
timekeeper_internal.h use CONFIG_GENERIC_GETTIMEOFDAY and not
CONFIG_GENERIC_TIME_VSYSCALL.

Align the code to use a single Kconfig symbol.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260519-vdso-generic_time_vsyscal-v1-2-5c2a5905d5f5@linutronix.de
2 weeks agoriscv: vdso: Drop CONFIG_GENERIC_TIME_VSYSCALL guard around syscall fallbacks
Thomas Weißschuh [Tue, 19 May 2026 06:26:15 +0000 (08:26 +0200)] 
riscv: vdso: Drop CONFIG_GENERIC_TIME_VSYSCALL guard around syscall fallbacks

The syscall definitions can be built just fine for 32-bit systems.
Also the guard does not cover __arch_get_hw_counter() which is always
used together with those system call fallbacks. Also this header is
unused when no vDSO is built anyways.

Drop the ifdeffery. The logic will be simpler to understand. Furthermore
this prepares the complete removal of CONFIG_GENERIC_TIME_VSYSCALL.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260519-vdso-generic_time_vsyscal-v1-1-5c2a5905d5f5@linutronix.de
2 weeks agovdso/datastore: Mark vdso_k_*_data pointers as __ro_after_init
Thomas Weißschuh [Wed, 13 May 2026 06:32:46 +0000 (08:32 +0200)] 
vdso/datastore: Mark vdso_k_*_data pointers as __ro_after_init

These pointers are only modified once in vdso_setup_data_pages(),
during the init phase. Make them read-only after that.

Drop __refdata as that would conflict with __ro_after_init.
Modpost does accept the reference from a __ro_after_init symbol to
an __init one.

Fixes: 05988dba1179 ("vdso/datastore: Allocate data pages dynamically")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260513-vdso-ro-after-init-v1-1-4b51f74015a4@linutronix.de
2 weeks agoARM: dts: freescale: add bootph-all to i.MX7ULP watchdog nodes
Alice Guo [Tue, 19 May 2026 10:55:16 +0000 (18:55 +0800)] 
ARM: dts: freescale: add bootph-all to i.MX7ULP watchdog nodes

Add the bootph-all property to ULP watchdog nodes for i.MX7ULP, ensuring
the watchdog is available during all boot phases.

Signed-off-by: Alice Guo <alice.guo@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
2 weeks agorust/drm: Introduce DeviceContext
Lyude Paul [Thu, 7 May 2026 21:59:19 +0000 (17:59 -0400)] 
rust/drm: Introduce DeviceContext

One of the tricky things about DRM bindings in Rust is the fact that
initialization of a DRM device is a multi-step process. It's quite normal
for a device driver to start making use of its DRM device for tasks like
creating GEM objects before userspace registration happens. This is an
issue in rust though, since prior to userspace registration the device is
only partly initialized. This means there's a plethora of DRM device
operations we can't yet expose without opening up the door to UB if the DRM
device in question isn't yet registered.

Additionally, this isn't something we can reliably check at runtime. And
even if we could, performing an operation which requires the device be
registered when the device isn't actually registered is a programmer bug,
meaning there's no real way to gracefully handle such a mistake at runtime.
And even if that wasn't the case, it would be horrendously annoying and
noisy to have to check if a device is registered constantly throughout a
driver.

In order to solve this, we first take inspiration from
`kernel::device::DeviceContext` and introduce `kernel::drm::DeviceContext`.
This provides us with a ZST type that we can generalize over to represent
contexts where a device is known to have been registered with userspace at
some point in time (`Registered`), along with contexts where we can't make
such a guarantee (`Uninit`).

It's important to note we intentionally do not provide a `DeviceContext`
which represents an unregistered device. This is because there's no
reasonable way to guarantee that a device with long-living references to
itself will not be registered eventually with userspace. Instead, we
provide a new-type for this: `UnregisteredDevice` which can
provide a guarantee that the `Device` has never been registered with
userspace. To ensure this, we modify `Registration` so that creating a new
`Registration` requires passing ownership of an `UnregisteredDevice`.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Link: https://patch.msgid.link/20260507220044.3204919-2-lyude@redhat.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2 weeks agotimers/migration: Turn tmigr_hierarchy level_list into a flexible array
Rosen Penev [Fri, 22 May 2026 23:16:18 +0000 (16:16 -0700)] 
timers/migration: Turn tmigr_hierarchy level_list into a flexible array

The level_list array is allocated separately right after the parent
struct. The size of the array is already known.

Move level_list to the struct tail as a flexible array member and fold the
two allocations into a single kzalloc_flex().

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Assisted-by: Claude:Opus-4.7
Link: https://patch.msgid.link/20260522231618.41622-1-rosenp@gmail.com
2 weeks agotimers/migration: Deactivate per-capacity hierarchies under nohz_full
Frederic Weisbecker [Tue, 19 May 2026 22:09:26 +0000 (00:09 +0200)] 
timers/migration: Deactivate per-capacity hierarchies under nohz_full

NOHZ_FULL CPUs global timers are guaranteed to be handled by the timekeeper
CPU, which never stops its tick and therefore remains active in the
hierarchy.

But since the introduction of per-capacity hierarchies, this guarantee is
broken because the timekeeper may not belong to the same hierarchy as all
the NOHZ_FULL CPUs.

Fix it with simply turning off capacity awareness when NOHZ_FULL is
running and force a single hierarchy. NOHZ_FULL is not exactly optimized
powerwise anyway.

Fixes: 098cbaad8e57 ("timers/migration: Split per-capacity hierarchies")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260519220926.63437-3-frederic@kernel.org
2 weeks agotimers/migration: Fix hotplug migrator selection target on asymetric capacity machines
Frederic Weisbecker [Tue, 19 May 2026 22:09:25 +0000 (00:09 +0200)] 
timers/migration: Fix hotplug migrator selection target on asymetric capacity machines

When a top-level migrator is deactivated, either at CPU down hotplug time
or when a CPU is domain isolated, a new migrator is elected among the
available CPUs and woken up to take over the migration duty.

However that election must happen at the scope of a given hierarchy and not
globally, which the introduction of per-capacity hierarchies failed to
handle.

As a result a given hierarchy may end up without migrator to handle global
timers.

Fix it by making sure that the new migrator belongs to the same hierarchy
as the outgoing CPU.

Fixes: 098cbaad8e57 ("timers/migration: Split per-capacity hierarchies")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260519220926.63437-2-frederic@kernel.org
2 weeks agosched/cputime: Handle dyntick-idle steal time correctly
Frederic Weisbecker [Fri, 8 May 2026 13:16:47 +0000 (15:16 +0200)] 
sched/cputime: Handle dyntick-idle steal time correctly

The dyntick-idle steal time is currently accounted when the tick restarts
but the stolen idle time is not subtracted from the idle time that was
already accounted. This is to avoid observing the idle time going backward
as the dyntick-idle cputime accessors can't reliably know in advance the
stolen idle time.

In order to maintain a forward progressing idle cputime while subtracting
idle steal time from it, keep track of the previously accounted idle stolen
time and substract it from _later_ idle cputime accounting.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-16-frederic@kernel.org
2 weeks agosched/cputime: Handle idle irqtime gracefully
Frederic Weisbecker [Fri, 8 May 2026 13:16:46 +0000 (15:16 +0200)] 
sched/cputime: Handle idle irqtime gracefully

The dyntick-idle cputime accounting always assumes that interrupt time
accounting is enabled and consequently stops elapsing the idle time during
dyntick-idle interrupts.

This doesn't mix up well with disabled interrupt time accounting because
then idle interrupts become a cputime blind-spot. Also this feature is
disabled on most configurations and the overhead of pausing dyntick-idle
accounting while in idle interrupts could then be avoided.

Fix the situation with conditionally pausing dyntick-idle accounting during
idle interrupts only iff either native vtime (which does interrupt time
accounting) or generic interrupt time accounting are enabled.

Also make sure that the accumulated interrupt time is not accidentally
substracted from later accounting.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-15-frederic@kernel.org
2 weeks agosched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case
Frederic Weisbecker [Fri, 8 May 2026 13:16:45 +0000 (15:16 +0200)] 
sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case

The last reason why get_cpu_idle/iowait_time_us() may return -1 now is if
the config doesn't support nohz.

The ad-hoc replacement solution by cpufreq is to compute jiffies minus the
whole busy cputime. Although the intention should provide a coherent low
resolution estimation of the idle and iowait time, the implementation is
buggy because jiffies don't start at 0.

Just provide instead a real get_cpu_[idle|iowait]_time_us() offcase.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-14-frederic@kernel.org
2 weeks agotick/sched: Consolidate idle time fetching APIs
Frederic Weisbecker [Fri, 8 May 2026 13:16:44 +0000 (15:16 +0200)] 
tick/sched: Consolidate idle time fetching APIs

Fetching the idle cputime is available through a variety of accessors all
over the place depending on the different accounting flavours and needs:

  - idle vtime generic accounting can be accessed by kcpustat_field(),
    kcpustat_cpu_fetch(), get_idle/iowait_time() and
    get_cpu_idle/iowait_time_us()

  - dynticks-idle accounting can only be accessed by get_idle/iowait_time()
    or get_cpu_idle/iowait_time_us()

  - CONFIG_NO_HZ_COMMON=n idle accounting can be accessed by kcpustat_field()
    kcpustat_cpu_fetch(), or get_idle/iowait_time() but not by
    get_cpu_idle/iowait_time_us()

Moreover get_idle/iowait_time() relies on get_cpu_idle/iowait_time_us()
with a non-sensical conversion to microseconds and back to nanoseconds on
the way.

Start consolidating the APIs with removing get_idle/iowait_time() and make
kcpustat_field() and kcpustat_cpu_fetch() work for all cases.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-13-frederic@kernel.org
2 weeks agotick/sched: Account tickless idle cputime only when tick is stopped
Frederic Weisbecker [Fri, 8 May 2026 13:16:43 +0000 (15:16 +0200)] 
tick/sched: Account tickless idle cputime only when tick is stopped

There is no real point in switching to dyntick-idle cputime accounting mode
if the tick is not actually stopped. This just adds overhead, notably
fetching the GTOD, on each idle exit and each idle IRQ entry for no reason
during short idle trips.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-12-frederic@kernel.org
2 weeks agotick/sched: Remove unused fields
Frederic Weisbecker [Fri, 8 May 2026 13:16:42 +0000 (15:16 +0200)] 
tick/sched: Remove unused fields

Remove fields after the dyntick-idle cputime migration to scheduler code.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-11-frederic@kernel.org
2 weeks agotick/sched: Move dyntick-idle cputime accounting to cputime code
Frederic Weisbecker [Fri, 8 May 2026 13:16:41 +0000 (15:16 +0200)] 
tick/sched: Move dyntick-idle cputime accounting to cputime code

Although the dynticks-idle cputime accounting is necessarily tied to the
tick subsystem, the actual related accounting code has no business residing
there and should be part of the scheduler cputime code.

Move away the relevant pieces and state machine to where they belong.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-10-frederic@kernel.org
2 weeks agotick/sched: Remove nohz disabled special case in cputime fetch
Frederic Weisbecker [Fri, 8 May 2026 13:16:40 +0000 (15:16 +0200)] 
tick/sched: Remove nohz disabled special case in cputime fetch

Even when nohz is not runtime enabled, the dynticks idle cputime accounting
can run and the common idle cputime accessors are still relevant.

Remove the nohz disabled special case accordingly.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-9-frederic@kernel.org
2 weeks agotick/sched: Unify idle cputime accounting
Frederic Weisbecker [Fri, 8 May 2026 13:16:39 +0000 (15:16 +0200)] 
tick/sched: Unify idle cputime accounting

The non-vtime dynticks-idle cputime accounting is a big mess that
accumulates within two concurrent statistics, each having their own
shortcomings:

 * The accounting for online CPUs which is based on the delta between
   tick_nohz_start_idle() and tick_nohz_stop_idle().

   Pros:
       - Works when the tick is off

       - Has nsecs granularity

   Cons:
       - Account idle steal time but doesn't substract it from idle
         cputime.

       - Assumes CONFIG_IRQ_TIME_ACCOUNTING by not accounting IRQs but
         the IRQ time is simply ignored when
         CONFIG_IRQ_TIME_ACCOUNTING=n

       - The windows between 1) idle task scheduling and the first call
         to tick_nohz_start_idle() and 2) idle task between the last
         tick_nohz_stop_idle() and the rest of the idle time are
         blindspots wrt. cputime accounting (though mostly insignificant
         amount)

       - Relies on private fields outside of kernel stats, with specific
         accessors.

 * The accounting for offline CPUs which is based on ticks and the
   jiffies delta during which the tick was stopped.

   Pros:
       - Handles steal time correctly

       - Handle CONFIG_IRQ_TIME_ACCOUNTING=y and
         CONFIG_IRQ_TIME_ACCOUNTING=n correctly.

       - Handles the whole idle task

       - Accounts directly to kernel stats, without midlayer accumulator.

    Cons:
       - Doesn't elapse when the tick is off, which doesn't make it
         suitable for online CPUs.

       - Has TICK_NSEC granularity (jiffies)

       - Needs to track the dyntick-idle ticks that were accounted and
         substract them from the total jiffies time spent while the tick
         was stopped. This is an ugly workaround.

Having two different accounting for a single context is not the only
problem: since those accountings are of different natures, it is
possible to observe the global idle time going backward after a CPU goes
offline.

Clean up the situation with introducing a hybrid approach that stays
coherent and works for both online and offline CPUs:

  * Tick based or native vtime accounting operate before the idle loop
    is entered and resume once the idle loop prepares to exit.

  * When the idle loop starts, switch to dynticks-idle accounting as is
    done currently, except that the statistics accumulate directly to the
    relevant kernel stat fields.

  * Private dyntick cputime accounting fields are removed.

  * Works on both online and offline case.

Further improvement will include:

  * Only switch to dynticks-idle cputime accounting when the tick actually
    goes in dynticks mode.

  * Handle CONFIG_IRQ_TIME_ACCOUNTING=n correctly such that the
    dynticks-idle accounting still elapses while on IRQs.

  * Correctly substract idle steal cputime from idle time

Reported-by: Xin Zhao <jackzxcui1989@163.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-8-frederic@kernel.org
2 weeks agos390/time: Prepare to stop elapsing in dynticks-idle
Frederic Weisbecker [Fri, 8 May 2026 13:16:38 +0000 (15:16 +0200)] 
s390/time: Prepare to stop elapsing in dynticks-idle

Currently the tick subsystem stores the idle cputime accounting in private
fields, allowing cohabitation with architecture idle vtime accounting. The
former is fetched on online CPUs, the latter on offline CPUs.

For consolidation purposes, architecture vtime accounting will continue to
account the cputime but will make a break when the idle tick is
stopped. The dyntick cputime accounting will then be relayed by the tick
subsystem so that the idle cputime is still seen advancing coherently even
when the tick isn't there to flush the idle vtime.

Prepare for that and introduce three new APIs which will be used in
subsequent patches:

  - vtime_dynticks_start() is deemed to be called when idle enters in
    dyntick mode. The idle cputime that elapsed so far is accumulated
    and accounted. Also idle time accounting is ignored.

  - vtime_dynticks_stop() is deemed to be called when idle exits from
    dyntick mode. The vtime entry clocks are fast-forward to current time
    so that idle accounting restarts elapsing from now. Also idle time
    accounting is resumed.

  - vtime_reset() is deemed to be called from dynticks idle IRQ entry to
    fast-forward the clock to current time so that the IRQ time is still
    accounted by vtime while nohz cputime is paused.

Also accumulated vtime won't be flushed from dyntick-idle ticks to avoid
accounting twice the idle cputime, along with nohz accounting.

Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-7-frederic@kernel.org
2 weeks agopowerpc/time: Prepare to stop elapsing in dynticks-idle
Frederic Weisbecker [Fri, 8 May 2026 13:16:37 +0000 (15:16 +0200)] 
powerpc/time: Prepare to stop elapsing in dynticks-idle

Currently the tick subsystem stores the idle cputime accounting in
private fields, allowing cohabitation with architecture idle vtime
accounting. The former is fetched on online CPUs, the latter on offline
CPUs.

For consolidation purpose, architecture vtime accounting will continue
to account the cputime but will make a break when the idle tick is
stopped. The dyntick cputime accounting will then be relayed by the tick
subsystem so that the idle cputime is still seen advancing coherently
even when the tick isn't there to flush the idle vtime.

Prepare for that and introduce three new APIs which will be used in
subsequent patches:

  - vtime_dynticks_start() is deemed to be called when idle enters in
    dyntick mode. The idle cputime that elapsed so far is accumulated.

  - vtime_dynticks_stop() is deemed to be called when idle exits from
    dyntick mode. The vtime entry clocks are fast-forward to current time
    so that idle accounting restarts elapsing from now.

  - vtime_reset() is deemed to be called from dynticks idle IRQ entry to
    fast-forward the clock to current time so that the IRQ time is still
    accounted by vtime while nohz cputime is paused.

Also accumulated vtime won't be flushed from dyntick-idle ticks to avoid
accounting twice the idle cputime, along with nohz accounting.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-6-frederic@kernel.org
2 weeks agosched/cputime: Correctly support generic vtime idle time
Frederic Weisbecker [Fri, 8 May 2026 13:16:36 +0000 (15:16 +0200)] 
sched/cputime: Correctly support generic vtime idle time

Currently whether generic vtime is running or not, the idle cputime is
fetched from the nohz accounting.

However generic vtime already does its own idle cputime accounting. Only
the kernel stat accessors are not plugged to support it.

Read the idle generic vtime cputime when it's running, this will allow to
later more clearly split nohz and vtime cputime accounting.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-5-frederic@kernel.org
2 weeks agosched/cputime: Remove superfluous and error prone kcpustat_field() parameter
Frederic Weisbecker [Fri, 8 May 2026 13:16:35 +0000 (15:16 +0200)] 
sched/cputime: Remove superfluous and error prone kcpustat_field() parameter

The first parameter to kcpustat_field() is a pointer to the cpu kcpustat to
be fetched from. This parameter is error prone because a copy to a kcpustat
could be passed by accident instead of the original one. Also the kcpustat
structure can already be retrieved with the help of the mandatory CPU
argument.

Remove the needless parameter.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-4-frederic@kernel.org
2 weeks agosched/idle: Handle offlining first in idle loop
Frederic Weisbecker [Fri, 8 May 2026 13:16:34 +0000 (15:16 +0200)] 
sched/idle: Handle offlining first in idle loop

Offline handling happens from within the inner idle loop, after the
beginning of dyntick cputime accounting, nohz idle load balancing and
TIF_NEED_RESCHED polling.

This is not necessary and even buggy because:

  * There is no dyntick handling to do. And calling tick_nohz_idle_enter()
    messes up with the struct tick_sched reset that was performed on
    tick_sched_timer_dying().

  * There is no nohz idle balancing to do.

  * Polling on TIF_RESCHED is irrelevant at this stage, there are no more
    tasks allowed to run.

  * No need to check if need_resched() before offline handling since
    stop_machine is done and all per-cpu kthread should be done with
    their job.

Therefore move the offline handling at the beginning of the idle loop.
This will also ease the idle cputime unification later by not elapsing
idle time while offline through the call to:

   tick_nohz_idle_enter() -> tick_nohz_start_idle()

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-3-frederic@kernel.org
2 weeks agotick/sched: Fix TOCTOU in nohz idle time fetch
Frederic Weisbecker [Fri, 8 May 2026 13:16:33 +0000 (15:16 +0200)] 
tick/sched: Fix TOCTOU in nohz idle time fetch

When the nohz idle time is fetched, the current clock timestamp is taken
outside the seqcount, which can result in a race as reported by Sashiko:

    get_cpu_sleep_time_us()                 tick_nohz_start_idle()
    -----------------------                 ---------------------
    now = ktime_get()
                                            write_seqcount_begin(idle_sleeptime_seq);
                                            idle_entrytime = ktime_get()
                                            tick_sched_flag_set(ts, TS_FLAG_IDLE_ACTIVE);
                                            write_seqcount_end(&ts->idle_sleeptime_seq);
    read_seqcount_begin(idle_sleeptime_seq)
    delta = now - idle_entrytime);
    //!! But now < idle_entrytime
    idle = *sleeptime +  delta;
    read_seqcount_retry(&ts->idle_sleeptime_seq, seq)

Here the read side fetches the timestamp before the write side and its
update. As a result the time delta computed on the read side is negative
(ktime_t is signed) and breaks the cputime monotonicity guarantee.

This could possibly be fixed with reading the current clock timestamp
inside the seqcount but the reader overhead might then increase. Also
simply checking that the current timestamp is above the idle entry time
is enough to prevent any issue of the like.

Fixes: 620a30fa0bd1 ("timers/nohz: Protect idle/iowait sleep time under seqcount")
Reported-by: Sashiko
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260508131647.43868-2-frederic@kernel.org
2 weeks agonet: garp: fix unsigned integer underflow in garp_pdu_parse_attr
Yizhou Zhao [Wed, 27 May 2026 08:31:58 +0000 (16:31 +0800)] 
net: garp: fix unsigned integer underflow in garp_pdu_parse_attr

The receive-side GARP attribute parser computes dlen with reversed
operands:

        dlen = sizeof(*ga) - ga->len;

ga->len is the on-wire attribute length and includes the GARP attribute
header. For normal attributes with data, ga->len is larger than
sizeof(*ga), so the subtraction underflows in unsigned arithmetic.

The resulting value is later passed to garp_attr_lookup(), whose length
argument is u8. After truncation, the parsed data length usually no
longer matches the length stored for locally registered attributes, so
received Join/Leave events are ignored. This breaks the GARP receive path
for common attributes, such as GVRP VLAN registration attributes.

Compute the data length as the attribute length minus the header length.

Fixes: eca9ebac651f ("net: Add GARP applicant-only participant")
Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
Reported-by: Ao Wang <wangao@seu.edu.cn>
Reported-by: Xuewei Feng <fengxw06@126.com>
Reported-by: Qi Li <qli01@tsinghua.edu.cn>
Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260527083200.42861-1-zhaoyz24@mails.tsinghua.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: drv-net: tso: add new tests for ip6tnl, ipip, and sit tunnels
Daniel Zahka [Fri, 29 May 2026 12:45:55 +0000 (05:45 -0700)] 
selftests: drv-net: tso: add new tests for ip6tnl, ipip, and sit tunnels

Add new tunnel test cases for ip6tnl, ipip, and sit. ip6tnl supports
ipv[46] as inner l3 header, and the other two tunnels only support a
single inner l3 type.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20260529-tso-tunnels-v1-1-3771ee9eaaa9@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agotime: Fix off-by-one in settimeofday() usec validation
Naveen Kumar Chaudhary [Tue, 2 Jun 2026 18:07:37 +0000 (23:37 +0530)] 
time: Fix off-by-one in settimeofday() usec validation

The validation check uses '>' instead of '>=' when comparing tv_usec
against USEC_PER_SEC, allowing the value 1000000 through. After
conversion to nanoseconds (*= 1000), this produces tv_nsec ==
NSEC_PER_SEC, violating the timespec invariant that tv_nsec must be
less than NSEC_PER_SEC.

Use '>=' to reject tv_usec values that are not in the valid range of
0 to 999999.

Fixes: 5e0fb1b57bea ("y2038: time: avoid timespec usage in settimeofday()")
Signed-off-by: Naveen Kumar Chaudhary <naveen.osdev@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/4rikk44zew3s6577dugmx4jyblz7o5c57niuap6ct3td5yfm6w@gh7pcumg7qor
2 weeks agoclockevents: Fix duplicate type specifier in stub function parameter
Naveen Kumar Chaudhary [Mon, 1 Jun 2026 03:57:46 +0000 (09:27 +0530)] 
clockevents: Fix duplicate type specifier in stub function parameter

The stub for arch_inlined_clockevent_set_next_coupled() has 'u64 u64
cycles' in its parameter list. Since u64 is a typedef, the compiler
parses the second 'u64' as the parameter name, making 'cycles' an
unused token. Remove the duplicate so the parameter is correctly named.

Fixes: 89f951a1e8ad ("clockevents: Provide support for clocksource coupled comparators")
Signed-off-by: Naveen Kumar Chaudhary <naveen.osdev@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/7tostpvxzdn6tobmyow63a5rweatls5kux3scqp2vzhe7mv6uq@ecr746b4hyhf
2 weeks agoACPI: button: Switch over to devres-based resource management
Rafael J. Wysocki [Mon, 1 Jun 2026 17:12:58 +0000 (19:12 +0200)] 
ACPI: button: Switch over to devres-based resource management

Switch over the ACPI button driver to devres-based resource management
by making the following changes:

 * Use devm_kzalloc() for allocating button object memory.

 * Use devm_input_allocate_device() for allocating the input class
   device object.

 * Turn acpi_lid_remove_fs() into a devm cleanup action added
   by devm_acpi_lid_add_fs() which is a new wrapper around
   acpi_lid_add_fs().

 * Add devm_acpi_button_init_wakeup() for initializing the wakeup source
   and make it add a custom devm action that will automatically remove
   the wakeup source registered by it.

 * Turn acpi_button_remove_event_handler() into a devm cleanup action
   added by devm_acpi_button_add_event_handler() which is a new wrapper
   around acpi_button_add_event_handler().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2283436.Mh6RI2rZIc@rafael.j.wysocki
[ rjw: Rebased and removed unnecessary input device parent assignment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2 weeks agoACPI: button: Reorganize installing and removing event handlers
Rafael J. Wysocki [Mon, 1 Jun 2026 17:12:19 +0000 (19:12 +0200)] 
ACPI: button: Reorganize installing and removing event handlers

To facilitate subsequent changes, move the code installing and
removing button event handlers into two separate functions called
acpi_button_add_event_handler() and acpi_button_remove_event_handler(),
respectively, and rearrange it to reduce code duplication.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2714170.Lt9SDvczpP@rafael.j.wysocki
2 weeks agoACPI: button: Use string literals for generating netlink messages
Rafael J. Wysocki [Mon, 1 Jun 2026 17:07:57 +0000 (19:07 +0200)] 
ACPI: button: Use string literals for generating netlink messages

Instead of storing strings that never change later under
acpi_device_class(device) and using them for generating netlink
messages, use pointers to string literals with the same content.

This also allows the clearing of the acpi_device_class(device)
area during driver removal and in the probe rollback path to be
dropped.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2070791.usQuhbGJ8B@rafael.j.wysocki
2 weeks agoACPI: button: Clean up adding and removing lid procfs interface
Rafael J. Wysocki [Mon, 1 Jun 2026 17:07:14 +0000 (19:07 +0200)] 
ACPI: button: Clean up adding and removing lid procfs interface

The procfs interface is only used with lid devices which only becomes
clear after looking into the function bodies of acpi_button_add_fs()
and acpi_button_remove_fs().  Moreover, the only error code returned
by the former of these functions is -ENODEV, so the ret local variable
in it is redundant, and the return type of the latter one can be changed
to void.

Accordingly, rename these functions to acpi_button_add_fs() and
acpi_button_remove_fs(), respectively, move the button->type checks
against ACPI_BUTTON_TYPE_LID from them to their callers, and make
code simplifications as per the above.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1869050.VLH7GnMWUR@rafael.j.wysocki
2 weeks agoACPI: button: Merge two switch () statements in acpi_button_probe()
Rafael J. Wysocki [Mon, 1 Jun 2026 17:05:31 +0000 (19:05 +0200)] 
ACPI: button: Merge two switch () statements in acpi_button_probe()

Two switch () statements in acpi_button_probe() operate on the same
value and the statements between them can be reordered with respect
to the second one, so merge them.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3352815.5fSG56mABF@rafael.j.wysocki
2 weeks agoACPI: button: Drop redundant variable from acpi_button_probe()
Rafael J. Wysocki [Mon, 1 Jun 2026 17:04:04 +0000 (19:04 +0200)] 
ACPI: button: Drop redundant variable from acpi_button_probe()

Local char pointer called "name" in acpi_button_probe() is redundant
because its value can be assigned directly to input->name and the
latter can be used in the only other place where "name" is read, so
get rid of it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3706239.iIbC2pHGDl@rafael.j.wysocki
2 weeks agoACPI: button: Rework device verification during probe
Rafael J. Wysocki [Mon, 1 Jun 2026 17:03:24 +0000 (19:03 +0200)] 
ACPI: button: Rework device verification during probe

Instead of manually comparing the primary ID of the device (retuned
by _HID) with each of the device IDs supported by the driver, use
acpi_match_acpi_device() (which includes the ACPI companion device
pointer check against NULL) and store the ACPI button type as
driver_data in button_device_ids[], which allows a multi-branch
conditional statement to be replaced with a switch () one.  However,
to continue preventing successful probing of devices that only have
one of the supported device IDs in their _CID lists, compare the
matched device ID with the primary ID of the device and return an
error if they don't match.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/7960518.EvYhyI6sBW@rafael.j.wysocki
[ rjw: Fixed button memory leak on probe failure ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2 weeks agontsync: Honour caller's time namespace for absolute MONOTONIC timeouts
Maoyi Xie [Thu, 28 May 2026 06:33:11 +0000 (14:33 +0800)] 
ntsync: Honour caller's time namespace for absolute MONOTONIC timeouts

ntsync_schedule() takes the absolute timeout from userspace and hands it to
schedule_hrtimeout_range_clock() with HRTIMER_MODE_ABS. For the default
CLOCK_MONOTONIC path, it does not call timens_ktime_to_host() first.

A process inside a CLOCK_MONOTONIC time namespace computes the absolute
timeout in its own clock view. The kernel reads the same value against the
host clock. The two differ by the namespace offset.  The timeout then fires
too early or too late.

Other users of absolute timeouts run the ktime through
timens_ktime_to_host() before starting the hrtimer. ntsync was added later
and missed that step.

/dev/ntsync is mode 0666. Any user inside a time namespace that can
open it is affected. The visible effect is wrong timeout behaviour
for Wine in a container that sets a CLOCK_MONOTONIC offset.

Reproducer: unshare --user --time, set the monotonic offset to -10s,
issue NTSYNC_IOC_WAIT_ANY with a 100 ms absolute MONOTONIC timeout.
The baseline run elapses about 100 ms. The run inside the namespace
elapses about 0 ms.

Apply timens_ktime_to_host() to the parsed timeout when the caller
did not set NTSYNC_WAIT_REALTIME. The helper does nothing in the
initial time namespace, so the fast path is unchanged.

Fixes: b4a7b5fe3f51 ("ntsync: Introduce NTSYNC_IOC_WAIT_ANY.")
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://patch.msgid.link/20260528063311.3300393-3-maoyixie.tju@gmail.com
2 weeks agotime/namespace: Export init_time_ns and do_timens_ktime_to_host()
Maoyi Xie [Thu, 28 May 2026 06:33:10 +0000 (14:33 +0800)] 
time/namespace: Export init_time_ns and do_timens_ktime_to_host()

timens_ktime_to_host() in compares the current time namespace against
init_time_ns for the fast path. It calls do_timens_ktime_to_host() for the
offset case. Both symbols are needed at link time by any caller of the
inline.

All current callers are builtin, but ntsync can be built as module, which
prevents it from using it.

Export both with EXPORT_SYMBOL_GPL.

Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260528063311.3300393-2-maoyixie.tju@gmail.com
2 weeks agohsr: Remove WARN_ONCE() in hsr_addr_is_self().
Kuniyuki Iwashima [Sat, 30 May 2026 06:42:58 +0000 (06:42 +0000)] 
hsr: Remove WARN_ONCE() in hsr_addr_is_self().

syzbot reported the warning [0] in hsr_addr_is_self(),
whose assumption is simply wrong.

hsr->self_node is cleared in hsr_del_self_node(), which
is called from hsr_dellink().

Since dev->rtnl_link_ops->dellink() is called before
unregister_netdevice_many(), there is a window when
user can find the device but without hsr->self_node.

Let's remove WARN_ONCE() in hsr_addr_is_self().

[0]:
HSR: No self node
WARNING: net/hsr/hsr_framereg.c:39 at hsr_addr_is_self+0x211/0x3f0 net/hsr/hsr_framereg.c:39, CPU#0: syz.4.16848/17220
Modules linked in:
CPU: 0 UID: 0 PID: 17220 Comm: syz.4.16848 Tainted: G             L      syzkaller #0 PREEMPT_{RT,(full)}
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
RIP: 0010:hsr_addr_is_self+0x211/0x3f0 net/hsr/hsr_framereg.c:39
Code: 33 2f 41 0f b7 dd 89 ee 09 de 31 ff e8 c8 b4 c6 f6 09 dd 74 54 e8 0f b0 c6 f6 31 ed eb 53 e8 06 b0 c6 f6 48 8d 3d 2f 50 9c 04 <67> 48 0f b9 3a 31 ed eb 42 e8 c1 13 1f 00 89 c5 31 ff 89 c6 e8 96
RSP: 0018:ffffc900041c70e0 EFLAGS: 00010283
RAX: ffffffff8afdc6ca RBX: ffffffff8afdc4e6 RCX: 0000000000080000
RDX: ffffc90010493000 RSI: 0000000000000948 RDI: ffffffff8f9a1700
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: ffffc900041c71e8 R11: fffff52000838e3f R12: dffffc0000000000
R13: ffff888041f9e3c0 R14: ffff888086ee3802 R15: 0000000000000000
FS:  00007f6fe985d6c0(0000) GS:ffff888126176000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f80bd437dac CR3: 0000000025096000 CR4: 00000000003526f0
DR0: ffffffffffffffff DR1: 00000000000001f8 DR2: 0000000000000002
DR3: ffffffffefffff15 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 check_local_dest net/hsr/hsr_forward.c:592 [inline]
 fill_frame_info net/hsr/hsr_forward.c:728 [inline]
 hsr_forward_skb+0xa11/0x2a80 net/hsr/hsr_forward.c:739
 hsr_dev_xmit+0x253/0x370 net/hsr/hsr_device.c:236
 __netdev_start_xmit include/linux/netdevice.h:5368 [inline]
 netdev_start_xmit include/linux/netdevice.h:5377 [inline]
 xmit_one net/core/dev.c:3888 [inline]
 dev_hard_start_xmit+0x2df/0x860 net/core/dev.c:3904
 __dev_queue_xmit+0x1428/0x3900 net/core/dev.c:4870
 neigh_output include/net/neighbour.h:556 [inline]
 ip_finish_output2+0xcec/0x10b0 net/ipv4/ip_output.c:237
 ip_send_skb net/ipv4/ip_output.c:1510 [inline]
 ip_push_pending_frames+0x8b/0x110 net/ipv4/ip_output.c:1530
 raw_sendmsg+0x1547/0x1a50 net/ipv4/raw.c:659
 sock_sendmsg_nosec net/socket.c:787 [inline]
 __sock_sendmsg net/socket.c:802 [inline]
 ____sys_sendmsg+0x7da/0x9c0 net/socket.c:2698
 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
 __sys_sendmsg net/socket.c:2784 [inline]
 __do_sys_sendmsg net/socket.c:2789 [inline]
 __se_sys_sendmsg net/socket.c:2787 [inline]
 __x64_sys_sendmsg+0x1c3/0x2a0 net/socket.c:2787
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6feb62ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6fe985d028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f6feb8a6090 RCX: 00007f6feb62ce59
RDX: 0000000000000000 RSI: 0000200000000000 RDI: 0000000000000004
RBP: 00007f6feb6c2d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f6feb8a6128 R14: 00007f6feb8a6090 R15: 00007ffcf01cc488
 </TASK>

Fixes: f266a683a480 ("net/hsr: Better frame dispatch")
Reported-by: syzbot+652670cf249077eb498b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a1a861e.b111c304.35cd64.0016.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260530064300.340793-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobpf: Silence unused-but-set-variable warning in bpf_for_each_reg_in_vstate_mask
Amery Hung [Tue, 2 Jun 2026 17:52:04 +0000 (10:52 -0700)] 
bpf: Silence unused-but-set-variable warning in bpf_for_each_reg_in_vstate_mask

The macro requires callers to pass a stack variable, but not all
callbacks use it. Add (void)__stack to suppress the clang W=1 warning.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260602175204.624401-1-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agoMerge tag 'batadv-next-pullrequest-20260601' of https://git.open-mesh.org/batadv
Jakub Kicinski [Tue, 2 Jun 2026 19:00:48 +0000 (12:00 -0700)] 
Merge tag 'batadv-next-pullrequest-20260601' of https://git.open-mesh.org/batadv

Simon Wunderlich says:

====================
This batman-adv cleanup patchset includes the following patches, all by
Sven Eckelmann:

 - drop batman-adv specific version

 - MAINTAINERS housekeeping for batman-adv (two patches)

 - add missing includes

 - use atomic_xchg() for gw.reselect check

 - extract netdev wifi detection information object

 - replace inappropriate atomic access with (READ|WRITE)_ONCE
   (six patches)

 - tt: replace open-coded overflow check with helper

 - tvlv: avoid unnecessary OGM buffer reallocations

 - use neigh_node's orig_node only as id

* tag 'batadv-next-pullrequest-20260601' of https://git.open-mesh.org/batadv:
  batman-adv: use neigh_node's orig_node only as id
  batman-adv: tvlv: avoid unnecessary OGM buffer reallocations
  batman-adv: tt: replace open-coded overflow check with helper
  batman-adv: replace non-atomic last_ttvn with (READ|WRITE)_ONCE
  batman-adv: replace non-atomic packet_size_max with (READ|WRITE)_ONCE
  batman-adv: replace non-atomic mesh state with (READ|WRITE)_ONCE
  batman-adv: replace non-atomic vlan config fields with (READ|WRITE)_ONCE
  batman-adv: replace non-atomic hardif config fields with (READ|WRITE)_ONCE
  batman-adv: replace non-atomic meshif config fields with (READ|WRITE)_ONCE
  batman-adv: extract netdev wifi detection information object
  batman-adv: use atomic_xchg() for gw.reselect check
  batman-adv: add missing includes
  MAINTAINERS: Don't send batman-adv patches to netdev
  MAINTAINERS: Rename batman-adv T(ree)
  batman-adv: drop batman-adv specific version
====================

Link: https://patch.msgid.link/20260601123629.707089-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge tag 'nf-26-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Tue, 2 Jun 2026 18:57:21 +0000 (11:57 -0700)] 
Merge tag 'nf-26-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for net:

1) Fix splat with PREEMPT_RCU because smp_processor_id() in nfqueue,
   from Fernando Fernandez Mancera.

2) Fix possible use of pointer to old IPVS scheduler after RCU grace
   period when editing service, from Julian Anastasov.

3) Fix possible forever RCU walk over rt->fib6_siblings in nft_fib6,
   if rt is unlinked mid-iteration, apparently same issue happens in
   the fib6 core. From Jiayuan Chen.

4) Add mutex to guard refcount in synproxy infrastructure, since
   concurrent hook {un}registration can happen.
   From Fernando Fernandez Mancera.

5) Bail out if IRC conntrack helper fails to parse a command, do not
   try parsing using other command handlers, from Florian Westphal.
   This fixes a possible out-of-bound read.

6) Possible use-after-free in nft_tunnel by releasing template dst
   after all references has been dropped, from Tristan Madani.

7) Ignore conntrack template in nft_ct, from Jiayuan Chen.

8) Missing skb_ensure_writable() in ebt_snat, Yiming Qian.

9) Remove multi-register byteorder support, this allows for kernel
   stack info leak, from Florian Westphal.

* tag 'nf-26-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_byteorder: remove multi-register support
  netfilter: bridge: make ebt_snat ARP rewrite writable
  netfilter: nft_ct: bail out on template ct in get eval
  netfilter: nft_tunnel: fix use-after-free on object destroy
  netfilter: conntrack_irc: fix possible out-of-bounds read
  netfilter: synproxy: add mutex to guard hook reference counting
  netfilter: nft_fib_ipv6: bail out of sibling walk if rt got unlinked
  ipvs: clear the svc scheduler ptr early on edit
  netfilter: xt_NFQUEUE: prefer raw_smp_processor_id
====================

Link: https://patch.msgid.link/20260601115923.433946-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agotcp: Add preempt_{disable,enable}_nested() in reqsk_queue_hash_req().
Kuniyuki Iwashima [Mon, 1 Jun 2026 18:20:55 +0000 (18:20 +0000)] 
tcp: Add preempt_{disable,enable}_nested() in reqsk_queue_hash_req().

syzbot reported a weird reqsk->rsk_refcnt underflow in
__inet_csk_reqsk_queue_drop().

The captured reqsk_put() in __inet_csk_reqsk_queue_drop()
is called only when it successfully removes reqsk from ehash.

Moreover, reqsk_timer_handler() calls another reqsk_put()
after that.

This indicates that the reqsk was missing both refcnts for
ehash and the timer itself.

Since all the syzbot reports had PREEMPT_RT enabled, the only
possible scenario is that reqsk_queue_hash_req() is preempted
after mod_timer() and before refcount_set(), and then the timer
triggered after 1s aborts the reqsk due to its listener's close().

Let's wrap mod_timer() and refcount_set() with
preempt_disable_nested() and preempt_enable_nested().

Note that inet_ehash_insert() holds the normal spin_lock()
(mutex in PREEMPT_RT), so it must be called outside of
preempt_disable_nested(), but this is fine.

The lookup path just ignores 0 sk_refcnt entries in ehash
and tries to create another reqsk, but this will fail at
inet_ehash_insert().

[0]:
refcount_t: underflow; use-after-free.
WARNING: lib/refcount.c:28 at refcount_warn_saturate+0xb2/0x110 lib/refcount.c:28, CPU#0: ktimers/0/16
Modules linked in:
CPU: 0 UID: 0 PID: 16 Comm: ktimers/0 Tainted: G             L      syzkaller #0 PREEMPT_{RT,(full)}
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
RIP: 0010:refcount_warn_saturate+0xb2/0x110 lib/refcount.c:28
Code: e4 7d d1 0a 67 48 0f b9 3a eb 4a e8 38 3d 23 fd 48 8d 3d e1 7d d1 0a 67 48 0f b9 3a eb 37 e8 25 3d 23 fd 48 8d 3d de 7d d1 0a <67> 48 0f b9 3a eb 24 e8 12 3d 23 fd 48 8d 3d db 7d d1 0a 67 48 0f
RSP: 0000:ffffc90000157948 EFLAGS: 00010246
RAX: ffffffff84a1301b RBX: 0000000000000003 RCX: ffff88801ca98000
RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffffff8f72ae00
RBP: ffffffff99ae3b01 R08: ffff88801ca98000 R09: 0000000000000005
R10: 0000000000000100 R11: 0000000000000004 R12: ffff8880425ef568
R13: ffff8880425ef4f8 R14: ffff8880425ef578 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff888126386000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7b46710e9c CR3: 000000000dbb6000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 __refcount_sub_and_test include/linux/refcount.h:400 [inline]
 __refcount_dec_and_test include/linux/refcount.h:432 [inline]
 refcount_dec_and_test include/linux/refcount.h:450 [inline]
 reqsk_put include/net/request_sock.h:136 [inline]
 __inet_csk_reqsk_queue_drop+0x3ce/0x440 net/ipv4/inet_connection_sock.c:1007
 reqsk_timer_handler+0x651/0xdf0 net/ipv4/inet_connection_sock.c:1137
 call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748
 expire_timers kernel/time/timer.c:1799 [inline]
 __run_timers kernel/time/timer.c:2374 [inline]
 __run_timer_base+0x6a3/0x9f0 kernel/time/timer.c:2386
 run_timer_base kernel/time/timer.c:2395 [inline]
 run_timer_softirq+0x67/0x170 kernel/time/timer.c:2403
 handle_softirqs+0x1de/0x6d0 kernel/softirq.c:622
 __do_softirq kernel/softirq.c:656 [inline]
 run_ktimerd+0x69/0x100 kernel/softirq.c:1151
 smpboot_thread_fn+0x541/0xa50 kernel/smpboot.c:160
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

Fixes: d2d6422f8bd1 ("x86: Allow to enable PREEMPT_RT.")
Reported-by: syzbot+e809069bc15f26300526@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6a1a7bcf.0a9e871e.332604.000b.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20260601182101.3183993-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: Annotate sk->sk_write_space() for UDP SOCKMAP.
Kuniyuki Iwashima [Fri, 29 May 2026 19:39:23 +0000 (19:39 +0000)] 
net: Annotate sk->sk_write_space() for UDP SOCKMAP.

UDP TX skb->destructor() is sock_wfree(), and UDP holds lock_sock()
only for UDP_CORK / MSG_MORE sendmsg().

Otherwise, sk->sk_write_space() may be read locklessly while SOCKMAP
rewrites sk->sk_write_space().

Let's use WRITE_ONCE() and READ_ONCE() for sk->sk_write_space().

Note that the write side is annotated by commit 2ef2b20cf4e0
("net: annotate data-races around sk->sk_{data_ready,write_space}").

Fixes: 7b98cd42b049 ("bpf: sockmap: Add UDP support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20260529193941.3897256-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoaf_unix: Remove sock->state assignment.
Kuniyuki Iwashima [Fri, 29 May 2026 19:18:02 +0000 (19:18 +0000)] 
af_unix: Remove sock->state assignment.

Both struct socket and struct sock have a variable to
manage its state, sock->state and sk->sk_state.

When both are used, the former typically manages syscall
state and the latter manages the actual connection state.

AF_UNIX only uses sk->sk_state.

Let's remove unnecessary assignemnts for sock->state.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260529191829.3864438-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: net: add socat syslog for PPPoL2TP
Qingfang Deng [Fri, 29 May 2026 02:11:42 +0000 (10:11 +0800)] 
selftests: net: add socat syslog for PPPoL2TP

As done in pppoe.sh, start socat as the syslog listener. In case the
test fails, dump its log to see what's going on.

Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Link: https://patch.msgid.link/20260529021146.5739-1-qingfang.deng@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'net-phy-dp83822-add-optional-external-phy-clock'
Jakub Kicinski [Tue, 2 Jun 2026 18:38:03 +0000 (11:38 -0700)] 
Merge branch 'net-phy-dp83822-add-optional-external-phy-clock'

Stefan Wahren says:

====================
net: phy: dp83822: Add optional external PHY clock

This small series implement support for external PHY clock for the
dp83822 driver.
====================

Link: https://patch.msgid.link/20260528184642.33424-1-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: phy: dp83822: Add optional external PHY clock
Stefan Wahren [Thu, 28 May 2026 18:46:42 +0000 (20:46 +0200)] 
net: phy: dp83822: Add optional external PHY clock

In some cases, the PHY can use an external ref clock source instead of a
crystal.

Add an optional clock in the PHY node to make sure that the clock source
is enabled, if specified, before probing.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260528184642.33424-3-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: phy: dp83822: Improve readability in dp8382x_probe
Stefan Wahren [Thu, 28 May 2026 18:46:41 +0000 (20:46 +0200)] 
net: phy: dp83822: Improve readability in dp8382x_probe

Introduce a local pointer for device so devm_kzalloc() fit into
a single line. Also this makes following changes easier to read.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260528184642.33424-2-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agopcnet32: stop holding device spin lock during napi_complete_done
Oscar Maes [Thu, 28 May 2026 14:03:20 +0000 (16:03 +0200)] 
pcnet32: stop holding device spin lock during napi_complete_done

napi_complete_done may call gro_flush_normal (though not currently, as GRO
is unsupported at the moment), which may result in packet TX. This will
eventually result in calling pcnet32_start_xmit - resulting in a deadlock
while trying to re-acquire the already locked spin lock.

It is safe to split the spinlock block into two, because the hardware
registers are still protected from concurrent access, and the two blocks
perform unrelated operations that don't need to happen atomically.

Fixes: 5b2ec6f2be51 ("pcnet32: use napi_complete_done()")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260528140320.5556-1-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agocgroup: Migrate tasks to the root css when a controller is rebound
Tejun Heo [Mon, 1 Jun 2026 18:56:04 +0000 (08:56 -1000)] 
cgroup: Migrate tasks to the root css when a controller is rebound

cgroup_apply_control_disable() defers kill_css_finish() while a css is
still populated, relying on css_update_populated() to fire the deferred
kill once the populated count reaches zero.

This deadlocks when a controller is rebound out of a hierarchy. Mounting
an implicit_on_dfl controller such as perf_event as a v1 hierarchy steals
it off the default hierarchy, and rebind_subsystems() kills its
per-cgroup csses while they are still populated. The migration run in the
same step keeps the old css for a controller no longer in the hierarchy's
mask, so no task is migrated off the dying csses. Their populated count
never reaches zero, the deferred kill_css_finish() never fires, and the
next cgroup_lock_and_drain_offline() hangs forever under cgroup_mutex.

That migration is already a no-op pass over the rebound subtree. Add
cgroup_rebind_ss_mask so find_existing_css_set() resolves the leaving
controllers to the root css. Their tasks are migrated there, the
per-cgroup csses depopulate, and cgroup_apply_control_disable() kills
them synchronously. The deferral stays correct for the rmdir and
controller-disable paths it was meant for.

Fixes: 1dffd95575eb ("cgroup: Defer kill_css_finish() in cgroup_apply_control_disable()")
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/41cd159c-54e5-45e0-81df-eaf36a6c028e@sirena.org.uk/
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/all/4e986b4ed7e16547805d54b6e67d09120bc4d2f2.camel@web.de/
Tested-by: Mark Brown <broonie@kernel.org>
Tested-by: Bert Karwatzki <spasswolf@web.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
2 weeks agotcp: change bpf_skops_hdr_opt_len() signature
Eric Dumazet [Mon, 1 Jun 2026 09:38:19 +0000 (09:38 +0000)] 
tcp: change bpf_skops_hdr_opt_len() signature

Some compilers do not inline bpf_skops_hdr_opt_len() from
tcp_established_options(), forcing an expensive stack canary
when CONFIG_STACKPROTECTOR_STRONG=y.

Change bpf_skops_hdr_opt_len() to return @remaining by value
to remove this stack canary from TCP fast path.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 1/1 up/down: 10/-59 (-49)
Function                                     old     new   delta
bpf_skops_hdr_opt_len                        297     307     +10
tcp_established_options                      574     515     -59
Total: Before=31456795, After=31456746, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260601093819.469626-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: dsa: b53: hide legacy gpiolib usage on non-mips
Arnd Bergmann [Mon, 1 Jun 2026 16:56:42 +0000 (18:56 +0200)] 
net: dsa: b53: hide legacy gpiolib usage on non-mips

The MIPS bcm53xx platform still uses the legacy gpiolib interfaces based
on gpio numbers, but other platforms do not.

Hide these interfaces inside of the existing #ifdef block and use the
modern interfaces in the common parts of the driver to allow building
it when the gpio_set_value() is left out of the kernel.

Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260601165716.648230-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge tag 'soc-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Tue, 2 Jun 2026 17:54:11 +0000 (10:54 -0700)] 
Merge tag 'soc-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
 "Following the previous set of fixes, this addresses another
  significant number of small issues found in firmware drivers (tee,
  optee, qcomtee, qcom ice, exynos acpm) drivers through various tools.

  This is about error handling, resource leaks, concurrency and a
  use-after-free bug.

  The fixes for the Qualcomm ICE driver also introduce interface changes
  in the UFS and MMC drivers using it.

  Outside of firmware drivers, there are a few fixes across the tree:

   - Minor driver code mistakes in the Atmel EBI memory controller, the
     i.MX soc ID driver and socfpga boot logic

   - A defconfig change to avoid a boot time regression on multiple
     qualcomm boards

   - Device tree fixes for qualcomm, at91 and gemini, addressing mostly
     minor configuration mistakes"

* tag 'soc-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (28 commits)
  firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion
  firmware: samsung: acpm: Fix missing LKMM barriers in sequence allocator
  firmware: samsung: acpm: Fix false timeouts and Use-After-Free in polling
  ARM: dts: gemini: Fix partition offsets
  ARM: socfpga: Fix OF node refcount leak in SMP setup
  soc: qcom: ice: Fix the error code when 'qcom,ice' property is not found
  arm64: dts: qcom: eliza: Add power-domain and iface clk for ice node
  arm64: dts: qcom: milos: Add power-domain and iface clk for ice node
  tee: qcomtee: add missing va_end in early return qcomtee_object_user_init()
  tee: fix params_from_user() error path in tee_ioctl_supp_recv
  tee: shm: fix shm leak in register_shm_helper()
  tee: fix tee_ioctl_object_invoke_arg padding
  arm64: defconfig: Enable PCI M.2 power sequencing driver
  scsi: ufs: ufs-qcom: Remove NULL check from devm_of_qcom_ice_get()
  mmc: sdhci-msm: Remove NULL check from devm_of_qcom_ice_get()
  soc: qcom: ice: Return proper error codes from devm_of_qcom_ice_get() instead of NULL
  soc: qcom: ice: Return -ENODEV if the ICE platform device is not found
  soc: qcom: ice: Fix race between qcom_ice_probe() and of_qcom_ice_get()
  ARM: dts: microchip: sam9x7: fix GMAC clock configuration
  firmware: samsung: acpm: Fix mailbox channel leak on probe error
  ...

2 weeks agoALSA: seq: oss: Reject reads that cannot fit the next event
Cássio Gabriel [Tue, 2 Jun 2026 11:18:39 +0000 (08:18 -0300)] 
ALSA: seq: oss: Reject reads that cannot fit the next event

snd_seq_oss_read() checks whether the next queued OSS sequencer event
fits in the remaining userspace buffer before removing it from the read
queue.

The check is inverted. It currently stops when the event is smaller than
the remaining buffer, so a normal 4-byte event is not copied for an
8-byte read buffer. Conversely, an 8-byte event can be copied for a
smaller read count.

Break only when the remaining userspace buffer is smaller than the next
event, and report -EINVAL if no complete event has been copied. This
prevents an undersized read from looking like end-of-file while leaving
the event queued for a later read with a large enough buffer.

Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260602-alsa-seq-oss-read-size-check-v1-1-10e59b1742e0@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 weeks agoRDMA/efa: Validate SQ ring size against max LLQ size
Yonatan Nachum [Tue, 26 May 2026 08:15:36 +0000 (08:15 +0000)] 
RDMA/efa: Validate SQ ring size against max LLQ size

Validate the SQ ring size against the device's max LLQ size. This
ensures that when using 128-byte WQEs, userspace cannot exceed the queue
limits.

On create QP, userspace provides the SQ ring size (depth x WQE size)
which is validated against the max LLQ size.

Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation")
Link: https://patch.msgid.link/r/20260526081536.1203553-1-ynachum@amazon.com
Reviewed-by: Michael Margolin <mrgolin@amazon.com>
Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2 weeks agoALSA: seq: Restore created port information after insertion
Cássio Gabriel [Tue, 2 Jun 2026 10:55:46 +0000 (07:55 -0300)] 
ALSA: seq: Restore created port information after insertion

Commit 2ee646353cd5 ("ALSA: seq: Register kernel port with full
information") split sequencer port creation from list insertion so a
port can be filled before it becomes visible.

However, snd_seq_ioctl_create_port() still copies port->addr back to the
ioctl argument before snd_seq_insert_port() assigns the final port
number. A successful SNDRV_SEQ_IOCTL_CREATE_PORT without
SNDRV_SEQ_PORT_FLG_GIVEN_PORT can therefore report port -1 to userspace.

Move the ioctl address copy after successful insertion, and keep the
default "port-%d" name assignment from overwriting a caller-provided port
name. This restores the observable behavior from before the split while
keeping the port populated before publication.

Fixes: 2ee646353cd5 ("ALSA: seq: Register kernel port with full information")
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260602-alsa-seq-create-port-info-fix-v1-1-eec0280131e9@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 weeks agoMerge branches 'rcutorture.2026.05.24' and 'misc.2026.05.24' into rcu-merge.2026...
Uladzislau Rezki (Sony) [Tue, 2 Jun 2026 17:45:08 +0000 (19:45 +0200)] 
Merge branches 'rcutorture.2026.05.24' and 'misc.2026.05.24' into rcu-merge.2026.05.24

rcutorture.2026.05.24: Torture-test updates
misc.2026.05.24: Miscellaneous RCU updates

2 weeks agorcu/nocb: reduce stack usage in nocb_gp_wait()
Arnd Bergmann [Tue, 19 May 2026 19:01:28 +0000 (21:01 +0200)] 
rcu/nocb: reduce stack usage in nocb_gp_wait()

When CONFIG_UBSAN_ALIGNMENT is enabled, the stack usage of nocb_gp_wait()
grows above typical warning limits:

In file included from kernel/rcu/tree.c:4930:
kernel/rcu/tree_nocb.h: In function 'rcu_nocb_gp_kthread':
kernel/rcu/tree_nocb.h:866:1: error: the frame size of 1968 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]

Apparently, the problem is passing rcu_data from a 'void *' pointer,
which gcc assumes may be misaligned. When the function is not inlined
into rcu_nocb_gp_kthread(), that is no longer visible to gcc.

Add a 'noinline_for_stack' annotation that leads to skipping a lot of
the alignment sanitizer checks and keeps the stack usage 60% lower here.

Reviewed-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2 weeks agoARM: dts: imx7: add nvmem-layout
Alexander Feilke [Wed, 27 May 2026 09:37:17 +0000 (11:37 +0200)] 
ARM: dts: imx7: add nvmem-layout

TQMa7 has board-information located in EEPROM at offset 0x20.
Add necessary nodes and properties for nvmem cell.

Signed-off-by: Alexander Feilke <Alexander.Feilke@ew.tq-group.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
2 weeks agocpufreq/amd-pstate: Fix setting EPP in performance mode
Mario Limonciello (AMD) [Sat, 30 May 2026 15:04:34 +0000 (17:04 +0200)] 
cpufreq/amd-pstate: Fix setting EPP in performance mode

EPP 0 is the only supported value in the performance policy.
commit 798c47593cca ("cpufreq/amd-pstate: Add support for platform profile
class") changed this while adding platform profile support to the
dynamic EPP feature, but this actually wasn't necessary since platform
profile writes disable manual EPP writes.

Restore allowing writing EPP of 0 when in performance mode.

Reviewed-by: Marco Scardovi <scardracs@disroot.org>
Tested-by: Marco Scardovi <scardracs@disroot.org>
Reported-by: Stuart Meckle <stuartmeckle@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221473
Closes: https://gitlab.freedesktop.org/upower/power-profiles-daemon/-/work_items/190
Fixes: 798c47593cca ("cpufreq/amd-pstate: Add support for platform profile class")
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2 weeks agoKVM: s390: Remove ptep_zap_softleaf_entry()
Claudio Imbrenda [Tue, 2 Jun 2026 14:23:56 +0000 (16:23 +0200)] 
KVM: s390: Remove ptep_zap_softleaf_entry()

Migration entries do not need to be removed.

The swap subsystem has been (and still is being) heavily reworked. The
current implementation of ptep_zap_softleaf_entry() has been slowly
modified and is now wrong, since it unconditionally calls
swap_put_entries_direct() for both swap and migration entries.

Remove ptep_zap_softleaf_entry() altogether, merge the path for proper
swap entries directly in the only caller, and ignore migration entries.

Fixes: 200197908dc4 ("KVM: s390: Refactor and split some gmap helpers")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-11-imbrenda@linux.ibm.com>

2 weeks agoKVM: s390: Fix possible reference leak in fault-in code
Claudio Imbrenda [Tue, 2 Jun 2026 14:23:55 +0000 (16:23 +0200)] 
KVM: s390: Fix possible reference leak in fault-in code

If kvm_s390_new_mmu_cache() fails, kvm_s390_faultin_gfn() returns
without releasing the faulted page.

Fix this by moving the allocation of the memory cache outside of the
loop. There is no reason to check at every iteration.

Opportunistically fix a comment.

Fixes: e907ae530133 ("KVM: s390: Add helper functions for fault handling")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-10-imbrenda@linux.ibm.com>

2 weeks agoKVM: s390: Prevent memslots outside the ASCE range
Claudio Imbrenda [Tue, 2 Jun 2026 14:23:54 +0000 (16:23 +0200)] 
KVM: s390: Prevent memslots outside the ASCE range

With KVM_S390_VM_MEM_LIMIT_SIZE, userspace can set the highest address
allowed for the VM. Creating a memslot that lies over the maximum
address does not make sense and is only a potential source of bugs.

Prevent creation of memslots over the maximum address, and prevent the
maximum address from being reduced below the end of existing memslots.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-9-imbrenda@linux.ibm.com>

2 weeks agoio_uring/bpf-ops: restrict ctx access to BPF
Pavel Begunkov [Tue, 2 Jun 2026 10:08:25 +0000 (11:08 +0100)] 
io_uring/bpf-ops: restrict ctx access to BPF

BPF programs should have no need in looking into struct io_ring_ctx, if
anything, most of such cases would be anti patterns like looking up ring
indices directly via the context.

Replace it with a new empty structure, which is just an alias to struct
io_ring_ctx. It'll create a new BTF type and fail verification if a BPF
program tries to access it (beyond the first byte). It'll also give more
flexibility for the future, and otherwise it can be made aligned with
io_ring_ctx as before with struct groups if ever needed or extended in a
different way.

Fixes: d0e437b76bd3c ("io_uring/bpf-ops: implement loop_step with BPF struct_ops")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/5f6ca3649e9e0bae8667db4357e28dd00cd07901.1780394491.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 weeks agodrm/gem/shmem: Introduce __drm_gem_shmem_free_sgt_locked()
Lyude Paul [Fri, 29 May 2026 18:34:03 +0000 (14:34 -0400)] 
drm/gem/shmem: Introduce __drm_gem_shmem_free_sgt_locked()

One of the complications of trying to use the shmem helpers to create a
scatterlist for shmem objects is that we need to be able to provide a
guarantee that the driver cannot be unbound for the lifetime of the
scatterlist.

The easiest way of handling this seems to be just hooking up an unmap
operation to devres the first time we create a scatterlist, which allows us
to still take advantage of gem shmem facilities without breaking that
guarantee. To allow for this, we extract __drm_gem_shmem_free_sgt_locked()
- which allows a caller (e.g. the rust bindings) to manually unmap the sgt
for a gem object as needed.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260529183702.677677-6-lyude@redhat.com
2 weeks agorust: drm: gem: s/device::Device/Device/ for shmem.rs
Lyude Paul [Tue, 28 Apr 2026 19:03:41 +0000 (15:03 -0400)] 
rust: drm: gem: s/device::Device/Device/ for shmem.rs

We're about to start explicitly mentioning kernel devices as well in this
file, so this makes it easier to differentiate the two by allowing us to
import `device` as `kernel::device`.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260428190605.3355690-2-lyude@redhat.com
2 weeks agoblock/partitions/acorn: use min in {riscix,linux}_partition
Thorsten Blum [Tue, 2 Jun 2026 16:07:57 +0000 (18:07 +0200)] 
block/partitions/acorn: use min in {riscix,linux}_partition

Use min() to replace the open-coded implementations and to simplify
riscix_partition() and linux_partition().

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20260602160757.973736-3-thorsten.blum@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 weeks agoMerge branch 'more-gen_loader-fixes-2'
Alexei Starovoitov [Tue, 2 Jun 2026 16:46:52 +0000 (09:46 -0700)] 
Merge branch 'more-gen_loader-fixes-2'

Daniel Borkmann says:

====================
More gen_loader fixes #2

Another small follow-up from the sashiko findings about signed loaders.
In particular, closing the gap to reject exclusive maps in iterators.
====================

Link: https://patch.msgid.link/20260602133052.423725-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agoselftests/bpf: Test that exclusive maps are rejected as iter targets
Daniel Borkmann [Tue, 2 Jun 2026 13:30:52 +0000 (15:30 +0200)] 
selftests/bpf: Test that exclusive maps are rejected as iter targets

Add a subtest to map_excl that creates an exclusive map and verifies a
bpf_map_elem iterator cannot be attached to it, which would otherwise
let an unrelated program read and overwrite the map's contents through
the iterator's writable value buffer.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t map_excl
  [...]
  ./test_progs -t map_excl
  [    1.704382] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.706068] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #215/1   map_excl/map_excl_allowed:OK
  #215/2   map_excl/map_excl_denied:OK
  #215/3   map_excl/map_excl_no_map_in_map:OK
  #215/4   map_excl/map_excl_no_map_iter:OK
  #215     map_excl:OK
  Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260602133052.423725-5-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agoselftests/bpf: Keep verifier_map_ptr exercising ops pointer access
Daniel Borkmann [Tue, 2 Jun 2026 13:30:51 +0000 (15:30 +0200)] 
selftests/bpf: Keep verifier_map_ptr exercising ops pointer access

sashiko complained that 38498c0ebacd ("selftests/bpf: Adjust verifier_map_ptr
for the map's excl field") would slightly decrease the test coverage given
before the test was against the verifier rejecting the ops pointer. Recover
the old test with the right offsets and add the existing one as an additional
test case.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_map_ptr
  [    1.672932] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #637/1   verifier_map_ptr/bpf_map_ptr: read with negative offset rejected:OK
  #637/2   verifier_map_ptr/bpf_map_ptr: read with negative offset rejected @unpriv:OK
  #637/3   verifier_map_ptr/bpf_map_ptr: write rejected:OK
  #637/4   verifier_map_ptr/bpf_map_ptr: write rejected @unpriv:OK
  #637/5   verifier_map_ptr/bpf_map_ptr: read non-existent field rejected:OK
  #637/6   verifier_map_ptr/bpf_map_ptr: read non-existent field rejected @unpriv:OK
  #637/7   verifier_map_ptr/bpf_map_ptr: read beyond excl field rejected:OK
  #637/8   verifier_map_ptr/bpf_map_ptr: read beyond excl field rejected @unpriv:OK
  #637/9   verifier_map_ptr/bpf_map_ptr: read ops field accepted:OK
  #637/10  verifier_map_ptr/bpf_map_ptr: read ops field accepted @unpriv:OK
  #637/11  verifier_map_ptr/bpf_map_ptr: r = 0, map_ptr = map_ptr + r:OK
  #637/12  verifier_map_ptr/bpf_map_ptr: r = 0, map_ptr = map_ptr + r @unpriv:OK
  #637/13  verifier_map_ptr/bpf_map_ptr: r = 0, r = r + map_ptr:OK
  #637/14  verifier_map_ptr/bpf_map_ptr: r = 0, r = r + map_ptr @unpriv:OK
  #637     verifier_map_ptr:OK
  [...]
  Summary: 2/20 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260602133052.423725-4-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agolibbpf: Guard add_data() against size overflow
Daniel Borkmann [Tue, 2 Jun 2026 13:30:50 +0000 (15:30 +0200)] 
libbpf: Guard add_data() against size overflow

add_data() computes size8 = roundup(size, 8) and then hands size8 to
realloc_data_buf() before doing memcpy(gen->data_cur, data, size) with
the original size. A wrapped size8 passes through the realloc_data_buf()
INT32_MAX check. Harden this against overflow, though not realistic to
happen in practice.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260602133052.423725-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agobpf: Reject exclusive maps for bpf_map_elem iterators
Daniel Borkmann [Tue, 2 Jun 2026 13:30:49 +0000 (15:30 +0200)] 
bpf: Reject exclusive maps for bpf_map_elem iterators

Exclusive maps (aka excl_prog_hash) are meant to be reachable only
from the single program whose hash matches. This is enforced by
check_map_prog_compatibility() when the map is referenced from a
program such as signed BPF loaders.

A bpf_map_elem iterator, however, binds its target map at attach
time in bpf_iter_attach_map() instead of referencing it from the
program, so the exclusivity check is never reached. On top of that,
the iterator exposes the map value as a writable buffer.

Fixes: baefdbdf6812 ("bpf: Implement exclusive map creation")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260602133052.423725-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agoMerge tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Tue, 2 Jun 2026 15:59:35 +0000 (08:59 -0700)] 
Merge tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM fixes from Andrew Morton:
 "13 hotfixes. All are for MM. 10 are cc:stable and the remaining 3
  address post-7.1 issues or aren't considered suitable for backporting.

  There's a three-patch series "userfaultfd: verify VMA state across
  UFFDIO_COPY retry" from Mike Rapoport which fixes a few uffd things.
  The rest are singletons - please see the individual changelogs for
  details"

* tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  userfaultfd: remove redundant check in vm_uffd_ops()
  userfaultfd: refuse to __mfill_atomic_pte() for unsupported VMAs
  userfaultfd: verify VMA state across UFFDIO_COPY retry
  mm/huge_memory: update file PMD counter before folio_put()
  mm/huge_memory: update file PUD counter before folio_put()
  mm/hugetlb_vmemmap: fix incorrect vmemmap restore in rollback
  mm/damon/ops-common: call folio_test_lru() after folio_get()
  mm/cma: fix reserved page leak on activation failure
  mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison
  mm/hugetlb: restore reservation on error in hugetlb folio copy paths
  mm/cma_debug: fix invalid accesses for inactive CMA areas
  memcg: use round-robin victim selection in refill_stock
  mm/hugetlb: avoid false positive lockdep assertion

2 weeks agodt-bindings: arm-smmu: Correct and add constraints for Hawi, Shikra and Kaanapali
Krzysztof Kozlowski [Wed, 20 May 2026 11:09:14 +0000 (13:09 +0200)] 
dt-bindings: arm-smmu: Correct and add constraints for Hawi, Shikra and Kaanapali

Previous commit 75949eb02653 ("dt-bindings: arm-smmu: Constrain clocks
for newer Qualcomm variants") duplicated constraints for
qcom,sm6350-smmu-500 and qcom,sm6375-smmu-500 - these are already part
of previous "if:" block.

It also missed enforcing one clock for qcom,kaanapali-smmu-500 in GPU
case and missed simultaneously added Shikra and Hawi.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2 weeks agodt-bindings: arm-smmu: Add compatible for Qualcomm Nord SoC
Shawn Guo [Tue, 19 May 2026 01:39:50 +0000 (09:39 +0800)] 
dt-bindings: arm-smmu: Add compatible for Qualcomm Nord SoC

Document Applications Processor Subsystem (APSS) SMMU on Qualcomm
Nord SoC.

Signed-off-by: Shawn Guo <shengchao.guo@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2 weeks agoaccel/amdxdna: Preserve user address when PASID is disabled
Lizhi Hou [Tue, 2 Jun 2026 04:06:24 +0000 (21:06 -0700)] 
accel/amdxdna: Preserve user address when PASID is disabled

When PASID is not used, the buffer user address is set to
AMDXDNA_INVALID_ADDR. As a result, heap buffer user address validation
fails even though the original userspace address is available.

Preserve the userspace address regardless of PASID usage so heap buffer
address validation works correctly.

Fixes: dbc8fd7a03cb ("accel/amdxdna: Add expandable device heap support")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260602040624.2206774-1-lizhi.hou@amd.com
2 weeks agoKVM: x86: Take PIC lock on KVM_GET_IRQCHIP path
Carlos López [Fri, 29 May 2026 14:00:14 +0000 (16:00 +0200)] 
KVM: x86: Take PIC lock on KVM_GET_IRQCHIP path

When userspace issues the KVM_SET_IRQCHIP ioctl to set the state of
the PIC, kvm_vm_ioctl_set_irqchip() grabs @kvm->arch.vpic->lock before
updating the state. However, the KVM_GET_IRQCHIP ioctl to retrieve the
same PIC state does not grab such lock, potentially causing torn reads
for userspace.

Fix this by grabbing the lock on the read path.

This issue goes all the way back. The bug was introduced with the
addition of PIC ioctl code itself in 6ceb9d791eee ("KVM: Add get/
set irqchip ioctls for in-kernel PIC live migration support"). Later,
894a9c5543ab ("KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU
ioctl paths") added the locking for kvm_vm_ioctl_set_irqchip(), but
missed kvm_vm_ioctl_get_irqchip().

Fixes: 6ceb9d791eee ("KVM: Add get/set irqchip ioctls for in-kernel PIC live migration support")
Fixes: 894a9c5543ab ("KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths")
Reported-by: Claude Code:claude-opus-4.6
Signed-off-by: Carlos López <clopez@suse.de>
Link: https://patch.msgid.link/20260529140013.14925-2-clopez@suse.de
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoarm64: mm: Unmap kernel data/bss entirely from the linear map
Ard Biesheuvel [Fri, 29 May 2026 15:02:06 +0000 (17:02 +0200)] 
arm64: mm: Unmap kernel data/bss entirely from the linear map

The linear aliases of the kernel text and rodata are also mapped
read-only in the linear map. Given that the contents of these regions
are mostly identical to the version in the loadable image, mapping them
read-only and leaving their contents visible is a reasonable hardening
measure.

Data and bss, however, are now also mapped read-only but the contents of
these regions are more likely to contain data that we'd rather not leak.
So let's unmap these entirely in the linear map when the kernel is
running normally.

When going into hibernation or waking up from it, these regions need to
be mapped, so map the region initially, and toggle the valid bit so
map/unmap the region as needed.

Doing so is required because pages covering the kernel image are marked
as PageReserved, and therefore disregarded for snapshotting by the
hibernate logic unless they are mapped.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2 weeks agoarm64: mm: Map the kernel data/bss read-only in the linear map
Ard Biesheuvel [Fri, 29 May 2026 15:02:05 +0000 (17:02 +0200)] 
arm64: mm: Map the kernel data/bss read-only in the linear map

On systems where the bootloader adheres to the original arm64 boot
protocol, the placement of the kernel in the physical address space is
highly predictable, and this makes the placement of its linear alias in
the kernel virtual address space equally predictable, given the lack of
randomization of the linear map.

The linear aliases of the kernel text and rodata regions are already
mapped read-only, but the kernel data and bss are mapped read-write in
this region. This is not needed, so map them read-only as well.

Note that the statically allocated kernel page tables do need to be
modifiable via the linear map, so leave these mapped read-write.

Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2 weeks agomm: Make empty_zero_page[] const
Ard Biesheuvel [Fri, 29 May 2026 15:02:04 +0000 (17:02 +0200)] 
mm: Make empty_zero_page[] const

The empty zero page is used to back any kernel or user space mapping
that is supposed to remain cleared, and so the page itself is never
supposed to be modified.

So mark it as const, which moves it into .rodata rather than .bss: on
most architectures, this ensures that both the kernel's mapping of it
and any aliases that are accessible via the kernel direct (linear) map
are mapped read-only, and cannot be used (inadvertently or maliciously)
to corrupt the contents of the zero page.

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2 weeks agosh: Drop cache flush of the zero page at boot
Ard Biesheuvel [Fri, 29 May 2026 15:02:03 +0000 (17:02 +0200)] 
sh: Drop cache flush of the zero page at boot

SuperH performs cache maintenance on the zero page during boot,
presumably because before commit

  6215d9f4470f ("arch, mm: consolidate empty_zero_page")

the zero page did double duty as a boot params region, and was cleared
separately, as it was not part of BSS. The memset() in question was
dropped by that commit, but the __flush_wback_region() call remained.

As empty_zero_page[] has been moved to BSS, it can be treated as any
other BSS memory, and so the cache flush can be dropped.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>