git.ipfire.org Git - thirdparty/kernel/linux.git/log

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2026-04-01

We've added 2 non-merge commits during the last 2 day(s) which contain
a total of 3 files changed, 139 insertions(+), 23 deletions(-).

The main changes are:

1) skb_dst_drop(skb) when bpf prog does a encap or decap,
   from Jakub Kicinski

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Test that dst is cleared on same-protocol encap
  net: Clear the dst when performing encap / decap
====================

Link: https://patch.msgid.link/20260401233956.4133413-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bpf: sockmap: Fix use-after-free of sk->sk_socket in sk_psock_verdict_data_ready().

syzbot reported use-after-free of AF_UNIX socket's sk->sk_socket
in sk_psock_verdict_data_ready(). [0]

In unix_stream_sendmsg(), the peer socket's ->sk_data_ready() is
called after dropping its unix_state_lock().

Although the sender socket holds the peer's refcount, it does not
prevent the peer's sock_orphan(), and the peer's sk_socket might
be freed after one RCU grace period.

Let's fetch the peer's sk->sk_socket and sk->sk_socket->ops under
RCU in sk_psock_verdict_data_ready().

[0]:
BUG: KASAN: slab-use-after-free in sk_psock_verdict_data_ready+0xec/0x590 net/core/skmsg.c:1278
Read of size 8 at addr ffff8880594da860 by task syz.4.1842/11013

CPU: 1 UID: 0 PID: 11013 Comm: syz.4.1842 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2026
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xba/0x230 mm/kasan/report.c:482
kasan_report+0x117/0x150 mm/kasan/report.c:595
sk_psock_verdict_data_ready+0xec/0x590 net/core/skmsg.c:1278
unix_stream_sendmsg+0x8a3/0xe80 net/unix/af_unix.c:2482
sock_sendmsg_nosec net/socket.c:721 [inline]
__sock_sendmsg net/socket.c:736 [inline]
____sys_sendmsg+0x972/0x9f0 net/socket.c:2585
___sys_sendmsg+0x2a5/0x360 net/socket.c:2639
__sys_sendmsg net/socket.c:2671 [inline]
__do_sys_sendmsg net/socket.c:2676 [inline]
__se_sys_sendmsg net/socket.c:2674 [inline]
__x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2674
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7facf899c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007facf9827028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007facf8c15fa0 RCX: 00007facf899c819
RDX: 0000000000000000 RSI: 0000200000000500 RDI: 0000000000000004
RBP: 00007facf8a32c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007facf8c16038 R14: 00007facf8c15fa0 R15: 00007ffd41b01c78
</TASK>

Allocated by task 11013:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
unpoison_slab_object mm/kasan/common.c:340 [inline]
__kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
kasan_slab_alloc include/linux/kasan.h:253 [inline]
slab_post_alloc_hook mm/slub.c:4538 [inline]
slab_alloc_node mm/slub.c:4866 [inline]
kmem_cache_alloc_lru_noprof+0x2b8/0x640 mm/slub.c:4885
sock_alloc_inode+0x28/0xc0 net/socket.c:316
alloc_inode+0x6a/0x1b0 fs/inode.c:347
new_inode_pseudo include/linux/fs.h:3003 [inline]
sock_alloc net/socket.c:631 [inline]
__sock_create+0x12d/0x9d0 net/socket.c:1562
sock_create net/socket.c:1656 [inline]
__sys_socketpair+0x1c4/0x560 net/socket.c:1803
__do_sys_socketpair net/socket.c:1856 [inline]
__se_sys_socketpair net/socket.c:1853 [inline]
__x64_sys_socketpair+0x9b/0xb0 net/socket.c:1853
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 15:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
poison_slab_object mm/kasan/common.c:253 [inline]
__kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
kasan_slab_free include/linux/kasan.h:235 [inline]
slab_free_hook mm/slub.c:2685 [inline]
slab_free mm/slub.c:6165 [inline]
kmem_cache_free+0x187/0x630 mm/slub.c:6295
rcu_do_batch kernel/rcu/tree.c:2617 [inline]
rcu_core+0x7cd/0x1070 kernel/rcu/tree.c:2869
handle_softirqs+0x22a/0x870 kernel/softirq.c:622
run_ksoftirqd+0x36/0x60 kernel/softirq.c:1063
smpboot_thread_fn+0x541/0xa50 kernel/smpboot.c:160
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()")
Closes: https://lore.kernel.org/bpf/69cc6b9f.a70a0220.128fd0.004b.GAE@google.com/
Reported-by: syzbot+2184232f07e3677fbaef@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://patch.msgid.link/20260401005418.2452999-1-kuniyu@google.com

net/mlx5: Move command entry freeing outside of spinlock

Move the kfree() call outside the critical section to reduce lock
holding time. This aligns with the general principle of minimizing
work under locks.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260331122604.1933-1-lirongqing@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ppp: dead code cleanup in Kconfig

There is already an 'if PPP' condition wrapping several config options
e.g. PPP_MPPE and PPPOE, making the 'depends on PPP' statement for each of
these a duplicate dependency (dead code).

I propose leaving the outer 'if PPP...endif' and removing the individual
'depends on PPP' statement from each option.

This dead code was found by kconfirm, a static analysis tool for Kconfig.

Signed-off-by: Julian Braha <julianbraha@gmail.com>
Reviewed-by: Qingfang Deng <dqfext@gmail.com>
Link: https://patch.msgid.link/20260330213258.13982-1-julianbraha@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: move ip6_dst_hoplimit() to net/ipv6/ip6_output.c

Move ip6_dst_hoplimit() to net/ipv6/ip6_output.c so that compiler
can (auto)inline it from ip6_xmit().

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-11 (-11)
Function old new delta
ip6_xmit 1684 1673 -11
Total: Before=29655407, After=29655396, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260331174722.4128061-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rds: ib: reject FRMR registration before IB connection is established

rds_ib_get_mr() extracts the rds_ib_connection from conn->c_transport_data
and passes it to rds_ib_reg_frmr() for FRWR memory registration. On a
fresh outgoing connection, ic is allocated in rds_ib_conn_alloc() with
i_cm_id = NULL because the connection worker has not yet called
rds_ib_conn_path_connect() to create the rdma_cm_id. When sendmsg() with
RDS_CMSG_RDMA_MAP is called on such a connection, the sendmsg path parses
the control message before any connection establishment, allowing
rds_ib_post_reg_frmr() to dereference ic->i_cm_id->qp and crash the
kernel.

The existing guard in rds_ib_reg_frmr() only checks for !ic (added in
commit 9e630bcb7701), which does not catch this case since ic is allocated
early and is always non-NULL once the connection object exists.

KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
RIP: 0010:rds_ib_post_reg_frmr+0x50e/0x920
Call Trace:
  rds_ib_post_reg_frmr (net/rds/ib_frmr.c:167)
  rds_ib_map_frmr (net/rds/ib_frmr.c:252)
  rds_ib_reg_frmr (net/rds/ib_frmr.c:430)
  rds_ib_get_mr (net/rds/ib_rdma.c:615)
  __rds_rdma_map (net/rds/rdma.c:295)
  rds_cmsg_rdma_map (net/rds/rdma.c:860)
  rds_sendmsg (net/rds/send.c:1363)
  ____sys_sendmsg
  do_syscall_64

Add a check in rds_ib_get_mr() that verifies ic, i_cm_id, and qp are all
non-NULL before proceeding with FRMR registration, mirroring the guard
already present in rds_ib_post_inv(). Return -ENODEV when the connection
is not ready, which the existing error handling in rds_cmsg_send() converts
to -EAGAIN for userspace retry and triggers rds_conn_connect_if_down() to
start the connection worker.

Fixes: 1659185fb4d0 ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Reviewed-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260330163237.2752440-2-bestswngs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: fix data race in fib6_metric_set() using cmpxchg

fib6_metric_set() may be called concurrently from softirq context without
holding the FIB table lock. A typical path is:

  ndisc_router_discovery()
    spin_unlock_bh(&table->tb6_lock)        <- lock released
    fib6_metric_set(rt, RTAX_HOPLIMIT, ...) <- lockless call

When two CPUs process Router Advertisement packets for the same router
simultaneously, they can both arrive at fib6_metric_set() with the same
fib6_info pointer whose fib6_metrics still points to dst_default_metrics.

  if (f6i->fib6_metrics == &dst_default_metrics) {   /* both CPUs: true */
      struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
      refcount_set(&p->refcnt, 1);
      f6i->fib6_metrics = p;   /* CPU1 overwrites CPU0's p -> p0 leaked */
  }

The dst_metrics allocated by the losing CPU has refcnt=1 but no pointer
to it anywhere in memory, producing a kmemleak report:

  unreferenced object 0xff1100025aca1400 (size 96):
    comm "softirq", pid 0, jiffies 4299271239
    backtrace:
      kmalloc_trace+0x28a/0x380
      fib6_metric_set+0xcd/0x180
      ndisc_router_discovery+0x12dc/0x24b0
      icmpv6_rcv+0xc16/0x1360

Fix this by:
- Set val for p->metrics before published via cmpxchg() so the metrics
   value is ready before the pointer becomes visible to other CPUs.
- Replace the plain pointer store with cmpxchg() and free the allocation
   safely when competition failed.
- Add READ_ONCE()/WRITE_ONCE() for metrics[] setting in the non-default
   metrics path to prevent compiler-based data races.

Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info")
Reported-by: Fei Liu <feliu@redhat.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260331-b4-fib6_metric_set-kmemleak-v3-1-88d27f4d8825@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'amd-drm-fixes-7.0-2026-04-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-7.0-2026-04-01:

amdgpu:
- UserQ fixes
- PASID handling fix
- S4 fix for smu11 chips
- Misc small fixes

amdkfd:
- Non-4K page fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260401174731.3576021-1-alexander.deucher@amd.com

bitmap: introduce bitmap_weighted_xor()

The function helps to XOR bitmaps and calculate Hamming weight of
the result in one pass.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Yury Norov <ynorov@nvidia.com>

i2c: imx: zero-initialize dma_slave_config for eDMA

commit 66d88e16f204 ("dmaengine: fsl-edma: read/write multiple registers
in cyclic transactions") causes fsl_edma_fill_tcd() to read
dst_port_window_size and src_port_window_size when building transfer
control descriptors.

Initialize the structure so unset fields are explicitly zero.

Fixes: 66d88e16f204 ("dmaengine: fsl-edma: read/write multiple registers in cyclic transactions")
Signed-off-by: Anthony Pighin <anthony.pighin@nokia.com>
Cc: <stable@vger.kernel.org> # v6.14+
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260331182632.888110-1-anthony.pighin@nokia.com

fs/resctrl: Add "*" shorthand to set io_alloc CBM for all domains

Configuring the io_alloc_cbm interface requires an explicit domain ID for each
cache domain. On systems with high core counts and numerous cache clusters,
this requirement becomes cumbersome for automation and management tasks that
aim to apply a uniform policy.

Introduce a wildcard domain ID selector "*" for the io_alloc_cbm interface.
This enables users to set the same Capacity Bitmask (CBM) across all cache
domains in a single operation.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://patch.msgid.link/20260325001159.447075-3-atomlin@atomlin.com

fs/resctrl: Report invalid domain ID when parsing io_alloc_cbm

The last_cmd_status file is intended to report details about the most recent
resctrl filesystem operation, specifically to aid in diagnosing failures.

However, when parsing io_alloc_cbm, if a user provides a domain ID that does
not exist in the resource, the operation fails with -EINVAL without updating
last_cmd_status. This results in inconsistent behaviour where the system call
returns an error, but last_cmd_status misleadingly reports "ok", leaving the
user unaware that the failure was caused by an invalid domain ID.

Write an error message to last_cmd_status when the target domain ID cannot
be found.

Fixes: 28fa2cce7a83 ("fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://patch.msgid.link/20260325001159.447075-2-atomlin@atomlin.com

Merge tag 'qcom-arm64-fixes-for-7.0-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes

More Qualcomm Arm64 DeviceTree fixes for v7.0

The shuffling of reset and wake GPIO properties across various Hamoa
devices left things in an incomplete state, fix this.

Add the missing "ranges" property to the QCM2290 MDSS DeviceTree binding
example, to fix the validation warning that was introduced by the
previous fix.

* tag 'qcom-arm64-fixes-for-7.0-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  arm64: dts: qcom: hamoa: Fix incomplete Root Port property migration
  dt-bindings: display/msm: qcm2290-mdss: Fix missing ranges in example
  arm64: dts: qcom: agatti: Fix IOMMU DT properties
  dt-bindings: media: venus: Fix iommus property
  dt-bindings: display: msm: qcm2290-mdss: Fix iommus property
  arm64: dts: qcom: monaco: Reserve full Gunyah metadata region
  arm64: dts: qcom: monaco: Fix UART10 pinconf
  arm64: dts: qcom: qcm6490-idp: Fix WCD9370 reset GPIO polarity
  arm64: dts: qcom: hamoa/x1: fix idle exit latency

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'sunxi-fixes-for-7.0' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes

Allwinner fixes for 7.0

Just one fix to make the r-spi SPI controller use the mcu-dma DMA
controller for DMA instead of the main DMA controller.

* tag 'sunxi-fixes-for-7.0' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: dts: allwinner: sun55i: Fix r-spi DMA

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'renesas-fixes-for-v7.0-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into arm/fixes

Renesas fixes for v7.0 (take two)

- Fix TFA BL31 memory corruption on Sparrow Hawk.

* tag 'renesas-fixes-for-v7.0-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel:
arm64: dts: renesas: sparrow-hawk: Reserve first 128 MiB of DRAM

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Bluetooth: hci_sync: fix stack buffer overflow in hci_le_big_create_sync

hci_le_big_create_sync() uses DEFINE_FLEX to allocate a
struct hci_cp_le_big_create_sync on the stack with room for 0x11 (17)
BIS entries.  However, conn->num_bis can hold up to HCI_MAX_ISO_BIS (31)
entries — validated against ISO_MAX_NUM_BIS (0x1f) in the caller
hci_conn_big_create_sync().  When conn->num_bis is between 18 and 31,
the memcpy that copies conn->bis into cp->bis writes up to 14 bytes
past the stack buffer, corrupting adjacent stack memory.

This is trivially reproducible: binding an ISO socket with
bc_num_bis = ISO_MAX_NUM_BIS (31) and calling listen() will
eventually trigger hci_le_big_create_sync() from the HCI command
sync worker, causing a KASAN-detectable stack-out-of-bounds write:

  BUG: KASAN: stack-out-of-bounds in hci_le_big_create_sync+0x256/0x3b0
  Write of size 31 at addr ffffc90000487b48 by task kworker/u9:0/71

Fix this by changing the DEFINE_FLEX count from the incorrect 0x11 to
HCI_MAX_ISO_BIS, which matches the maximum number of BIS entries that
conn->bis can actually carry.

Fixes: 42ecf1947135 ("Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending")
Cc: stable@vger.kernel.org
Signed-off-by: hkbinbin <hkbinbinbin@gmail.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: SMP: derive legacy responder STK authentication from MITM state

The legacy responder path in smp_random() currently labels the stored
STK as authenticated whenever pending_sec_level is BT_SECURITY_HIGH.
That reflects what the local service requested, not what the pairing
flow actually achieved.

For Just Works/Confirm legacy pairing, SMP_FLAG_MITM_AUTH stays clear
and the resulting STK should remain unauthenticated even if the local
side requested HIGH security. Use the established MITM state when
storing the responder STK so the key metadata matches the pairing result.

This also keeps the legacy path aligned with the Secure Connections code,
which already treats JUST_WORKS/JUST_CFM as unauthenticated.

Fixes: fff3490f4781 ("Bluetooth: Fix setting correct authentication information for SMP STK")
Cc: stable@vger.kernel.org
Signed-off-by: Oleh Konko <security@1seal.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: SMP: force responder MITM requirements before building the pairing response

smp_cmd_pairing_req() currently builds the pairing response from the
initiator auth_req before enforcing the local BT_SECURITY_HIGH
requirement. If the initiator omits SMP_AUTH_MITM, the response can
also omit it even though the local side still requires MITM.

tk_request() then sees an auth value without SMP_AUTH_MITM and may
select JUST_CFM, making method selection inconsistent with the pairing
policy the responder already enforces.

When the local side requires HIGH security, first verify that MITM can
be achieved from the IO capabilities and then force SMP_AUTH_MITM in the
response in both rsp.auth_req and auth. This keeps the responder auth bits
and later method selection aligned.

Fixes: 2b64d153a0cc ("Bluetooth: Add MITM mechanism to LE-SMP")
Cc: stable@vger.kernel.org
Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Signed-off-by: Oleh Konko <security@1seal.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: MGMT: validate mesh send advertising payload length

mesh_send() currently bounds MGMT_OP_MESH_SEND by total command
length, but it never verifies that the bytes supplied for the
flexible adv_data[] array actually match the embedded adv_data_len
field. MGMT_MESH_SEND_SIZE only covers the fixed header, so a
truncated command can still pass the existing 20..50 byte range
check and later drive the async mesh send path past the end of the
queued command buffer.

Keep rejecting zero-length and oversized advertising payloads, but
validate adv_data_len explicitly and require the command length to
exactly match the flexible array size before queueing the request.

Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh")
Reported-by: Keenan Dong <keenanat2000@gmail.com>
Signed-off-by: Keenan Dong <keenanat2000@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: hci_event: fix potential UAF in hci_le_remote_conn_param_req_evt

hci_conn lookup and field access must be covered by hdev lock in
hci_le_remote_conn_param_req_evt, otherwise it's possible it is freed
concurrently.

Extend the hci_dev_lock critical section to cover all conn usage.

Fixes: 95118dd4edfec ("Bluetooth: hci_event: Use of a function table to handle LE subevents")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: hci_conn: fix potential UAF in set_cig_params_sync

hci_conn lookup and field access must be covered by hdev lock in
set_cig_params_sync, otherwise it's possible it is freed concurrently.

Take hdev lock to prevent hci_conn from being deleted or modified
concurrently. Just RCU lock is not suitable here, as we also want to
avoid "tearing" in the configuration.

Fixes: a091289218202 ("Bluetooth: hci_conn: Fix hci_le_set_cig_params")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: MGMT: validate LTK enc_size on load

Load Long Term Keys stores the user-provided enc_size and later uses
it to size fixed-size stack operations when replying to LE LTK
requests. An enc_size larger than the 16-byte key buffer can therefore
overflow the reply stack buffer.

Reject oversized enc_size values while validating the management LTK
record so invalid keys never reach the stored key state.

Fixes: 346af67b8d11 ("Bluetooth: Add MGMT handlers for dealing with SMP LTK's")
Reported-by: Keenan Dong <keenanat2000@gmail.com>
Signed-off-by: Keenan Dong <keenanat2000@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: hci_h4: Fix race during initialization

Commit 5df5dafc171b ("Bluetooth: hci_uart: Fix another race during
initialization") fixed a race for hci commands sent during initialization.
However, there is still a race that happens if an hci event from one of
these commands is received before HCI_UART_REGISTERED has been set at
the end of hci_uart_register_dev(). The event will be ignored which
causes the command to fail with a timeout in the log:

"Bluetooth: hci0: command 0x1003 tx timeout"

This is because the hci event receive path (hci_uart_tty_receive ->
h4_recv) requires HCI_UART_REGISTERED to be set in h4_recv(), while the
hci command transmit path (hci_uart_send_frame -> h4_enqueue) only
requires HCI_UART_PROTO_INIT to be set in hci_uart_send_frame().

The check for HCI_UART_REGISTERED was originally added in commit
c2578202919a ("Bluetooth: Fix H4 crash from incoming UART packets")
to fix a crash caused by hu->hdev being null dereferenced. That can no
longer happen: once HCI_UART_PROTO_INIT is set in hci_uart_register_dev()
all pointers (hu, hu->priv and hu->hdev) are valid, and
hci_uart_tty_receive() already calls h4_recv() on HCI_UART_PROTO_INIT
or HCI_UART_PROTO_READY.

Remove the check for HCI_UART_REGISTERED in h4_recv() to fix the race
condition.

Fixes: 5df5dafc171b ("Bluetooth: hci_uart: Fix another race during initialization")
Signed-off-by: Jonathan Rissanen <jonathan.rissanen@axis.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: hci_sync: Fix UAF in le_read_features_complete

This fixes the following backtrace caused by hci_conn being freed
before le_read_features_complete but after
hci_le_read_remote_features_sync so hci_conn_del -> hci_cmd_sync_dequeue
is not able to prevent it:

==================================================================
BUG: KASAN: slab-use-after-free in instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
BUG: KASAN: slab-use-after-free in atomic_dec_and_test include/linux/atomic/atomic-instrumented.h:1383 [inline]
BUG: KASAN: slab-use-after-free in hci_conn_drop include/net/bluetooth/hci_core.h:1688 [inline]
BUG: KASAN: slab-use-after-free in le_read_features_complete+0x5b/0x340 net/bluetooth/hci_sync.c:7344
Write of size 4 at addr ffff8880796b0010 by task kworker/u9:0/52

CPU: 0 UID: 0 PID: 52 Comm: kworker/u9:0 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xcd/0x630 mm/kasan/report.c:482
kasan_report+0xe0/0x110 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:194 [inline]
kasan_check_range+0x100/0x1b0 mm/kasan/generic.c:200
instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
atomic_dec_and_test include/linux/atomic/atomic-instrumented.h:1383 [inline]
hci_conn_drop include/net/bluetooth/hci_core.h:1688 [inline]
le_read_features_complete+0x5b/0x340 net/bluetooth/hci_sync.c:7344
hci_cmd_sync_work+0x1ff/0x430 net/bluetooth/hci_sync.c:334
process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257
process_scheduled_works kernel/workqueue.c:3340 [inline]
worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421
kthread+0x3c5/0x780 kernel/kthread.c:463
ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
</TASK>

Allocated by task 5932:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:56
kasan_save_track+0x14/0x30 mm/kasan/common.c:77
poison_kmalloc_redzone mm/kasan/common.c:400 [inline]
__kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:417
kmalloc_noprof include/linux/slab.h:957 [inline]
kzalloc_noprof include/linux/slab.h:1094 [inline]
__hci_conn_add+0xf8/0x1c70 net/bluetooth/hci_conn.c:963
hci_conn_add_unset+0x76/0x100 net/bluetooth/hci_conn.c:1084
le_conn_complete_evt+0x639/0x1f20 net/bluetooth/hci_event.c:5714
hci_le_enh_conn_complete_evt+0x23d/0x380 net/bluetooth/hci_event.c:5861
hci_le_meta_evt+0x357/0x5e0 net/bluetooth/hci_event.c:7408
hci_event_func net/bluetooth/hci_event.c:7716 [inline]
hci_event_packet+0x685/0x11c0 net/bluetooth/hci_event.c:7773
hci_rx_work+0x2c9/0xeb0 net/bluetooth/hci_core.c:4076
process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257
process_scheduled_works kernel/workqueue.c:3340 [inline]
worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421
kthread+0x3c5/0x780 kernel/kthread.c:463
ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246

Freed by task 5932:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:56
kasan_save_track+0x14/0x30 mm/kasan/common.c:77
__kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:587
kasan_save_free_info mm/kasan/kasan.h:406 [inline]
poison_slab_object mm/kasan/common.c:252 [inline]
__kasan_slab_free+0x5f/0x80 mm/kasan/common.c:284
kasan_slab_free include/linux/kasan.h:234 [inline]
slab_free_hook mm/slub.c:2540 [inline]
slab_free mm/slub.c:6663 [inline]
kfree+0x2f8/0x6e0 mm/slub.c:6871
device_release+0xa4/0x240 drivers/base/core.c:2565
kobject_cleanup lib/kobject.c:689 [inline]
kobject_release lib/kobject.c:720 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0x1e7/0x590 lib/kobject.c:737
put_device drivers/base/core.c:3797 [inline]
device_unregister+0x2f/0xc0 drivers/base/core.c:3920
hci_conn_del_sysfs+0xb4/0x180 net/bluetooth/hci_sysfs.c:79
hci_conn_cleanup net/bluetooth/hci_conn.c:173 [inline]
hci_conn_del+0x657/0x1180 net/bluetooth/hci_conn.c:1234
hci_disconn_complete_evt+0x410/0xa00 net/bluetooth/hci_event.c:3451
hci_event_func net/bluetooth/hci_event.c:7719 [inline]
hci_event_packet+0xa10/0x11c0 net/bluetooth/hci_event.c:7773
hci_rx_work+0x2c9/0xeb0 net/bluetooth/hci_core.c:4076
process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257
process_scheduled_works kernel/workqueue.c:3340 [inline]
worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421
kthread+0x3c5/0x780 kernel/kthread.c:463
ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246

The buggy address belongs to the object at ffff8880796b0000
which belongs to the cache kmalloc-8k of size 8192
The buggy address is located 16 bytes inside of
freed 8192-byte region [ffff8880796b0000, ffff8880796b2000)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x796b0
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000040 ffff88813ff27280 0000000000000000 0000000000000001
raw: 0000000000000000 0000000000020002 00000000f5000000 0000000000000000
head: 00fff00000000040 ffff88813ff27280 0000000000000000 0000000000000001
head: 0000000000000000 0000000000020002 00000000f5000000 0000000000000000
head: 00fff00000000003 ffffea0001e5ac01 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5657, tgid 5657 (dhcpcd-run-hook), ts 79819636908, free_ts 79814310558
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x1af/0x220 mm/page_alloc.c:1845
prep_new_page mm/page_alloc.c:1853 [inline]
get_page_from_freelist+0xd0b/0x31a0 mm/page_alloc.c:3879
__alloc_frozen_pages_noprof+0x25f/0x2440 mm/page_alloc.c:5183
alloc_pages_mpol+0x1fb/0x550 mm/mempolicy.c:2416
alloc_slab_page mm/slub.c:3075 [inline]
allocate_slab mm/slub.c:3248 [inline]
new_slab+0x2c3/0x430 mm/slub.c:3302
___slab_alloc+0xe18/0x1c90 mm/slub.c:4651
__slab_alloc.constprop.0+0x63/0x110 mm/slub.c:4774
__slab_alloc_node mm/slub.c:4850 [inline]
slab_alloc_node mm/slub.c:5246 [inline]
__kmalloc_cache_noprof+0x477/0x800 mm/slub.c:5766
kmalloc_noprof include/linux/slab.h:957 [inline]
kzalloc_noprof include/linux/slab.h:1094 [inline]
tomoyo_print_bprm security/tomoyo/audit.c:26 [inline]
tomoyo_init_log+0xc8a/0x2140 security/tomoyo/audit.c:264
tomoyo_supervisor+0x302/0x13b0 security/tomoyo/common.c:2198
tomoyo_audit_env_log security/tomoyo/environ.c:36 [inline]
tomoyo_env_perm+0x191/0x200 security/tomoyo/environ.c:63
tomoyo_environ security/tomoyo/domain.c:672 [inline]
tomoyo_find_next_domain+0xec1/0x20b0 security/tomoyo/domain.c:888
tomoyo_bprm_check_security security/tomoyo/tomoyo.c:102 [inline]
tomoyo_bprm_check_security+0x12d/0x1d0 security/tomoyo/tomoyo.c:92
security_bprm_check+0x1b9/0x1e0 security/security.c:794
search_binary_handler fs/exec.c:1659 [inline]
exec_binprm fs/exec.c:1701 [inline]
bprm_execve fs/exec.c:1753 [inline]
bprm_execve+0x81e/0x1620 fs/exec.c:1729
do_execveat_common.isra.0+0x4a5/0x610 fs/exec.c:1859
page last free pid 5657 tgid 5657 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1394 [inline]
__free_frozen_pages+0x7df/0x1160 mm/page_alloc.c:2901
discard_slab mm/slub.c:3346 [inline]
__put_partials+0x130/0x170 mm/slub.c:3886
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x4c/0xf0 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x195/0x1e0 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x69/0x90 mm/kasan/common.c:352
kasan_slab_alloc include/linux/kasan.h:252 [inline]
slab_post_alloc_hook mm/slub.c:4948 [inline]
slab_alloc_node mm/slub.c:5258 [inline]
__kmalloc_cache_noprof+0x274/0x800 mm/slub.c:5766
kmalloc_noprof include/linux/slab.h:957 [inline]
tomoyo_print_header security/tomoyo/audit.c:156 [inline]
tomoyo_init_log+0x197/0x2140 security/tomoyo/audit.c:255
tomoyo_supervisor+0x302/0x13b0 security/tomoyo/common.c:2198
tomoyo_audit_env_log security/tomoyo/environ.c:36 [inline]
tomoyo_env_perm+0x191/0x200 security/tomoyo/environ.c:63
tomoyo_environ security/tomoyo/domain.c:672 [inline]
tomoyo_find_next_domain+0xec1/0x20b0 security/tomoyo/domain.c:888
tomoyo_bprm_check_security security/tomoyo/tomoyo.c:102 [inline]
tomoyo_bprm_check_security+0x12d/0x1d0 security/tomoyo/tomoyo.c:92
security_bprm_check+0x1b9/0x1e0 security/security.c:794
search_binary_handler fs/exec.c:1659 [inline]
exec_binprm fs/exec.c:1701 [inline]
bprm_execve fs/exec.c:1753 [inline]
bprm_execve+0x81e/0x1620 fs/exec.c:1729
do_execveat_common.isra.0+0x4a5/0x610 fs/exec.c:1859
do_execve fs/exec.c:1933 [inline]
__do_sys_execve fs/exec.c:2009 [inline]
__se_sys_execve fs/exec.c:2004 [inline]
__x64_sys_execve+0x8e/0xb0 fs/exec.c:2004
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94

Memory state around the buggy address:
ffff8880796aff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8880796aff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880796b0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880796b0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880796b0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Fixes: a106e50be74b ("Bluetooth: HCI: Add support for LL Extended Feature Set")
Reported-by: syzbot+87badbb9094e008e0685@syzkaller.appspotmail.com
Tested-by: syzbot+87badbb9094e008e0685@syzkaller.appspotmail.com
Closes: https://syzbot.org/bug?extid=87badbb9094e008e0685
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Pauli Virtanen <pav@iki.fi>

Bluetooth: hci_sync: fix leaks when hci_cmd_sync_queue_once fails

When hci_cmd_sync_queue_once() returns with error, the destroy callback
will not be called.

Fix leaking references / memory on these failures.

Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: hci_sync: hci_cmd_sync_queue_once() return -EEXIST if exists

hci_cmd_sync_queue_once() needs to indicate whether a queue item was
added, so caller can know if callbacks are called, so it can avoid
leaking resources.

Change the function to return -EEXIST if queue item already exists.

Modify all callsites to handle that.

Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: hci_event: move wake reason storage into validated event handlers

hci_store_wake_reason() is called from hci_event_packet() immediately
after stripping the HCI event header but before hci_event_func()
enforces the per-event minimum payload length from hci_ev_table.
This means a short HCI event frame can reach bacpy() before any bounds
check runs.

Rather than duplicating skb parsing and per-event length checks inside
hci_store_wake_reason(), move wake-address storage into the individual
event handlers after their existing event-length validation has
succeeded. Convert hci_store_wake_reason() into a small helper that only
stores an already-validated bdaddr while the caller holds hci_dev_lock().
Use the same helper after hci_event_func() with a NULL address to
preserve the existing unexpected-wake fallback semantics when no
validated event handler records a wake address.

Annotate the helper with __must_hold(&hdev->lock) and add
lockdep_assert_held(&hdev->lock) so future call paths keep the lock
contract explicit.

Call the helper from hci_conn_request_evt(), hci_conn_complete_evt(),
hci_sync_conn_complete_evt(), le_conn_complete_evt(),
hci_le_adv_report_evt(), hci_le_ext_adv_report_evt(),
hci_le_direct_adv_report_evt(), hci_le_pa_sync_established_evt(), and
hci_le_past_received_evt().

Fixes: 2f20216c1d6f ("Bluetooth: Emit controller suspend and resume events")
Cc: stable@vger.kernel.org
Signed-off-by: Oleh Konko <security@1seal.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: SCO: fix race conditions in sco_sock_connect()

sco_sock_connect() checks sk_state and sk_type without holding
the socket lock. Two concurrent connect() syscalls on the same
socket can both pass the check and enter sco_connect(), leading
to use-after-free.

The buggy scenario involves three participants and was confirmed
with additional logging instrumentation:

  Thread A (connect):    HCI disconnect:      Thread B (connect):

  sco_sock_connect(sk)                        sco_sock_connect(sk)
  sk_state==BT_OPEN                           sk_state==BT_OPEN
  (pass, no lock)                             (pass, no lock)
  sco_connect(sk):                            sco_connect(sk):
    hci_dev_lock                                hci_dev_lock
    hci_connect_sco                               <- blocked
      -> hcon1
    sco_conn_add->conn1
    lock_sock(sk)
    sco_chan_add:
      conn1->sk = sk
      sk->conn = conn1
    sk_state=BT_CONNECT
    release_sock
    hci_dev_unlock
                           hci_dev_lock
                           sco_conn_del:
                             lock_sock(sk)
                             sco_chan_del:
                               sk->conn=NULL
                               conn1->sk=NULL
                               sk_state=
                                 BT_CLOSED
                               SOCK_ZAPPED
                             release_sock
                           hci_dev_unlock
                                                  (unblocked)
                                                  hci_connect_sco
                                                    -> hcon2
                                                  sco_conn_add
                                                    -> conn2
                                                  lock_sock(sk)
                                                  sco_chan_add:
                                                    sk->conn=conn2
                                                  sk_state=
                                                    BT_CONNECT
                                                  // zombie sk!
                                                  release_sock
                                                  hci_dev_unlock

Thread B revives a BT_CLOSED + SOCK_ZAPPED socket back to
BT_CONNECT. Subsequent cleanup triggers double sock_put() and
use-after-free. Meanwhile conn1 is leaked as it was orphaned
when sco_conn_del() cleared the association.

Fix this by:
- Moving lock_sock() before the sk_state/sk_type checks in
  sco_sock_connect() to serialize concurrent connect attempts
- Fixing the sk_type != SOCK_SEQPACKET check to actually
  return the error instead of just assigning it
- Adding a state re-check in sco_connect() after lock_sock()
  to catch state changes during the window between the locks
- Adding sco_pi(sk)->conn check in sco_chan_add() to prevent
  double-attach of a socket to multiple connections
- Adding hci_conn_drop() on sco_chan_add failure to prevent
  HCI connection leaks

Fixes: 9a8ec9e8ebb5 ("Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm")
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: hci_sync: call destroy in hci_cmd_sync_run if immediate

hci_cmd_sync_run() may run the work immediately if called from existing
sync work (otherwise it queues a new sync work). In this case it fails
to call the destroy() function.

On immediate run, make it behave same way as if item was queued
successfully: call destroy, and return 0.

The only callsite is hci_abort_conn() via hci_cmd_sync_run_once(), and
this changes its return value. However, its return value is not used
except as the return value for hci_disconnect(), and nothing uses the
return value of hci_disconnect(). Hence there should be no behavior
change anywhere.

Fixes: c898f6d7b093b ("Bluetooth: hci_sync: Introduce hci_cmd_sync_run/hci_cmd_sync_run_once")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

mips: mm: Allocate tlb_vpn array atomically

Found by DEBUG_ATOMIC_SLEEP:

  BUG: sleeping function called from invalid context at /include/linux/sched/mm.h:306
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
  preempt_count: 1, expected: 0
  RCU nest depth: 0, expected: 0
  no locks held by swapper/1/0.
  irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  hardirqs last disabled at (0): [<ffffffff801477fc>] copy_process+0x75c/0x1b68
  softirqs last  enabled at (0): [<ffffffff801477fc>] copy_process+0x75c/0x1b68
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.6.119-d79e757675ec-fct #1
  Stack : 800000000290bad8 0000000000000000 0000000000000008 800000000290bae8
          800000000290bae8 800000000290bc78 0000000000000000 0000000000000000
          ffffffff80c80000 0000000000000001 ffffffff80d8dee8 ffffffff810d09c0
          784bb2a7ec10647d 0000000000000010 ffffffff80a6fd60 8000000001d8a9c0
          0000000000000000 0000000000000000 ffffffff80d90000 0000000000000000
          ffffffff80c9e0e8 0000000007ffffff 0000000000000cc0 0000000000000400
          ffffffffffffffff 0000000000000001 0000000000000002 ffffffffc0149ed8
          fffffffffffffffe 8000000002908000 800000000290bae0 ffffffff80a81b74
          ffffffff80129fb0 0000000000000000 0000000000000000 0000000000000000
          0000000000000000 0000000000000000 ffffffff80129fd0 0000000000000000
          ...
  Call Trace:
  [<ffffffff80129fd0>] show_stack+0x60/0x158
  [<ffffffff80a7f894>] dump_stack_lvl+0x88/0xbc
  [<ffffffff8018d3c8>] __might_resched+0x268/0x288
  [<ffffffff803648b0>] __kmem_cache_alloc_node+0x2e0/0x330
  [<ffffffff80302788>] __kmalloc+0x58/0xd0
  [<ffffffff80a81b74>] r4k_tlb_uniquify+0x7c/0x428
  [<ffffffff80143e8c>] tlb_init+0x7c/0x110
  [<ffffffff8012bdb4>] per_cpu_trap_init+0x16c/0x1d0
  [<ffffffff80133258>] start_secondary+0x28/0x128

Fixes: 231ac951faba ("MIPS: mm: kmalloc tlb_vpn array to avoid stack overflow")
Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

workqueue: Add pool_workqueue to pending_pwqs list when unplugging multiple inactive works

In unplug_oldest_pwq(), the first inactive work item on the
pool_workqueue is activated correctly. However, if multiple inactive
works exist on the same pool_workqueue, subsequent works fail to
activate because wq_node_nr_active.pending_pwqs is empty — the list
insertion is skipped when the pool_workqueue is plugged.

Fix this by checking for additional inactive works in
unplug_oldest_pwq() and updating wq_node_nr_active.pending_pwqs
accordingly.

Fixes: 4c065dbce1e8 ("workqueue: Enable unbound cpumask update on ordered workqueues")
Cc: stable@vger.kernel.org
Cc: Carlos Santa <carlos.santa@intel.com>
Cc: Ryan Neph <ryanneph@google.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>

lib/crypto: arm64: Assume a little-endian kernel

Since support for big-endian arm64 kernels was removed, the CPU_LE()
macro now unconditionally emits the code it is passed, and the CPU_BE()
macro now unconditionally discards the code it is passed.

Simplify the assembly code in lib/crypto/arm64/ accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401003331.144065-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

arm64: fpsimd: Remove obsolete cond_yield macro

All invocations of the cond_yield macro have been removed, so remove the
macro definition as well.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crypto: arm64/sha3: Remove obsolete chunking logic

Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.

Simplify the SHA-3 code accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crypto: arm64/sha512: Remove obsolete chunking logic

Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.

Simplify the SHA-512 code accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crypto: arm64/sha256: Remove obsolete chunking logic

Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.

Simplify the SHA-256 code accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crypto: arm64/sha1: Remove obsolete chunking logic

Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.

Simplify the SHA-1 code accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crypto: arm64/poly1305: Remove obsolete chunking logic

Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.

Simplify the Poly1305 code accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crypto: arm64/gf128hash: Remove obsolete chunking logic

Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.

Simplify the GHASH and POLYVAL code accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crypto: arm64/chacha: Remove obsolete chunking logic

Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.

Simplify the ChaCha code accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crypto: arm64/aes: Remove obsolete chunking logic

Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.

Simplify the AES-CBC-MAC code accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

MIPS: mm: Rewrite TLB uniquification for the hidden bit feature

Before the introduction of the EHINV feature, which lets software mark
TLB entries invalid, certain older implementations of the MIPS ISA were
equipped with an analogous bit, as a vendor extension, which however is
hidden from software and only ever set at reset, and then any software
write clears it, making the intended TLB entry valid.

This feature makes it unsafe to read a TLB entry with TLBR, modify the
page mask, and write the entry back with TLBWI, because this operation
will implicitly clear the hidden bit and this may create a duplicate
entry, as with the presence of the hidden bit there is no guarantee all
the entries across the TLB are unique each.

Usually the firmware has already uniquified TLB entries before handing
control over, in which case we only need to guarantee at bootstrap no
clash will happen with the VPN2 values chosen in local_flush_tlb_all().

However with systems such as Mikrotik RB532 we get handed the TLB as at
reset, with the hidden bit set across the entries and possibly duplicate
entries present. This then causes a machine check exception when page
sizes are reset in r4k_tlb_uniquify() and prevents the system from
booting.

Rewrite the algorithm used in r4k_tlb_uniquify() then such as to avoid
the reuse of ASID/VPN values across the TLB. Get rid of global entries
first as they may be blocking the entire address space, e.g. 16 256MiB
pages will exhaust the whole address space of a 32-bit CPU and a single
big page can exhaust the 32-bit compatibility space on a 64-bit CPU.

Details of the algorithm chosen are given across the code itself.

Fixes: 9f048fa48740 ("MIPS: mm: Prevent a TLB shutdown on initial uniquification")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # v6.18+
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

MIPS: mm: Suppress TLB uniquification on EHINV hardware

Hardware that supports the EHINV feature, mandatory for R6 ISA and FTLB
implementation, lets software mark TLB entries invalid, which eliminates
the need to ensure no duplicate matching entries are ever created. This
feature is already used by local_flush_tlb_all(), via the UNIQUE_ENTRYHI
macro, making the preceding call to r4k_tlb_uniquify() superfluous.

The next change will also modify uniquification code such that it'll
become incompatible with the FTLB and MMID features, as well as MIPSr6
CPUs that do not implement 4KiB pages.

Therefore prevent r4k_tlb_uniquify() from being used on EHINV hardware,
as denoted by `cpu_has_tlbinv'.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

MIPS: Always record SEGBITS in cpu_data.vmbits

With a 32-bit kernel running on 64-bit MIPS hardware the hardcoded value
of `cpu_vmbits' only records the size of compatibility useg and does not
reflect the size of native xuseg or the complete range of values allowed
in the VPN2 field of TLB entries.

An upcoming change will need the actual VPN2 value range permitted even
in 32-bit kernel configurations, so always include the `vmbits' member
in `struct cpuinfo_mips' and probe for SEGBITS when running on 64-bit
hardware and resorting to the currently hardcoded value of 31 on 32-bit
processors. No functional change for users of `cpu_vmbits'.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

MIPS: Fix the GCC version check for `__multi3' workaround

It was only GCC 10 that fixed a MIPS64r6 code generation issue with a
`__multi3' libcall inefficiently produced to perform 64-bit widening
multiplication while suitable machine instructions exist to do such a
calculation. The fix went in with GCC commit 48b2123f6336 ("re PR
target/82981 (unnecessary __multi3 call for mips64r6 linux kernel)").

Adjust our code accordingly, removing build failures such as:

mips64-linux-ld: lib/math/div64.o: in function `mul_u64_add_u64_div_u64':
div64.c:(.text+0x84): undefined reference to `__multi3'

with the GCC versions affected.

Fixes: ebabcf17bcd7 ("MIPS: Implement __multi3 for GCC7 MIPS64r6 builds")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601140146.hMLODc6v-lkp@intel.com/
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # v4.15+
Reviewed-by: David Laight <david.laight.linux@gmail.com.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

MIPS: SiByte: Bring back cache initialisation

Bring back cache initialisation for Broadcom SiByte SB1 cores, which has
been removed causing the kernel to hang at bootstrap right after:

Dentry cache hash table entries: 524288 (order: 8, 4194304 bytes, linear)
Inode-cache hash table entries: 262144 (order: 7, 2097152 bytes, linear)

The cause of the problem is R4k cache handlers are also used by Broadcom
SiByte SB1 cores, however with a different cache error exception handler
and therefore not using CPU_R4K_CACHE_TLB:

obj-$(CONFIG_CPU_R4K_CACHE_TLB) += c-r4k.o cex-gen.o tlb-r4k.o
obj-$(CONFIG_CPU_SB1) += c-r4k.o cerr-sb1.o cex-sb1.o tlb-r4k.o

(from arch/mips/mm/Makefile).

Fixes: bbe4f634f48c ("mips: fix r3k_cache_init build regression")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # v6.8+
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

mips: ralink: update CPU clock index

Update CPU clock index to match the clock driver changes.

Fixes: d34db686a3d7 ("clk: ralink: mtmips: fix clocks probe order in oldest ralink SoCs")
Signed-off-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Signed-off-by: Shiji Yang <yangshiji66@outlook.com>
Reviewed-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

MIPS: Loongson64: env: Check UARTs passed by LEFI cautiously

Some firmware does not set nr_uarts properly and passes empty items.
Iterate at most min(system->nr_uarts, MAX_UARTS) items to prevent
out-of-bounds access, and ignore UARTs with addr 0 silently.

Meanwhile, our DT only works with UPIO_MEM but theoretically firmware
may pass other IO types, so explicitly check against that.

Tested on Loongson-LS3A4000-7A1000-NUC-SE.

Fixes: 3989ed418483 ("MIPS: Loongson64: env: Fixup serial clock-frequency when using LEFI")
Cc: stable@vger.kernel.org
Reviewed-by: Yao Zi <me@ziyao.cc>
Signed-off-by: Rong Zhang <rongrong@oss.cipunited.com>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

sched: update task_struct->comm comment

Since commit 3a3f61ce5e0b ("exec: Make sure task->comm is always
NUL-terminated"), __set_task_comm() is unlocked and no longer uses
strscpy_pad() - update the stale comment accordingly.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20260401152039.724811-4-thorsten.blum@linux.dev
Signed-off-by: Kees Cook <kees@kernel.org>

exec: use strnlen() in __set_task_comm

Use strnlen() to limit source string scanning to 'TASK_COMM_LEN - 1'
bytes.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260401152039.724811-3-thorsten.blum@linux.dev
Signed-off-by: Kees Cook <kees@kernel.org>

x86/CPU/AMD: Print AGESA string from DMI additional information entry

Type 40 entries (Additional Information) are summarized in section 7.41 as
part of the SMBIOS specification. Generally, these entries aren't interesting
to save.

However on some AMD Zen systems, the AGESA version is stored here. This is
useful to save to the kernel message logs for debugging. It can be used to
cross-reference issues.

Implement an iterator for the Additional Information entries. Use this to find
and print the AGESA string. Do so in AMD code, since the use case is
AMD-specific.

[ bp: Match only "AGESA". ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: "Mario Limonciello (AMD)" <superm1@kernel.org>
Signed-off-by: "Mario Limonciello (AMD)" <superm1@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Link: https://patch.msgid.link/20260307141024.819807-6-superm1@kernel.org

firmware: dmi: Add pr_fmt() for dmi_scan.c

Several prints inconsistently use DMI: or dmi: or nothing. To make it clear
which prints come from dmi_scan.c, add a pr_fmt() macro.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Link: https://patch.msgid.link/20260307141024.819807-5-superm1@kernel.org

firmware: dmi: Adjust dmi_decode() to use enums

dmi_decode() has hardcoded values with comments for each DMI entry type. The
same information is already in dmi.h though, so drop the comments and use the
definitions instead.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://patch.msgid.link/20260307141024.819807-3-superm1@kernel.org

firmware: dmi: Correct an indexing error in dmi.h

The entries later in enum dmi_entry_type don't match the SMBIOS
specification¹.

The entry for type 33: `64-Bit Memory Error Information` is not present and
thus the index for all later entries is incorrect.

Add it.

Also, add missing entry types 43-46, while at it.

¹ Search for "System Management BIOS (SMBIOS) Reference Specification"

[ bp: Drop the flaky SMBIOS spec URL. ]

Fixes: 93c890dbe5287 ("firmware: Add DMI entry types to the headers")
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://patch.msgid.link/20260307141024.819807-2-superm1@kernel.org

arm64: mm: Use generic enum pgtable_level

enum pgtable_type was introduced for arm64 by commit c64f46ee1377
("arm64: mm: use enum to identify pgtable level instead of
*_SHIFT"). In the meantime, the generic enum pgtable_level got
introduced by commit b22cc9a9c7ff ("mm/rmap: convert "enum
rmap_level" to "enum pgtable_level"").

Let's switch to the generic enum pgtable_level. The only difference
is that it also includes PGD level; __pgd_pgtable_alloc() isn't
expected to create PGD tables so we add a VM_WARN_ON() for that
case.

Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64/mm: Reject memory removal that splits a kernel leaf mapping

Linear and vmemmap mappings that get torn down during a memory hot remove
operation might contain leaf level entries on any page table level. If the
requested memory range's linear or vmemmap mappings falls within such leaf
entries, new mappings need to be created for the remaining memory mapped on
the leaf entry earlier, following standard break before make aka BBM rules.
But kernel cannot tolerate BBM and hence remapping to fine grained leaves
would not be possible on systems without BBML2_NOABORT.

Currently memory hot remove operation does not perform such restructuring,
and so removing memory ranges that could split a kernel leaf level mapping
need to be rejected.

While memory_hotplug.c does appear to permit hot removing arbitrary ranges
of memory, the higher layers that drive memory_hotplug (e.g. ACPI, virtio,
...) all appear to treat memory as fixed size devices. So it is impossible
to hot unplug a different amount than was previously hot plugged, and hence
we should never see a rejection in practice, but adding the check makes us
robust against a future change.

Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Suggested-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64/mm: Enable batched TLB flush in unmap_hotplug_range()

During a memory hot remove operation, both linear and vmemmap mappings for
the memory range being removed, get unmapped via unmap_hotplug_range() but
mapped pages get freed only for vmemmap mapping. This is just a sequential
operation where each table entry gets cleared, followed by a leaf specific
TLB flush, and then followed by memory free operation when applicable.

This approach was simple and uniform both for vmemmap and linear mappings.
But linear mapping might contain CONT marked block memory where it becomes
necessary to first clear out all entire in the range before a TLB flush.
This is as per the architecture requirement. Hence batch all TLB flushes
during the table tear down walk and finally do it in unmap_hotplug_range().

Prior to this fix, it was hypothetically possible for a speculative access
to a higher address in the contiguous block to fill the TLB with shattered
entries for the entire contiguous range after a lower address had already
been cleared and invalidated. Due to the table entries being shattered, the
subsequent TLB invalidation for the higher address would not then clear the
TLB entries for the lower address, meaning stale TLB entries could persist.

Besides it also helps in improving the performance via TLBI range operation
along with reduced synchronization instructions. The time spent executing
unmap_hotplug_range() improved 97% measured over a 2GB memory hot removal
in KVM guest.

This scheme is not applicable during vmemmap mapping tear down where memory
needs to be freed and hence a TLB flush is required after clearing out page
table entry.

Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Closes: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/
Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")
Cc: stable@vger.kernel.org
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

hrtimer: Fix incorrect #endif comment for BITS_PER_LONG check

The #endif comment says "BITS_PER_LONG >= 64", but the corresponding #if
guard is "BITS_PER_LONG < 64".

The comment was originally correct when the block had a three-way
#if/#else/#endif structure, where the #else branch provided a 64-bit inline
version. Commit 79bf2bb335b8 ("[PATCH] tick-management: dyntick / highres
functionality") removed the #else branch but did not update the #endif
comment, leaving it inconsistent with the remaining #if condition.

Fix the comment to match the preprocessor guard.

Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260331074811.26147-1-zhanxusheng@xiaomi.com

irqchip/renesas-rzg2l: Add NMI support

The RZ/G2L SoC has an NMI interrupt. Add support for the NMI interrupt.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260401114504.332825-1-biju.das.jz@bp.renesas.com

powerpc/powernv/iommu: iommu incorrectly bypass DMA APIs

In a PowerNV environment, for devices that supports DMA mask less than
64 bit but larger than 32 bits, iommu is incorrectly bypassing DMA
APIs while allocating and mapping buffers for DMA operations.

Devices are failing with ENOMEN during probe with the following messages

amdgpu 0000:01:00.0: [drm] Detected VRAM RAM=4096M, BAR=4096M
amdgpu 0000:01:00.0: [drm] RAM width 128bits GDDR5
amdgpu 0000:01:00.0: iommu: 64-bit OK but direct DMA is limited by 0
amdgpu 0000:01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff
amdgpu 0000:01:00.0:  4096M of VRAM memory ready
amdgpu 0000:01:00.0:  32570M of GTT memory ready.
amdgpu 0000:01:00.0: (-12) failed to allocate kernel bo
amdgpu 0000:01:00.0: [drm] Debug VRAM access will use slowpath MM access
amdgpu 0000:01:00.0: [drm] GART: num cpu pages 4096, num gpu pages 65536
amdgpu 0000:01:00.0: [drm] PCIE GART of 256M enabled (table at 0x000000F4FFF80000).
amdgpu 0000:01:00.0: (-12) failed to allocate kernel bo
amdgpu 0000:01:00.0: (-12) create WB bo failed
amdgpu 0000:01:00.0: amdgpu_device_wb_init failed -12
amdgpu 0000:01:00.0: amdgpu_device_ip_init failed
amdgpu 0000:01:00.0: Fatal error during GPU init
amdgpu 0000:01:00.0: finishing device.
amdgpu 0000:01:00.0: probe with driver amdgpu failed with error -12
amdgpu 0000:01:00.0:  ttm finalized

Fixes: 1471c517cf7d ("powerpc/iommu: bypass DMA APIs for coherent allocations for pre-mapped memory")
Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reported-by: Dan Horák <dan@danny.cz>
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5039
Tested-by: Dan Horak <dan@danny.cz>
Closes: https://lore.kernel.org/linuxppc-dev/20260313142351.609bc4c3efe1184f64ca5f44@danny.cz/
Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
Closes: https://lore.kernel.org/linuxppc-dev/20260313142351.609bc4c3efe1184f64ca5f44@danny.cz/
[Maddy: Fixed tags]
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260331223022.47488-1-gbatra@linux.ibm.com

io_uring/zcrx: don't use mark0 for allocating xarray

XA_MARK_0 is not compatible with xarray allocating entries, use
XA_MARK_1.

Fixes: fda90d43f4fac ("io_uring/zcrx: return back two step unregistration")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/f232cfd3c466047d333b474dd2bddd246b6ebb82.1774780198.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: cast id to u64 before shifting in io_allocate_rbuf_ring()

Smatch warns:
io_uring/zcrx.c:393 io_allocate_rbuf_ring() warn: should 'id << 16' be a 64 bit type?

The expression 'id << IORING_OFF_PBUF_SHIFT' is evaluated using 32-bit
arithmetic because id is a u32. This may overflow before being promoted
to the 64-bit mmap_offset.

Cast id to u64 before shifting to ensure the shift is performed in
64-bit arithmetic.

Signed-off-by: Anas Iqbal <mohd.abd.6602@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/52400e1b343691416bef3ed3ae287fb1a88d407f.1774780198.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: reject REG_NODEV with large rx_buf_size

The copy fallback path doesn't care about the actual niov size and only
uses first PAGE_SIZE bytes, and any additional space will be wasted.
Since ZCRX_REG_NODEV solely relies on the copy path, it doesn't make
sense to support non-standard rx_buf_len. Reject it for now, and
re-enable once improved.

Fixes: c11728021d5cd ("io_uring/zcrx: implement device-less mode for zcrx")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/3e7652d9c27f8ac5d2b141e3af47971f2771fb05.1774780198.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/cancel: validate opcode for IORING_ASYNC_CANCEL_OP

io_async_cancel_prep() reads the opcode selector from sqe->len and
stores it in cancel->opcode, which is an 8-bit field. Since sqe->len
is a 32-bit value, values larger than U8_MAX are implicitly truncated.

This can cause unintended opcode matches when the truncated value
corresponds to a valid io_uring opcode. For example, submitting a value
such as 0x10b will be truncated to 0x0b (IORING_OP_TIMEOUT), allowing a
cancel request to match operations it did not intend to target.
Validate the opcode value before assigning it to the 8-bit field and
reject values outside the valid io_uring opcode range.

Signed-off-by: Amir Mohammad Jahangirzad <a.jahangirzad@gmail.com>
Link: https://patch.msgid.link/20260331232113.615972-1-a.jahangirzad@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/rsrc: use io_cache_free() to free node

Replace kfree(node) with io_cache_free() in io_buffer_register_bvec()
to match all other error paths that free nodes allocated via
io_rsrc_node_alloc(). The node is allocated through io_cache_alloc()
internally, so it should be returned to the cache via io_cache_free()
for proper object reuse.

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Link: https://patch.msgid.link/20260331104509.7055-1-liu.yun@linux.dev
[axboe: remove fixes tag, it's not a fix, it's a cleanup]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: rename zcrx [un]register functions

Drop "ifqs" from function names, as it refers to an interface queue and
there might be none once a device-less mode is introduced.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/657874acd117ec30fa6f45d9d844471c753b5a0f.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: check ctrl op payload struct sizes

Add a build check that ctrl payloads are of the same size and don't grow
struct zcrx_ctrl.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/af66caf9776d18e9ff880ab828eb159a6a03caf5.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: cache fallback availability in zcrx ctx

Store a flag in struct io_zcrx_ifq telling if the backing memory is
normal page or dmabuf based. It was looking it up from the area, however
it logically allocates from the zcrx ctx and not a particular area, and
once we add more than one area it'll become a mess.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/65e75408a7758fe7e60fae89b7a8d5ae4857f515.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: warn on a repeated area append

We only support a single area, no path should be able to call
io_zcrx_append_area() twice. Warn if that happens instead of just
returning an error.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/28eb67fb8c48445584d7c247a36e1ad8800f0c8b.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: consolidate dma syncing

Split refilling into two steps, first allocate niovs, and then do DMA
sync for them. This way dma synchronisation code can be better
optimised. E.g. we don't need to call dma_dev_need_sync() for each every
niov, and maybe we can coalesce sync for adjacent netmems in the future
as well.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/19f2d50baa62ff2e0c6cd56dd7c394cab728c567.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: netmem array as refiling format

Instead of peeking into page pool allocation cache directly or via
net_mp_netmem_place_in_cache(), pass a netmem array around. It's a
better intermediate format, e.g. you can have it on stack and reuse the
refilling code and decouples it from page pools a bit more.

It still points into the page pool directly, there will be no additional
copies. As the next step, we can change the callback prototype to take
the netmem array from page pool.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/9d8549adb7ef6672daf2d8a52858ce5926279a82.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: warn on alloc with non-empty pp cache

Page pool ensures the cache is empty before asking to refill it. Warn if
the assumption is violated.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/9c9792d6e65f3780d57ff83b6334d341ed9a5f29.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: move count check into zcrx_get_free_niov

Instead of relying on the caller of __io_zcrx_get_free_niov() to check
that there are free niovs available (i.e. free_count > 0), move the
check into the function and return NULL if can't allocate. It
consolidates the free count checks, and it'll be easier to extend the
niov free list allocator in the future.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/6df04a6b3a6170f86d4345da9864f238311163f9.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: use guards for locking

Convert last several places using manual locking to guards to simplify
the code.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/eb4667cfaf88c559700f6399da9e434889f5b04a.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: add a struct for refill queue

Add a new structure that keeps the refill queue state. It's cleaner and
will be useful once we introduce multiple refill queues.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/4ce200da1ff0309c377293b949200f95f80be9ae.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: use better name for RQ region

Rename "region" to "rq_region" to highlight that it's a refill queue
region.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/ac815790d2477a15826aecaa3d94f2a94ef507e6.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: implement device-less mode for zcrx

Allow creating a zcrx instance without attaching it to a net device.
All data will be copied through the fallback path. The user is also
expected to use ZCRX_CTRL_FLUSH_RQ to handle overflows as it normally
should even with a netdev, but it becomes even more relevant as there
will likely be no one to automatically pick up buffers.

Apart from that, it follows the zcrx uapi for the I/O path, and is
useful for testing, experimentation, and potentially for the copy
receive path in the future if improved.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/674f8ad679c5a0bc79d538352b3042cf0999596e.1774261953.git.asml.silence@gmail.com
[axboe: fix spelling error in uapi header and commit message]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: extract netdev+area init into a helper

In preparation to following patches, add a function that is responsibly
for looking up a netdev, creating an area, DMA mapping it and opening a
queue.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/88cb6f746ecb496a9030756125419df273d0b003.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: always dma map in advance

zcrx was originally establisihing dma mappings at a late stage when it
was being bound to a page pool. Dma-buf couldn't work this way, so it's
initialised during area creation.

It's messy having them do it at different spots, just move everything to
the area creation time.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/334092a2cbdd4aabd7c025050aa99f05ace89bb5.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: fully clean area on error in io_import_umem()

When accounting fails, io_import_umem() sets the page array, etc. and
returns an error expecting that the error handling code will take care
of the rest. To make the next patch simpler, only return a fully
initialised areas from the function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/3a602b7fb347dbd4da6797ac49b52ea5dedb856d.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: return back two step unregistration

There are reports where io_uring instance removal takes too long and an
ifq reallocation by another zcrx instance fails. Split zcrx destruction
into two steps similarly how it was before, first close the queue early
but maintain zcrx alive, and then when all inflight requests are
completed, drop the main zcrx reference. For extra protection, mark
terminated zcrx instances in xarray and warn if we double put them.

Cc: stable@vger.kernel.org # 6.19+
Link: https://github.com/axboe/liburing/issues/1550
Reported-by: Youngmin Choi <youngminchoi94@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/0ce21f0565ab4358668922a28a8a36922dfebf76.1774261953.git.asml.silence@gmail.com
[axboe: NULL ifq before break inside scoped guard]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

tools/nolibc/printf: Support negative variable width and precision

For (eg) "%*.*s" treat a negative field width as a request to left align
the output (the same as the '-' flag), and a negative precision to
request the default precision.

Set the default precision to -1 (not INT_MAX) and add explicit checks
to the string handling for negative values (makes the tet unsigned).

For numeric output check for 'precision >= 0' instead of testing
_NOLIBC_PF_FLAGS_CONTAIN(flags, '.').
This needs an inverted test, some extra goto and removes an indentation.
The changed conditionals fix printf("%0-#o", 0) - but '0' and '-' shouldn't
both be specified.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Link: https://patch.msgid.link/20260323112247.3196-1-david.laight.linux@gmail.com
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>

timens: Use task_lock guard in timens_get*()

Simplify the logic in timens_get*() by converting the task_lock
usage to a guard().

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260330-timens-cleanup-v1-4-936e91c9dd30@linutronix.de

timens: Use mutex guard in proc_timens_set_offset()

Simplify the logic in proc_timens_set_offset() by converting the mutex
usage to a guard().

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260330-timens-cleanup-v1-3-936e91c9dd30@linutronix.de

timens: Simplify some calls to put_time_ns()

Use the new __free() based cleanup helpers to simplify some functions.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260330-timens-cleanup-v1-2-936e91c9dd30@linutronix.de

timens: Add a __free() wrapper for put_time_ns()

The wrapper will be used to simplify cleanups of 'struct time_namespace'.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260330-timens-cleanup-v1-1-936e91c9dd30@linutronix.de

hwmon: (asus-ec-sensors) Fix T_Sensor for PRIME X670E-PRO WIFI

On the Asus PRIME X670E-PRO WIFI, the driver reports a constant value of
zero for T_Sensor. On this board, the register for T_Sensor is at a
different address, as found by experimentation and confirmed by
comparison to an independent temperature reading.

* sensor disconnected: -62.0°C
* ambient temperature: +22.0°C
* held between fingers: +30.0°C

Introduce SENSOR_TEMP_T_SENSOR_ALT1 to support the PRIME X670E-PRO WIFI
without causing a regression for other 600-series boards

Fixes: e0444758dd1b ("hwmon: (asus-ec-sensors) add PRIME X670E-PRO WIFI")
Signed-off-by: Corey Hickey <bugfood-c@fatooh.org>
Link: https://lore.kernel.org/r/20260331215414.368785-1-bugfood-ml@fatooh.org
[groeck: Fixed typo, updated Fixes: reference]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

io_uring/bpf_filters: retain COW'ed settings on parse failures

If io_parse_restrictions() fails, it ends up clearing any restrictions
currently set. The intent is only to clear whatever it already applied,
but it ends up clearing everything, including whatever settings may have
been applied in a copy-on-write fashion already. Ensure that those are
retained.

Link: https://lore.kernel.org/io-uring/CAK8a0jzF-zaO5ZmdOrmfuxrhXuKg5m5+RDuO7tNvtj=kUYbW7Q@mail.gmail.com/
Reported-by: antonius <bluedragonsec2023@gmail.com>
Fixes: ed82f35b926b ("io_uring: allow registration of per-task restrictions")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: protect remaining lockless ctx->rings accesses with RCU

Commit 96189080265e addressed one case of ctx->rings being potentially
accessed while a resize is happening on the ring, but there are still
a few others that need handling. Add a helper for retrieving the
rings associated with an io_uring context, and add some sanity checking
to that to catch bad uses. ->rings_rcu is always valid, as long as it's
used within RCU read lock. Any use of ->rings_rcu or ->rings inside
either ->uring_lock or ->completion_lock is sane as well.

Do the minimum fix for the current kernel, but set it up such that this
basic infra can be extended for later kernels to make this harder to
mess up in the future.

Thanks to Junxi Qian for finding and debugging this issue.

Cc: stable@vger.kernel.org
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Reviewed-by: Junxi Qian <qjx1298677004@gmail.com>
Tested-by: Junxi Qian <qjx1298677004@gmail.com>
Link: https://lore.kernel.org/io-uring/20260330172348.89416-1-qjx1298677004@gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>

arm64: Use static call trampolines when kCFI is enabled

Implement arm64 support for the 'unoptimized' static call variety, which
routes all calls through a trampoline that performs a tail call to the
chosen function, and wire it up for use when kCFI is enabled. This works
around an issue with kCFI and generic static calls, where the prototypes
of default handlers such as __static_call_nop() and __static_call_ret0()
don't match the expected prototype of the call site, resulting in kCFI
false positives [0].

Since static call targets may be located in modules loaded out of direct
branching range, this needs an ADRP/LDR pair to load the branch target
into R16 and a branch-to-register (BR) instruction to perform an
indirect call.

Unlike on x86, there is no pressing need on arm64 to avoid indirect
calls at all cost, but hiding it from the compiler as is done here does
have some benefits:
- the literal is located in .rodata, which gives us the same robustness
advantage that code patching does;
- no D-cache pollution from fetching hash values from .text sections.

From an execution speed PoV, this is unlikely to make any difference at
all.

Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will McVicker <willmcvicker@google.com>
Reported-by: Carlos Llamas <cmllamas@google.com>
Closes: https://lore.kernel.org/all/20260311225822.1565895-1-cmllamas@google.com/ [0]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

genirq/affinity: Remove cpus_read_lock() while reading cpu_possible_mask

cpu_possible_mask is set early during boot based on information from the
firmware. After that it remains read only and is never changed. Therefore
there is no need to acquire the CPU-hotplug lock while reading it.

Remove cpus_read_*() while accessing cpu_possible_mask.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260401121334.xeMOSC1v@linutronix.de

cpufreq: governor: fix double free in cpufreq_dbs_governor_init() error path

When kobject_init_and_add() fails, cpufreq_dbs_governor_init() calls
kobject_put(&dbs_data->attr_set.kobj).

The kobject release callback cpufreq_dbs_data_release() calls
gov->exit(dbs_data) and kfree(dbs_data), but the current error path
then calls gov->exit(dbs_data) and kfree(dbs_data) again, causing a
double free.

Keep the direct kfree(dbs_data) for the gov->init() failure path, but
after kobject_init_and_add() has been called, let kobject_put() handle
the cleanup through cpufreq_dbs_data_release().

Fixes: 4ebe36c94aed ("cpufreq: Fix kobject memleak")
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/20260401024535.1395801-1-lgs201920130244@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

powercap: intel_rapl: Consolidate PL4 and PMU support flags into rapl_defaults

Currently, PL4 and MSR-based RAPL PMU support are detected using
separate CPU ID tables (pl4_support_ids and pmu_support_ids) in the
MSR driver probe path. This creates a maintenance burden since adding
a new CPU requires updates in two places: the rapl_ids table and one
or both of these capability tables.

Consolidate PL4 and PMU capability information directly into
struct rapl_defaults by adding msr_pl4_support and msr_pmu_support
flags. This allows per-CPU capability to be expressed in a single
place alongside other per-CPU defaults, eliminating the duplicate
CPU ID tables entirely.

No functional changes are intended.

Co-developed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20260331211950.3329932-8-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

powercap: intel_rapl: Move MSR primitives to MSR driver

MSR-specific RAPL primitives differ from those used by TPMI and MMIO
interfaces. Keeping them in the common driver requires
interface-specific handling logic and makes the common layer
unnecessarily complex.

Move the MSR primitive definitions and associated bitmasks into the
MSR interface driver. This change includes:

1. Move MSR-specific bitmask definitions to RAPL MSR driver.
2. Add MSR-local struct rapl_primitive_info instance and assign it to
priv->rpi during MSR probe.
3. Remove the primitive assignment logic from rapl_config() in the
common driver.

No functional changes are intended.

Co-developed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20260331211950.3329932-7-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

thermal: intel: int340x: processor: Move MMIO primitives to MMIO driver

MMIO-specific primitives differ from those used by the TPMI interface.
The MSR and MMIO interfaces shared the same primitives in the common
driver, but MMIO does not require many MSR-specific entries (like PSYS).
Keeping these in the common driver does not add any value and requires
interface-specific handling logic that makes the common layer
unnecessarily complex.

Move the MMIO primitive definitions and associated bitmasks into the
MMIO interface driver. This change includes:

1. Add MMIO-local struct rapl_primitive_info instance without
MSR-specific entries and assign it to priv->rpi during MMIO
initialization.
2. Remove the RAPL MMIO case from rapl_config() in the common driver.

No functional changes are intended.

Co-developed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20260331211950.3329932-6-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

powercap: intel_rapl: Move TPMI primitives to TPMI driver

TPMI-specific RAPL primitives differ from those used by MSR and MMIO
interfaces. Keeping them in the common RAPL driver requires
interface-specific handling logic and makes the common layer
unnecessarily complex.

Move the TPMI primitive definitions and associated bitmasks into the
TPMI interface driver. This change includes:

1. Move TPMI-specific bitmask definitions from intel_rapl_common.c to
intel_rapl_tpmi.c.
2. Add TPMI-local struct rapl_primitive_info instance and assign it to
priv->rpi during TPMI probe.
3. Remove the RAPL TPMI related definitions from the common driver.

No functional changes are intended.

Co-developed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20260331211950.3329932-5-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

powercap: intel_rapl: Move primitive info to header for interface drivers

RAPL primitive information varies across different RAPL interfaces
(MSR, TPMI, MMIO). Keeping them in the common code adds no benefit, but
requires interface-specific handling logic and makes the common layer
unnecessarily complex.

Move the primitive info infrastructure to the shared header to allow
interface drivers to configure RAPL primitives. Specific changes:

1. Move struct rapl_primitive_info, enum unit_type, and
    PRIMITIVE_INFO_INIT macro to intel_rapl.h.
2. Change the @rpi field in struct rapl_if_priv from void * to
    struct rapl_primitive_info * to improve type safety and eliminate
    unnecessary casts.

No functional changes. This is a preparatory refactoring to allow
interface drivers to supply their own RAPL primitive settings.

Co-developed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20260331211950.3329932-4-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

powercap: intel_rapl: Remove unused macro definitions

Remove the following unused macro definitions from the RAPL common
driver:

* DOMAIN_STATE_INACTIVE and DOMAIN_STATE_POWER_LIMIT_SET
* IOSF_CPU_POWER_BUDGET_CTL_BYT and IOSF_CPU_POWER_BUDGET_CTL_TNG
* MAX_PRIM_NAME

No functional changes.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20260331211950.3329932-3-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

powercap: intel_rapl: Move MSR default settings into MSR interface driver

MSR-specific RAPL defaults differ from those used by the TPMI interface.
The MMIO and MSR interfaces shared the same rapl_defaults pointer in the
common driver, but MMIO does not require the CPU-specific variations
needed by MSR. Keeping these in the common driver adds unnecessary
complexity and MSR-specific initialization.

Move MSR defaults and CPU matching into the MSR interface driver.

Moves
-----
  * Move rapl_check_unit_atom(), set_floor_freq_atom(), and
    rapl_compute_time_window_atom() into intel_rapl_msr.c.
  * Move MSR unit-field GENMASK definitions and local constants.
  * Move all MSR-related rapl_defaults tables and the CPU-ID matching
    logic (rapl_ids[]) into the MSR driver.
  * Move iosf_mbi dependencies (floor-frequency control and related MBI
    register definitions) as they are MSR-platform specific.

Modifications
-------------
  * Replace the common driver's platform-device manual alloc/add sequence
    with platform_device_register_data() in the MSR driver to pass
    matching rapl_defaults as platform_data.
  * Update MSR driver probe to assign pdev->dev.platform_data to
    priv->defaults.
  * Update Atom helper functions to use rp->lead_cpu directly for MSR
    reads/writes instead of the generic get_rid().
  * Update Atom floor frequency logic to access defaults via the
    package private data pointer.
  * Convert MSR device creation from fs_initcall() to module_init().
    This preserves existing enumeration behavior as the driver was
    already using module_init().
  * Since rapl_ids need to exist after boot, remove __initconst
    specifier.

No functional changes are expected.

Co-developed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20260331211950.3329932-2-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpuidle: clean up dead dependencies on CPU_IDLE in Kconfig

The Kconfig in the parent directory already has the first 'if CPU_IDLE'
gating the inclusion of this Kconfig, meaning that the 'depends on
CPUIDLE' statements in these config options are effectively dead code.

Leave the 'if CPU_IDLE...endif' condition, and remove the individual
'depends on' statements in Kconfig.mips and Kconfig.powerpc

This dead code was found by kconfirm, a static analysis tool for
Kconfig.

Signed-off-by: Julian Braha <julianbraha@gmail.com>
[ rjw: Subject and changelog edits ]
Link: https://patch.msgid.link/20260331074920.41269-1-julianbraha@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>