Tejun Heo [Sun, 10 May 2026 20:08:16 +0000 (10:08 -1000)]
sched_ext: Cleanups in preparation for the SCX_TASK_INIT_BEGIN/DEAD work
Cleanups in preparation for the state-machine work that follows:
- Convert three sub-sched call sites that open-code
rcu_assign_pointer(p->scx.sched, ...) to scx_set_task_sched().
- Move scx_get_task_state()/scx_set_task_state() above the SCX task iter
section so scx_task_iter_next_locked() can use them without a forward
declaration.
No functional change.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
net: xgene: fix mdio_np leak in xgene_mdiobus_register()
The for_each_child_of_node() loop captures mdio_np via break,
holding the refcount. of_mdiobus_register() does not consume the
reference, so it leaks on success.
Jakub Kicinski [Sun, 10 May 2026 17:07:37 +0000 (10:07 -0700)]
Merge tag 'batadv-net-pullrequest-20260508' of https://git.open-mesh.org/batadv
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- fix integer overflow on buff_pos, by Lyes Bourennani
- fix invalid tp_meter access during teardown, by Jiexun Wang (2 patches)
- stop caching unowned originator pointers in BAT IV, by Jiexun Wang
- tp_meter: fix tp_num leak on kmalloc failure, by Sven Eckelmann
- fix BLA refcounting issues, by Sven Eckelmann (3 patches)
* tag 'batadv-net-pullrequest-20260508' of https://git.open-mesh.org/batadv:
batman-adv: bla: put backbone reference on failed claim hash insert
batman-adv: bla: only purge non-released claims
batman-adv: bla: prevent use-after-free when deleting claims
batman-adv: tp_meter: fix tp_num leak on kmalloc failure
batman-adv: stop caching unowned originator pointers in BAT IV
batman-adv: stop tp_meter sessions during mesh teardown
batman-adv: reject new tp_meter sessions during teardown
batman-adv: fix integer overflow on buff_pos
====================
net: ena: PHC: Fix potential use-after-free in get_timestamp
Move the phc->active check and resp pointer assignment to after
acquiring the spinlock. Previously, phc->active was checked without
holding the lock, and resp was cached from ena_dev->phc.virt_addr
before the lock was acquired.
If ena_com_phc_destroy() runs between the lockless active check and
the lock acquisition, it sets active=false, releases the lock, frees
the DMA memory, and sets virt_addr=NULL. The get_timestamp path would
then read a NULL virt_addr and dereference it.
With both the active check and the pointer read under the lock,
destroy cannot free the memory while get_timestamp is using it.
Fixes: e0ea34158ee8 ("net: ena: Add PHC support in the ENA driver") Cc: stable@vger.kernel.org Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260508062126.7273-1-akiyano@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chuck Lever [Sun, 19 Apr 2026 18:52:59 +0000 (14:52 -0400)]
NFSD: Fix infinite loop in layout state revocation
find_one_sb_stid() skips stids whose sc_status is non-zero, but the
SC_TYPE_LAYOUT case in nfsd4_revoke_states() never sets sc_status
before calling nfsd4_close_layout(). The retry loop therefore finds
the same layout stid on every iteration, hanging the revoker
indefinitely.
Fixes: 1e33e1414bec ("nfsd: allow layout state to be admin-revoked.") Reported-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Jeff Layton [Sat, 11 Apr 2026 21:12:16 +0000 (17:12 -0400)]
sunrpc: start cache request seqno at 1 to fix netlink GET_REQS
sunrpc_cache_requests_snapshot() filters requests with
crq->seqno <= min_seqno. The min_seqno for the first netlink
dump call is cb->args[0] which is 0. Since next_seqno was
initialized to 0, the very first cache request got seqno=0
and was silently skipped by the snapshot (0 <= 0 is true).
This caused netlink-based GET_REQS to return 0 pending requests
even when a request was queued, preventing mountd from resolving
cache entries (particularly expkey/nfsd.fh). The unresolved
CACHE_PENDING state blocked all further notifications for the
entry, leading to permanent NFS4ERR_DELAY hangs.
Start next_seqno at 1 so all requests have seqno >= 1 and pass
the snapshot filter when min_seqno is 0.
Fixes: facc4e3c8042 ("sunrpc: split cache_detail queue into request and reader lists") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nfsd: update mtime/ctime on COPY in presence of delegated attributes
When delegated attributes are given on open, the file is opened with
NOCMTIME and modifying operations do not update mtime/ctime as to not get
out-of-sync with the client's delegated view. However, for COPY operation,
the server should update its view of mtime/ctime and reflect that in any
GETATTR queries.
Fixes: e5e9b24ab8fa ("nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nfsd: update mtime/ctime on CLONE in presense of delegated attributes
When delegated attributes are given on open, the file is opened with
NOCMTIME and modifying operations do not update mtime/ctime as to not get
out-of-sync with the client's delegated view. However, for CLONE operation,
the server should update its view of mtime/ctime and reflect that in any
GETATTR queries.
Fixes: e5e9b24ab8fa ("nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Scott Mayhew [Tue, 7 Apr 2026 22:08:57 +0000 (18:08 -0400)]
nfsd: fix file change detection in CB_GETATTR
RFC 8881, section 10.4.3 doesn't say anything about caching the file
size in the delegation record, nor does it say anything about comparing
a cached file size with the size reported by the client in the
CB_GETATTR reply for the purpose of determining if the client holds
modified data for the file.
What section 10.4.3 of RFC 8881 does say is that the server should
compare the *current* file size with the size reported by the client
holding the delegation in the CB_GETATTR reply, and if they differ to
treat it as a modification regardless of the change attribute retrieved
via the CB_GETATTR.
Doing otherwise would cause the server to believe the client holding the
delegation has a modified version of the file, even if the client
flushed the modifications to the server prior to the CB_GETATTR. This
would have the added side effect of subsequent CB_GETATTRs causing
updates to the mtime, ctime, and change attribute even if the client
holding the delegation makes no further updates to the file.
Modify nfsd4_deleg_getattr_conflict() to obtain the current file size
via i_size_read(). Retain the ncf_cur_fsize field, since it's a
convenient way to return the file size back to nfsd4_encode_fattr4(),
but don't use it for the purpose of detecting file changes. Remove the
unnecessary initialization of ncf_cur_fsize in nfs4_open_delegation().
Also, if we recall the delegation (because the client didn't respond to
the CB_GETATTR), then skip the logic that checks the nfs4_cb_fattr
fields.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Cc: stable@vger.kernel.org Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
tools/ynl: add missing uapi header deps in Makefile.deps
ethtool.h includes linux/typelimits.h which is a relatively new header
not yet shipped in most distro kernel-header packages. Without the
explicit entry, the build silently falls through to -idirafter.
dev_energymodel.h is a new YNL family whose uapi header is not in
system paths at all and was missing a CFLAGS entry entirely.
Hyunwoo Kim [Fri, 8 May 2026 08:53:09 +0000 (17:53 +0900)]
rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present
The DATA-packet handler in rxrpc_input_call_event() and the RESPONSE
handler in rxrpc_verify_response() copy the skb to a linear one before
calling into the security ops only when skb_cloned() is true. An skb
that is not cloned but still carries externally-owned paged fragments
(e.g. SKBFL_SHARED_FRAG set by splice() into a UDP socket via
__ip_append_data, or a chained skb_has_frag_list()) falls through to
the in-place decryption path, which binds the frag pages directly into
the AEAD/skcipher SGL via skb_to_sgvec().
Extend the gate to also unshare when skb_has_frag_list() or
skb_has_shared_frag() is true. This catches the splice-loopback vector
and other externally-shared frag sources while preserving the
zero-copy fast path for skbs whose frags are kernel-private (e.g. NIC
page_pool RX, GRO). The OOM/trace handling already in place is reused.
Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()") Cc: stable@vger.kernel.org Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 10 May 2026 15:10:47 +0000 (08:10 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk driver fixes from Stephen Boyd:
- Mark the DDR bus clk critical in the SpaceMiT driver so that
boot doesn't fail
- Fix boot on Mobile EyeQ by creating the auxiliary device for
the ethernet PHY
- Plug an OF node leak in Rockchip rk808 clk driver
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: rk808: fix OF node reference imbalance
MAINTAINERS: add myself as a reviewer for the clk subsystem
reset: eyeq: drop device_set_of_node_from_dev() done by parent
clk: eyeq: add EyeQ5 children auxiliary device for generic PHYs
clk: eyeq: use the auxiliary device creation helper
clk: spacemit: k3: mark top_dclk as CLK_IS_CRITICAL
Commit 81af9e40e2e4 ("phy: qcom: qmp-ufs: Fix SM8650 PCS table for Gear 4")
moved QPHY_V6_PCS_UFS_PLL_CNTL register configuration from the shared
sm8650_ufsphy_g5_pcs table to the SM8650-specific sm8650_ufsphy_pcs base
table to fix Gear 4 operation on SM8650.
However, this change inadvertently broke kaanapali and SM8750 SoCs
which also rely on the shared sm8650_ufsphy_g5_pcs table for Gear 5
configuration but use their own sm8750_ufsphy_pcs base table. After the
change, kaanapali PHYs are left without the required PLL_CNTL = 0x33
setting, causing the PHY PLL to remain at its hardware reset default
value, preventing PLL lock and resulting in DME_LINKSTARTUP timeouts.
Fix this by adding the missing QPHY_V6_PCS_UFS_PLL_CNTL = 0x33 entry
to the sm8750_ufsphy_pcs table, mirroring what the original commit
already did for sm8650_ufsphy_pcs.
Cc: stable@vger.kernel.org # v6.19.12 Fixes: 81af9e40e2e4 ("phy: qcom: qmp-ufs: Fix SM8650 PCS table for Gear 4") Signed-off-by: Nitin Rawat <nitin.rawat@oss.qualcomm.com> Reviewed-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20260415104851.2763238-1-nitin.rawat@oss.qualcomm.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
phy: exynos5-usbdrd: fix USB 2.0 HS PHY tuning values for Exynos7870
The existing PHYPARAM0 tuning values for Exynos7870 are incorrect,
causing the USB 2.0 PHY to fail high-speed negotiation and fall back
to full-speed (12Mbps) operation.
Fix TXVREFTUNE (transmitter voltage reference) from 14 to 3,
TXRESTUNE (transmitter impedance) from 3 to 2, and SQRXTUNE
(squelch threshold) from 6 to 5. Also explicitly set
TXPREEMPPULSETUNE to 0, which was previously missing from the
tuning table despite being included in the register mask.
All values are derived from the vendor kernel for the Samsung
Galaxy A6 (SM-A600FN), as no public hardware documentation is
available for the Exynos7870 USB DRD PHY. With these corrections,
the PHY successfully negotiates high-speed (480Mbps) operation.
The existing code reads a single hs_term_range_adj value from bit field
[10:7] of FUSE_SKU_CALIB_0 and applies it to all USB2 pads uniformly.
However, on SoCs that support per-pad termination, each pad has its own
hs_term_range_adj field: pad 0 in FUSE_SKU_CALIB_0[10:7], and pads 1-3
in FUSE_USB_CALIB_EXT_0 at bit offsets [8:5], [12:9], and [16:13]
respectively.
Fix the calibration by reading per-pad values from the appropriate fuse
registers. For SoCs that do not support per-pad termination, replicate
pad 0's value to all pads to maintain existing behavior.
Add a has_per_pad_term flag to the SoC data to indicate whether per-pad
termination values are available in FUSE_USB_CALIB_EXT_0.
The mvebu_a3700_utmi_phy_power_off() function tries to modify the
USB2_PHY_CTRL register by using the IO address of the PHY IP block along
with the readl/writel IO accessors. However, the register exist in the
USB miscellaneous register space, and as such it must be accessed via
regmap like it is done in the mvebu_a3700_utmi_phy_power_on() function.
Change the code to use regmap_update_bits() for modífying the register
to fix this.
DaeMyung Kang [Sun, 10 May 2026 02:13:11 +0000 (11:13 +0900)]
ntfs: fix empty_buf and ra lifetime bugs in ntfs_empty_logfile()
ntfs_empty_logfile() has three related allocator bugs around the
@empty_buf and @ra buffers it uses inside the per-cluster loop.
When the loop encounters a runlist entry with LCN_RL_NOT_MAPPED, the
function kvfrees @empty_buf and goes to map_vcn to remap. @empty_buf
is not cleared. If ntfs_map_runlist_nolock() fails on re-entry,
control jumps to the err label which kvfrees @empty_buf a second time.
In the same branch, @ra is left allocated. When the remap succeeds
the function falls through the @empty_buf re-allocation and the @ra
re-allocation, overwriting the previous @ra pointer and leaking it.
The success path frees @empty_buf with kfree() instead of kvfree().
kvzalloc() may fall back to vmalloc(), in which case kfree() does not
correctly release the memory.
A KASAN-enabled QEMU harness mirroring this control flow reports
"BUG: KASAN: double-free" when the second ntfs_map_runlist_nolock()
fails.
Clear both @empty_buf and @ra after the in-loop releases so the err
path is a no-op when the buffers have already been freed and so the
remap-success path does not leak the previous @ra. Switch the success
path to kvfree() to match the @empty_buf allocator.
Hans de Goede [Sat, 25 Apr 2026 12:33:51 +0000 (14:33 +0200)]
clk: qcom: x1e80100-dispcc: Stop disp_cc_mdss_mdp_clk_src from getting parked
Parking disp_cc_mdss_mdp_clk_src at 19.2MHz causing the EFI GOP framebuffer
to stop functioning. The EFI GOP framebuffer should keep working until
the msm display driver loads, to help with boot debugging and to ensure
display output when the msm module is not in the initramfs.
Switch disp_cc_mdss_mdp_clk_src over to clk_rcg2_shared_no_init_park_ops
to keep the EFI GOP working after binding the x1e80100-dispcc driver.
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Fixes: 01a0a6cc8cfd ("clk: qcom: Park shared RCGs upon registration") Link: https://lore.kernel.org/r/20260425123351.6292-1-johannes.goede@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Linus Torvalds [Sun, 10 May 2026 01:42:54 +0000 (18:42 -0700)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix sk_local_storage diag dump via netlink (Amery Hung)
- Fix off-by-one in arena direct-value access (Junyoung Jang)
- Reject TCP_NODELAY in bpf-tcp congestion control (KaFai Wan)
- Fix type confusion in bpf_*_sock() (Kuniyuki Iwashima)
- Reject TX-only AF_XDP sockets (Linpu Yu)
- Don't run arg-tracking analysis twice on main subprog (Paul Chaignon)
- Fix NULL pointer dereference in bpf_sk_storage_clone and fib lookup
(Weiming Shi)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix off-by-one boundary validation in arena direct-value access
xskmap: reject TX-only AF_XDP sockets
bpf: Don't run arg-tracking analysis twice on main subprog
bpf: Free reuseport cBPF prog after RCU grace period.
bpf: tcp: Fix type confusion in sol_tcp_sockopt().
bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().
bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().
mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.
bpf: tcp: Fix type confusion in bpf_tcp_sock().
tools/headers: Regenerate stddef.h to fix BPF selftests
bpf: Fix sk_local_storage diag dumping uninitialized special fields
bpf: Fix NULL pointer dereference in bpf_skb_fib_lookup()
sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}().
bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
bpf: Reject TCP_NODELAY in bpf-tcp-cc
bpf: Reject TCP_NODELAY in TCP header option callbacks
Junyoung Jang [Sun, 26 Apr 2026 17:25:05 +0000 (02:25 +0900)]
bpf: Fix off-by-one boundary validation in arena direct-value access
BPF_MAP_TYPE_ARENA accepts BPF_PSEUDO_MAP_VALUE offsets at exactly
the end of the arena mapping (off == arena_size). The boundary check
in arena_map_direct_value_addr() uses `>` instead of `>=`, which
incorrectly allows a one-past-end pointer to be accepted.
Change the condition to `>=` to correctly reject offsets that fall
outside the valid arena user_vm range.
Linpu Yu [Fri, 8 May 2026 14:43:43 +0000 (22:43 +0800)]
xskmap: reject TX-only AF_XDP sockets
XSKMAP entries are used as redirect targets for incoming XDP frames.
A TX-only AF_XDP socket lacks an Rx ring and cannot handle redirected
traffic, but xsk_map_update_elem() currently allows such sockets to
be inserted into the map.
Redirecting packets to such a socket on the veth generic-XDP path
causes a kernel crash in xsk_generic_rcv().
This became possible after xsk_is_setup_for_bpf_map() was removed from
the XSKMAP update path, which allowed bound TX-only sockets to be
inserted into the map.
Reject TX-only sockets during XSKMAP updates to avoid the crash.
They remain fully operational for pure Tx purposes outside XSKMAP.
Fixes: 968be23ceaca ("xsk: Fix possible segfault at xskmap entry insertion") Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yifan Wu <yifanwucs@gmail.com> Signed-off-by: Linpu Yu <linpu5433@gmail.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20260508144344.694-1-linpu5433@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Paul Chaignon [Thu, 7 May 2026 18:22:06 +0000 (20:22 +0200)]
bpf: Don't run arg-tracking analysis twice on main subprog
Because subprog 0, the main subprog, is considered a global function,
we end up running the arg-tracking dataflow analysis twice on it. That
results in slightly longer verification but mostly in more verbose
verifier logs. This patch fixes it by keeping only the iteration over
global subprogs.
When running over all of Cilium's programs with BPF_LOG_LEVEL2, this
reduces verbosity by ~20% on average.
Linus Torvalds [Sat, 9 May 2026 18:47:39 +0000 (11:47 -0700)]
Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux
Pull fsverity fix from Eric Biggers:
"Fix a regression in overlayfs caused by an fsverity API change"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
ovl: fix verity lazy-load guard broken by fsverity_active() semantic change
DaeMyung Kang [Sat, 9 May 2026 06:12:37 +0000 (15:12 +0900)]
ntfs: validate attribute name bounds before returning it
ntfs_attr_find() validates a named attribute before comparing it with the
requested name, but that check is currently after the AT_UNUSED handling.
When callers enumerate attributes with AT_UNUSED, ntfs_attr_find() can
return a malformed named attribute before checking whether name_offset
and name_length stay within the attribute record.
Some enumeration callers use the returned attribute name pointer
directly. For example, one path passes (attr + name_offset, name_length)
to ntfs_attr_iget(), where the name can later be copied according to
name_length. A malformed on-disk name_offset/name_length pair should not
be exposed to those callers.
Move the existing name bounds validation before returning attributes
during AT_UNUSED enumeration, and write it as an offset/remaining-size
check so the subtraction cannot underflow. Extract the converted values
into local variables (name_offset, attr_len, name_size) to make the
intent explicit and avoid repeating the endian conversions inside the
bounds check. This keeps matching attributes on the same checked path
while also covering attribute enumeration.
A small userspace ASAN model with attr length=32, name_offset=124 and
name_length=8 reproduces a heap-buffer-overflow read in the old
enumeration path. With this change the same malformed attribute is
rejected before the name pointer is returned to the caller.
DaeMyung Kang [Sat, 9 May 2026 06:12:36 +0000 (15:12 +0900)]
ntfs: fix MFT bitmap scan 2^32 boundary check
NTFS MFT record numbers are limited to the 32-bit range, and
ntfs_mft_record_layout() rejects mft_no >= 2^32. The free-MFT-record
bitmap scan in ntfs_mft_bitmap_find_and_alloc_free_rec_nolock() also
guards against this overflow but uses a strict greater than comparison,
allowing record number 2^32 itself through this earlier check.
Every other 2^32 boundary check in fs/ntfs/mft.c uses '>=', so the
strict greater than here is both a real off-by-one and an internal
inconsistency. A model with ll == 2^32 confirms the current check
accepts the value while the corrected check rejects it.
Use '>=' so the boundary matches the layout-time rejection and the
surrounding bitmap-scan checks.
DaeMyung Kang [Sat, 9 May 2026 06:12:35 +0000 (15:12 +0900)]
ntfs: validate MFT attrs_offset against bytes_in_use
ntfs_mft_record_check() verifies that attrs_offset is aligned and that
the resulting pointer stays within the allocated MFT record buffer, but
it does not check that the first attribute header starts within the
bytes_in_use area.
A malformed record with attrs_offset greater than bytes_in_use can pass
this check as long as attrs_offset is still within bytes_allocated. The
attribute parser then computes the remaining record space by subtracting
the attribute pointer from bytes_in_use. Because that value is unsigned,
the subtraction can underflow and allow bytes after bytes_in_use to be
interpreted as an attribute.
Reject records where attrs_offset is outside bytes_in_use or where the
used area does not even contain the four-byte attribute type/AT_END
terminator at attrs_offset.
A small userspace model with attrs_offset=128 and bytes_in_use=64 shows
the current check accepts the record and the parser space calculation
underflows to 0xffffffc0. With this change the same malformed record is
rejected before the attribute walker is entered.
Linus Torvalds [Sat, 9 May 2026 15:32:50 +0000 (08:32 -0700)]
Merge tag 'hwmon-for-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- ads7871: Fix endianness bug in 16-bit register reads
- lm75: Fix configuration register writes and AS6200/TMP112 setup and
alarm handling
- lm63: Fix TOCTOU problems
- corsair-psu: Close HID device on probe errors
- ltc2992: Fix overflow and threshold range
- Documentation: fix link to ideapad-laptop.c file
- Remove stale CONFIG_SENSORS_SBRMI Makefile reference
* tag 'hwmon-for-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ads7871) Fix endianness bug in 16-bit register reads
hwmon: (lm75) Fix configuration register writes.
hwmon: (lm75) Fix AS6200 and TMP112 setup and alarm handling
hwmon: (lm63) Add locking to avoid TOCTOU
hwmon: (corsair-psu) Close HID device on probe errors
hwmon: Remove stale CONFIG_SENSORS_SBRMI Makefile reference
Documentation: hwmon: fix link to ideapad-laptop.c file
hwmon: (ltc2992) Fix u32 overflow in power read path
hwmon: (ltc2992) Clamp threshold writes to hardware range
Linus Torvalds [Sat, 9 May 2026 15:10:07 +0000 (08:10 -0700)]
Merge tag 'i2c-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- sanitize more input parameters in the core (found by syzkaller)
- usual set of driver fixes (proper completion handling, applying
quirks, correct workqueue selection...)
- ID additions to simplify dependency handling
- new email address for Peter Rosin
* tag 'i2c-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: smbus: reject oversized block transfers in the common path
MAINTAINERS: Update mail for Peter Rosin
i2c: stub: Reject I2C block transfers with invalid length
i2c: Compare the return value of gpiod_get_direction against GPIO_LINE_DIRECTION_OUT
i2c: dev: prevent integer overflow in I2C_TIMEOUT ioctl
i2c: acpi: Add ELAN0678 to i2c_acpi_force_100khz_device_ids
dt-bindings: i2c: apple,i2c: Add t8122 compatible
i2c: stm32f7: reinit_completion() per transfer not per msg
dt-bindings: i2c: amlogic: Add compatible for T7 SOC
i2c: testunit: Replace system_long_wq with system_dfl_long_wq
Linus Torvalds [Sat, 9 May 2026 15:03:21 +0000 (08:03 -0700)]
Merge tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- Fix KASAN sanitization flag for core_$(BITS).o
- Fixes for handling offset values in pseries htmdump
- Fix interrupt mask in cpm1_gpiochip_add16()
- ps3/pasemi fixes to drop redundant result assignment
- Fixes in papr-hvpipe code path
- powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events
Thanks to Aboorva Devarajan, Athira Rajeev, Christophe Leroy (CS GROUP),
Geert Uytterhoeven, Haren Myneni, Krzysztof Kozlowski, Mukesh Kumar
Chaurasiya (IBM), Nathan Chancellor, Ritesh Harjani (IBM), Shivani
Nittor, Sourabh Jain, Thomas Zimmermann, and Venkat Rao Bagalkote.
* tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (21 commits)
powerpc/pasemi: Drop redundant res assignment
powerpc/ps3: Drop redundant result assignment
powerpc/vdso: Drop -DCC_USING_PATCHABLE_FUNCTION_ENTRY from 32-bit flags with clang
arch/powerpc: Drop CONFIG_FIRMWARE_EDID from defconfig files
powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events
powerpc/8xx: Fix interrupt mask in cpm1_gpiochip_add16()
powerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec
powerpc/kdump: fix KASAN sanitization flag for core_$(BITS).o
pseries/papr-hvpipe: Fix style and checkpatch issues in enable_hvpipe_IRQ()
pseries/papr-hvpipe: Refactor and simplify hvpipe_rtas_recv_msg()
pseries/papr-hvpipe: Kill task_struct pointer from struct hvpipe_source_info
pseries/papr-hvpipe: Simplify spin unlock usage in papr_hvpipe_handle_release()
pseries/papr-hvpipe: Fix the usage of copy_to_user()
pseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init()
pseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle()
pseries/papr-hvpipe: Prevent kernel stack memory leak to userspace
pseries/papr-hvpipe: Fix race with interrupt handler
powerpc/pseries/htmdump: Add memory configuration dump support to htmdump module
powerpc/pseries/htmdump: Fix the offset value used in htm status dump
powerpc/pseries/htmdump: Fix the offset value used in processor configuration dump
...
Linus Torvalds [Sat, 9 May 2026 03:28:45 +0000 (20:28 -0700)]
Merge tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix memory map enumeration bug in the Xen e820 parsing code (Juergen
Gross)
- Re-enable e820 BIOS fallback if e820 table is empty (David Gow)
* tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/e820: Re-enable BIOS fallback if e820 table is empty
x86/xen: Fix a potential problem in xen_e820_resolve_conflicts()
Zhan Xusheng [Fri, 8 May 2026 07:29:34 +0000 (15:29 +0800)]
ntfs: fix missing kstrdup() error check in ntfs_write_volume_label()
ntfs_write_volume_label() does not check the return value of
kstrdup(). If the allocation fails, vol->volume_label is set to
NULL while the function returns success. A subsequent
FS_IOC_GETFSLABEL then returns an empty string even though the
on-disk label was updated correctly.
Fix by allocating the new label before taking vol_ni->mrec_lock and
updating any on-disk metadata, so an -ENOMEM from kstrdup() leaves
both the in-memory and on-disk labels untouched and consistent. On
success the preallocated copy replaces the old vol->volume_label.
Also move mark_inode_dirty_sync() into the success path so that it
is not called when no metadata was actually modified.
Fixes: 6251f0b0de7d ("ntfs: update super block operations") Suggested-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Linus Torvalds [Sat, 9 May 2026 02:42:10 +0000 (19:42 -0700)]
Merge tag 'sched-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix spurious failures in rseq self-tests (Mark Brown)
- Fix rseq rseq::cpu_id_start ABI regression due to TCMalloc's creative
use of the supposedly read-only field
The fix is to introduce a new ABI variant based on a new (larger)
rseq area registration size, to keep the TCMalloc use of rseq
backwards compatible on new kernels (Thomas Gleixner)
- Fix wakeup_preempt_fair() for not waking up task (Vincent Guittot)
- Fix s64 mult overflow in vruntime_eligible() (Zhan Xusheng)
* tag 'sched-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix wakeup_preempt_fair() for not waking up task
sched/fair: Fix overflow in vruntime_eligible()
selftests/rseq: Expand for optimized RSEQ ABI v2
rseq: Reenable performance optimizations conditionally
rseq: Implement read only ABI enforcement for optimized RSEQ V2 mode
selftests/rseq: Validate legacy behavior
selftests/rseq: Make registration flexible for legacy and optimized mode
selftests/rseq: Skip tests if time slice extensions are not available
rseq: Revert to historical performance killing behaviour
rseq: Don't advertise time slice extensions if disabled
rseq: Protect rseq_reset() against interrupts
rseq: Set rseq::cpu_id_start to 0 on unregistration
selftests/rseq: Don't run tests with runner scripts outside of the scripts
Linus Torvalds [Sat, 9 May 2026 02:39:18 +0000 (19:39 -0700)]
Merge tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events fixes from Ingo Molnar:
- Fix deadlock in the perf_mmap() failure path (Peter Zijlstra)
- Intel ACR (Auto Counter Reload) fixes (Dapeng Mi):
- Fix validation and configuration of ACR masks
- Fix ACR rescheduling bug causing stale masks
- Disable the PMI on ACR-enabled hardware
- Enable ACR on Panther Cover uarch too
* tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Enable auto counter reload for DMR
perf/x86/intel: Disable PMI for self-reloaded ACR events
perf/x86/intel: Always reprogram ACR events to prevent stale masks
perf/x86/intel: Improve validation and configuration of ACR masks
perf/core: Fix deadlock in perf_mmap() failure path
Holger Brunck [Thu, 7 May 2026 15:53:32 +0000 (17:53 +0200)]
net: wan: fsl_ucc_hdlc: free tx_skbuff in uhdlc_memclean
When the device is removed all allocated resources should be freed.
In uhdlc_memclean the netdev transmit queue was already stopped. But at
this point we may have pending skb in the transmit queue which must be
freed. Therefore iterate over the tx_skbuff pointers and free all
pending skb. The issue was discovered by sashiko.
Tested on a ls1043a board running HDLC in bus mode on kernel 6.12.
Jakub Kicinski [Sat, 9 May 2026 01:28:26 +0000 (18:28 -0700)]
Merge tag 'nf-26-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains Netfilter fixes for net:
1) Allow initial x_tables table replacement without emitting an audit
log message. Delay the register message until after hooks are wired up
to avoid unnecessary unregister logs during error unwinding.
2) Fix a NULL dereference by allocating hook ops before adding the
table to the per-netns list. Use `synchronize_rcu()` during error
unwinding to ensure the table stops processing packets before
teardown. Defer audit log register message until all operations
succeed.
3) Refactor xtables to use a single `xt_unregister_table_pre_exit`
function. Eliminate code duplication by centralizing table
unregistration logic within the xtables core. ebtables cannot be
changed due to incompatibility.
4) Unregister xtables templates before module removal. This prevents
a race condition where userspace instantiates a new table after the
pernet unreg removed the current table.
5) Add `xtables_unregister_table_exit` to fully unregister netfilter
tables during module removal. Unlink the table from dying lists,
then free hook operations.
6) Implement a two-stage removal scheme for ebtables following the
x_tables pattern. Assign table->ops while holding the ebt mutex to
prevent exposing partially-filled structures.
7) Fix ebtables module initialization race. Register the template last
in table initialization functions. Prevent table instantiation before
pernet operations are available.
8) Fix a race condition in x_tables module initialization. Ensure
pernet ops are fully set up before exposing the table to userspace.
9) Fix a race condition in ebtables module initialization, similar to
previous patch.
10) Restore propagation of helper to expected connection, this is a
fix-for-recent-fix.
11) Validate that the expectation tuple and mask netlink attributes are
present when adding expectation via nfqueue, this fixes a possible
null-ptr-deref.
12) Fix possible rare memleak in the SIP helper in case helper has been
detached from conntrack entry, from Li Xiasong.
13) Fix refcount leak in nft_ct when creating custom expectation, also
from Li Xiason.
Patches 1-9 from Florian Westphal.
10) Restore propagation of helper to expected connection, this is a
fix-for-recent-fix.
11) Check that tuple and mask netlink attributes are set when creating an
expectation via nfqueue.
* tag 'nf-26-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_ct: fix missing expect put in obj eval
netfilter: nf_conntrack_sip: get helper before allocating expectation
netfilter: ctnetlink: check tuple and mask in expectations created via nfqueue
netfilter: nf_conntrack_expect: restore helper propagation via expectation
netfilter: bridge: eb_tables: close module init race
netfilter: x_tables: close dangling table module init race
netfilter: ebtables: close dangling table module init race
netfilter: ebtables: move to two-stage removal scheme
netfilter: x_tables: add and use xtables_unregister_table_exit
netfilter: x_tables: unregister the templates first
netfilter: x_tables: add and use xt_unregister_table_pre_exit
netfilter: x_tables: allocate hook ops while under mutex
netfilter: x_tables: allow initial table replace without emitting audit log message
====================
Ben Morris [Fri, 8 May 2026 00:14:55 +0000 (17:14 -0700)]
sctp: revalidate list cursor after sctp_sendmsg_to_asoc() in SCTP_SENDALL
The SCTP_SENDALL path in sctp_sendmsg() iterates ep->asocs with
list_for_each_entry_safe(), which caches the next entry in @tmp before
the loop body runs. The body calls sctp_sendmsg_to_asoc(), which may
drop the socket lock inside sctp_wait_for_sndbuf().
While the lock is dropped, another thread can SCTP_SOCKOPT_PEELOFF the
association cached in @tmp, migrating it to a new endpoint via
sctp_sock_migrate() (list_del_init() + list_add_tail() to
newep->asocs), and optionally close the new socket which frees the
association via kfree_rcu(). The cached @tmp can also be freed by a
network ABORT for that association, processed in softirq while the
lock is dropped.
sctp_wait_for_sndbuf() revalidates @asoc (the current entry) on re-lock
via the "sk != asoc->base.sk" and "asoc->base.dead" checks, but nothing
revalidates @tmp. After a successful return, the iterator advances to
the stale @tmp, yielding either a use-after-free (if the peeled socket
was closed) or a list-walk onto the new endpoint's list head (type
confusion of &newep->asocs as a struct sctp_association *).
Both are reachable from CapEff=0; the type-confusion path gives
controlled indirect call via the outqueue.sched->init_sid pointer.
Fix by re-deriving @tmp from @asoc after sctp_sendmsg_to_asoc()
returns. @asoc is known to still be on ep->asocs at that point: the
only callers that list_del an association from ep->asocs are
sctp_association_free() (which sets asoc->base.dead) and
sctp_assoc_migrate() (which changes asoc->base.sk), and
sctp_wait_for_sndbuf() checks both under the lock before any
successful return; a tripped check propagates as err < 0 and the loop
bails before the re-derive.
The SCTP_ABORT path in sctp_sendmsg_check_sflags() returns 0 and the
loop hits 'continue' before sctp_sendmsg_to_asoc() is ever called, so
the @tmp cached by list_for_each_entry_safe() still covers the
lock-held free that ba59fb027307 ("sctp: walk the list of asoc
safely") was added for.
Fixes: 4910280503f3 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg") Cc: stable@vger.kernel.org Signed-off-by: Ben Morris <bmorris@anthropic.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20260508001455.3137-1-joycathacker@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: ti: icssm-prueth: fix eth_ports_node leak in probe
The error path on of_property_read_u32() failure inside
icssm_prueth_probe() returns without putting eth_ports_node,
which was acquired before the for_each_child_of_node() loop.
Myeonghun Pak [Wed, 6 May 2026 12:43:11 +0000 (21:43 +0900)]
net: lan966x: avoid unregistering netdev on register failure
lan966x_probe_port() stores the newly allocated net_device in the
port before calling register_netdev(). If register_netdev() fails,
the probe error path calls lan966x_cleanup_ports(), which sees
port->dev and calls unregister_netdev() for a device that was never
registered.
Destroy the phylink instance created for this port and clear port->dev
before returning the registration error. The common cleanup path now skips
ports without port->dev before reaching the registered netdev cleanup, so
it only handles ports that reached the registered-netdev lifetime.
This also avoids treating an uninitialized FDMA netdev and the failed port
as a NULL == NULL match in the common cleanup path.
Fixes: d28d6d2e37d1 ("net: lan966x: add port module support") Co-developed-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Myeonghun Pak <mhun512@gmail.com> Link: https://patch.msgid.link/20260506124331.31945-1-mhun512@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 8 May 2026 23:08:58 +0000 (16:08 -0700)]
Merge tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- Don't fallback to bus reset after failed slot reset; a bus reset
isn't safe if the .reset_slot() callback is implemented (Keith Busch)
- Update saved_config_space upon resource assignment to fix passthrough
regressions when x86 pcibios_assign_resources() updates BARs (Lukas
Wunner)
- Initialize a temporary pci_dev->dev in sysfs 'new_id' attribute to
fix a lockdep regression after driver_override was moved from PCI to
device core (Samiullah Khawaja)
- Update MAINTAINERS email addresses (Marek Vasut, Hans Zhang)
- Add MAINTAINERS reviewer for PCIe Cadence IP (Aksh Garg)
* tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Aksh Garg as PCIe CADENCE reviewer
MAINTAINERS: Update Hans Zhang email for PCIe CIX Sky1
MAINTAINERS: Update Marek Vasut email for PCIe R-Car
PCI: Initialize temporary device in new_id_store()
PCI: Update saved_config_space upon resource assignment
PCI: Don't fallback to bus reset after failed slot reset
====================
Intel Wired LAN Driver Updates 2026-05-04 (i40e, ice, idpf)
Matt Volrath fixes two issues with the i40e driver probe routine, ensuring
that PTP is properly cleaned up if the probe fails.
Emil corrects the initialization of the read_dev_clk_lock spinlock in
idpf_ptp_init, ensuring it is initialized prior to when the
ptp_schedule_worker() is called.
Greg KH fixes a double free and use-after free in the idpf auxiliary device
error paths.
Marcin fixes ice_set_rss_hfunc() to use the correct q_opt_flags field,
correcting the assignment and preventing submission of invalid data to the
firmware.
Bart corrects the locking in ice_dcb_rebuild(), ensuring that the tc_mutex
is held over the entire operation.
Ivan fixes the rclk pin state get for E810 devices, ensuring the index is
properly offset by the base_rclk_idx value. This ensures that the correct
pin index is used to look up recovered clock state. He additionally adds
bounds checking to prevent attempting to access pins outside of the pin
state array.
Ivan also moves the CGU register macros to the top of ice_dpll.h, inside
the header guard to avoid duplicate macro definitions should the ice_dpll.h
header is included multiple times.
====================
Ivan Vecera [Wed, 6 May 2026 21:48:17 +0000 (14:48 -0700)]
ice: dpll: fix misplaced header macros
The CGU register definitions (ICE_CGU_R10, ICE_CGU_R11 and related field
masks) were placed after the #endif of the _ICE_DPLL_H_ include guard,
leaving them unprotected. Move them inside the guard.
Fixes: ad1df4f2d591 ("ice: dpll: Support E825-C SyncE and dynamic pin discovery") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-8-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Wed, 6 May 2026 21:48:16 +0000 (14:48 -0700)]
ice: dpll: fix rclk pin state get for E810
The refactoring of ice_dpll_rclk_state_on_pin_get() to use
ice_dpll_pin_get_parent_idx() omitted the base_rclk_idx adjustment that was
correctly added in the ice_dpll_rclk_state_on_pin_set() path. This breaks
E810 devices where base_rclk_idx is non-zero, causing the wrong hardware
index to be used for pin state lookup and incorrect recovered clock state
to be reported via the DPLL subsystem. E825C is unaffected as its
base_rclk_idx is 0.
While at it, add bounds check against ICE_DPLL_RCLK_NUM_MAX on hw_idx after
the base_rclk_idx subtraction in both ice_dpll_rclk_state_on_pin_{get,set}()
to prevent out-of-bounds access on the pin state array.
Fixes: ad1df4f2d591 ("ice: dpll: Support E825-C SyncE and dynamic pin discovery") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-7-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bart Van Assche [Wed, 6 May 2026 21:48:15 +0000 (14:48 -0700)]
ice: fix locking in ice_dcb_rebuild()
Move the mutex_lock() call up to prevent that DCB settings change after
the first ice_query_port_ets() call. The second ice_query_port_ets()
call in ice_dcb_rebuild() is already protected by pf->tc_mutex.
This also fixes a bug in an error path, as before taking the first
"goto dcb_error" in the function jumped over mutex_lock() to
mutex_unlock().
This bug has been detected by the clang thread-safety analyzer.
Cc: intel-wired-lan@lists.osuosl.org Fixes: 242b5e068b25 ("ice: Fix DCB rebuild after reset") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-6-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marcin Szycik [Wed, 6 May 2026 21:48:14 +0000 (14:48 -0700)]
ice: fix setting RSS VSI hash for E830
ice_set_rss_hfunc() performs a VSI update, in which it sets hashing
function, leaving other VSI options unchanged. However, ::q_opt_flags is
mistakenly set to the value of another field, instead of its original
value, probably due to a typo. What happens next is hardware-dependent:
On E810, only the first bit is meaningful (see
ICE_AQ_VSI_Q_OPT_PE_FLTR_EN) and can potentially end up in a different
state than before VSI update.
On E830, some of the remaining bits are not reserved. Setting them
to some unrelated values can cause the firmware to reject the update
because of invalid settings, or worse - succeed.
Reproducer:
sudo ethtool -X $PF1 equal 8
Output in dmesg:
Failed to configure RSS hash for VSI 6, error -5
Fixes: 352e9bf23813 ("ice: enable symmetric-xor RSS for Toeplitz hash function") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-5-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
idpf: fix double free and use-after-free in aux device error paths
When auxiliary_device_add() fails in idpf_plug_vport_aux_dev() or
idpf_plug_core_aux_dev(), the err_aux_dev_add label calls
auxiliary_device_uninit() and falls through to err_aux_dev_init. The
uninit call will trigger put_device(), which invokes the release
callback (idpf_vport_adev_release / idpf_core_adev_release) that frees
iadev. The fall-through then reads adev->id from the freed iadev for
ida_free() and double-frees iadev with kfree().
Free the IDA slot and clear the back-pointer before uninit, while adev
is still valid, then return immediately.
Commit 65637c3a1811 ("idpf: fix UAF in RDMA core aux dev deinitialization")
fixed the same use-after-free in the matching unplug path in this file but
missed both probe error paths.
Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: stable@kernel.org Fixes: be91128c579c ("idpf: implement RDMA vport auxiliary dev create, init, and destroy") Fixes: f4312e6bfa2a ("idpf: implement core RDMA auxiliary dev create, init, and destroy") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-4-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Emil Tantilov [Wed, 6 May 2026 21:48:12 +0000 (14:48 -0700)]
idpf: fix read_dev_clk_lock spinlock init in idpf_ptp_init()
In idpf_ptp_init(), read_dev_clk_lock is initialized after
ptp_schedule_worker() had already been called (and after
idpf_ptp_settime64() could reach the lock). The PTP aux worker
fires immediately upon scheduling and can call into
idpf_ptp_read_src_clk_reg_direct(), which takes
spin_lock(&ptp->read_dev_clk_lock) on an uninitialized lock, triggering
the lockdep "non-static key" warning:
Move the call to spin_lock_init() up a bit to make sure read_dev_clk_lock
is not touched before it's been initialized.
Fixes: 5cb8805d2366 ("idpf: negotiate PTP capabilities and get PTP clock") Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-3-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matt Vollrath [Wed, 6 May 2026 21:48:10 +0000 (14:48 -0700)]
i40e: Cleanup PTP registration on probe failure
Fix two conditions which would leak PTP registration on probe failure:
1. i40e_setup_pf_switch can encounter an error in
i40e_setup_pf_filter_control, call i40e_ptp_init, then return
non-zero, sending i40e_probe to err_vsis.
2. i40e_setup_misc_vector can return non-zero, sending i40e_probe to
err_vsis.
Both of these conditions have been present since PTP was introduced in
this driver.
Found with coccinelle.
Fixes: beb0dff1251db ("i40e: enable PTP") Signed-off-by: Matt Vollrath <tactii@gmail.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-1-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mohsin Bashir [Wed, 6 May 2026 23:37:45 +0000 (16:37 -0700)]
net: shaper: Reject reparenting of existing nodes
When an existing node-scope shaper is moved to a different parent
via the group operation, the framework fails to update the leaves
count on both the old and new parent shapers. Only newly created
nodes (handle.id == NET_SHAPER_ID_UNSPEC) trigger the parent
leaves increment at line 1039.
This causes the parent's leaves counter to diverge from the
actual number of children in the xarray. When the node is later
deleted, pre_del_node() allocates an array sized by the stale
leaves count, but the xarray iteration finds more children than
expected, hitting the WARN_ON_ONCE guard and returning -EINVAL.
Rather than adding reparenting support with complex leaves count
bookkeeping, reject group calls that attempt to change an existing
node's parent. Updates to an existing node's rate or leaves under
the same parent remain permitted. We expect that for any modification
of the topology user should always create new groups and let the
kernel garbage collect the leaf-less nodes.
Alice Ryhl [Wed, 6 May 2026 20:07:13 +0000 (20:07 +0000)]
genetlink: free the skb on 'group >= family->n_mcgrps'
These methods generally consume ownership of the provided skb, so even
if an error path is encountered, the skb is freed. This is because the
very first thing they do after some initial setup is to unconditionally
consume the skb via consume_skb(skb). Any subsequent errors lead to the
core netlink layer freeing the skb.
However, there is one check that occurs before ownership is passed,
which is the check for the group index. So if this error condition is
encountered, then the skb is leaked. This error condition is generally
considered a violation of the netlink API, so it's not expected to occur
under normal circumstances. For the same reason, no callers check for
this error condition, and no callers need to be adjusted. However, we
should still follow the same ownership semantics of the rest of the
function. Thus, free the skb in this codepath.
Ilya Maximets [Thu, 7 May 2026 12:04:26 +0000 (14:04 +0200)]
net: nsh: fix incorrect header length macros
NSH header length is a 6-bit field that encodes the total length of
the header in 4-byte words. So the maximum length is 0b111111 * 4,
which is 252 and not 256. The maximum context length is the same
number minus the length of the base header (8), so 244.
These macros are used to validate push_nsh() action in openvswitch.
Miscalculation here doesn't cause any real issues. In the worst case
the oversized context is truncated while building the header, so we'll
construct and send a broken packet, which is not a big problem, as any
receiver should validate the fields. No invalid memory accesses will
happen during the header push. But we should fix the macros to reject
the incorrect actions in the first place.
Using previously defined values and calculating the length instead
of defining numbers directly, so it's easier to understand where they
come from and harder to make a mistake.
Quan Sun [Thu, 7 May 2026 13:17:38 +0000 (21:17 +0800)]
net: ethtool: fix NULL pointer dereference in phy_reply_size
In phy_prepare_data(), several strings such as 'name', 'drvname',
'upstream_sfp_name', and 'downstream_sfp_name' are allocated using
kstrdup(). However, these allocations were not checked for failure.
If kstrdup() fails for 'name', it returns NULL while the function
continues. This leads to a kernel NULL pointer dereference and panic
later in phy_reply_size() when it unconditionally calls strlen() on
the NULL pointer.
While other strings like 'upstream_sfp_name' might be checked before
access in certain code paths, failing to handle these allocations
consistently can lead to incomplete data reporting or hidden bugs.
Fix this by adding proper NULL checks for all kstrdup() calls in
phy_prepare_data() and implement a centralized error handling path
using goto labels to ensure all previously allocated resources are
freed on failure.
Fixes: 9dd2ad5e92b9 ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump") Signed-off-by: Quan Sun <2022090917019@std.uestc.edu.cn> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260507131738.1173835-1-2022090917019@std.uestc.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nicolas Ferre [Thu, 7 May 2026 12:04:44 +0000 (14:04 +0200)]
MAINTAINERS: change maintainers for macb Ethernet driver
I would like to hand over the macb maintenance to Théo, as I'm unable to
keep up with the recent flow of patches for this driver. After speaking
with Claudiu, he indicated that he is in the same position as me.
To help with this work, Conor has agreed to act as a reviewer.
I was given responsibility for this driver years ago, and I'm glad to
see it continue with talented developers.
Dragos Tatulea [Wed, 6 May 2026 09:08:08 +0000 (09:08 +0000)]
net: napi: Avoid gro timer misfiring at end of busypoll
When in irq deferral mode (defer-hard-irqs > 0), a short enough
gro-flush timeout can trigger before NAPI_STATE_SCHED is cleared if the
last poll in busy_poll_stop() takes too long. This can have the effect
of leaving the queue stuck with interrupts disabled and no timer armed
which results in a tx timeout if there is no subsequent busypoll cycle.
To prevent this, defer the gro-flush timer arm after the last poll.
Fixes: 7fd3253a7de6 ("net: Introduce preferred busy-polling") Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca> Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260506090808.820559-2-dtatulea@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
ipv6: flowlabel: per-netns budget for unprivileged callers
From: Maoyi Xie <maoyi.xie@ntu.edu.sg>
This series fixes the cross-tenant DoS in net/ipv6/ip6_flowlabel.c.
v1 through v6 were single-patch postings, each in its own thread.
v6 review pointed out that the existing fl_size read in
mem_check() and the corresponding write in fl_intern() are not in
the same critical section. v7 split the work into 2 patches.
Patch 1/2 is a prerequisite. It moves spin_lock_bh(&ip6_fl_lock)
and the matching unlock from fl_intern() into its only caller
ipv6_flowlabel_get(), so the mem_check() call runs under the same
critical section as the fl_intern() insert. With all writers and
the read of fl_size under the lock, fl_size is converted from
atomic_t to plain int. This is independent of the per-netns
budget. It also makes 2/2 backportable without conflicts.
Patch 2/2 is the v6 patch, rebased on 1/2.
- flowlabel_count is plain int rather than atomic_t, since the
previous patch put all writers and readers under ip6_fl_lock.
- In ip6_fl_gc(), fl_free() is now placed below the fl_size
and flowlabel_count decrements, removing the v6 cache of
fl->fl_net.
- In ip6_fl_purge(), fl_free() stays in its original position.
The function argument net is used for flowlabel_count.
- mem_check() uses spaces around the / operator on all four
expressions, addressing the checkpatch note in v6 review.
CAP_NET_ADMIN against init_user_ns still bypasses both caps.
Reproducer (KASAN VM, 4 cores, qemu): unprivileged netns A holds
3072 flowlabels via 100 procs. Fresh unprivileged netns B then
allocates 32 flowlabels (the FL_MAX_PER_SOCK ceiling for one
socket), the same as a clean baseline. Without the per-netns
ceiling, netns A could push fl_size past FL_MAX_SIZE - FL_MAX_SIZE
/ 4 and netns B would see allocations denied.
====================
Maoyi Xie [Wed, 6 May 2026 08:24:16 +0000 (16:24 +0800)]
ipv6: flowlabel: enforce per-netns limit for unprivileged callers
fl_size, fl_ht and ip6_fl_lock in net/ipv6/ip6_flowlabel.c are
file scope and shared across netns. mem_check() reads fl_size to
decide whether to deny non-CAP_NET_ADMIN callers. capable() runs
against init_user_ns, so an unprivileged user in any non-init
userns can push fl_size past FL_MAX_SIZE - FL_MAX_SIZE / 4 and
starve every other unprivileged userns on the host.
Add struct netns_ipv6::flowlabel_count, bumped and decremented
next to fl_size in fl_intern, ip6_fl_gc and ip6_fl_purge. The new
field fills the existing 4-byte hole after ipmr_seq, so struct
netns_ipv6 stays the same size on 64-bit builds.
Bump FL_MAX_SIZE from 4096 to 8192. It has been 4096 since the
file was added. Machines and connection counts have grown.
mem_check() folds an extra per-netns ceiling into the existing
non-CAP_NET_ADMIN conditional. The ceiling is half of the total
budget that unprivileged callers have ever been able to use, i.e.
(FL_MAX_SIZE - FL_MAX_SIZE / 4) / 2 = 3072 entries. With
FL_MAX_SIZE doubled, this preserves the original per-user reach
of 3K (what an unprivileged caller could already obtain before
this change), while forcing an attacker to spread allocations
across at least two netns to exhaust the global non-CAP_NET_ADMIN
budget.
CAP_NET_ADMIN against init_user_ns still bypasses both caps.
The previous patch took ip6_fl_lock across mem_check and
fl_intern, so the new flowlabel_count read in mem_check and the
new flowlabel_count++ in fl_intern run under the same critical
section. flowlabel_count is therefore plain int, like fl_size.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Link: https://patch.msgid.link/20260506082416.2259567-3-maoyixie.tju@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maoyi Xie [Wed, 6 May 2026 08:24:15 +0000 (16:24 +0800)]
ipv6: flowlabel: take ip6_fl_lock across mem_check and fl_intern
mem_check() in net/ipv6/ip6_flowlabel.c reads fl_size without
holding ip6_fl_lock. fl_intern() takes the lock immediately
afterwards. The two checks therefore race against concurrent
fl_intern, ip6_fl_gc and ip6_fl_purge writers, which makes the
mem_check budget check approximate.
Move spin_lock_bh(&ip6_fl_lock) and the matching unlock from
fl_intern() into its only caller ipv6_flowlabel_get(). The
mem_check() call now runs under the same critical section as the
fl_intern() insert, so the budget check is exact.
With all writers and the read of fl_size under ip6_fl_lock,
convert fl_size from atomic_t to plain int. The four sites that
update or read fl_size are fl_intern (insert path), ip6_fl_gc
(garbage collector, the !sched check and the per-entry decrement),
ip6_fl_purge (per-netns purge), and mem_check (budget check), and
all four now run under ip6_fl_lock.
This is a prerequisite for adding a per-netns budget alongside
fl_size. The follow-up patch adds netns_ipv6::flowlabel_count and
folds it into mem_check().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Link: https://patch.msgid.link/20260506082416.2259567-2-maoyixie.tju@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MAINTAINERS: Add self for the 3c509 network driver
It appears there's a need for a maintainer for the 3Com EtherLink III
family of Ethernet network adapters. There is documentation available
and the driver is very mature so the task ought to be of little hassle,
so I think I should be able to squeeze in any issues to be addressed.
When TCP socket migration fails at inet_ehash_insert() in
reqsk_timer_handler(), we jump to the no_ownership: label
and free the new reqsk immediately with __reqsk_free().
Thus, we must stop the new reqsk's timer before jumping to the
label, but the timer might be missed since the cited commit,
resulting in UAF.
As we are in the original reqsk's timer context, we can safely
call timer_delete_sync() for the new reqsk.
Let's pass false to __inet_csk_reqsk_queue_drop() to stop
the new reqsk's timer.
PCI: Initialize temporary device in new_id_store()
When setting new_id of a PCI device driver using sysfs a lockdep splat
occurs. This is because new_id_store() builds a temporary pci_dev for
pci_match_device(), which calls device_match_driver_override(). That
depends on the driver_override.lock added by cb3d1049f4ea ("driver core:
generalize driver_override in struct device").
The new driver_override.lock was not initialized in the temporary pci_dev,
resulting in this lockdep splat.
Initialize the temporary pci_dev to fix this.
Repro:
Build with CONFIG_LOCKDEP=y, boot with QEMU, and add a new ID:
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 2 UID: 0 PID: 177 Comm: liveupdate-iomm Not tainted 7.0.0+ #9 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
register_lock_class+0x77e/0x790
lock_acquire+0xbf/0x2e0
pci_match_device+0x24/0x180
new_id_store+0x189/0x1d0
kernfs_fop_write_iter+0x14f/0x210
vfs_write+0x263/0x5e0
ksys_write+0x79/0xf0
do_syscall_64+0x117/0xf80
Fixes: 10a4206a2401 ("PCI: use generic driver_override infrastructure") Fixes: 8895d3bcb8ba ("PCI: Fail new_id for vendor/device values already built into driver") Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
[bhelgaas: add commit log details and repro, trim backtrace] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Link: https://patch.msgid.link/20260505234327.716630-1-skhawaja@google.com
PCI: Update saved_config_space upon resource assignment
Bernd reports passthrough failure of a Digital Devices Cine S2 V6 DVB
adapter plugged into an ASRock X570S PG Riptide board with BIOS version
P5.41 (09/07/2023):
ddbridge 0000:05:00.0: detected Digital Devices Cine S2 V6 DVB adapter
ddbridge 0000:05:00.0: cannot read registers
ddbridge 0000:05:00.0: fail
BIOS assigns an incorrect BAR to the DVB adapter which doesn't fit into the
upstream bridge window. The kernel corrects the BAR assignment:
pci 0000:07:00.0: BAR 0 [mem 0xfffffffffc500000-0xfffffffffc50ffff 64bit]: can't claim; no compatible bridge window
pci 0000:07:00.0: BAR 0 [mem 0xfc500000-0xfc50ffff 64bit]: assigned
Correction of the BAR assignment happens in an x86-specific fs_initcall,
pcibios_assign_resources(), after device enumeration in a subsys_initcall.
This order was introduced at the behest of Linus in 2004:
No other architecture performs such a late BAR correction.
Bernd bisected the issue to commit a2f1e22390ac ("PCI/ERR: Ensure error
recoverability at all times"), but it only occurs in the absence of commit 4d4c10f763d7 ("PCI: Explicitly put devices into D0 when initializing").
This combination exists in stable kernel v6.12.70, but not in mainline,
hence Bernd cannot reproduce the issue with mainline.
Since a2f1e22390ac, config space is saved on enumeration, prior to BAR
correction. Upon passthrough, the corrected BAR is overwritten with the
incorrect saved value by:
But only if the device's current_state is PCI_UNKNOWN, as it was prior to
commit 4d4c10f763d7. Since the commit, it is PCI_D0, which changes the
behavior of vfio_pci_set_power_state() to no longer restore the state
without saving it first.
Alexandre is reporting the same issue as Bernd, but in his case, mainline
is affected as well. The difference is that on Alexandre's system, the
host kernel binds a driver to the device which is unbound prior to
passthrough, whereas on Bernd's system no driver gets bound by the host
kernel.
Unbinding sets current_state to PCI_UNKNOWN in pci_device_remove(), so when
vfio-pci is subsequently bound to the device, pci_restore_state() is once
again called without invoking pci_save_state() first.
To robustly fix the issue, always update saved_config_space upon resource
assignment.
Reported-by: Bernd Schumacher <bernd@bschu.de> Closes: https://lore.kernel.org/r/acfZrlP0Ua_5D3U4@eldamar.lan/ Reported-by: Alexandre N. <an.tech@mailo.com> Closes: https://lore.kernel.org/r/dd3c3358-de0f-4a56-9c81-04aceaab4058@mailo.com/ Fixes: a2f1e22390ac ("PCI/ERR: Ensure error recoverability at all times") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Bernd Schumacher <bernd@bschu.de> Tested-by: Alexandre N. <an.tech@mailo.com> Cc: stable@vger.kernel.org # v6.12+ Link: https://patch.msgid.link/febc3f354e0c1f5a9f5b3ee9ffddaa44caccf651.1776268054.git.lukas@wunner.de
bpf: Free reuseport cBPF prog after RCU grace period.
Eulgyu Kim reported the splat below with a repro. [0]
The repro sets up a UDP reuseport group with a cBPF prog and
replaces it with a new one while another thread is sending
a UDP packet to the group.
The reuseport prog is freed by sk_reuseport_prog_free().
bpf_prog_put() is called for "e"BPF prog to destruct through
multiple stages while cBPF prog is freed immediately by
bpf_release_orig_filter() and bpf_prog_free().
If a reuseport prog is detached from the setsockopt() path
(reuseport_attach_prog() or reuseport_detach_prog()),
sk_reuseport_prog_free() is called without waiting for RCU
readers to complete, resulting in various bugs.
Let's defer freeing the reuseport cBPF prog after one RCU
grace period.
Note "e"BPF prog is safe as is unless the fast path starts
to touch fields destroyed in bpf_prog_put_deferred() and
__bpf_prog_put_noref().
Linus Torvalds [Fri, 8 May 2026 20:18:13 +0000 (13:18 -0700)]
Merge tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- Fix for ublk not doing an actual issue from the task_work fallback
path. Any request hitting that should be canceled automatically
- Fix for uring_cmd prep side handling, for the block side uring_cmd
discard handling
- Fix for missing validation of the io and physical block size shifts
- Fix for a use-after-free in ublk's cancel command handling
* tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
ublk: fix use-after-free in ublk_cancel_cmd()
ublk: validate physical_bs_shift, io_min_shift and io_opt_shift
block: only read from sqe on initial invocation of blkdev_uring_cmd()
ublk: don't issue uring_cmd from fallback task work
Linus Torvalds [Fri, 8 May 2026 20:12:48 +0000 (13:12 -0700)]
Merge tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Ensure that the absolute timeouts for both the command side and the
waiting side honor the callers time namespace
- Ensure tracked NAPI entries are cleared at unregistration time, as
the NAPI polling loop checks the list state rather than the general
NAPI state. This can lead to NAPI polling even after unregistration
has been done. If unregistered, all NAPI polling should be disabled
- Fix for eventfd recursive invocation handling
* tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
io_uring/eventfd: reset deferred signal state
io_uring/napi: clear tracked NAPI entries on unregister
Fixes: 21fb59ab4b976 ("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn") Suggested-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: All applicable <stable@vger.kernel.org> Link: https://patch.msgid.link/20260504230141.484743-2-mario.limonciello@amd.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
bpf: tcp: Fix type confusion in sol_tcp_sockopt().
sol_tcp_sockopt() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:
socket(AF_INET, SOCK_RAW, IPPROTO_TCP)
Let's use sk_is_tcp().
Note that initially sol_tcp_sockopt() checked sk->sk_prot->setsockopt.
Fixes: 2ab42c7b871f ("bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260504210610.180150-7-kuniyu@google.com
mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
bpf_mptcp_sock_from_subflow() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:
socket(AF_INET, SOCK_RAW, IPPROTO_TCP)
In this case, it would NOT be valid to call sk_is_mptcp() which will
assume sk is a pointer to a struct tcp_sock, and wrongly checks for:
tcp_sk(sk)->is_mptcp.
Breno Leitao [Fri, 8 May 2026 16:22:03 +0000 (09:22 -0700)]
workqueue: Fix wq->cpu_pwq leak in alloc_and_link_pwqs() WQ_UNBOUND path
For WQ_UNBOUND workqueues, alloc_and_link_pwqs() allocates wq->cpu_pwq
via alloc_percpu() and then calls apply_workqueue_attrs_locked(). On
failure it returns the error directly, bypassing the enomem: label
which holds the only free_percpu(wq->cpu_pwq) in this function.
The caller's error path kfree()s wq without touching wq->cpu_pwq,
leaking one percpu pointer table (nr_cpu_ids * sizeof(void *) bytes) per
failed call.
Breno Leitao [Thu, 7 May 2026 11:04:46 +0000 (04:04 -0700)]
workqueue: Release PENDING in __queue_work() drain/destroy reject path
The caller of __queue_work() owns WORK_STRUCT_PENDING, won via
test_and_set_bit() in queue_work_on()/__queue_delayed_work(). The
state machine documented above __queue_work() requires that owner
to either hand the token to a pwq (insert_work() -> set_work_pwq()),
hand it to a timer, or release it via set_work_pool_and_clear_pending().
try_to_grab_pending() relies on this: when it observes
"PENDING && off-queue" it busy-loops, trusting the current owner to
make progress.
The (__WQ_DESTROYING | __WQ_DRAINING) early-return path violates that
contract. It WARN_ONCE()s and bare-returns, leaving work->data with
PENDING set, WORK_STRUCT_PWQ clear, and work->entry empty.
The path is reachable without explicit API abuse: queue_delayed_work()
arms a timer with PENDING set; if drain_workqueue() runs while the
timer is still pending, delayed_work_timer_fn() -> __queue_work() in
softirq context hits the WARN, current is not a wq worker so
is_chained_work() is false, and the work is silently dropped with
PENDING leaked.
Mirror what clear_pending_if_disabled() already does on its analogous
reject path: unpack the off-queue data and call
set_work_pool_and_clear_pending() to release the token before
returning.
I was able to reproduce this by queueing several slow works on
a max_active=1 wq, arm a delayed_work whose timer fires while
drain_workqueue() is blocked, then call cancel_delayed_work_sync().
Without this patch the cancel livelocks at 100% CPU; with it the cancel
returns immediately.
Linus Torvalds [Fri, 8 May 2026 17:24:35 +0000 (10:24 -0700)]
Merge tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix for two ACL issues (security fix to validate dacloffset better
and chmod fix)
- Fix out of bounds reads (in check_wsl_eas and smb2_check_msg for
symlinks)
- Two Kerberos fixes including an important one when AES-256 encryption
chosen
- Fix open_cached_dir problem when directory leases disabled
* tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: validate dacloffset before building DACL pointers
smb/client: fix out-of-bounds read in smb2_compound_op()
smb/client: fix out-of-bounds read in symlink_data()
smb: client: Zero-pad short GSS session keys per MS-SMB2
smb: client: Use FullSessionKey for AES-256 encryption key derivation
smb: client: use kzalloc to zero-initialize security descriptor buffer
cifs: abort open_cached_dir if we don't request leases
Linus Torvalds [Fri, 8 May 2026 17:14:51 +0000 (10:14 -0700)]
Merge tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There's two main series here, fixing issues that came up in the
Microchip QSPI and Freescale i.MX drivers. Both of those could result
in some quite noticable issues if they were encountered in production.
We also have one minor documentation fix in the ch341 driver"
* tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: ch341: correct company name in MODULE_DESCRIPTION
spi: microchip-core-qspi: remove some inline markings
spi: microchip-core-qspi: don't attempt to transmit during emulated read-only dual/quad operations
spi: microchip-core-qspi: control built-in cs manually
spi: imx: Propagate prepare_transfer() error from spi_imx_setupxfer()
spi: imx: Fix UAF on package-1 prepare failure in spi_imx_dma_data_prepare()
spi: imx: Fix precedence bug in spi_imx_dma_max_wml_find()
The buggy address belongs to the object at ffff88801083d280
which belongs to the cache RAW of size 1792
The buggy address is located 1248 bytes inside of
allocated 1792-byte region [ffff88801083d280, ffff88801083d980)
Fixes: 655a51e536c0 ("bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock") Reported-by: Damiano Melotti <melotti@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20260504210610.180150-2-kuniyu@google.com
Linus Torvalds [Fri, 8 May 2026 15:23:06 +0000 (08:23 -0700)]
Merge tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes, lots of them but all pretty small, amdgpu and xe are the
usual but then a large amount of fixes all over.
xe:
- Add NULL check for media_gt in intel_hdcp_gsc_check_status
- Fix EAGAIN sign in pf_migration_consume
- Fix MMIO access using PF view instead of VF view during migration
- Exclude indirect ring state page from ADS engine state size
* tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel: (37 commits)
drm: Set old handle to NULL before prime swap in change_handle
drm/bochs: Drop manual put on probe error path
drm/xe/guc: Exclude indirect ring state page from ADS engine state size
drm/xe/pf: Fix MMIO access using PF view instead of VF view during migration
drm/xe/pf: Fix EAGAIN sign in pf_migration_consume()
drm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status()
drm/exynos: remove bridge when component_add fails
drm/amdgpu: nuke amdgpu_userq_fence_slab v2
drm/amdgpu/userq: fix access to stale wptr mapping
drm/amdkfd: Check if there are kfd porcesses using adev by kfd_processes_count
drm/amdgpu: zero-initialize GART table on allocation
drm/amdgpu/sdma4: replace BUG_ON with WARN_ON in fence emission
drm/radeon: add missing revision check for CI
drm/amdgpu/pm: align Hawaii mclk workaround with radeon
drm/amdgpu/pm: add missing revision check for CI
drm/amdgpu/gfx9: drop unnecessary 64-bit fence flag check in KIQ
drm/amdkfd: Make all TLB-flushes heavy-weight
drm/panel: himax-hx83102: restore MODE_LPM after sending disable cmds
drm/panel: boe-tv101wum-nl6: restore MODE_LPM after sending disable cmds
drm/panel: feiyang-fy07024di26a30d: return display-on error
...
Thomas Hellström [Tue, 28 Apr 2026 09:44:42 +0000 (11:44 +0200)]
drm/ttm: Fix ttm_bo_swapout() infinite LRU walk on swapout failure
When ttm_tt_swapout() fails, the current code calls
ttm_resource_add_bulk_move() followed by ttm_resource_move_to_lru_tail()
to restore the resource's bulk_move membership.
However, ttm_resource_move_to_lru_tail() places the resource at the tail
of the LRU list which, relative to the walk cursor's hitch node (placed
immediately after the resource when it was yielded), puts the resource
*in front of the* the hitch. The next list_for_each_entry_continue() from
the hitch finds the same resource again, causing an infinite loop.
Fix by deferring del_bulk_move to the success path only.
On the success path, TTM_TT_FLAG_SWAPPED has just been set by
ttm_tt_swapout() but the resource is still tracked in the bulk_move range,
so ttm_resource_del_bulk_move()'s !ttm_resource_unevictable() guard would
incorrectly skip the removal. Introduce
ttm_resource_del_bulk_move_unevictable() which bypasses that guard.
Reported-by: Jatin Kataria <jkataria@netflix.com> Fixes: fc5d96670eb2 ("drm/ttm: Move swapped objects off the manager's LRU list") Cc: Christian König <christian.koenig@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <dri-devel@lists.freedesktop.org> Cc: <stable@vger.kernel.org> # v6.13+ Assisted-by: GitHub_Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Tested-by: Boqun Feng <boqun@kernel.org> Link: https://patch.msgid.link/20260428094442.16985-1-thomas.hellstrom@linux.intel.com
Linus Torvalds [Fri, 8 May 2026 15:16:07 +0000 (08:16 -0700)]
Merge tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
"Core:
- Cache-flushing fix for non-x86 platforms
AMD-Vi:
- Security fix when SEV-SNP is enabled
- Operator precedence fix in DTE setting"
* tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/amd: Fix precedence order in set_dte_passthrough()
iommu/pages: Fix iommu_pages_flush_incoherent() for non-x86
iommu/amd: Use maximum PPR log buffer size when SNP is enabled on Family 0x19
iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19
Zqiang [Fri, 8 May 2026 11:50:45 +0000 (19:50 +0800)]
sched_ext: Use IRQ_WORK_INIT_HARD() to initialize sch->disable_irq_work
For built with PREEMPT_RT kernels, the scx_disable_irq_workfn() is
called from per-cpu irq_work kthreads context, this means that
when call the scx_dump_state() in the scx_disable_irq_workfn() to
output current->comm/pid, it always output current irq_work kthread's
comm/pid. this commit therefore use the IRQ_WORK_INIT_HARD() to
initialize sch->disable_irq_work to make scx_disable_irq_workfn() is
called from hardirq context.
Mark Rutland [Fri, 8 May 2026 14:20:23 +0000 (15:20 +0100)]
arm64/entry: Fix arm64-specific rseq brokenness
Mathias Stearn reports that since v6.19, there are two big issues
affecting rseq:
(1) On arm64 specifically, rseq critical sections aren't aborted when
they should be.
(2) The 'cpu_id_start' field is no longer written by the kernel in all
cases it used to be, including some cases where TCMalloc depends on
the kernel clobbering the field.
This patch fixes issue #1. This patch DOES NOT fix issue #2, which will
need to be addressed by other patches.
The arm64-specific brokenness is a result of commits:
2fc0e4b4126c ("rseq: Record interrupt from user space") 39a167560a61 ("rseq: Optimize event setting")
The first commit failed to add a call to rseq_note_user_irq_entry() on
arm64. Thus arm64 never sets rseq_event::user_irq to record that it may
be necessary to abort an active rseq critical section upon return to
userspace. On its own, this commit had no functional impact as the value
of rseq_event::user_irq was not consumed.
The second commit relied upon rseq_event::user_irq to determine whether
or not to bother to perform rseq work when returning to userspace. As
rseq_event::user_irq wasn't set on arm64, this work would be skipped,
and consequently an active rseq critical section would not be aborted.
Fix this by giving arm64 syscall-specific entry/exit paths, and
performing the relevant logic in syscall and non-syscall paths,
including calling rseq_note_user_irq_entry() for non-syscall entry.
Currently arm64 cannot use syscall_enter_from_user_mode(),
syscall_exit_to_user_mode(), and irqentry_exit_to_user_mode(), due to
ordering constraints with exception masking, and risk of ABI breakage
for syscall tracing/audit/etc. For the moment the entry/exit logic is
left as arm64-specific, directly using enter_from_user_mode() and
exit_to_user_mode(), but mirroring the generic code.
I intend to follow up with refactoring/cleanup, as we did for kernel
mode entry paths in commit:
041aa7a85390 ("entry: Split preemption from irqentry_exit_to_kernel_mode()")
... which will allow arm64 to use the GENERIC_IRQ_ENTRY functions directly.
David Woodhouse [Tue, 28 Apr 2026 20:59:52 +0000 (21:59 +0100)]
x86/kexec: Push kjump return address even for non-kjump kexec
The version of purgatory code shipped by kexec-tools attempts to look above
the top of its stack to find a return address for a kjump, even in a non-kjump
kexec.
After the commit in Fixes: the word above the stack might not be there,
leading to a fault (which is at least now caught by my exception-handling code
in kexec).
That commit fixed things for the actual kjump path, but no longer
"gratuitously" pushes the unused return address to the stack in the non-kjump
path. Put that *back* in the non-kjump path, to prevent purgatory from
crashing when trying to access it.
Fixes: 2cacf7f23a02 ("x86/kexec: Fix stack and handling of re-entry point for ::preserve_context") Reported-by: Rohan Kakulawaram <rohanka@google.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Rohan Kakulawaram <rohanka@google.com> Cc: <stable@kernel.org> Link: https://patch.msgid.link/32d627134143ffd957891cb697138e839c623211.camel@infradead.org
The buffer is then handed to ntfs_attr_add() and persisted as the
SECURITY_DESCRIPTOR attribute of the new MFT record. The descriptor
covers a relative security descriptor header, two SIDs (owner and
group), an ACL header, and a single ACE, but several fields inside
those structures are never written before the buffer is committed
to disk:
- struct security_descriptor_relative
@alignment (1 byte)
@sacl (4 bytes; SE_SACL_PRESENT is not set
but the offset still reaches disk)
- struct ntfs_sid (3 instances: owner, group, ACE.sid)
identifier_authority.value[0..4] (5 bytes per SID, 15 total
- only value[5] is set)
That is 23 bytes of uninitialised slab memory persisted to disk for
every new file or directory the legacy ntfs driver creates. The
"+ 4" trailing accounting in sd_len holds ace->sid.sub_authority[0],
which the existing code does explicitly write to zero, so it is
not part of the leak.
Anything later able to read the SECURITY_DESCRIPTOR attribute - the
same NTFS volume mounted on Windows or by another NTFS reader, an
offline forensics tool, an unprivileged user that ends up with read
access to the volume - can recover those bytes. The leak persists
for the lifetime of the file on disk, not just the lifetime of the
kernel that wrote it.
Switch the allocation to kzalloc() so every byte the on-disk
descriptor covers is zero before the explicit initialisations run.
While there, replace the bare "return -1" allocation-failure path
with a proper -ENOMEM so the error reaches userspace as a meaningful
errno instead of an unrelated -EPERM.
Found by inspection while auditing fs/ntfs new-inode paths.
Fixes: af0db57d4293 ("ntfs: update inode operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
DaeMyung Kang [Thu, 7 May 2026 02:18:31 +0000 (11:18 +0900)]
ntfs: fix out-of-bounds write in ntfs_index_walk_down()
ntfs_index_walk_down() used to update the index traversal depth
directly before writing parent_pos[] and parent_vcn[]. A malformed
directory index with too many child-node levels can therefore advance
pindex past MAX_PARENT_VCN and write past the fixed arrays in struct
ntfs_index_context, corrupting context state used by later index
traversal.
Use ntfs_icx_parent_inc() for walk-down transitions so the existing
depth limit is enforced before the arrays are updated. Make the helper
check the limit before incrementing pindex so failed callers do not
leave the context at an out-of-range depth.
This is reachable by iterating a crafted NTFS directory after the volume
has been mounted, including read-only mounts. The reproducer uses
getdents64() on an index root that points to an excessively deep chain
of child index blocks.
A crafted directory index with a chain of child-node entries reproduced
UBSAN array-index-out-of-bounds reports in ntfs_index_walk_down() and
subsequent KASAN reports in ntfs_index_walk_up(). With this change, the
same image is rejected with "Index is over 32 level deep" and no KASAN
or UBSAN report is emitted.
DaeMyung Kang [Wed, 6 May 2026 09:24:48 +0000 (18:24 +0900)]
ntfs: fix out-of-bounds write in ntfs_rl_collapse_range() merge path
ntfs_rl_collapse_range() merges the run on the left of the collapsed
region with the run on its right when they are contiguous. The contiguous
check chooses a clamped index when @new_1st_cnt is 0:
i = new_1st_cnt == 0 ? 1 : new_1st_cnt;
if (ntfs_rle_lcn_contiguous(&new_rl[i - 1], &new_rl[i])) {
When @new_1st_cnt is 0 this computes &new_rl[-1] and writes 8 bytes
before the kvcalloc() runlist buffer. The path is reachable through
fallocate(FALLOC_FL_COLLAPSE_RANGE) starting at vcn 0 against an
attribute whose first run after the collapsed region and the following
run are holes. In that case ntfs_rle_lcn_contiguous() returns true
because both checked entries are LCN_HOLE, so the merge path is entered
with @new_1st_cnt still 0. Such consecutive holes do not occur on a
well-formed runlist (NTFS keeps runlists coalesced in memory), so this
OOB path is only reachable from a crafted volume.
A normal runlist has no element to the left of vcn 0, so the left/right
merge is not valid when @new_1st_cnt is 0. Require @new_1st_cnt to be
positive before checking or performing the merge. This skips the merge
entirely in that case instead of clamping the merge target.
The out-of-bounds write can corrupt an adjacent slab object. On a
non-KASAN kernel, it is reachable after a crafted NTFS volume has been
mounted read-write with the legacy fs/ntfs driver, by a local user that
has write access to the crafted file.
Fixes: 11ccc9107dc4 ("ntfs: update runlist handling and cluster allocator") Suggested-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>