Eric Dumazet [Tue, 20 Jan 2026 16:17:44 +0000 (16:17 +0000)]
bonding: provide a net pointer to __skb_flow_dissect()
After 3cbf4ffba5ee ("net: plumb network namespace into __skb_flow_dissect")
we have to provide a net pointer to __skb_flow_dissect(),
either via skb->dev, skb->sk, or a user provided pointer.
In the following case, syzbot was able to cook a bare skb.
Taehee Yoo [Tue, 20 Jan 2026 13:39:30 +0000 (13:39 +0000)]
selftests: net: amt: wait longer for connection before sending packets
Both send_mcast4() and send_mcast6() use sleep 2 to wait for the tunnel
connection between the gateway and the relay, and for the listener
socket to be created in the LISTENER namespace.
However, tests sometimes fail because packets are sent before the
connection is fully established.
Increase the waiting time to make the tests more reliable, and use
wait_local_port_listen() to explicitly wait for the listener socket.
Andrey Vatoropin [Tue, 20 Jan 2026 11:37:47 +0000 (11:37 +0000)]
be2net: Fix NULL pointer dereference in be_cmd_get_mac_from_list
When the parameter pmac_id_valid argument of be_cmd_get_mac_from_list() is
set to false, the driver may request the PMAC_ID from the firmware of the
network card, and this function will store that PMAC_ID at the provided
address pmac_id. This is the contract of this function.
However, there is a location within the driver where both
pmac_id_valid == false and pmac_id == NULL are being passed. This could
result in dereferencing a NULL pointer.
To resolve this issue, it is necessary to pass the address of a stub
variable to the function.
This change lead to MHI WWAN device can't connect to internet.
I found a netwrok issue with kernel 6.19-rc4, but network works
well with kernel 6.18-rc1. After checking, this commit is the
root cause.
Before appliing this serial changes on MHI WWAN network, we shall
revert this change in case of v6.19 being impacted.
* Implement missing DCB connectors in uconn.c previously defined in conn.h.
* Replace kernel WARN_ON macro with printk message to more gracefully signify
an unknown connector was encountered.
With this patch, unknown connectors are explicitly marked with value 0
(DCB_CONNECTOR_VGA) to match the tested current behavior. Although 0xff
(DCB_CONNECTOR_NONE) may be more suitable, I don't want to introduce a
breaking change.
Alex Ramírez [Sat, 13 Dec 2025 00:53:26 +0000 (19:53 -0500)]
drm/nouveau: add missing DCB connector types
* Add missing DCB connectors in conn.h as per the NVIDIA DCB specification.
A lot of connector logic was rewritten for Linux v6.5; some display connector types
went unaccounted-for which caused kernel warnings on devices with the now-unsupported
DCB connectors. This patch adds all of the DCB connectors as defined by NVIDIA to the
dcb_connector_type enum to bring back support for these connectors to the new logic.
Alex Deucher [Fri, 16 Jan 2026 02:45:43 +0000 (21:45 -0500)]
drm/amdgpu: fix type for wptr in ring backup
Needs to be a u64.
Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 56fff1941abd3ca3b6f394979614ca7972552f7f)
Timur Kristóf [Mon, 19 Jan 2026 20:36:24 +0000 (21:36 +0100)]
drm/amd/pm: Workaround SI powertune issue on Radeon 430 (v2)
Radeon 430 and 520 are OEM GPUs from 2016~2017
They have the same device id: 0x6611 and revision: 0x87
On the Radeon 430, powertune is buggy and throttles the GPU,
never allowing it to reach its maximum SCLK. Work around this
bug by raising the TDP limits we program to the SMC from
24W (specified by the VBIOS on Radeon 430) to 32W.
Disabling powertune entirely is not a viable workaround,
because it causes the Radeon 520 to heat up above 100 C,
which I prefer to avoid.
Additionally, revise the maximum SCLK limit. Considering the
above issue, these GPUs never reached a high SCLK on Linux,
and the workarounds were added before the GPUs were released,
so the workaround likely didn't target these specifically.
Use 780 MHz (the maximum SCLK according to the VBIOS on the
Radeon 430). Note that the Radeon 520 VBIOS has a higher
maximum SCLK: 905 MHz, but in practice it doesn't seem to
perform better with the higher clock, only heats up more.
v2:
Move the workaround to si_populate_smc_tdp_limits.
Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 966d70f1e160bdfdecaf7ff2b3f22ad088516e9f)
Timur Kristóf [Mon, 19 Jan 2026 20:36:23 +0000 (21:36 +0100)]
drm/amd/pm: Don't clear SI SMC table when setting power limit
There is no reason to clear the SMC table.
We also don't need to recalculate the power limit then.
Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e214d626253f5b180db10dedab161b7caa41f5e9)
Timur Kristóf [Mon, 19 Jan 2026 20:36:22 +0000 (21:36 +0100)]
drm/amd/pm: Fix si_dpm mmCG_THERMAL_INT setting
Use WREG32 to write mmCG_THERMAL_INT.
This is a direct access register.
Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2555f4e4a741d31e0496572a8ab4f55941b4e30e)
Sasha Levin [Wed, 21 Jan 2026 16:25:32 +0000 (11:25 -0500)]
objtool: Fix libopcodes linking with static libraries
Commit 436326bc525d ("objtool: fix build failure due to missing libopcodes
check") tests for libopcodes using an empty main(), which passes even when
static libraries lack their dependencies. This causes undefined reference
errors (xmalloc, bfd_get_bits, etc.) when linking against static libopcodes
without its required libbfd and libiberty.
Fix by testing with an actual libopcodes symbol and trying increasingly
complete library combinations until one succeeds.
Fixes: 436326bc525d ("objtool: fix build failure due to missing libopcodes check") Reported-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Link: https://patch.msgid.link/20260121162532.1596238-1-sashal@kernel.org
Linus Torvalds [Wed, 21 Jan 2026 17:34:45 +0000 (09:34 -0800)]
Merge tag 'soc-fixes-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"The main changes are devicetree updates for qualcomm and rockchips
arm64 platforms, fixing minor mistakes in SoC and board specific
settings:
- GPIO settings for Pinephone Pro buttons
- Register ranges for rk3576 GPU
- Power domains on sc8280xp
- Clocks on qcom talos
- dtc warnings for extraneous properties, nonstandard node names and
undocument identifiers
The Tegra210 platform gets a single revert for a devicetree change
that caused a 6.19 regression.
On 32-bit Arm, we have trivial fixes for Microchip SAMA7 devicetree
files and NPCM Kconfig, as well as Andrew Jeffery being officially
listed as MAINTAINER for NPCM.
A single driver fix is for Qualcomm RPMHD power domains, bringing the
driver up to date with a devicetree change that added additional power
domains to be enabled"
* tag 'soc-fixes-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (27 commits)
MAINTAINERS: Add Andrew as M: to ARM/NUVOTON NPCM ARCHITECTURE
MAINTAINERS: update email address for Yixun Lan
Revert "arm64: tegra: Add interconnect properties for Tegra210"
arm64: dts: rockchip: Drop unsupported properties
arm64: dts: rockchip: Fix gpio pinctrl node names
arm64: dts: rockchip: Fix pinctrl property typo on rk3326-odroid-go3
arm64: dts: rockchip: Drop "sitronix,st7789v" fallback compatible from rk3568-wolfvision
ARM: dts: microchip: sama7d65: fix size-cells property for i2c3
ARM: dts: microchip: sama7d65: fix the ranges property for flx9
arm: npcm: drop unused Kconfig ERRATA symbol
arm64: dts: rockchip: Fix wrong register range of rk3576 gpu
arm64: dts: rockchip: Configure MCLK for analog sound on NanoPi M5
arm64: dts: rockchip: Fix headphones widget name on NanoPi M5
ARM: dts: microchip: lan966x: Fix the access to the PHYs for pcb8290
arm64: dts: rockchip: remove redundant max-link-speed from nanopi-r4s
arm64: dts: rockchip: remove dangerous max-link-speed from helios64
arm64: dts: rockchip: fix unit-address for RK3588 NPU's core1 and core2's IOMMU
arm64: dts: rockchip: Fix wifi interrupts flag on Sakura Pi RK3308B
arm64: dts: qcom: sm8650: Fix compile warnings in USB controller node
arm64: dts: qcom: sm8550: Fix compile warnings in USB controller node
...
Vincent Guittot [Wed, 21 Jan 2026 16:33:17 +0000 (17:33 +0100)]
sched/fair: Fix pelt clock sync when entering idle
Samuel and Alex reported regressions of the util_avg of RT rq with
commit 17e3e88ed0b6 ("sched/fair: Fix pelt lost idle time detection").
It happens that fair is updating and syncing the pelt clock with task one
when pick_next_task_fair() fails to pick a task but before the prev
scheduling class got a chance to update its pelt signals.
Move update_idle_rq_clock_pelt() in set_next_task_idle() which is called
after prev class has been called.
Fixes: 17e3e88ed0b6 ("sched/fair: Fix pelt lost idle time detection") Closes: https://lore.kernel.org/all/CAG2KctpO6VKS6GN4QWDji0t92_gNBJ7HjjXrE+6H+RwRXt=iLg@mail.gmail.com/ Closes: https://lore.kernel.org/all/8cf19bf0e0054dcfed70e9935029201694f1bb5a.camel@mediatek.com/ Reported-by: Samuel Wu <wusamuel@google.com> Reported-by: Alex Hoh <Alex.Hoh@mediatek.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Samuel Wu <wusamuel@google.com> Tested-by: Alex Hoh <Alex.Hoh@mediatek.com> Link: https://patch.msgid.link/20260121163317.505635-1-vincent.guittot@linaro.org
Linus Torvalds [Wed, 21 Jan 2026 16:42:34 +0000 (08:42 -0800)]
Merge tag 'for-6.19-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- protect reading super block vs setting block size externally (found
by syzbot)
- make sure no transaction is started in read-only mode even with some
rescue mount option combinations
- fix checksum calculation of backup super blocks when block-group-tree
is enabled
- more extensive mount-time checks of device items that could be left
after device replace and attempting degraded mount
- fix build warning with -Wmaybe-uninitialized on loongarch64-gcc 12
* tag 'for-6.19-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: add extra device item checks at mount
btrfs: fix missing fields in superblock backup with BLOCK_GROUP_TREE
btrfs: reject new transactions if the fs is fully read-only
btrfs: sync read disk super and set block size
btrfs: fix Wmaybe-uninitialized warning in replay_one_buffer()
Fernand Sieber [Thu, 11 Dec 2025 18:36:04 +0000 (20:36 +0200)]
perf/x86/intel: Do not enable BTS for guests
By default when users program perf to sample branch instructions
(PERF_COUNT_HW_BRANCH_INSTRUCTIONS) with a sample period of 1, perf
interprets this as a special case and enables BTS (Branch Trace Store)
as an optimization to avoid taking an interrupt on every branch.
Since BTS doesn't virtualize, this optimization doesn't make sense when
the request originates from a guest. Add an additional check that
prevents this optimization for virtualized events (exclude_host).
Reported-by: Jan H. Schönherr <jschoenh@amazon.de> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fernand Sieber <sieberf@amazon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20251211183604.868641-1-sieberf@amazon.com
This occurs when creating a group member event with the flag
PERF_FLAG_FD_OUTPUT. The group leader should be mmap-ed and then mmap-ing
the event triggers the warning.
Since the event has copied the output_event in perf_event_set_output(),
event->rb is set. As a result, perf_mmap_rb() calls
refcount_inc(&event->mmap_count) when event->mmap_count = 0.
Disallow the case when event->mmap_count = 0. This also prevents two
events from updating the same user_page.
Fixes: 448f97fba901 ("perf: Convert mmap() refcounts to refcount_t") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Rosenberg <whrosenb@asu.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260119184956.801238-1-whrosenb@asu.edu
Ming Lei [Tue, 13 Jan 2026 08:58:02 +0000 (16:58 +0800)]
selftests/ublk: fix garbage output in foreground mode
Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output
when running kublk in foreground mode. Without this, _evtfd is
zero-initialized to 0 (stdin), and ublk_send_dev_event() writes
binary data to stdin which appears as garbage on the terminal.
Also fix debug message format string.
Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Tue, 13 Jan 2026 08:58:01 +0000 (16:58 +0800)]
selftests/ublk: fix error handling for starting device
Fix error handling in ublk_start_daemon() when start_dev fails:
1. Call ublk_ctrl_stop_dev() to cancel inflight uring_cmd before
cleanup. Without this, the device deletion may hang waiting for
I/O completion that will never happen.
2. Add fail_start label so that pthread_join() is called on the
error path. This ensures proper thread cleanup when startup fails.
Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Tue, 13 Jan 2026 08:58:00 +0000 (16:58 +0800)]
selftests/ublk: fix IO thread idle check
Include cmd_inflight in ublk_thread_is_done() check. Without this,
the thread may exit before all FETCH commands are completed, which
may cause device deletion to hang.
Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
block: make the new blkzoned UAPI constants discoverable
The Linux 6.19 merge window added the new BLKREPORTZONESV2 ioctl, and
with it the new BLK_ZONE_REP_CACHED and BLK_ZONE_COND_ACTIVE constants.
The two constants are defined as part of enums, which makes it very
painful for userspace to discover if they are present in the installed
system headers.
Use the #define to the same name trick to make them trivially
discoverable using CPP directives.
Fixes: 0bf0e2e46668 ("block: track zone conditions") Fixes: b30ffcdc0c15 ("block: introduce BLKREPORTZONESV2 ioctl") Reported-by: Andrey Albershteyn <aalbersh@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Seamus Connor [Thu, 15 Jan 2026 02:59:52 +0000 (18:59 -0800)]
ublk: fix ublksrv pid handling for pid namespaces
When ublksrv runs inside a pid namespace, START/END_RECOVERY compared
the stored init-ns tgid against the userspace pid (getpid vnr), so the
check failed and control ops could not proceed. Compare against the
caller’s init-ns tgid and store that value, then translate it back to
the caller’s pid namespace when reporting GET_DEV_INFO so ublk list
shows a sensible pid.
Testing: start/recover in a pid namespace; `ublk list` shows
reasonable pid values in init, child, and sibling namespaces.
Fixes: c2c8089f325e ("ublk: validate ublk server pid") Signed-off-by: Seamus Connor <sconnor@purestorage.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lukasz Laguna [Wed, 21 Jan 2026 14:33:04 +0000 (15:33 +0100)]
drm/xe: Update wedged.mode only after successful reset policy change
Previously, the driver's internal wedged.mode state was updated without
verifying whether the corresponding engine reset policy update in GuC
succeeded. This could leave the driver reporting a wedged.mode state
that doesn't match the actual reset behavior programmed in GuC.
With this change, the reset policy is updated first, and the driver's
wedged.mode state is modified only if the policy update succeeds on all
available GTs.
This patch also introduces two functional improvements:
- The policy is sent to GuC only when a change is required. An update
is needed only when entering or leaving XE_WEDGED_MODE_UPON_ANY_HANG,
because only in that case the reset policy changes. For example,
switching between XE_WEDGED_MODE_UPON_CRITICAL_ERROR and
XE_WEDGED_MODE_NEVER doesn't affect the reset policy, so there is no
need to send the same value to GuC.
- An inconsistent_reset flag is added to track cases where reset policy
update succeeds only on a subset of GTs. If such inconsistency is
detected, future wedged mode configuration will force a retry of the
reset policy update to restore a consistent state across all GTs.
Matthew Auld [Tue, 20 Jan 2026 11:06:11 +0000 (11:06 +0000)]
drm/xe/migrate: fix job lock assert
We are meant to be checking the user vm for the bind queue, but actually
we are checking the migrate vm. For various reasons this is not
currently firing but this will likely change in the future.
Now that we have the user_vm attached to the bind queue, we can fix this
by directly checking that here.
Fixes: dba89840a920 ("drm/xe: Add GT TLB invalidation jobs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Arvind Yadav <arvind.yadav@intel.com> Link: https://patch.msgid.link/20260120110609.77958-4-matthew.auld@intel.com
(cherry picked from commit 9dd1048bca4fe2aa67c7a286bafb3947537adedb) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Matthew Auld [Tue, 20 Jan 2026 11:06:10 +0000 (11:06 +0000)]
drm/xe/uapi: disallow bind queue sharing
Currently this is very broken if someone attempts to create a bind
queue and share it across multiple VMs. For example currently we assume
it is safe to acquire the user VM lock to protect some of the bind queue
state, but if allow sharing the bind queue with multiple VMs then this
quickly breaks down.
To fix this reject using a bind queue with any VM that is not the same
VM that was originally passed when creating the bind queue. This a uAPI
change, however this was more of an oversight on kernel side that we
didn't reject this, and expectation is that userspace shouldn't be using
bind queues in this way, so in theory this change should go unnoticed.
Based on a patch from Matt Brost.
v2 (Matt B):
- Hold the vm lock over queue create, to ensure it can't be closed as
we attach the user_vm to the queue.
- Make sure we actually check for NULL user_vm in destruction path.
v3:
- Fix error path handling.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Carl Zhang <carl.zhang@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Arvind Yadav <arvind.yadav@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Link: https://patch.msgid.link/20260120110609.77958-3-matthew.auld@intel.com
(cherry picked from commit 9dd08fdecc0c98d6516c2d2d1fa189c1332f8dab) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Just toggling the descriptor's "requested" flag is not enough. We need
to properly request it in order to potentially propagate any
configuration to pinctrl via the .request() callback.
We must not take the reference to the device at this point (the device
is not ready but we're also requesting the device's own descriptor) so
make the _commit() variants of request and free functions available to
GPIO core in order to use them instead of their regular counterparts.
This fixes an audio issue reported on one of the Qualcomm platforms.
Swaraj Gaikwad [Tue, 13 Jan 2026 15:06:39 +0000 (20:36 +0530)]
slab: fix kmalloc_nolock() context check for PREEMPT_RT
On PREEMPT_RT kernels, local_lock becomes a sleeping lock. The current
check in kmalloc_nolock() only verifies we're not in NMI or hard IRQ
context, but misses the case where preemption is disabled.
When a BPF program runs from a tracepoint with preemption disabled
(preempt_count > 0), kmalloc_nolock() proceeds to call
local_lock_irqsave() which attempts to acquire a sleeping lock,
triggering:
BUG: sleeping function called from invalid context
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6128
preempt_count: 2, expected: 0
Fix this by checking !preemptible() on PREEMPT_RT, which directly
expresses the constraint that we cannot take a sleeping lock when
preemption is disabled. This encompasses the previous checks for NMI
and hard IRQ contexts while also catching cases where preemption is
disabled.
Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") Reported-by: syzbot+b1546ad4a95331b2101e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b1546ad4a95331b2101e Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260113150639.48407-1-swarajgaikwad1925@gmail.co Cc: <stable@vger.kernel.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Thomas Gleixner [Wed, 17 Dec 2025 17:21:05 +0000 (18:21 +0100)]
clocksource: Reduce watchdog readout delay limit to prevent false positives
The "valid" readout delay between the two reads of the watchdog is larger
than the valid delta between the resulting watchdog and clocksource
intervals, which results in false positive watchdog results.
Assume TSC is the clocksource and HPET is the watchdog and both have a
uncertainty margin of 250us (default). The watchdog readout does:
If the delay between #4 and #6 is within the 750us margin then any delay
between #4 and #5 which is larger than 600us will fail the interval check
and mark the TSC unstable because the intervals are calculated against the
previous value:
which is obviously larger than the allowed 500us margin and results in
marking TSC unstable.
Fix this by using the same margin as the interval comparison. If the delay
between two watchdog reads is larger than that, then the readout was either
disturbed by interconnect congestion, NMIs or SMIs.
Takashi Iwai [Wed, 21 Jan 2026 08:20:20 +0000 (09:20 +0100)]
ALSA: usb-audio: Use the right limit for PCM OOB check
The recent fix commit for addressing the OOB access of PCM URB data
buffer caused a regression on Behringer UMC2020HD device, resulting in
choppy sound. The fix used ep->max_urb_frames for the upper limit
check, and this is no right value to be referred.
Use the actual buffer size (ctx->buffer_size) as the upper limit
instead, which also avoids the regression on the device above.
Jeongjun Park [Mon, 19 Jan 2026 06:33:59 +0000 (15:33 +0900)]
netrom: fix double-free in nr_route_frame()
In nr_route_frame(), old_skb is immediately freed without checking if
nr_neigh->ax25 pointer is NULL. Therefore, if nr_neigh->ax25 is NULL,
the caller function will free old_skb again, causing a double-free bug.
Therefore, to prevent this, we need to modify it to check whether
nr_neigh->ax25 is NULL before freeing old_skb.
Cc: <stable@vger.kernel.org> Reported-by: syzbot+999115c3bf275797dc27@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69694d6f.050a0220.58bed.0029.GAE@google.com/ Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Link: https://patch.msgid.link/20260119063359.10604-1-aha310510@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Laurent Vivier [Mon, 19 Jan 2026 07:55:18 +0000 (08:55 +0100)]
usbnet: limit max_mtu based on device's hard_mtu
The usbnet driver initializes net->max_mtu to ETH_MAX_MTU before calling
the device's bind() callback. When the bind() callback sets
dev->hard_mtu based the device's actual capability (from CDC Ethernet's
wMaxSegmentSize descriptor), max_mtu is never updated to reflect this
hardware limitation).
This allows userspace (DHCP or IPv6 RA) to configure MTU larger than the
device can handle, leading to silent packet drops when the backend sends
packet exceeding the device's buffer size.
Fix this by limiting net->max_mtu to the device's hard_mtu after the
bind callback returns.
See https://gitlab.com/qemu-project/qemu/-/issues/3268 and
https://bugs.passt.top/attachment.cgi?bugid=189
Timur Kristóf [Sun, 18 Jan 2026 13:03:45 +0000 (14:03 +0100)]
drm/amd/display: Only poll analog connectors
Analog connectors may be hot-plugged unlike other connector
types that don't support HPD.
Stop DRM from polling other connector types that don't
support HPD, such as eDP, LVDS, etc. These were wrongly
polled when analog connector support was added,
causing issues with the seamless boot process.
Fixes: c4f3f114e73c ("drm/amd/display: Poll analog connectors (v3)") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e924c7004b08e4e173782bad60b27841d889e371)
Alex Deucher [Mon, 29 Dec 2025 20:24:10 +0000 (15:24 -0500)]
drm/amdgpu: fix error handling in ib_schedule()
If fence emit fails, free the fence if necessary.
Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5eb680a06007f2f6ea333d11a4e29039da90614b)
Jonathan Kim [Wed, 17 Dec 2025 16:03:12 +0000 (11:03 -0500)]
drm/amdkfd: fix gfx11 restrictions on debugging cooperative launch
Restrictions on debugging cooperative launch for GFX11 devices should
align to CWSR work around requirements.
i.e. devices without the need for the work around should not be subject
to such restrictions.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: James Zhu <james.zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 230ef3977d6ffdd498ffa9baa6f5a061786189bf)
Jiqian Chen [Wed, 14 Jan 2026 10:06:10 +0000 (18:06 +0800)]
drm/amdgpu: free hw_vm_fence when fail in amdgpu_job_alloc
If drm_sched_job_init fails, hw_vm_fence is not freed currently,
then cause memory leak.
Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling") Link: https://lore.kernel.org/amd-gfx/a5a828cb-0e4a-41f0-94c3-df31e5ddad52@amd.com/T/#t Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> Reviewed-by: Amos Kong <kongjianjun@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5d42ee457ccd1fb5da4c7f817825b2806ec36956)
Jiawen Wu [Mon, 19 Jan 2026 06:59:35 +0000 (14:59 +0800)]
net: txgbe: remove the redundant data return in SW-FW mailbox
For these two firmware mailbox commands, in txgbe_test_hostif() and
txgbe_set_phy_link_hostif(), there is no need to read data from the
buffer.
Under the current setting, OEM firmware will cause the driver to fail to
probe. Because OEM firmware returns more link information, with a larger
OEM structure txgbe_hic_ephy_getlink. However, the current driver does
not support the OEM function. So just fix it in the way that does not
involve reading the returned data.
====================
fix some bugs in the flow director of HNS3 driver
This patchset fixes two bugs in the flow director:
1. Incorrect definition of HCLGE_FD_AD_COUNTER_NUM_M
2. Incorrect assignment of HCLGE_FD_AD_NXT_KEY
====================
Tao Wang reports that sometimes, after resume, stmmac can watchdog:
NETDEV WATCHDOG: CPU: x: transmit queue x timed out xx ms
When this occurs, the DMA transmit descriptors contain:
eth0: 221 [0x0000000876d10dd0]: 0x73660cbe 0x8 0x42 0xb04416a0
eth0: 222 [0x0000000876d10de0]: 0x77731d40 0x8 0x16a0 0x90000000
where descriptor 221 is the TSO header and 222 is the TSO payload.
tdes3 for descriptor 221 (0xb04416a0) has both bit 29 (first
descriptor) and bit 28 (last descriptor) set, which is incorrect.
The following packet also has bit 28 set, but isn't marked as a
first descriptor, and this causes the transmit DMA to stall.
This occurs because stmmac_tso_allocator() populates the first
descriptor, but does not set .last_segment correctly. There are two
places where this matters: one is later in stmmac_tso_xmit() where
we use it to update the TSO header descriptor. The other is in the
ring/chain mode clean_desc3() which is a performance optimisation.
Rather than using tx_q->tx_skbuff_dma[].last_segment to determine
whether the first descriptor entry is the only segment, calculate the
number of descriptor entries used. If there is only one descriptor,
then the first is also the last, so mark it as such.
Further work will be necessary to either eliminate .last_segment
entirely or set it correctly. Code analysis also indicates that a
similar issue exists with .is_jumbo. These will be the subject of
a future patch.
Reported-by: Tao Wang <tao03.wang@horizon.auto> Fixes: c2837423cb54 ("net: stmmac: Rework TX Coalesce logic") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vhq8O-00000005N5s-0Ke5@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Yang [Mon, 19 Jan 2026 15:34:36 +0000 (23:34 +0800)]
be2net: fix data race in be_get_new_eqd
In be_get_new_eqd(), statistics of pkts, protected by u64_stats_sync, are
read and accumulated in ignorance of possible u64_stats_fetch_retry()
events. Before the commit in question, these statistics were retrieved
one by one directly from queues. Fix this by reading them into temporary
variables first.
Fixes: 209477704187 ("be2net: set interrupt moderation for Skyhawk-R using EQ-DB") Signed-off-by: David Yang <mmyangfl@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260119153440.1440578-1-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Yang [Mon, 19 Jan 2026 16:27:16 +0000 (00:27 +0800)]
idpf: Fix data race in idpf_net_dim
In idpf_net_dim(), some statistics protected by u64_stats_sync, are read
and accumulated in ignorance of possible u64_stats_fetch_retry() events.
The correct way to copy statistics is already illustrated by
idpf_add_queue_stats(). Fix this by reading them into temporary variables
first.
Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Fixes: 3a8845af66ed ("idpf: add RX splitq napi poll support") Signed-off-by: David Yang <mmyangfl@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260119162720.1463859-1-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Yang [Mon, 19 Jan 2026 16:07:37 +0000 (00:07 +0800)]
net: hns3: fix data race in hns3_fetch_stats
In hns3_fetch_stats(), ring statistics, protected by u64_stats_sync, are
read and accumulated in ignorance of possible u64_stats_fetch_retry()
events. These statistics are already accumulated by
hns3_ring_stats_update(). Fix this by reading them into a temporary
buffer first.
nfc: MAINTAINERS: Orphan the NFC and look for new maintainers
NFC stack in Linux is in poor shape, with several bugs being discovered
last years via fuzzing, not much new development happening and limited
review and testing. It requires some more effort than drive-by reviews
I have been offering last one or two years.
I don't have much time nor business interests to keep looking at NFC,
so let's drop me from the maintainers to clearly indicate that more
hands are needed.
Linus Torvalds [Tue, 20 Jan 2026 23:01:15 +0000 (15:01 -0800)]
Merge tag 'devicetree-fixes-for-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Fix a refcount leak in of_alias_scan()
- Support descending into child nodes when populating nodes
in /firmware
* tag 'devicetree-fixes-for-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: fix reference count leak in of_alias_scan()
of: platform: Use default match table for /firmware
Linus Torvalds [Tue, 20 Jan 2026 21:32:16 +0000 (13:32 -0800)]
Merge tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
- A patch series from David Hildenbrand which fixes a few things
related to hugetlb PMD sharing
- The remainder are singletons, please see their changelogs for details
* tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: restore per-memcg proactive reclaim with !CONFIG_NUMA
mm/kfence: fix potential deadlock in reboot notifier
Docs/mm/allocation-profiling: describe sysctrl limitations in debug mode
mm: do not copy page tables unnecessarily for VM_UFFD_WP
mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather
mm/rmap: fix two comments related to huge_pmd_unshare()
mm/hugetlb: fix two comments related to huge_pmd_unshare()
mm/hugetlb: fix hugetlb_pmd_shared()
mm: remove unnecessary and incorrect mmap lock assert
x86/kfence: avoid writing L1TF-vulnerable PTEs
mm/vma: do not leak memory when .mmap_prepare swaps the file
migrate: correct lock ordering for hugetlb file folios
panic: only warn about deprecated panic_print on write access
fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
mm: take into account mm_cid size for mm_struct static definitions
mm: rename cpu_bitmap field to flexible_array
mm: add missing static initializer for init_mm::mm_cid.lock
Mina Almasry [Thu, 11 Dec 2025 10:19:29 +0000 (10:19 +0000)]
idpf: read lower clock bits inside the time sandwich
PCIe reads need to be done inside the time sandwich because PCIe
writes may get buffered in the PCIe fabric and posted to the device
after the _postts completes. Doing the PCIe read inside the time
sandwich guarantees that the write gets flushed before the _postts
timestamp is taken.
Cc: lrizzo@google.com Cc: namangulati@google.com Cc: willemb@google.com Cc: intel-wired-lan@lists.osuosl.org Cc: milena.olech@intel.com Cc: jacob.e.keller@intel.com Fixes: 5cb8805d2366 ("idpf: negotiate PTP capabilities and get PTP clock") Suggested-by: Shachar Raindel <shacharr@google.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Paul Greenwalt [Mon, 29 Dec 2025 08:52:34 +0000 (03:52 -0500)]
ice: fix devlink reload call trace
Commit 4da71a77fc3b ("ice: read internal temperature sensor") introduced
internal temperature sensor reading via HWMON. ice_hwmon_init() was added
to ice_init_feature() and ice_hwmon_exit() was added to ice_remove(). As a
result if devlink reload is used to reinit the device and then the driver
is removed, a call trace can occur.
BUG: unable to handle page fault for address: ffffffffc0fd4b5d
Call Trace:
string+0x48/0xe0
vsnprintf+0x1f9/0x650
sprintf+0x62/0x80
name_show+0x1f/0x30
dev_attr_show+0x19/0x60
The call trace repeats approximately every 10 minutes when system
monitoring tools (e.g., sadc) attempt to read the orphaned hwmon sysfs
attributes that reference freed module memory.
The sequence is:
1. Driver load, ice_hwmon_init() gets called from ice_init_feature()
2. Devlink reload down, flow does not call ice_remove()
3. Devlink reload up, ice_hwmon_init() gets called from
ice_init_feature() resulting in a second instance
4. Driver unload, ice_hwmon_exit() called from ice_remove() leaving the
first hwmon instance orphaned with dangling pointer
Fix this by moving ice_hwmon_exit() from ice_remove() to
ice_deinit_features() to ensure proper cleanup symmetry with
ice_hwmon_init().
Fixes: 4da71a77fc3b ("ice: read internal temperature sensor") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit 1390b8b3d2be ("ice: remove duplicate call to ice_deinit_hw() on
error paths") removed ice_deinit_hw() from ice_deinit_dev(). As a result
ice_devlink_reinit_down() no longer calls ice_deinit_hw(), but
ice_devlink_reinit_up() still calls ice_init_hw(). Since the control
queues are not uninitialized, ice_init_hw() fails with -EBUSY.
Add ice_deinit_hw() to ice_devlink_reinit_down() to correspond with
ice_init_hw() in ice_devlink_reinit_up().
Fixes: 1390b8b3d2be ("ice: remove duplicate call to ice_deinit_hw() on error paths") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Cody Haas [Sat, 13 Dec 2025 00:22:26 +0000 (16:22 -0800)]
ice: Fix persistent failure in ice_get_rxfh
Several ioctl functions have the ability to call ice_get_rxfh, however
all of these ioctl functions do not provide all of the expected
information in ethtool_rxfh_param. For example, ethtool_get_rxfh_indir does
not provide an rss_key. This previously caused ethtool_get_rxfh_indir to
always fail with -EINVAL.
This change draws inspiration from i40e_get_rss to handle this
situation, by only calling the appropriate rss helpers when the
necessary information has been provided via ethtool_rxfh_param.
Fixes: b66a972abb6b ("ice: Refactor ice_set/get_rss into LUT and key specific functions") Signed-off-by: Cody Haas <chaas@riotgames.com> Closes: https://lore.kernel.org/intel-wired-lan/CAH7f-UKkJV8MLY7zCdgCrGE55whRhbGAXvgkDnwgiZ9gUZT7_w@mail.gmail.com/ Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Linus Torvalds [Tue, 20 Jan 2026 17:46:29 +0000 (09:46 -0800)]
Merge tag 'pwm/for-6.19-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fixes and a maintainer update from Uwe Kleine-König:
- pwm: Ensure ioctl() returns a negative errno on error
This affects two ioctls on /dev/pwmchipX where the return value of
copy_to_user() was passed to userspace. This is fixed to return
-EFAULT now instead.
- pwm: max7360: Populate missing .sizeof_wfhw in max7360_pwm_ops
This fixes an oversight in the original commit that added support for
the max7360 driver (d93a75d94b79: "pwm: max7360: Add MAX7360 PWM
support"). There is no user-visible effect because the .sizeof_wfhw
member is just a safe guard that the memory provided by the core is
big enough. While it currently is big enough and there is no reason
to assume that will change, doing that correctly is necessary.
- MAINTAINERS: Add Michal Wilczynski as reviewer for PWM rust drivers
Michal cares for the Rust parts of the pwm subsystem. Several of the
patches sent recently for the (for now) only Rust pwm driver did not
add Michal to Cc which resulted in the patches waiting for review as
I thought Michal would care but he wasn't aware of them.
* tag 'pwm/for-6.19-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
MAINTAINERS: Add myself as reviewer for PWM rust drivers
pwm: max7360: Populate missing .sizeof_wfhw in max7360_pwm_ops
pwm: Ensure ioctl() returns a negative errno on error
Yosry Ahmed [Fri, 16 Jan 2026 20:52:47 +0000 (20:52 +0000)]
mm: restore per-memcg proactive reclaim with !CONFIG_NUMA
Commit 2b7226af730c ("mm/memcg: make memory.reclaim interface generic")
moved proactive reclaim logic from memory.reclaim handler to a generic
user_proactive_reclaim() helper to be used for per-node proactive reclaim.
However, user_proactive_reclaim() was only defined under CONFIG_NUMA, with
a stub always returning 0 otherwise. This broke memory.reclaim on
!CONFIG_NUMA configs, causing it to report success without actually
attempting reclaim.
Move the definition of user_proactive_reclaim() outside CONFIG_NUMA, and
instead define a stub for __node_reclaim() in the !CONFIG_NUMA case.
__node_reclaim() is only called from user_proactive_reclaim() when a write
is made to sys/devices/system/node/nodeX/reclaim, which is only defined
with CONFIG_NUMA.
Link: https://lkml.kernel.org/r/20260116205247.928004-1-yosry.ahmed@linux.dev Fixes: 2b7226af730c ("mm/memcg: make memory.reclaim interface generic") Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Breno Leitao [Fri, 16 Jan 2026 14:10:11 +0000 (06:10 -0800)]
mm/kfence: fix potential deadlock in reboot notifier
The reboot notifier callback can deadlock when calling
cancel_delayed_work_sync() if toggle_allocation_gate() is blocked in
wait_event_idle() waiting for allocations, that might not happen on
shutdown path.
The issue is that cancel_delayed_work_sync() waits for the work to
complete, but the work is waiting for kfence_allocation_gate > 0 which
requires allocations to happen (each allocation is increased by 1) -
allocations that may have stopped during shutdown.
Fix this by:
1. Using cancel_delayed_work() (non-sync) to avoid blocking. Now the
callback succeeds and return.
2. Adding wake_up() to unblock any waiting toggle_allocation_gate()
3. Adding !kfence_enabled to the wait condition so the wake succeeds
The static_branch_disable() IPI will still execute after the wake, but at
this early point in shutdown (reboot notifier runs with INT_MAX priority),
the system is still functional and CPUs can respond to IPIs.
Link: https://lkml.kernel.org/r/20260116-kfence_fix-v1-1-4165a055933f@debian.org Fixes: ce2bba89566b ("mm/kfence: add reboot notifier to disable KFENCE on shutdown") Signed-off-by: Breno Leitao <leitao@debian.org> Reported-by: Chris Mason <clm@meta.com> Closes: https://lore.kernel.org/all/20260113140234.677117-1-clm@meta.com/ Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Breno Leitao <leitao@debian.org> Cc: Chris Mason <clm@meta.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Docs/mm/allocation-profiling: describe sysctrl limitations in debug mode
When CONFIG_MEM_ALLOC_PROFILING_DEBUG=y, /proc/sys/vm/mem_profiling is
read-only to avoid debug warnings in a scenario when an allocation is
made while profiling is disabled (allocation does not get an allocation
tag), then profiling gets enabled and allocation gets freed (warning due
to the allocation missing allocation tag).
Link: https://lkml.kernel.org/r/20260116184423.2708363-1-surenb@google.com Fixes: ebdf9ad4ca98 ("memprofiling: documentation") Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Wed, 14 Jan 2026 11:00:06 +0000 (11:00 +0000)]
mm: do not copy page tables unnecessarily for VM_UFFD_WP
Commit ab04b530e7e8 ("mm: introduce copy-on-fork VMAs and make
VM_MAYBE_GUARD one") aggregates flags checks in vma_needs_copy(),
including VM_UFFD_WP.
However in doing so, it incorrectly performed this check against src_vma.
This check was done on the assumption that all relevant flags are copied
upon fork.
However the userfaultfd logic is very innovative in that it implements
custom logic on fork in dup_userfaultfd(), including a rather well hidden
case where lacking UFFD_FEATURE_EVENT_FORK causes VM_UFFD_WP to not be
propagated to the destination VMA.
And indeed, vma_needs_copy(), prior to this patch, did check this property
on dst_vma, not src_vma.
Since all the other relevant flags are copied on fork, we can simply fix
this by checking against dst_vma.
While we're here, we fix a comment against VM_COPY_ON_FORK (noting that it
did indeed already reference dst_vma) to make it abundantly clear that we
must check against the destination VMA.
Link: https://lkml.kernel.org/r/20260114110006.1047071-1-lorenzo.stoakes@oracle.com Fixes: ab04b530e7e8 ("mm: introduce copy-on-fork VMAs and make VM_MAYBE_GUARD one") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Chris Mason <clm@meta.com> Closes: https://lore.kernel.org/all/20260113231257.3002271-1-clm@meta.com/ Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Pedro Falcato <pfalcato@suse.de> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather
As reported, ever since commit 1013af4f585f ("mm/hugetlb: fix
huge_pmd_unshare() vs GUP-fast race") we can end up in some situations
where we perform so many IPI broadcasts when unsharing hugetlb PMD page
tables that it severely regresses some workloads.
In particular, when we fork()+exit(), or when we munmap() a large
area backed by many shared PMD tables, we perform one IPI broadcast per
unshared PMD table.
There are two optimizations to be had:
(1) When we process (unshare) multiple such PMD tables, such as during
exit(), it is sufficient to send a single IPI broadcast (as long as
we respect locking rules) instead of one per PMD table.
Locking prevents that any of these PMD tables could get reused before
we drop the lock.
(2) When we are not the last sharer (> 2 users including us), there is
no need to send the IPI broadcast. The shared PMD tables cannot
become exclusive (fully unshared) before an IPI will be broadcasted
by the last sharer.
Concurrent GUP-fast could walk into a PMD table just before we
unshared it. It could then succeed in grabbing a page from the
shared page table even after munmap() etc succeeded (and supressed
an IPI). But there is not difference compared to GUP-fast just
sleeping for a while after grabbing the page and re-enabling IRQs.
Most importantly, GUP-fast will never walk into page tables that are
no-longer shared, because the last sharer will issue an IPI
broadcast.
(if ever required, checking whether the PUD changed in GUP-fast
after grabbing the page like we do in the PTE case could handle
this)
So let's rework PMD sharing TLB flushing + IPI sync to use the mmu_gather
infrastructure so we can implement these optimizations and demystify the
code at least a bit. Extend the mmu_gather infrastructure to be able to
deal with our special hugetlb PMD table sharing implementation.
To make initialization of the mmu_gather easier when working on a single
VMA (in particular, when dealing with hugetlb), provide
tlb_gather_mmu_vma().
We'll consolidate the handling for (full) unsharing of PMD tables in
tlb_unshare_pmd_ptdesc() and tlb_flush_unshared_tables(), and track
in "struct mmu_gather" whether we had (full) unsharing of PMD tables.
Because locking is very special (concurrent unsharing+reuse must be
prevented), we disallow deferring flushing to tlb_finish_mmu() and instead
require an explicit earlier call to tlb_flush_unshared_tables().
From hugetlb code, we call huge_pmd_unshare_flush() where we make sure
that the expected lock protecting us from concurrent unsharing+reuse is
still held.
Check with a VM_WARN_ON_ONCE() in tlb_finish_mmu() that
tlb_flush_unshared_tables() was properly called earlier.
Document it all properly.
Notes about tlb_remove_table_sync_one() interaction with unsharing:
There are two fairly tricky things:
(1) tlb_remove_table_sync_one() is a NOP on architectures without
CONFIG_MMU_GATHER_RCU_TABLE_FREE.
Here, the assumption is that the previous TLB flush would send an
IPI to all relevant CPUs. Careful: some architectures like x86 only
send IPIs to all relevant CPUs when tlb->freed_tables is set.
The relevant architectures should be selecting
MMU_GATHER_RCU_TABLE_FREE, but x86 might not do that in stable
kernels and it might have been problematic before this patch.
Also, the arch flushing behavior (independent of IPIs) is different
when tlb->freed_tables is set. Do we have to enlighten them to also
take care of tlb->unshared_tables? So far we didn't care, so
hopefully we are fine. Of course, we could be setting
tlb->freed_tables as well, but that might then unnecessarily flush
too much, because the semantics of tlb->freed_tables are a bit
fuzzy.
This patch changes nothing in this regard.
(2) tlb_remove_table_sync_one() is not a NOP on architectures with
CONFIG_MMU_GATHER_RCU_TABLE_FREE that actually don't need a sync.
Take x86 as an example: in the common case (!pv, !X86_FEATURE_INVLPGB)
we still issue IPIs during TLB flushes and don't actually need the
second tlb_remove_table_sync_one().
This optimized can be implemented on top of this, by checking e.g., in
tlb_remove_table_sync_one() whether we really need IPIs. But as
described in (1), it really must honor tlb->freed_tables then to
send IPIs to all relevant CPUs.
Notes on TLB flushing changes:
(1) Flushing for non-shared PMD tables
We're converting from flush_hugetlb_tlb_range() to
tlb_remove_huge_tlb_entry(). Given that we properly initialize the
MMU gather in tlb_gather_mmu_vma() to be hugetlb aware, similar to
__unmap_hugepage_range(), that should be fine.
(2) Flushing for shared PMD tables
We're converting from various things (flush_hugetlb_tlb_range(),
tlb_flush_pmd_range(), flush_tlb_range()) to tlb_flush_pmd_range().
tlb_flush_pmd_range() achieves the same that
tlb_remove_huge_tlb_entry() would achieve in these scenarios.
Note that tlb_remove_huge_tlb_entry() also calls
__tlb_remove_tlb_entry(), however that is only implemented on
powerpc, which does not support PMD table sharing.
Similar to (1), tlb_gather_mmu_vma() should make sure that TLB
flushing keeps on working as expected.
Further, note that the ptdesc_pmd_pts_dec() in huge_pmd_share() is not a
concern, as we are holding the i_mmap_lock the whole time, preventing
concurrent unsharing. That ptdesc_pmd_pts_dec() usage will be removed
separately as a cleanup later.
There are plenty more cleanups to be had, but they have to wait until
this is fixed.
[david@kernel.org: fix kerneldoc] Link: https://lkml.kernel.org/r/f223dd74-331c-412d-93fc-69e360a5006c@kernel.org Link: https://lkml.kernel.org/r/20251223214037.580860-5-david@kernel.org Fixes: 1013af4f585f ("mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race") Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reported-by: Uschakow, Stanislav" <suschako@amazon.de> Closes: https://lore.kernel.org/all/4d3878531c76479d9f8ca9789dc6485d@amazon.de/ Tested-by: Laurence Oberman <loberman@redhat.com> Acked-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rik van Riel <riel@surriel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/rmap: fix two comments related to huge_pmd_unshare()
PMD page table unsharing no longer touches the refcount of a PMD page
table. Also, it is not about dropping the refcount of a "PMD page" but
the "PMD page table".
Let's just simplify by saying that the PMD page table was unmapped,
consequently also unmapping the folio that was mapped into this page.
This code should be deduplicated in the future.
Link: https://lkml.kernel.org/r/20251223214037.580860-4-david@kernel.org Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count") Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Rik van Riel <riel@surriel.com> Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: "Uschakow, Stanislav" <suschako@amazon.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/hugetlb: fix two comments related to huge_pmd_unshare()
Ever since we stopped using the page count to detect shared PMD page
tables, these comments are outdated.
The only reason we have to flush the TLB early is because once we drop the
i_mmap_rwsem, the previously shared page table could get freed (to then
get reallocated and used for other purpose). So we really have to flush
the TLB before that could happen.
So let's simplify the comments a bit.
The "If we unshared PMDs, the TLB flush was not recorded in mmu_gather."
part introduced as in commit a4a118f2eead ("hugetlbfs: flush TLBs
correctly after huge_pmd_unshare") was confusing: sure it is recorded in
the mmu_gather, otherwise tlb_flush_mmu_tlbonly() wouldn't do anything.
So let's drop that comment while at it as well.
We'll centralize these comments in a single helper as we rework the code
next.
Link: https://lkml.kernel.org/r/20251223214037.580860-3-david@kernel.org Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count") Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Rik van Riel <riel@surriel.com> Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: "Uschakow, Stanislav" <suschako@amazon.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/hugetlb: fixes for PMD table sharing (incl. using
mmu_gather)", v3.
One functional fix, one performance regression fix, and two related
comment fixes.
I cleaned up my prototype I recently shared [1] for the performance fix,
deferring most of the cleanups I had in the prototype to a later point.
While doing that I identified the other things.
The goal of this patch set is to be backported to stable trees "fairly"
easily. At least patch #1 and #4.
Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing
Patch #2 + #3 are simple comment fixes that patch #4 interacts with.
Patch #4 is a fix for the reported performance regression due to excessive
IPI broadcasts during fork()+exit().
The last patch is all about TLB flushes, IPIs and mmu_gather.
Read: complicated
There are plenty of cleanups in the future to be had + one reasonable
optimization on x86. But that's all out of scope for this series.
Runtime tested, with a focus on fixing the performance regression using
the original reproducer [2] on x86.
This patch (of 4):
We switched from (wrongly) using the page count to an independent shared
count. Now, shared page tables have a refcount of 1 (excluding
speculative references) and instead use ptdesc->pt_share_count to identify
sharing.
We didn't convert hugetlb_pmd_shared(), so right now, we would never
detect a shared PMD table as such, because sharing/unsharing no longer
touches the refcount of a PMD table.
Page migration, like mbind() or migrate_pages() would allow for migrating
folios mapped into such shared PMD tables, even though the folios are not
exclusive. In smaps we would account them as "private" although they are
"shared", and we would be wrongly setting the PM_MMAP_EXCLUSIVE in the
pagemap interface.
Fix it by properly using ptdesc_pmd_is_shared() in hugetlb_pmd_shared().
Lorenzo Stoakes [Wed, 14 Jan 2026 11:56:19 +0000 (11:56 +0000)]
mm: remove unnecessary and incorrect mmap lock assert
This check was introduced by commit 42fc541404f2 ("mmap locking API: add
mmap_assert_locked() and mmap_assert_write_locked()") which replaced a
VM_BUG_ON_VMA() over rwsem_is_locked from commit a00cc7d9dd93 ("mm, x86:
add support for PUD-sized transparent hugepages"), i.e. the commit that
introduced PUD THPs.
These seem to be careful asserts introduced to ensure that locks are held
in general, however for a zap we require that VMAs are kept stable, and
this is a requirement that has held perfectly well for a long time.
These were long before VMA locks and thus there appears to be no reason to
think this is assert is there for anything other than 'stabilised VMA'.
Asserting that the VMA under examination is stable only in the case of a
THP PUD is strange and unnecessary. If we wish to be careful and assert
such things, we should do so at the zap level.
However in any case the current situation is already simply incorrect - a
VMA lock suffices here.
Remove the assert for now as it is unnecessarily, incorrect and unhelpful,
subsequent work can introduce an assert in general for zapping if
required.
Link: https://lkml.kernel.org/r/20260114115619.1087466-1-lorenzo.stoakes@oracle.com Fixes: 2ab7f1bbafc9 ("mm/madvise: allow guard page install/remove under VMA lock") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Chris Mason <clm@meta.com> Closes: https://lore.kernel.org/all/20260113220856.2358195-1-clm@meta.com/ Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
iommu/io-pgtable-arm: fix size_t signedness bug in unmap path
__arm_lpae_unmap() returns size_t but was returning -ENOENT (negative
error code) when encountering an unmapped PTE. Since size_t is unsigned,
-ENOENT (typically -2) becomes a huge positive value (0xFFFFFFFFFFFFFFFE
on 64-bit systems).
This corrupted value propagates through the call chain:
__arm_lpae_unmap() returns -ENOENT as size_t
-> arm_lpae_unmap_pages() returns it
-> __iommu_unmap() adds it to iova address
-> iommu_pgsize() triggers BUG_ON due to corrupted iova
This can cause IOVA address overflow in __iommu_unmap() loop and
trigger BUG_ON in iommu_pgsize() from invalid address alignment.
Fix by returning 0 instead of -ENOENT. The WARN_ON already signals
the error condition, and returning 0 (meaning "nothing unmapped")
is the correct semantic for size_t return type. This matches the
behavior of other io-pgtable implementations (io-pgtable-arm-v7s,
io-pgtable-dart) which return 0 on error conditions.
Fixes: 3318f7b5cefb ("iommu/io-pgtable-arm: Add quirk to quiet WARN_ON()") Cc: stable@vger.kernel.org Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Rob Clark <robin.clark@oss.qualcomm.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Yun Lu [Fri, 16 Jan 2026 09:53:08 +0000 (17:53 +0800)]
netdevsim: fix a race issue related to the operation on bpf_bound_progs list
The netdevsim driver lacks a protection mechanism for operations on the
bpf_bound_progs list. When the nsim_bpf_create_prog() performs
list_add_tail, it is possible that nsim_bpf_destroy_prog() is
simultaneously performs list_del. Concurrent operations on the list may
lead to list corruption and trigger a kernel crash as follows:
Qu Wenruo [Sun, 11 Jan 2026 22:02:09 +0000 (08:32 +1030)]
btrfs: add extra device item checks at mount
[BUG]
There is a bug report where after a dev-replace, the replace source
device with devid 4 is properly erased (dump tree shows it's the old
devid 4), but the target device is still using devid 0.
When the user tries to mount the fs degraded, the mount failed with the
following errors:
BTRFS: device fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 devid 5 transid 1394395 /dev/sda (8:0) scanned by btrfs (261)
BTRFS: device fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 devid 6 transid 1394395 /dev/sde (8:64) scanned by btrfs (261)
BTRFS: device fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 devid 0 transid 1394395 /dev/sdd (8:48) scanned by btrfs (261)
BTRFS: device fsid 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9 devid 3 transid 1394395 /dev/sdf (8:80) scanned by btrfs (261)
BTRFS info (device sdd): first mount of filesystem 84a1ed4a-365c-45c3-a9ee-a7df525dc3c9
BTRFS info (device sdd): using crc32c (crc32c-intel) checksum algorithm
BTRFS warning (device sdd): devid 4 uuid 01e2081c-9c2a-4071-b9f4-e1b27e571ff5 is missing
BTRFS info (device sdd): bdev <missing disk> errs: wr 84994544, rd 15567, flush 65872, corrupt 0, gen 0
BTRFS info (device sdd): bdev /dev/sdd errs: wr 71489901, rd 0, flush 30001, corrupt 0, gen 0
BTRFS error (device sdd): replace without active item, run 'device scan --forget' on the target device
BTRFS error (device sdd): failed to init dev_replace: -117
BTRFS error (device sdd): open_ctree failed: -117
[CAUSE]
The devid 0 didn't get its devid updated is its own problem, here I'm
only focusing on the mount failure itself.
The mount is not caused by the missing device, as the fs has RAID1C3 for
metadata and RAID10 for data, thus is completely able to tolerate one
missing device.
The device tree shows the dev-replace has properly finished:
So this means device scan will register sdd as devid 0 into the fs, then
during btrfs_init_dev_replace(), we located the replace progress item,
found the previous replace is finished, but we still need to check if
the dev-replace target device (devid 0) exists.
If that device exists, we error out showing that error message.
But to be honest the end user may not really remember which device is
the replace target device, thus not sure what to do in the next step.
[ENHANCEMENT]
To make the error more obvious, and tell the end user which devices
should be unregistered:
- Introduce BTRFS_DEV_STATE_ITEM_FOUND flag
During device item read from the chunk tree, set the flag for each
found device item.
- Verify there is no device without the above flag during mount
Even missing device should have that flag set.
If we found a device without that flag set, it means it's an
unexpected one and should be rejected.
- More detailed error message on what to do next
This will show all unexpected devices and tell the end user to use
'btrfs dev scan --forget' to forget them or remove them before mount.
There is an example dmesg where a device of a valid filesystem is modified to
have devid 0, then try degraded mount:
BTRFS info (device dm-6): first mount of filesystem 7c873869-844c-4b39-bd75-a96148bf4656
BTRFS info (device dm-6): using crc32c checksum algorithm
BTRFS warning (device dm-6): devid 3 uuid b4a9f35b-db42-4ac4-b55a-cbf81d3b9683 is missing
BTRFS error (device dm-6): devid 0 path /dev/mapper/test-scratch3 is registered but not found in chunk tree
BTRFS error (device dm-6): please remove above devices or use 'btrfs device scan --forget <dev>' to unregister them before mount
BTRFS error (device dm-6): open_ctree failed: -117
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Mark Harmstone [Tue, 13 Jan 2026 18:37:56 +0000 (18:37 +0000)]
btrfs: fix missing fields in superblock backup with BLOCK_GROUP_TREE
When the BLOCK_GROUP_TREE compat_ro flag is set, the extent root and
csum root fields are getting missed.
This is because EXTENT_TREE_V2 treated these differently, and when
they were split off this special-casing was mistakenly assigned to
BGT rather than the rump EXTENT_TREE_V2. There's no reason why the
existence of the block group tree should mean that we don't record the
details of the last commit's extent root and csum root.
Fix the code in backup_super_roots() so that the correct check gets
made.
Fixes: 1c56ab991903 ("btrfs: separate BLOCK_GROUP_TREE compat RO flag from EXTENT_TREE_V2") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Since rescue mount options will mark the full fs read-only, there should
be no new transaction triggered.
But during unmount we will evict all inodes, which can trigger a new
transaction, and triggers warnings on a heavily corrupted fs.
[CAUSE]
Btrfs allows new transaction even on a read-only fs, this is to allow
log replay happen even on read-only mounts, just like what ext4/xfs do.
However with rescue mount options, the fs is fully read-only and cannot
be remounted read-write, thus in that case we should also reject any new
transactions.
[FIX]
If we find the fs has rescue mount options, we should treat the fs as
error, so that no new transaction can be started.
When the user performs a btrfs mount, the block device is not set
correctly. The user sets the block size of the block device to 0x4000
by executing the BLKBSZSET command.
Since the block size change also changes the mapping->flags value, this
further affects the result of the mapping_min_folio_order() calculation.
Let's analyze the following two scenarios:
Scenario 1: Without executing the BLKBSZSET command, the block size is
0x1000, and mapping_min_folio_order() returns 0;
Scenario 2: After executing the BLKBSZSET command, the block size is
0x4000, and mapping_min_folio_order() returns 2.
do_read_cache_folio() allocates a folio before the BLKBSZSET command
is executed. This results in the allocated folio having an order value
of 0. Later, after BLKBSZSET is executed, the block size increases to
0x4000, and the mapping_min_folio_order() calculation result becomes 2.
This leads to two undesirable consequences:
1. filemap_add_folio() triggers a VM_BUG_ON_FOLIO(folio_order(folio) <
mapping_min_folio_order(mapping)) assertion.
2. The syzbot report [1] shows a null pointer dereference in
create_empty_buffers() due to a buffer head allocation failure.
Synchronization should be established based on the inode between the
BLKBSZSET command and read cache page to prevent inconsistencies in
block size or mapping flags before and after folio allocation.
Reported-by: syzbot+b4a2af3000eaa84d95d5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b4a2af3000eaa84d95d5 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Hans de Goede [Thu, 11 Dec 2025 16:37:27 +0000 (17:37 +0100)]
leds: led-class: Only Add LED to leds_list when it is fully ready
Before this change the LED was added to leds_list before led_init_core()
gets called adding it the list before led_classdev.set_brightness_work gets
initialized.
This leaves a window where led_trigger_register() of a LED's default
trigger will call led_trigger_set() which calls led_set_brightness()
which in turn will end up queueing the *uninitialized*
led_classdev.set_brightness_work.
This race gets hit by the lenovo-thinkpad-t14s EC driver which registers
2 LEDs with a default trigger provided by snd_ctl_led.ko in quick
succession. The first led_classdev_register() causes an async modprobe of
snd_ctl_led to run and that async modprobe manages to exactly hit
the window where the second LED is on the leds_list without led_init_core()
being called for it, resulting in:
Close the race window by moving the adding of the LED to leds_list to
after the led_init_core() call.
Cc: stable@vger.kernel.org Fixes: d23a22a74fde ("leds: delay led_set_brightness if stopping soft-blink") Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com> Reviewed-by: Sebastian Reichel <sre@kernel.org> Link: https://patch.msgid.link/20251211163727.366441-1-johannes.goede@oss.qualcomm.com Signed-off-by: Lee Jones <lee@kernel.org>
Michal Luczaj [Fri, 16 Jan 2026 08:52:36 +0000 (09:52 +0100)]
vsock/test: Do not filter kallsyms by symbol type
Blamed commit implemented logic to discover available vsock transports by
grepping /proc/kallsyms for known symbols. It incorrectly filtered entries
by type 'd'.
During the rework of the fan behavior control code in commit d8e8362b09d3 ("platform/x86: acer-wmi: Fix setting of fan behavior"),
acer_toggle_turbo() was changed to use WMID_gaming_set_fan_behavior()
instead of WMID_gaming_set_u64() when switching the fans to turbo
mode. The new function however does not check if the necessary
capability (ACER_CAP_TURBO_FAN) is actually enabled on a given
machine, causing the driver to potentially access unsupported
features.
Fix this by manually checking if ACER_CAP_TURBO_FAN is enabled
on a given machine before changing the fan mode.
Cc: stable@vger.kernel.org Fixes: d8e8362b09d3 ("platform/x86: acer-wmi: Fix setting of fan behavior") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://patch.msgid.link/20260108164716.14376-2-W_Armin@gmx.de Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Andrew Jeffery [Fri, 16 Jan 2026 00:59:56 +0000 (11:29 +1030)]
MAINTAINERS: Add Andrew as M: to ARM/NUVOTON NPCM ARCHITECTURE
Nuvoton's NPCM SoCs are part of their iBMC product line[1]. NPCM arch
patches have historically gone through Joel's tree along with ASPEED
changes due to their relevance to OpenBMC. Commit df5e674c7a99
("MAINTAINERS: Switch ASPEED tree to shared BMC repository") does what
it says on the tin - we now have bmc/linux.git on git.kernel.org, and
I've picked up the maintainer role for it.
Document that I'm continuing to apply NPCM arch patches from the
openbmc@ list to the BMC tree for PRs to the SoC tree.
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Avi Fishman <avifishman70@gmail.com> Cc: Drew Fustini <fustini@kernel.org> Cc: Joel Stanley <joel@jms.id.au> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Rob Herring <robh@kernel.org> Cc: Tali Perry <tali.perry1@gmail.com> Cc: Tomer Maimon <tmaimon77@gmail.com> Link: https://www.nuvoton.com/products/cloud-computing/ibmc/ Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Jens Axboe [Tue, 20 Jan 2026 14:42:50 +0000 (07:42 -0700)]
io_uring/io-wq: check IO_WQ_BIT_EXIT inside work run loop
Currently this is checked before running the pending work. Normally this
is quite fine, as work items either end up blocking (which will create a
new worker for other items), or they complete fairly quickly. But syzbot
reports an issue where io-wq takes seemingly forever to exit, and with a
bit of debugging, this turns out to be because it queues a bunch of big
(2GB - 4096b) reads with a /dev/msr* file. Since this file type doesn't
support ->read_iter(), loop_rw_iter() ends up handling them. Each read
returns 16MB of data read, which takes 20 (!!) seconds. With a bunch of
these pending, processing the whole chain can take a long time. Easily
longer than the syzbot uninterruptible sleep timeout of 140 seconds.
This then triggers a complaint off the io-wq exit path:
There's really nothing wrong here, outside of processing these reads
will take a LONG time. However, we can speed up the exit by checking the
IO_WQ_BIT_EXIT inside the io_worker_handle_work() loop, as syzbot will
exit the ring after queueing up all of these reads. Then once the first
item is processed, io-wq will simply cancel the rest. That should avoid
syzbot running into this complaint again.
platform/x86: hp-bioscfg: Fix kernel panic in GET_INSTANCE_ID macro
The GET_INSTANCE_ID macro that caused a kernel panic when accessing sysfs
attributes:
1. Off-by-one error: The loop condition used '<=' instead of '<',
causing access beyond array bounds. Since array indices are 0-based
and go from 0 to instances_count-1, the loop should use '<'.
2. Missing NULL check: The code dereferenced attr_name_kobj->name
without checking if attr_name_kobj was NULL, causing a null pointer
dereference in min_length_show() and other attribute show functions.
The panic occurred when fwupd tried to read BIOS configuration attributes:
Oops: general protection fault [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:min_length_show+0xcf/0x1d0 [hp_bioscfg]
Add a NULL check for attr_name_kobj before dereferencing and corrects
the loop boundary to match the pattern used elsewhere in the driver.
Cc: stable@vger.kernel.org Fixes: 5f94f181ca25 ("platform/x86: hp-bioscfg: bioscfg-h") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/20260115203725.828434-3-mario.limonciello@amd.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
drm/bridge: synopsys: dw-dp: fix error paths of dw_dp_bind
Fix several issues in dw_dp_bind() error handling:
1. Missing return after drm_bridge_attach() failure - the function
continued execution instead of returning an error.
2. Resource leak: drm_dp_aux_register() is not a devm function, so
drm_dp_aux_unregister() must be called on all error paths after
aux registration succeeds. This affects errors from:
- drm_bridge_attach()
- phy_init()
- devm_add_action_or_reset()
- platform_get_irq()
- devm_request_threaded_irq()
3. Bug fix: platform_get_irq() returns the IRQ number or a negative
error code, but the error path was returning ERR_PTR(ret) instead
of ERR_PTR(dp->irq).
Use a goto label for cleanup to ensure consistent error handling.
The upper limit of the firmware queue fill state for each APQN
is reported by the hwinfo.qd field. This field shows the
numbers 0-7 for 1-8 queue spaces available. But the exploiting
code assumed the real boundary is stored there and thus stoppes
queuing in messages one tick too early.
Correct the limit calculation and thus offer a boost
of 12.5% performance for high traffic on one APQN.
Fixes: d4c53ae8e4948 ("s390/ap: store TAPQ hwinfo in struct ap_card") Cc: stable@vger.kernel.org Reported-by: Ingo Franzki <ifranzki@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Berk Cem Goksel [Tue, 20 Jan 2026 10:28:55 +0000 (13:28 +0300)]
ALSA: usb-audio: Fix use-after-free in snd_usb_mixer_free()
When snd_usb_create_mixer() fails, snd_usb_mixer_free() frees
mixer->id_elems but the controls already added to the card still
reference the freed memory. Later when snd_card_register() runs,
the OSS mixer layer calls their callbacks and hits a use-after-free read.
Fix by calling snd_ctl_remove() for all mixer controls before freeing
id_elems. We save the next pointer first because snd_ctl_remove()
frees the current element.
Fixes: 6639b6c2367f ("[ALSA] usb-audio - add mixer control notifications") Cc: stable@vger.kernel.org Cc: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Berk Cem Goksel <berkcgoksel@gmail.com> Link: https://patch.msgid.link/20260120102855.7300-1-berkcgoksel@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
Tzung-Bi Shih [Tue, 20 Jan 2026 09:26:50 +0000 (09:26 +0000)]
gpio: cdev: Fix resource leaks on errors in gpiolib_cdev_register()
On error handling paths, gpiolib_cdev_register() doesn't free the
allocated resources which results leaks. Fix it.
Cc: stable@vger.kernel.org Fixes: 7b9b77a8bba9 ("gpiolib: add a per-gpio_device line state notification workqueue") Fixes: d83cee3d2bb1 ("gpio: protect the pointer to gpio_chip in gpio_device with SRCU") Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org> Link: https://lore.kernel.org/r/20260120092650.2305319-1-tzungbi@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Merge tag 'w1-drv-6.20' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krzk/linux-w1 into char-misc-linus
1-Wire bus drivers fixes
Non critical (old issues) fixes:
1. Fix possible buffer overflow in W1 thermal driver sysfs interfasce,
2. Drop duplicated device put when attaching a slave device failed,
which could lead to memory corruption.
* tag 'w1-drv-6.20' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krzk/linux-w1:
w1: fix redundant counter decrement in w1_attach_slave_device()
w1: therm: Fix off-by-one buffer overflow in alarms_store
Thomas Weißschuh [Tue, 20 Jan 2026 06:55:55 +0000 (07:55 +0100)]
timekeeping: Adjust the leap state for the correct auxiliary timekeeper
When __do_ajdtimex() was introduced to handle adjtimex for any
timekeeper, this reference to tk_core was not updated. When called on an
auxiliary timekeeper, the core timekeeper would be updated incorrectly.
This gets caught by the lock debugging diagnostics because the
timekeepers sequence lock gets written to without holding its
associated spinlock:
Jason Gunthorpe [Tue, 20 Jan 2026 00:19:49 +0000 (20:19 -0400)]
iommupt: Make it clearer to the compiler that pts.level == 0 for single page
Older versions of gcc and clang sometimes get tripped up by the build time
assertion in FIELD_PREP because they can see that the argument to
FIELD_PREP is constant but can't see that the if condition protecting it
is also a constant false.
In file included from <command-line>:
In function 'amdv1pt_install_leaf_entry',
inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:651:3,
inlined from '__map_single_page0' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:662:1,
inlined from 'pt_descend' at drivers/iommu/generic_pt/fmt/../pt_iter.h:391:9,
inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:658:10,
inlined from '__map_single_page1.constprop' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:662:1:
././include/linux/compiler_types.h:631:45: error: call to '__compiletime_assert_251' declared with attribute error: FIELD_PREP: value too large for the field
631 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
././include/linux/compiler_types.h:612:25: note: in definition of macro '__compiletime_assert'
612 | prefix ## suffix(); \
| ^~~~~~
././include/linux/compiler_types.h:631:9: note: in expansion of macro '_compiletime_assert'
631 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
./include/linux/bitfield.h:69:17: note: in expansion of macro 'BUILD_BUG_ON_MSG'
69 | BUILD_BUG_ON_MSG(__builtin_constant_p(_val) ? \
| ^~~~~~~~~~~~~~~~
./include/linux/bitfield.h:90:17: note: in expansion of macro '__BF_FIELD_CHECK_MASK'
90 | __BF_FIELD_CHECK_MASK(mask, val, pfx); \
| ^~~~~~~~~~~~~~~~~~~~~
./include/linux/bitfield.h:137:17: note: in expansion of macro '__FIELD_PREP'
137 | __FIELD_PREP(_mask, _val, "FIELD_PREP: "); \
| ^~~~~~~~~~~~
drivers/iommu/generic_pt/fmt/amdv1.h:220:26: note: in expansion of macro 'FIELD_PREP'
220 | FIELD_PREP(AMDV1PT_FMT_OA,
| ^~~~~~~~~~
Changing the caller to check pts.level == 0 avoids demanding a bit of
complex reasoning from the compiler that pts.level == level == 0. Instead
the compiler sees that pt_install_leaf_entry() is called with a constant
pts.level == 0 which makes it more reliable to see the constant false in
the if.
Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op") Reported-by: Chunyu Hu <chuhu@redhat.com> Closes: https://lore.kernel.org/all/aUn9uGPCooqB-RIF@gmail.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
On 32-bit machines with CONFIG_ARM_LPAE, it is possible for lowmem
allocations to be backed by addresses physical memory above the 32-bit
address limit, as found while experimenting with larger VMSPLIT
configurations.
This caused the qemu virt model to crash in the GICv3 driver, which
allocates the 'itt' object using GFP_KERNEL. Since all memory below
the 4GB physical address limit is in ZONE_DMA in this configuration,
kmalloc() defaults to higher addresses for ZONE_NORMAL, and the
ITS driver stores the physical address in a 32-bit 'unsigned long'
variable.
Change the itt_addr variable to the correct phys_addr_t type instead,
along with all other variables in this driver that hold a physical
address.
The gicv5 driver correctly uses u64 variables, while all other irqchip
drivers don't call virt_to_phys or similar interfaces. It's expected that
other device drivers have similar issues, but fixing this one is
sufficient for booting a virtio based guest.
Fixes: cc2d3216f53c ("irqchip: GICv3: ITS command queue") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260119201603.2713066-1-arnd@kernel.org