Philipp Zabel [Thu, 2 Apr 2026 12:30:10 +0000 (14:30 +0200)]
Merge tag 'reset-fixes-for-v7.0-2' into reset/next
Reset controller fixes for v7.0, part 2
* Decouple spacemit K3 reset lines that were incorrectly coupled
together as one, but are in fact separate resets in hardware.
* Fix a double free in the reset_add_gpio_aux_device() error path.
This has already been fixed on reset/next by commit a9b95ce36de4
("reset: gpio: add a devlink between reset-gpio and its consumer").
* Fix the MODULE_AUTHOR string in the rzg2l-usbphy-ctrl driver.
We merge this into reset/next to resolve a conflict between commits a9b95ce36de4 ("reset: gpio: add a devlink between reset-gpio and its
consumer") and fbffb8c7c7bb ("reset: gpio: fix double free in
reset_add_gpio_aux_device() error path").
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
====================
bpf: Migrate bpf_task_work and file dynptr to kmalloc_nolock
Now that kmalloc can be used from NMI context via kmalloc_nolock(),
migrate BPF internal allocations away from bpf_mem_alloc to use the
standard slab allocator.
Use kfree_rcu() for deferred freeing, which waits for a regular RCU
grace period before the memory is reclaimed. Sleepable BPF programs
hold rcu_read_lock_trace but not regular rcu_read_lock, so patch 1
adds explicit rcu_read_lock/unlock around the pointer-to-refcount
window to prevent kfree_rcu from freeing memory while a sleepable
program is still between reading the pointer and acquiring a
reference.
Patch 1 migrates bpf_task_work_ctx from bpf_mem_alloc/bpf_mem_free to
kmalloc_nolock/kfree_rcu.
Patch 2 migrates bpf_dynptr_file_impl from bpf_mem_alloc/bpf_mem_free
to kmalloc_nolock/kfree.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Changes in v2:
- Switch to scoped_guard in patch 1 (Kumar)
- Remove rcu gp wait in patch 2 (Kumar)
- Defer to irq_work when irqs disabled in patch 1
- use bpf_map_kmalloc_nolock() for bpf_task_work
- use kmalloc_nolock() for file dynptr
- Link to v1: https://lore.kernel.org/all/20260325-kmalloc_special-v1-0-269666afb1ea@meta.com/
====================
Mykyta Yatsenko [Mon, 30 Mar 2026 22:27:57 +0000 (15:27 -0700)]
bpf: Migrate dynptr file to kmalloc_nolock
Replace bpf_mem_alloc/bpf_mem_free with kmalloc_nolock/kfree_nolock for
bpf_dynptr_file_impl, continuing the migration away from bpf_mem_alloc
now that kmalloc can be used from NMI context.
freader_cleanup() runs before kfree_nolock() while the dynptr still
holds exclusive access, so plain kfree_nolock() is safe — no concurrent
readers can access the object.
Mykyta Yatsenko [Mon, 30 Mar 2026 22:27:56 +0000 (15:27 -0700)]
bpf: Migrate bpf_task_work to kmalloc_nolock
Replace bpf_mem_alloc/bpf_mem_free with
kmalloc_nolock/kfree_rcu for bpf_task_work_ctx.
Replace guard(rcu_tasks_trace)() with guard(rcu)() in
bpf_task_work_irq(). The function only accesses ctx struct members
(not map values), so tasks trace protection is not needed - regular
RCU is sufficient since ctx is freed via kfree_rcu. The guard in
bpf_task_work_callback() remains as tasks trace since it accesses map
values from process context.
Sleepable BPF programs hold rcu_read_lock_trace but not
regular rcu_read_lock. Since kfree_rcu
waits for a regular RCU grace period, the ctx memory can be freed
while a sleepable program is still running. Add scoped_guard(rcu)
around the pointer read and refcount tryget in
bpf_task_work_acquire_ctx to close this race window.
Since kfree_rcu uses call_rcu internally which is not safe from
NMI context, defer destruction via irq_work when irqs are disabled.
For the lost-cmpxchg path the ctx was never published, so
kfree_nolock is safe.
MAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer
Mario Limonciello has led amd-pstate maintenance in recent years and
has done excellent work. The amd-pstate driver is in good hands with
him. I am stepping down as co-maintainer as I move on to other things.
Add K Prateek Nayak as a reviewer. He has been actively contributing
to the driver including preferred-core and ITMT improvements, and has
been helping review amd-pstate patches for a while now.
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Acked-by: K Prateek Nayak <kprateek.nayak@amd.com> Acked-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://lore.kernel.org/r/20260402102611.16519-1-gautham.shenoy@amd.com Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
K Prateek Nayak [Mon, 16 Mar 2026 08:18:49 +0000 (08:18 +0000)]
cpufreq: Pass the policy to cpufreq_driver->adjust_perf()
cpufreq_cpu_get() can sleep on PREEMPT_RT in presence of concurrent
writer(s), however amd-pstate depends on fetching the cpudata via the
policy's driver data which necessitates grabbing the reference.
Since schedutil governor can call "cpufreq_driver->update_perf()"
during sched_tick/enqueue/dequeue with rq_lock held and IRQs disabled,
fetching the policy object using the cpufreq_cpu_get() helper in the
scheduler fast-path leads to "BUG: scheduling while atomic" on
PREEMPT_RT [1].
Pass the cached cpufreq policy object in sg_policy to the update_perf()
instead of just the CPU. The CPU can be inferred using "policy->cpu".
The lifetime of cpufreq_policy object outlasts that of the governor and
the cpufreq driver (allocated when the CPU is onlined and only reclaimed
when the CPU is offlined / the CPU device is removed) which makes it
safe to be referenced throughout the governor's lifetime.
K Prateek Nayak [Mon, 16 Mar 2026 08:18:48 +0000 (08:18 +0000)]
cpufreq/amd-pstate: Pass the policy to amd_pstate_update()
All callers of amd_pstate_update() already have a reference to the
cpufreq_policy object.
Pass the entire policy object and grab the cpudata using
"policy->driver_data" instead of passing the cpudata and unnecessarily
grabbing another read-side reference to the cpufreq policy object when
it is already available in the caller.
No functional changes intended.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Link: https://lore.kernel.org/r/20260316081849.19368-2-kprateek.nayak@amd.com Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
====================
bpf: Fix abuse of kprobe_write_ctx via freplace
The potential issue of kprobe_write_ctx+freplace was mentioned in
"bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs" [1].
It is true issue, that the test in patch #2 verifies that kprobe_write_ctx=false
kprobe progs can be abused to modify struct pt_regs via kprobe_write_ctx=true
freplace progs.
When struct pt_regs is modified, bpf_prog_test_run_opts() gets -EFAULT instead
of 0.
Changes:
v2 -> v3:
* Add comment to the rejection of kprobe_write_ctx (per Jiri).
* Use libbpf_get_error() instead of errno in test (per Jiri).
* Collect Acked-by tags from Jiri and Song, thanks.
v2: https://lore.kernel.org/bpf/20260326141718.17731-1-leon.hwang@linux.dev/
v1 -> v2:
* Drop patch #1 in v1, as it wasn't an issue (per Toke).
* Check kprobe_write_ctx value at attach time instead of at load time, to
prevent attaching kprobe_write_ctx=true freplace progs on
kprobe_write_ctx=false kprobe progs (per Gemini/sashiko).
* Move kprobe_write_ctx test code to attach_probe.c and kprobe_write_ctx.c.
v1: https://lore.kernel.org/bpf/20260324150444.68166-1-leon.hwang@linux.dev/
====================
Leon Hwang [Tue, 31 Mar 2026 14:53:53 +0000 (22:53 +0800)]
selftests/bpf: Add test to verify the fix of kprobe_write_ctx abuse
Add a test to verify the issue: kprobe_write_ctx can be abused to modify
struct pt_regs of kernel functions via kprobe_write_ctx=true freplace
progs.
Without the fix, the issue is verified:
kprobe_write_ctx=true freplace prog is allowed to attach to
kprobe_write_ctx=false kprobe prog. Then, the first arg of
bpf_fentry_test1 will be set as 0, and bpf_prog_test_run_opts() gets
-EFAULT instead of 0.
With the fix, the issue is rejected at attach time.
Leon Hwang [Tue, 31 Mar 2026 14:53:52 +0000 (22:53 +0800)]
bpf: Fix abuse of kprobe_write_ctx via freplace
uprobe programs are allowed to modify struct pt_regs.
Since the actual program type of uprobe is KPROBE, it can be abused to
modify struct pt_regs via kprobe+freplace when the kprobe attaches to
kernel functions.
For example,
SEC("?kprobe")
int kprobe(struct pt_regs *regs)
{
return 0;
}
freplace_kprobe prog will attach to kprobe prog.
kprobe prog will attach to a kernel function.
Without this patch, when the kernel function runs, its first arg will
always be set as 0 via the freplace_kprobe prog.
To fix the abuse of kprobe_write_ctx=true via kprobe+freplace, disallow
attaching freplace programs on kprobe programs with different
kprobe_write_ctx values.
Fixes: 7384893d970e ("bpf: Allow uprobe program to change context registers") Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260331145353.87606-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
cpufreq/amd-pstate: Add support for raw EPP writes
The energy performance preference field of the CPPC request MSR
supports values from 0 to 255, but the strings only offer 4 values.
The other values are useful for tuning the performance of some
workloads.
Add support for writing the raw energy performance preference value
to the sysfs file. If the last value written was an integer then
an integer will be returned. If the last value written was a string
then a string will be returned.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
cpufreq/amd-pstate: add kernel command line to override dynamic epp
Add `amd_dynamic_epp=enable` and `amd_dynamic_epp=disable` to override
the kernel configuration option `CONFIG_X86_AMD_PSTATE_DYNAMIC_EPP`
locally.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
cpufreq/amd-pstate: Add dynamic energy performance preference
Dynamic energy performance preference changes the EPP profile based on
whether the machine is running on AC or DC power.
A notification chain from the power supply core is used to adjust EPP
values on plug in or plug out events.
When enabled, the driver exposes a sysfs toggle for dynamic EPP, blocks
manual writes to energy_performance_preference while it "owns" the EPP
updates.
For non-server systems:
* the default EPP for AC mode is `performance`.
* the default EPP for DC mode is `balance_performance`.
For server systems dynamic EPP is mostly a no-op.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Ninad Naik [Mon, 30 Mar 2026 19:08:55 +0000 (00:38 +0530)]
Documentation: amd-pstate: fix dead links in the reference section
The links for AMD64 Architecture Programmer's Manual and PPR for AMD
Family 19h Model 51h, Revision A1 Processors redirect to a generic page.
Update the links to the working ones.
Documentation/amd-pstate: Add documentation for amd_pstate_floor_{freq,count}
Add documentation for the sysfs files
/sys/devices/system/cpu/cpufreq/policy*/amd_pstate_floor_freq
and
/sys/devices/system/cpu/cpufreq/policy*/amd_pstate_floor_count.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Documentation/amd-pstate: List amd_pstate_prefcore_ranking sysfs file
Add the missing amd_pstate_prefcore_ranking filenames in the sysfs
listing example leading to the descriptions of these
parameters. Clarify when will the file be visible.
Fixes: 15a2b764ea7c ("amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Documentation/amd-pstate: List amd_pstate_hw_prefcore sysfs file
Add the missing amd_pstate_hw_prefcore filenames in the sysfs listing
example leading to the descriptions of these parameters. Clarify when
will the file be visible.
Fixes: b96b82d1af7f ("cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
amd-pstate-ut: Add a testcase to validate the visibility of driver attributes
amd-pstate driver has per-attribute visibility functions to
dynamically control which sysfs freq_attrs are exposed based on the
platform capabilities and the current amd_pstate mode. However, there
is no test coverage to validate that the driver's live attribute list
matches the expected visibility for each mode.
Add amd_pstate_ut_check_freq_attrs() to the amd-pstate unit test
module. For each enabled mode (passive, active, guided), the test
independently derives the expected visibility of each attribute:
- Core attributes (max_freq, lowest_nonlinear_freq, highest_perf)
are always expected.
- Prefcore attributes (prefcore_ranking, hw_prefcore) are expected
only when cpudata->hw_prefcore indicates platform support.
- EPP attributes (energy_performance_preference,
energy_performance_available_preferences) are expected only in
active mode.
- Floor frequency attributes (floor_freq, floor_count) are expected
only when X86_FEATURE_CPPC_PERF_PRIO is present.
Compare these independent expectations against the live driver's attr
array, catching bugs such as attributes leaking into wrong modes or
visibility functions checking incorrect conditions.
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
amd-pstate-ut: Add module parameter to select testcases
Currently when amd-pstate-ut test module is loaded, it runs all the
tests from amd_pstate_ut_cases[] array.
Add a module parameter named "test_list" that accepts a
comma-delimited list of test names, allowing users to run a
selected subset of tests. When the parameter is omitted or empty, all
tests are run as before.
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
amd-pstate: Introduce a tracepoint trace_amd_pstate_cppc_req2()
Introduce a new tracepoint trace_amd_pstate_cppc_req2() to track
updates to MSR_AMD_CPPC_REQ2.
Invoke this while changing the Floor Perf.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
amd-pstate: Add sysfs support for floor_freq and floor_count
When Floor Performance feature is supported by the platform, expose
two sysfs files:
* amd_pstate_floor_freq to allow userspace to request the floor
frequency for each CPU.
* amd_pstate_floor_count which advertises the number of distinct
levels of floor frequencies supported on this platform.
Reset the floor_perf to bios_floor_perf in the suspend, offline, and
exit paths, and restore the value to the cached user-request
floor_freq on the resume and online paths mirroring how bios_min_perf
is handled for MSR_AMD_CPPC_REQ.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
amd-pstate: Add support for CPPC_REQ2 and FLOOR_PERF
Some future AMD processors have feature named "CPPC Performance
Priority" which lets userspace specify different floor performance
levels for different CPUs. The platform firmware takes these different
floor performance levels into consideration while throttling the CPUs
under power/thermal constraints. The presence of this feature is
indicated by bit 16 of the EDX register for CPUID leaf
0x80000007. More details can be found in AMD Publication titled "AMD64
Collaborative Processor Performance Control (CPPC) Performance
Priority" Revision 1.10.
The number of distinct floor performance levels supported on the
platform will be advertised through the bits 32:39 of the
MSR_AMD_CPPC_CAP1. Bits 0:7 of a new MSR MSR_AMD_CPPC_REQ2
(0xc00102b5) will be used to specify the desired floor performance
level for that CPU.
Add support for the aforementioned MSR_AMD_CPPC_REQ2, and macros for
parsing and updating the relevant bits from MSR_AMD_CPPC_CAP1 and
MSR_AMD_CPPC_REQ2.
On boot if the default value of the MSR_AMD_CPPC_REQ2[7:0] (Floor
Perf) is lower than CPPC.lowest_perf, and thus invalid, initialize it
to MSR_AMD_CPPC_CAP1.nominal_perf which is a sane default value.
Save the boot-time floor_perf during amd_pstate_init_floor_perf(). In
a subsequent patch it will be restored in the suspend, offline, and
exit paths, mirroring how bios_min_perf is handled for
MSR_AMD_CPPC_REQ.
Link: https://docs.amd.com/v/u/en-US/69206_1.10_AMD64_CPPC_PUB Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Some future AMD processors have feature named "CPPC Performance
Priority" which lets userspace specify different floor performance
levels for different CPUs. The platform firmware takes these different
floor performance levels into consideration while throttling the CPUs
under power/thermal constraints. The presence of this feature is
indicated by bit 16 of the EDX register for CPUID leaf
0x80000007. More details can be found in AMD Publication titled "AMD64
Collaborative Processor Performance Control (CPPC) Performance
Priority" Revision 1.10.
Define a new feature bit named X86_FEATURE_CPPC_PERF_PRIO to map to
CPUID 0x80000007.EDX[16].
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
amd-pstate: Make certain freq_attrs conditionally visible
Certain amd_pstate freq_attrs such as amd_pstate_hw_prefcore and
amd_pstate_prefcore_ranking are enabled even when preferred core is
not supported on the platform.
Similarly there are common freq_attrs between the amd-pstate and the
amd-pstate-epp drivers (eg: amd_pstate_max_freq,
amd_pstate_lowest_nonlinear_freq, etc.) but are duplicated in two
different freq_attr structs.
Unify all the attributes in a single place and associate each of them
with a visibility function that determines whether the attribute
should be visible based on the underlying platform support and the
current amd_pstate mode.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
amd-pstate: Update cppc_req_cached in fast_switch case
The function msr_update_perf() does not cache the new value that is
written to MSR_AMD_CPPC_REQ into the variable cpudata->cppc_req_cached
when the update is happening from the fast path.
Fix that by caching the value everytime the MSR_AMD_CPPC_REQ gets
updated.
This issue was discovered by Claude Opus 4.6 with the aid of Chris
Mason's AI review-prompts
(https://github.com/masoncl/review-prompts/tree/main/kernel).
Assisted-by: Claude:claude-opus-4.6 review-prompts/linux Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Fixes: fff395796917 ("cpufreq/amd-pstate: Always write EPP value when updating perf") Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
amd-pstate: Fix memory leak in amd_pstate_epp_cpu_init()
On failure to set the epp, the function amd_pstate_epp_cpu_init()
returns with an error code without freeing the cpudata object that was
allocated at the beginning of the function.
Ensure that the cpudata object is freed before returning from the
function.
This memory leak was discovered by Claude Opus 4.6 with the aid of
Chris Mason's AI review-prompts
(https://github.com/masoncl/review-prompts/tree/main/kernel).
Assisted-by: Claude:claude-opus-4.6 review-prompts/linux Fixes: f9a378ff6443 ("cpufreq/amd-pstate: Set different default EPP policy for Epyc and Ryzen") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Zhiguo Niu [Thu, 5 Mar 2026 03:22:46 +0000 (11:22 +0800)]
f2fs: fix to preserve previous reserve_{blocks,node} value when remount
The following steps will change previous value of reserve_{blocks,node},
this dones not match the original intention.
1.mount -t f2fs -o reserve_root=8192 imgfile test_mount/
F2FS-fs (loop56): Mounted with checkpoint version = 1b69f8c7
mount info:
/dev/block/loop56 on /data/test_mount type f2fs (xxx,reserve_root=8192,reserve_node=0,resuid=0,resgid=0,xxx)
2.mount -t f2fs -o remount,reserve_root=4096 /data/test_mount
F2FS-fs (loop56): Preserve previous reserve_root=8192
check mount info: reserve_root change to 4096
/dev/block/loop56 on /data/test_mount type f2fs (xxx,reserve_root=4096,reserve_node=0,resuid=0,resgid=0,xxx)
Prior to commit d18535132523 ("f2fs: separate the options parsing and options checking"),
the value of reserve_{blocks,node} was only set during the first mount, along with
the corresponding mount option F2FS_MOUNT_RESERVE_{ROOT,NODE} . If the mount option
F2FS_MOUNT_RESERVE_{ROOT,NODE} was found to have been set during the mount/remount,
the previously value of reserve_{blocks,node} would also be preserved, as shown in
the code below.
if (test_opt(sbi, RESERVE_ROOT)) {
f2fs_info(sbi, "Preserve previous reserve_root=%u",
F2FS_OPTION(sbi).root_reserved_blocks);
} else {
F2FS_OPTION(sbi).root_reserved_blocks = arg;
set_opt(sbi, RESERVE_ROOT);
}
But commit d18535132523 ("f2fs: separate the options parsing and options checking")
only preserved the previous mount option; it did not preserve the previous value of
reserve_{blocks,node}. Since value of reserve_{blocks,node} value is assigned
or not depends on ctx->spec_mask, ctx->spec_mask should be alos handled in
f2fs_check_opt_consistency.
This patch will clear the corresponding ctx->spec_mask bits in f2fs_check_opt_consistency
to preserve the previously values of reserve_{blocks,node} if it already have a value.
Fixes: d18535132523 ("f2fs: separate the options parsing and options checking") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yongpeng Yang [Tue, 24 Mar 2026 09:47:08 +0000 (17:47 +0800)]
f2fs: invalidate block device page cache on umount
Neither F2FS nor VFS invalidates the block device page cache, which
results in reading stale metadata. An example scenario is shown below:
Terminal A Terminal B
mount /dev/vdb /mnt/f2fs
touch mx // ino = 4
sync
dump.f2fs -i 4 /dev/vdb// block on "[Y/N]"
touch mx2 // ino = 5
sync
umount /mnt/f2fs
dump.f2fs -i 5 /dev/vdb // block addr is 0
After umount, the block device page cache is not purged, causing
`dump.f2fs -i 5 /dev/vdb` to read stale metadata and see inode 5 with
block address 0.
Btrfs has encountered a similar issue before, the solution there was to
call sync_blockdev() and invalidate_bdev() when the device is closed:
For the root user, the f2fs kernel calls sync_blockdev() on umount to
flush all cached data to disk, and f2fs-tools can release the page cache
by issuing ioctl(fd, BLKFLSBUF) when accessing the device. However,
non-root users are not permitted to drop the page cache, and may still
observe stale data.
This patch calls sync_blockdev() and invalidate_bdev() during umount to
invalidate the block device page cache, thereby preventing stale
metadata from being read.
Note that this may result in an extra sync_blockdev() call on the first
device, in both f2fs_put_super() and kill_block_super(). The second call
do nothing, as there are no dirty pages left to flush. It ensures that
non-root users do not observe stale data.
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Daeho Jeong [Mon, 16 Mar 2026 18:59:54 +0000 (11:59 -0700)]
f2fs: fix to freeze GC and discard threads quickly
Suspend can fail if kernel threads do not freeze for a while.
f2fs_gc and f2fs_discard threads can perform long-running operations
that prevent them from reaching a freeze point in a timely manner.
This patch adds explicit freezing checks in the following locations:
1. f2fs_gc: Added a check at the 'retry' label to exit the loop quickly
if freezing is requested, especially during heavy GC rounds.
2. __issue_discard_cmd: Added a 'suspended' flag to break both inner and
outer loops during discard command issuance if freezing is detected
after at least one command has been issued.
3. __issue_discard_cmd_orderly: Added a similar check for orderly discard
to ensure responsiveness.
These checks ensure that the threads release locks safely and enter the
frozen state.
The root cause is: in f2fs_finish_read_bio(), we may access uninit data
in folio if we failed to read the data from device into folio, let's add
a check condition to avoid such issue.
Cc: stable@kernel.org Fixes: 50ac3ecd8e05 ("f2fs: fix to do sanity check on node footer in {read,write}_end_io") Reported-by: syzbot+9aac813cdc456cdd49f8@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/69a9ca26.a70a0220.305d9a.0000.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 6 Mar 2026 12:24:20 +0000 (12:24 +0000)]
f2fs: fix false alarm of lockdep on cp_global_sem lock
lockdep reported a potential deadlock:
a) TCMU device removal context:
- call del_gendisk() to get q->q_usage_counter
- call start_flush_work() to get work_completion of wb->dwork
b) f2fs writeback context:
- in wb_workfn(), which holds work_completion of wb->dwork
- call f2fs_balance_fs() to get sbi->gc_lock
c) f2fs vfs_write context:
- call f2fs_gc() to get sbi->gc_lock
- call f2fs_write_checkpoint() to get sbi->cp_global_sem
d) f2fs mount context:
- call recover_fsync_data() to get sbi->cp_global_sem
- call f2fs_check_and_fix_write_pointer() to call blkdev_report_zones()
that goes down to blk_mq_alloc_request and get q->q_usage_counter
Original callstack is in Closes tag.
However, I think this is a false alarm due to before mount returns
successfully (context d), we can not access file therein via vfs_write
(context c).
Let's introduce per-sb cp_global_sem_key, and assign the key for
cp_global_sem, so that lockdep can recognize cp_global_sem from
different super block correctly.
A lot of work are done by Shin'ichiro Kawasaki, thanks a lot for
the work.
Fixes: c426d99127b1 ("f2fs: Check write pointer consistency of open zones") Cc: stable@kernel.org Reported-and-tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/linux-f2fs-devel/20260218125237.3340441-1-shinichiro.kawasaki@wdc.com Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yongpeng Yang [Tue, 10 Mar 2026 09:36:14 +0000 (17:36 +0800)]
f2fs: fix data loss caused by incorrect use of nat_entry flag
Data loss can occur when fsync is performed on a newly created file
(before any checkpoint has been written) concurrently with a checkpoint
operation. The scenario is as follows:
create & write & fsync 'file A' write checkpoint
- f2fs_do_sync_file // inline inode
- f2fs_write_inode // inode folio is dirty
- f2fs_write_checkpoint
- f2fs_flush_merged_writes
- f2fs_sync_node_pages
- f2fs_flush_nat_entries
- f2fs_fsync_node_pages // no dirty node
- f2fs_need_inode_block_update // return false
SPO and lost 'file A'
f2fs_flush_nat_entries() sets the IS_CHECKPOINTED and HAS_LAST_FSYNC
flags for the nat_entry, but this does not mean that the checkpoint has
actually completed successfully. However, f2fs_need_inode_block_update()
checks these flags and incorrectly assumes that the checkpoint has
finished.
The root cause is that the semantics of IS_CHECKPOINTED and
HAS_LAST_FSYNC are only guaranteed after the checkpoint write fully
completes.
This patch modifies f2fs_need_inode_block_update() to acquire the
sbi->node_write lock before reading the nat_entry flags, ensuring that
once IS_CHECKPOINTED and HAS_LAST_FSYNC are observed to be set, the
checkpoint operation has already completed.
Fixes: e05df3b115e7 ("f2fs: add node operations") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Daeho Jeong [Mon, 16 Mar 2026 18:59:21 +0000 (11:59 -0700)]
f2fs: fix to skip empty sections in f2fs_get_victim
In age-based victim selection (ATGC, AT_SSR, or GC_CB), f2fs_get_victim
can encounter sections with zero valid blocks. This situation often
arises when checkpoint is disabled or due to race conditions between
SIT updates and dirty list management.
In such cases, f2fs_get_section_mtime() returns INVALID_MTIME, which
subsequently triggers a fatal f2fs_bug_on(sbi, mtime == INVALID_MTIME)
in add_victim_entry() or get_cb_cost().
This patch adds a check in f2fs_get_victim's selection loop to skip
sections with no valid blocks. This prevents unnecessary age
calculations for empty sections and avoids the associated kernel panic.
This change also allows removing redundant checks in add_victim_entry().
Yongpeng Yang [Wed, 18 Mar 2026 08:46:35 +0000 (16:46 +0800)]
f2fs: fix inline data not being written to disk in writeback path
When f2fs_fiemap() is called with `fileinfo->fi_flags` containing the
FIEMAP_FLAG_SYNC flag, it attempts to write data to disk before
retrieving file mappings via filemap_write_and_wait(). However, there is
an issue where the file does not get mapped as expected. The following
scenario can occur:
The root cause of this issue is that f2fs_write_single_data_page() only
calls f2fs_write_inline_data() to copy data from the data folio to the
inode folio, and it clears the dirty flag on the data folio. However, it
does not mark the data folio as writeback. When
__filemap_fdatawait_range() checks for folios with the writeback flag,
it returns early, causing f2fs_fiemap() to report that the file has no
mapping.
To fix this issue, the solution is to call
f2fs_write_single_node_folio() in f2fs_inline_data_fiemap() when
getting fiemap with FIEMAP_FLAG_SYNC flags. This patch ensures that the
inode folio is written back and the writeback process completes before
proceeding.
Cc: stable@kernel.org Fixes: 9ffe0fb5f3bb ("f2fs: handle inline data operations") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yongpeng Yang [Wed, 18 Mar 2026 08:45:34 +0000 (16:45 +0800)]
f2fs: fix fsck inconsistency caused by FGGC of node block
During FGGC node block migration, fsck may incorrectly treat the
migrated node block as fsync-written data.
The reproduction scenario:
root@vm:/mnt/f2fs# seq 1 2048 | xargs -n 1 ./test_sync // write inline inode and sync
root@vm:/mnt/f2fs# rm -f 1
root@vm:/mnt/f2fs# sync
root@vm:/mnt/f2fs# f2fs_io gc_range // move data block in sync mode and not write CP
SPO, "fsck --dry-run" find inode has already checkpointed but still
with DENT_BIT_SHIFT set
The root cause is that GC does not clear the dentry mark and fsync mark
during node block migration, leading fsck to misinterpret them as
user-issued fsync writes.
In BGGC mode, node block migration is handled by f2fs_sync_node_pages(),
which guarantees the dentry and fsync marks are cleared before writing.
This patch move the set/clear of the fsync|dentry marks into
__write_node_folio to make the logic clearer, and ensures the
fsync|dentry mark is cleared in FGGC.
Cc: stable@kernel.org Fixes: da011cc0da8c ("f2fs: move node pages only in victim section during GC") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yongpeng Yang [Tue, 10 Mar 2026 09:36:12 +0000 (17:36 +0800)]
f2fs: fix fsck inconsistency caused by incorrect nat_entry flag usage
f2fs_need_dentry_mark() reads nat_entry flags without mutual exclusion
with the checkpoint path, which can result in an incorrect inode block
marking state. The scenario is as follows:
create & write & fsync 'file A' write checkpoint
- f2fs_do_sync_file // inline inode
- f2fs_write_inode // inode folio is dirty
- f2fs_write_checkpoint
- f2fs_flush_merged_writes
- f2fs_sync_node_pages
- f2fs_fsync_node_pages // no dirty node
- f2fs_need_inode_block_update // return true
- f2fs_fsync_node_pages // inode dirtied
- f2fs_need_dentry_mark //return true
- f2fs_flush_nat_entries
- f2fs_write_checkpoint end
- __write_node_folio // inode with DENT_BIT_SHIFT set
SPO, "fsck --dry-run" find inode has already checkpointed but still
with DENT_BIT_SHIFT set
The state observed by f2fs_need_dentry_mark() can differ from the state
observed in __write_node_folio() after acquiring sbi->node_write. The
root cause is that the semantics of IS_CHECKPOINTED and
HAS_FSYNCED_INODE are only guaranteed after the checkpoint write has
fully completed.
This patch moves set_dentry_mark() into __write_node_folio() and
protects it with the sbi->node_write lock.
Cc: stable@kernel.org Fixes: 88bd02c9472a ("f2fs: fix conditions to remain recovery information in f2fs_sync_file") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This will only happen when fitrim races w/ remount rw, if we remount to
readonly filesystem, remount will wait until mnt_pcp.mnt_writers to zero,
that means fitrim is not in process at that time.
Yongpeng Yang [Wed, 18 Mar 2026 08:45:33 +0000 (16:45 +0800)]
f2fs: refactor node footer flag setting related code
This patch refactors the node footer flag setting code to simplify
redundant logic and adjust function parameters and return types. No
logical changes.
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Thomas Weißschuh [Tue, 31 Mar 2026 17:50:20 +0000 (19:50 +0200)]
kbuild: vdso_install: hide readelf warnings
If 'readelf -n' encounters a note it does not recognize it emits a
warning. This for example happens when inspecting a compat vDSO for
which the main kernel toolchain was not used.
However the relevant build ID note is always readable, so the
warnings are pointless.
Hide the warnings to make it possible to extract build IDs for more
architectures in the future.
On systems with 64K pages, RX queues will be wedged if users set the
descriptor count to the current minimum (16). Fbnic fragments large
pages into 4K chunks, and scales down the ring size accordingly. With
64K pages and 16 descriptors, the ring size mask is 0 and will never
be filled.
32 descriptors is another special case that wedges the RX rings.
Internally, the rings track pages for the head/tail pointers, not page
fragments. So with 32 descriptors, there's only 1 usable page as one
ring slot is kept empty to disambiguate between an empty/full ring.
As a result, the head pointer never advances and the HW stalls after
consuming 16 page fragments.
This patchset contains few fixes for the bugs hit during testing with
Monza EVK platform
- around array out of bounds access on dai ids which keep extending but
the drivers seems to have hardcoded some numbers, fix this and clean
the mess up
- fix few issues discovered while trying to shut down dsp.
- flooding rpmsg with write requests due to not resetting queue pointer,
fix this resetting the pointer in trigger stop.
- possible multiple graph opens which can result in open failures.
Apart from this few new enhancements to the dsp side
- add new LPI MI2S and senary dai entries
- handle pipewire and Displayport issues by moving graph start to
trigger level, which should fix outstanding pipewire and DP issues on
Qualcomm SoCs.
- remove some unnessary loops in hot path
- support early memory map on DSP.
Tested this on top of linux-next on VENTUNO-Q platform.
ASoC: qcom: qdsp6: remove search for module iid in hot path
Remove searching for Shared Memory module instance id on every
read/write call, this is un-necessary if we can cache the shared
memory module instance id per PCM graph.
Add new member to graph struct to store shared memory module
instance id to avoid searching for this in hot path.
ASoC: qcom: q6apm-lpass-dai: move graph start to trigger
Start the graph at trigger callback. Staring the graph at prepare does
not make sense as there is no data transfer at this point.
Moving this to trigger will also help cope situation where pipewire
is not happy if display port is not connected during start.
ASoC: qcom: q6dsp: Add Senary MI2S audio interface support
Introduces support for the Senary MI2S audio interface in the Qualcomm
q6dsp. Add new AFE port IDs for Senary MI2S RX and TX and include the
necessary mappings in the port configuration to allow audio routing
over the Senary MI2S interface.
Signed-off-by: Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Tested-by: Val Packett <val@packett.cool> # sm7325-motorola-dubai Link: https://patch.msgid.link/20260402081118.348071-10-srinivas.kandagatla@oss.qualcomm.com Signed-off-by: Mark Brown <broonie@kernel.org>
ASoC: qcom: common: validate cpu dai id during parsing
lpass ports numbers have been added but the afe/apm driver never got
updated with new max port value that it uses to store dai specific data.
There are more than one places these values are cached and always become
out of sync.
This will result in array out of bounds and weird driver behaviour.
To catch such issues, first add a single place where we can define max
port and second add a check in common parsing code which can error
out before corrupting the memory with out of bounds array access.
This should help both avoid and catch these type of mistakes in future.
ASoC: dt-bindings: qcom: add LPASS LPI MI2S dai ids
Add new dai ids entries for LPASS LPI MI2S and SENARY MI2S audio lines.
Co-developed-by: Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com> Signed-off-by: Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://patch.msgid.link/20260402081118.348071-7-srinivas.kandagatla@oss.qualcomm.com Signed-off-by: Mark Brown <broonie@kernel.org>
ASoC: qcom: q6apm-dai: reset queue ptr on trigger stop
Reset queue pointer on SNDRV_PCM_TRIGGER_STOP event to be inline
with resetting appl_ptr. Without this we will end up with a queue_ptr
out of sync and driver could try to send data that is not ready yet.
ASoC: qcom: qdsp6: topology: check widget type before accessing data
Check widget type before accessing the private data, as this could a
virtual widget which is no associated with a dsp graph, container and
module. Accessing witout check could lead to incorrect memory access.
ASoC: qcom: q6apm: move component registration to unmanaged version
q6apm component registers dais dynamically from ASoC toplology, which
are allocated using device managed version apis. Allocating both
component and dynamic dais using managed version could lead to incorrect
free ordering, dai will be freed while component still holding references
to it.
Fix this issue by moving component to unmanged version so
that the dai pointers are only freeded after the component is removed.
Ruide Cao [Thu, 2 Apr 2026 15:12:31 +0000 (23:12 +0800)]
batman-adv: reject oversized global TT response buffers
batadv_tt_prepare_tvlv_global_data() builds the allocation length for a
global TT response in 16-bit temporaries. When a remote originator
advertises a large enough global TT, the TT payload length plus the VLAN
header offset can exceed 65535 and wrap before kmalloc().
The full-table response path still uses the original TT payload length when
it fills tt_change, so the wrapped allocation is too small and
batadv_tt_prepare_tvlv_global_data() writes past the end of the heap object
before the later packet-size check runs.
Fix this by rejecting TT responses whose TVLV value length cannot fit in
the 16-bit TVLV payload length field.
Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific") Cc: stable@vger.kernel.org Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Ruide Cao <caoruide123@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Eric Dumazet [Wed, 1 Apr 2026 15:47:21 +0000 (15:47 +0000)]
ipv6: avoid overflows in ip6_datagram_send_ctl()
Yiming Qian reported :
<quote>
I believe I found a locally triggerable kernel bug in the IPv6 sendmsg
ancillary-data path that can panic the kernel via `skb_under_panic()`
(local DoS).
The core issue is a mismatch between:
- a 16-bit length accumulator (`struct ipv6_txoptions::opt_flen`, type
`__u16`) and
- a pointer to the *last* provided destination-options header (`opt->dst1opt`)
when multiple `IPV6_DSTOPTS` control messages (cmsgs) are provided.
- `include/net/ipv6.h`:
- `struct ipv6_txoptions::opt_flen` is `__u16` (wrap possible).
(lines 291-307, especially 298)
- `net/ipv6/datagram.c:ip6_datagram_send_ctl()`:
- Accepts repeated `IPV6_DSTOPTS` and accumulates into `opt_flen`
without rejecting duplicates. (lines 909-933)
- `net/ipv6/ip6_output.c:__ip6_append_data()`:
- Uses `opt->opt_flen + opt->opt_nflen` to compute header
sizes/headroom decisions. (lines 1448-1466, especially 1463-1465)
- `net/ipv6/ip6_output.c:__ip6_make_skb()`:
- Calls `ipv6_push_frag_opts()` if `opt->opt_flen` is non-zero.
(lines 1930-1934)
- `net/ipv6/exthdrs.c:ipv6_push_frag_opts()` / `ipv6_push_exthdr()`:
- Push size comes from `ipv6_optlen(opt->dst1opt)` (based on the
pointed-to header). (lines 1179-1185 and 1206-1211)
1. `opt_flen` is a 16-bit accumulator:
- `include/net/ipv6.h:298` defines `__u16 opt_flen; /* after fragment hdr */`.
2. `ip6_datagram_send_ctl()` accepts *repeated* `IPV6_DSTOPTS` cmsgs
and increments `opt_flen` each time:
- In `net/ipv6/datagram.c:909-933`, for `IPV6_DSTOPTS`:
- It computes `len = ((hdr->hdrlen + 1) << 3);`
- It checks `CAP_NET_RAW` using `ns_capable(net->user_ns,
CAP_NET_RAW)`. (line 922)
- Then it does:
- `opt->opt_flen += len;` (line 927)
- `opt->dst1opt = hdr;` (line 928)
There is no duplicate rejection here (unlike the legacy
`IPV6_2292DSTOPTS` path which rejects duplicates at
`net/ipv6/datagram.c:901-904`).
If enough large `IPV6_DSTOPTS` cmsgs are provided, `opt_flen` wraps
while `dst1opt` still points to a large (2048-byte)
destination-options header.
In the attached PoC (`poc.c`):
- 32 cmsgs with `hdrlen=255` => `len = (255+1)*8 = 2048`
- 1 cmsg with `hdrlen=0` => `len = 8`
- Total increment: `32*2048 + 8 = 65544`, so `(__u16)opt_flen == 8`
- The last cmsg is 2048 bytes, so `dst1opt` points to a 2048-byte header.
3. The transmit path sizes headers using the wrapped `opt_flen`:
- The `IPV6_DSTOPTS` cmsg path requires `CAP_NET_RAW` in the target
netns user namespace (`ns_capable(net->user_ns, CAP_NET_RAW)`).
- Root (or any task with `CAP_NET_RAW`) can trigger this without user
namespaces.
- An unprivileged `uid=1000` user can trigger this if unprivileged
user namespaces are enabled and it can create a userns+netns to obtain
namespaced `CAP_NET_RAW` (the attached PoC does this).
- Local denial of service: kernel BUG/panic (system crash).
- Reproducible with a small userspace PoC.
</quote>
This patch does not reject duplicated options, as this might break
some user applications.
Instead, it makes sure to adjust opt_flen and opt_nflen to correctly
reflect the size of the current option headers, preventing the overflows
and the potential for panics.
This applies to IPV6_DSTOPTS, IPV6_HOPOPTS, and IPV6_RTHDR.
Specifically:
When a new IPV6_DSTOPTS is processed, the length of the old opt->dst1opt
is subtracted from opt->opt_flen before adding the new length.
When a new IPV6_HOPOPTS is processed, the length of the old opt->dst0opt
is subtracted from opt->opt_nflen.
When a new Routing Header (IPV6_RTHDR or IPV6_2292RTHDR) is processed,
the length of the old opt->srcrt is subtracted from opt->opt_nflen.
In the special case within IPV6_2292RTHDR handling where dst1opt is moved
to dst0opt, the length of the old opt->dst0opt is subtracted from
opt->opt_nflen before the new one is added.
Fixes: 333fad5364d6 ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).") Reported-by: Yiming Qian <yimingqian591@gmail.com> Closes: https://lore.kernel.org/netdev/CAL_bE8JNzawgr5OX5m+3jnQDHry2XxhQT5=jThW1zDPtUikRYA@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260401154721.3740056-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: hsr: fixes for PRP duplication and VLAN unwind
This series addresses two logic bugs in the HSR/PRP implementation
identified during a protocol audit. These are targeted for the 'net'
tree as they fix potential memory corruption and state inconsistency.
The primary change resolves a race condition in the node merging path by
implementing address-based lock ordering. This ensures that concurrent
mutations of sequence blocks do not lead to state corruption or
deadlocks.
An additional fix corrects asymmetric VLAN error unwinding by
implementing a centralized unwind path on slave errors.
====================
Luka Gejak [Wed, 1 Apr 2026 09:22:43 +0000 (11:22 +0200)]
net: hsr: fix VLAN add unwind on slave errors
When vlan_vid_add() fails for a secondary slave, the error path calls
vlan_vid_del() on the failing port instead of the peer slave that had
already succeeded. This results in asymmetric VLAN state across the HSR
pair.
Fix this by switching to a centralized unwind path that removes the VID
from any slave device that was already programmed.
Luka Gejak [Wed, 1 Apr 2026 09:22:42 +0000 (11:22 +0200)]
net: hsr: serialize seq_blocks merge across nodes
During node merging, hsr_handle_sup_frame() walks node_curr->seq_blocks
to update node_real without holding node_curr->seq_out_lock. This
allows concurrent mutations from duplicate registration paths, risking
inconsistent state or XArray/bitmap corruption.
Fix this by locking both nodes' seq_out_lock during the merge.
To prevent ABBA deadlocks, locks are acquired in order of memory
address.
Reviewed-by: Felix Maurer <fmaurer@redhat.com> Fixes: 415e6367512b ("hsr: Implement more robust duplicate discard for PRP") Signed-off-by: Luka Gejak <luka.gejak@linux.dev> Link: https://patch.msgid.link/20260401092243.52121-2-luka.gejak@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
vsock: initialize child_ns_mode_locked in vsock_net_init()
The `child_ns_mode_locked` field lives in `struct net`, which persists
across vsock module reloads. When the module is unloaded and reloaded,
`vsock_net_init()` resets `mode` and `child_ns_mode` back to their
default values, but does not reset `child_ns_mode_locked`.
The stale lock from the previous module load causes subsequent writes
to `child_ns_mode` to silently fail: `vsock_net_set_child_mode()` sees
the old lock, skips updating the actual value, and returns success
when the requested mode matches the stale lock. The sysctl handler
reports no error, but `child_ns_mode` remains unchanged.
Steps to reproduce:
$ modprobe vsock
$ echo local > /proc/sys/net/vsock/child_ns_mode
$ cat /proc/sys/net/vsock/child_ns_mode
local
$ modprobe -r vsock
$ modprobe vsock
$ echo local > /proc/sys/net/vsock/child_ns_mode
$ cat /proc/sys/net/vsock/child_ns_mode
global <--- expected "local"
Fix this by initializing `child_ns_mode_locked` to 0 (unlocked) in
`vsock_net_init()`, so the write-once mechanism works correctly after
module reload.
Fixes: 102eab95f025 ("vsock: lock down child_ns_mode as write-once") Reported-by: Jin Liu <jinl@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260401092153.28462-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kexin Sun [Sat, 21 Mar 2026 10:57:04 +0000 (18:57 +0800)]
drivers/base/memory: fix stale reference to memory_block_add_nid()
The function memory_block_add_nid() was renamed to
memory_block_add_nid_early() by commit 0a947c14e48c
("drivers/base: move memory_block_add_nid() into the
caller"). Update the stale reference in add_memory_block().
Assisted-by: unnamed:deepseek-v3.2 coccinelle Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Link: https://patch.msgid.link/20260321105704.6093-1-kexinsun@smail.nju.edu.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Shevchenko [Wed, 18 Mar 2026 14:21:40 +0000 (15:21 +0100)]
device property: Document how to check for the property presence
Currently it's unclear if one may or may not rely on the error codes
returned from the property getters to check for the property presence.
Clarify this by updating kernel-doc for fwnode_property_*() and
device_property_*() where it's applicable.
Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/r/4b24f1f4-b395-467a-81b7-1334a2d48845@roeck-us.net Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://patch.msgid.link/20260318142404.2526642-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mei: csc: wake device while reading firmware status
The CSC has firmware status registers in MMIO and they may be
unaccessible while device is suspended.
Wake device while reading firmware status via sysfs.
mei: csc: support controller with separate PCI device
Intel PCI driver for chassis controller embedded in Intel graphics
devices.
An MEI device here called CSC can be embedded in discrete
Intel graphics devices having separate PCI device, to support a range
of chassis tasks such as graphics card firmware update and security tasks.
Ensure that callers receive only < 0 return value on error.
Convert PCI error returned by pci_read_config_dword()
to common errno before returning from function.
In all cases in which a struct acpi_driver is used for binding a driver
to an ACPI device object, a corresponding platform device is created by
the ACPI core and that device is regarded as a proper representation of
underlying hardware. Accordingly, a struct platform_driver should be
used by driver code to bind to that device. There are multiple reasons
why drivers should not bind directly to ACPI device objects [1].
Overall, it is better to bind drivers to platform devices than to their
ACPI companions, so convert the sonypi ACPI driver to a platform one.
While this is not expected to alter functionality, it changes sysfs
layout and so it will be visible to user space.
Randy Dunlap [Thu, 26 Feb 2026 05:12:07 +0000 (21:12 -0800)]
misc: apds990x: fix all kernel-doc warnings
Move a #define so that it is not between kernel-doc and its struct
declaration.
Spell one struct member correctly.
Warning: include/linux/platform_data/apds990x.h:33 #define
APDS_PARAM_SCALE 4096; error: Cannot parse struct or union!
Warning: include/linux/platform_data/apds990x.h:62 struct member
'pdrive' not described in 'apds990x_platform_data'
Thorsten Blum [Wed, 25 Feb 2026 18:03:29 +0000 (19:03 +0100)]
most: usb: Use kzalloc_objs for endpoint address array
Replace kcalloc() with kzalloc_objs() when allocating the endpoint
address array to keep the size type-safe and match nearby allocations.
Reformat ->busy_urbs allocation to a single line. No functional change.
In all cases in which a struct acpi_driver is used for binding a driver
to an ACPI device object, a corresponding platform device is created by
the ACPI core and that device is regarded as a proper representation of
underlying hardware. Accordingly, a struct platform_driver should be
used by driver code to bind to that device. There are multiple reasons
why drivers should not bind directly to ACPI device objects [1].
Overall, it is better to bind drivers to platform devices than to their
ACPI companions, so convert the HPET ACPI driver to a platform one.
While this is not expected to alter functionality, it changes sysfs
layout and so it will be visible to user space.
Two char drivers have unnecessary module_init and module_exit functions
that are empty or just print a message. Remove them. Note that if a
module_init function exists, a module_exit function must also exist;
otherwise, the module cannot be unloaded.
Henry Zhang [Wed, 28 Jan 2026 01:45:01 +0000 (20:45 -0500)]
speakup: Document bleeps parameter values
The speakup documentation had a TODO about accepted values for the
bleeps parameter. drivers/accessibility/speakup/main.c indicates
that it's a bitmasked param where bit 0 controls beeping and bit 1
controls announcements.
Ivan Vera [Fri, 27 Mar 2026 13:16:45 +0000 (13:16 +0000)]
nvmem: zynqmp_nvmem: Fix buffer size in DMA and memcpy
Buffer size used in dma allocation and memcpy is wrong.
It can lead to undersized DMA buffer access and possible
memory corruption. use correct buffer size in dma_alloc_coherent
and memcpy.
Fixes: 737c0c8d07b5 ("nvmem: zynqmp_nvmem: Add support to access efuse") Cc: stable@vger.kernel.org Signed-off-by: Ivan Vera <ivanverasantos@gmail.com> Signed-off-by: Harish Ediga <harish.ediga@amd.com> Signed-off-by: Harsh Jain <h.jain@amd.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://patch.msgid.link/20260327131645.3025781-3-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jori Koolstra [Mon, 2 Mar 2026 15:11:32 +0000 (16:11 +0100)]
pps: change pps_class to a const struct
The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Change pps_class to be a const struct class and drop the
class_create() call.
most: replace cdev_component->class with a const struct class
The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Replace cdev_component->class with a const struct class and drop
the class_create() call. Compile tested only.
Jori Koolstra [Mon, 2 Mar 2026 14:24:36 +0000 (15:24 +0100)]
pps: change pps_gen_class to a const struct
The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Change pps_gen_class to be a const struct class and drop the
class_create() call.
Tyllis Xu [Sat, 14 Mar 2026 16:58:05 +0000 (11:58 -0500)]
ibmasm: fix heap over-read in ibmasm_send_i2o_message()
The ibmasm_send_i2o_message() function uses get_dot_command_size() to
compute the byte count for memcpy_toio(), but this value is derived from
user-controlled fields in the dot_command_header (command_size: u8,
data_size: u16) and is never validated against the actual allocation size.
A root user can write a small buffer with inflated header fields, causing
memcpy_toio() to read up to ~65 KB past the end of the allocation into
adjacent kernel heap, which is then forwarded to the service processor
over MMIO.
Silently clamping the copy size is not sufficient: if the header fields
claim a larger size than the buffer, the SP receives a dot command whose
own header is inconsistent with the I2O message length, which can cause
the SP to desynchronize. Reject such commands outright by returning
failure.
Validate command_size before calling get_mfa_inbound() to avoid leaking
an I2O message frame: reading INBOUND_QUEUE_PORT dequeues a hardware
frame from the controller's free pool, and returning without a
corresponding set_mfa_inbound() call would permanently exhaust it.
Additionally, clamp command_size to I2O_COMMAND_SIZE before the
memcpy_toio() so the MMIO write stays within the I2O message frame,
consistent with the clamping already performed by outgoing_message_size()
for the header field.
Tyllis Xu [Sat, 14 Mar 2026 16:53:54 +0000 (11:53 -0500)]
ibmasm: fix OOB reads in command_file_write due to missing size checks
The command_file_write() handler allocates a kernel buffer of exactly
count bytes and copies user data into it, but does not validate the
buffer against the dot command protocol before passing it to
get_dot_command_size() and get_dot_command_timeout().
Since both the allocation size (count) and the header fields (command_size,
data_size) are independently user-controlled, an attacker can cause
get_dot_command_size() to return a value exceeding the allocation,
triggering OOB reads in get_dot_command_timeout() and an out-of-bounds
memcpy_toio() that leaks kernel heap memory to the service processor.
Fix with two guards: reject writes smaller than sizeof(struct
dot_command_header) before allocation, then after copying user data
reject commands where the buffer is smaller than the total size declared
by the header (sizeof(header) + command_size + data_size). This ensures
all subsequent header and payload field accesses stay within the buffer.
Tyllis Xu [Sun, 8 Mar 2026 06:21:08 +0000 (00:21 -0600)]
misc: ibmasm: fix OOB MMIO read in ibmasm_handle_mouse_interrupt()
ibmasm_handle_mouse_interrupt() performs an out-of-bounds MMIO read
when the queue reader or writer index from hardware exceeds
REMOTE_QUEUE_SIZE (60).
A compromised service processor can trigger this by writing an
out-of-range value to the reader or writer MMIO register before
asserting an interrupt. Since writer is re-read from hardware on
every loop iteration, it can also be set to an out-of-range value
after the loop has already started.
The root cause is that get_queue_reader() and get_queue_writer() return
raw readl() values that are passed directly into get_queue_entry(),
which computes:
with no bounds check. This unchecked MMIO address is then passed to
memcpy_fromio(), reading 8 bytes from unintended device registers.
For sufficiently large values the address falls outside the PCI BAR
mapping entirely, triggering a machine check exception.
Fix by checking both indices against REMOTE_QUEUE_SIZE at the top of
the loop body, before any call to get_queue_entry(). On an out-of-range
value, reset the reader register to 0 via set_queue_reader() before
breaking, so that normal queue operation can resume if the corrupted
hardware state is transient.
Romain Gantois [Tue, 31 Mar 2026 09:20:58 +0000 (11:20 +0200)]
misc: ti_fpc202: Support special-purpose GPIO lines with LED features
The FPC202 dual port controller has 20 regular GPIO lines and 8 special
GPIO lines with LED features. Each one of these "LED GPIOs" can output PWM
and blink signals.
Add support for the eight special-purpose GPIO lines to the existing FPC202
driver's GPIO support. Add support for registering led-class devices on
these GPIO lines.
Romain Gantois [Tue, 31 Mar 2026 09:20:57 +0000 (11:20 +0200)]
dt-bindings: misc: Describe FPC202 LED features
The FPC202 dual port controller has 20 regular GPIO lines and 8 special
GPIO lines with LED features. Each one of these "LED GPIOs" can output PWM
and blink signals.