Hao Li [Mon, 30 Mar 2026 03:57:49 +0000 (11:57 +0800)]
mm/memory_hotplug: maintain N_NORMAL_MEMORY during hotplug
N_NORMAL_MEMORY is initialized from zone population at boot, but memory
hotplug currently only updates N_MEMORY. As a result, a node that gains
normal memory via hotplug can remain invisible to users iterating over
N_NORMAL_MEMORY, while a node that loses its last normal memory can stay
incorrectly marked as such.
The most visible effect is that
/sys/devices/system/node/has_normal_memory does not report a node even
after that node has gained normal memory via hotplug.
Also, list_lru-based shrinkers can undercount objects on such a node
and may skip reclaim on that node entirely, which can lead to a higher
memory footprint than expected.
Restore N_NORMAL_MEMORY maintenance directly in online_pages() and
offline_pages(). Set the bit when a node that currently lacks normal
memory onlines pages into a zone <= ZONE_NORMAL, and clear it when
offlining removes the last present pages from zones <= ZONE_NORMAL.
This restores the intended semantics without bringing back the old
status_change_nid_normal notifier plumbing which was removed in 8d2882a8edb8.
Current users that benefit include list_lru, zswap, nfsd filecache,
hugetlb_cgroup, and has_normal_memory sysfs reporting.
Link: https://lkml.kernel.org/r/20260330035941.518186-1-hao.li@linux.dev Fixes: 8d2882a8edb8 ("mm,memory_hotplug: remove status_change_nid_normal and update documentation") Signed-off-by: Hao Li <hao.li@linux.dev> Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Fri, 27 Mar 2026 00:32:22 +0000 (17:32 -0700)]
mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails
damon_call() for repeat_call_control of DAMON_SYSFS could fail if somehow
the kdamond is stopped before the damon_call(). It could happen, for
example, when te damon context was made for monitroing of a virtual
address processes, and the process is terminated immediately, before the
damon_call() invocation. In the case, the dyanmically allocated
repeat_call_control is not deallocated and leaked.
Fix the leak by deallocating the repeat_call_control under the
damon_call() failure.
Joanne Koong [Thu, 26 Mar 2026 21:51:27 +0000 (14:51 -0700)]
mm: reinstate unconditional writeback start in balance_dirty_pages()
Commit 64dd89ae01f2 ("mm/block/fs: remove laptop_mode") removed this
unconditional writeback start from balance_dirty_pages():
if (unlikely(!writeback_in_progress(wb)))
wb_start_background_writeback(wb);
This logic needs to be reinstated to prevent performance regressions for
strictlimited BDIs and memcg setups. The problem occurs because:
a) For strictlimited BDIs, throttling is calculated using per-wb
thresholds. The per-wb threshold can be exceeded even when the global
dirty threshold was not exceeded (nr_dirty < gdtc->bg_thresh)
b) For memcg-based throttling, memcg uses its own dirty count /
thresholds and can trigger throttling even when the global threshold
isn't exceeded
Without the unconditional writeback start, IO is throttled as it waits for
dirty pages to be written back but there is no writeback running. This
leads to severe stalls. On fuse, buffered write performance dropped from
1400 MiB/s to 2000 KiB/s.
Reinstate the unconditional writeback start so that writeback is
guaranteed to be running whenever IO needs to be throttled.
Link: https://lkml.kernel.org/r/20260326215127.3857682-2-joannelkoong@gmail.com Fixes: 64dd89ae01f2 ("mm/block/fs: remove laptop_mode") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
luo_session_deserialize() ignored the return value from
luo_file_deserialize(). As a result, a session could be left partially
restored even though the /dev/liveupdate open path treats deserialization
failures as fatal.
Propagate the error so a failed file deserialization aborts session
deserialization instead of silently continuing.
After analyzing this page’s state, it is hard to understand why the
mapcount is not 0 while the refcount is 0, since this page is not where
the issue first occurred. By enabling the CONFIG_DEBUG_VM config, I can
reproduce the crash as well and captured the first warning where the issue
appears:
The code that triggers the warning is: "VM_WARN_ON_FOLIO(page_folio(page +
nr_pages - 1) != folio, folio)", which indicates that set_pte_range()
tried to map beyond the large folio’s size.
By adding more debug information, I found that 'nr_pages' had overflowed
in filemap_map_pages(), causing set_pte_range() to establish mappings for
a range exceeding the folio size, potentially corrupting fields of pages
that do not belong to this folio (e.g., page->_mapcount).
After above analysis, I think the possible race is as follows:
CPU 0 CPU 1
filemap_map_pages() ext4_setattr()
//get and lock folio with old inode->i_size
next_uptodate_folio()
.......
//shrink the inode->i_size
i_size_write(inode, attr->ia_size);
//calculate the end_pgoff with the new inode->i_size
file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1;
end_pgoff = min(end_pgoff, file_end);
......
//nr_pages can be overflowed, cause xas.xa_index > end_pgoff
end = folio_next_index(folio) - 1;
nr_pages = min(end, end_pgoff) - xas.xa_index + 1;
......
//map large folio
filemap_map_folio_range()
......
//truncate folios
truncate_pagecache(inode, inode->i_size);
To fix this issue, move the 'end_pgoff' calculation before
next_uptodate_folio(), so the retrieved folio stays consistent with the
file end to avoid 'nr_pages' calculation overflow. After this patch, the
crash issue is gone.
Link: https://lkml.kernel.org/r/1cf1ac59018fc647a87b0dad605d4056a71c14e4.1773739704.git.baolin.wang@linux.alibaba.com Fixes: 743a2753a02e ("filemap: cap PTE range to be created to allowed zero fill in folio_map_range()") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reported-by: Yuanhe Shu <xiangzao@linux.alibaba.com> Tested-by: Yuanhe Shu <xiangzao@linux.alibaba.com> Acked-by: Kiryl Shutsemau (Meta) <kas@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cheng-Yang Chou [Tue, 31 Mar 2026 09:18:33 +0000 (17:18 +0800)]
tools/sched_ext: Fix off-by-one in scx_sdt payload zeroing
scx_alloc_free_idx() zeroes the payload of a freed arena allocation
one word at a time. The loop bound was alloc->pool.elem_size / 8, but
elem_size includes sizeof(struct sdt_data) (the 8-byte union sdt_id
header). This caused the loop to write one extra u64 past the
allocation, corrupting the tid field of the adjacent pool element.
Fix the loop bound to (elem_size - sizeof(struct sdt_data)) / 8 so
only the payload portion is zeroed.
Test plan:
- Add a temporary sanity check in scx_task_free() before the free call:
if (mval->data->tid.idx != mval->tid.idx)
scx_bpf_error("tid corruption: arena=%d storage=%d",
mval->data->tid.idx, (int)mval->tid.idx);
Without this fix, running scx_sdt under fork-heavy load triggers the
corruption error. With the fix applied, the same workload completes
without error.
Fixes: 36929ebd17ae ("tools/sched_ext: add arena based scheduler") Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Tejun Heo <tj@kernel.org>
selftests/nolibc: only use libgcc when really necessary
nolibc should work without libgcc to be compatible with as many
toolchains as possible. Currently the functionality tested by
nolibc-test does not contain any dependencies, make sure it stays
this way by not linking libgcc anymore.
On the ppc target GCC always emits references to '_restgpr_' functions,
so keep linking libgcc there.
tools/nolibc: check for overflow in calloc() without divisions
On some architectures without native division instructions
the division can generate calls into libgcc/compiler-rt.
This library might not be available, so its use should be avoided.
Use the compiler builtin to check for overflows without needing a
division. The builtin has been available since GCC 3 and clang 3.8.
Richard Cheng [Thu, 2 Apr 2026 09:38:50 +0000 (17:38 +0800)]
PCI/NPEM: Set LED_HW_PLUGGABLE for hotplug-capable ports
NPEM registers LED classdevs on PCI endpoint that may be behind
hotplug-capable ports. During hot-removal, led_classdev_unregister() calls
led_set_brightness(LED_OFF) which leads to a PCI config read to a
disconnected device, which fails and returns -ENODEV (topology details in
msgid.link below):
leds 0003:01:00.0:enclosure:ok: Setting an LED's brightness failed (-19)
The LED core already suppresses this for devices with LED_HW_PLUGGABLE set,
but NPEM never sets it. Add the flag since NPEM LEDs are on hot-pluggable
hardware by nature.
cpupower: remove extern declarations in cmd functions
extern char *optarg and extern int optind, opterr, optopt are
already declared by <getopt.h>, which is included at the top of
the file. Repeating extern declarations inside a function body
is misleading and unnecessary.
Merge tag 'soc-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"The largest part here are devicetree fixes for Qualcomm, and NXP i.MX,
addressing a few regressions and incorrect settings in board and SoC
pecific dts files.
The largest single commits are a revert of a cleanup patch for i.MX
that caused regressions for the NAND flash controller and a fixup for
an incomplete cleanup of the PCIe controller on Qualcomm platforms
that broke because the state was left incompatible with both the old
and new behavior.
On the Rockchips, Hisilicon, Renesas, Allwinner and AT91 platforms,
only a single simple dts bugfix each was added since the last round of
fixes.
On the SoC specific device drivers, everything is relatively harmless:
three reset controller driver fixes, a compatibility for fix ASpeed
soc ID, and error handling fixes for Qualcomm and Microchip. One
regression fix on Qualcomm addresses a problem with a previous fix for
DisplayPort alt mode"
* tag 'soc-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
arm64: dts: qcom: hamoa: Fix incomplete Root Port property migration
dt-bindings: display/msm: qcm2290-mdss: Fix missing ranges in example
firmware: microchip: fail auto-update probe if no flash found
arm64: dts: renesas: sparrow-hawk: Reserve first 128 MiB of DRAM
arm64: dts: qcom: agatti: Fix IOMMU DT properties
dt-bindings: media: venus: Fix iommus property
dt-bindings: display: msm: qcm2290-mdss: Fix iommus property
arm64: dts: allwinner: sun55i: Fix r-spi DMA
reset: spacemit: k3: Decouple composite reset lines
reset: gpio: fix double free in reset_add_gpio_aux_device() error path
ARM: dts: microchip: sam9x7: fix gpio-lines count for pioB
arm64: dts: hisilicon: hi3798cv200: Add missing dma-ranges
arm64: dts: hisilicon: poplar: Correct PCIe reset GPIO polarity
reset: rzg2l-usbphy-ctrl: Fix malformed MODULE_AUTHOR string
soc: microchip: mpfs-mss-top-sysreg: Fix resource leak on driver unbind
soc: microchip: mpfs-control-scb: Fix resource leak on driver unbind
soc: qcom: pmic_glink_altmode: Fix TBT->SAFE->!TBT transition
arm64: dts: qcom: monaco: Reserve full Gunyah metadata region
arm64: dts: imx8mq-librem5: Bump BUCK1 suspend voltage up to 0.85V
Revert "arm64: dts: imx8mq-librem5: Set the DVS voltages lower"
...
Franz Schnyder [Wed, 25 Mar 2026 09:31:16 +0000 (10:31 +0100)]
PCI: imx6: Fix reference clock source selection for i.MX95
In the PCIe PHY init for the i.MX95, the reference clock source selection
uses a conditional instead of always passing the mask. This currently
breaks functionality if the internal refclk is used.
To fix this issue, always pass IMX95_PCIE_REF_USE_PAD as the mask and clear
bit if external refclk is not used. This essentially swaps the parameters.
PCI/TPH: Pass ACPI Processor UID to Cache Locality _DSM
pcie_tph_get_cpu_st() uses the Query Cache Locality Features _DSM [1]
to retrieve the TPH Steering Tag for memory associated with the CPU
identified by its "cpu_uid" parameter, a Linux logical CPU ID.
The _DSM requires an ACPI Processor UID, which pcie_tph_get_cpu_st()
previously assumed was the same as the Linux logical CPU ID. This is
true on x86 but not on arm64, so pcie_tph_get_cpu_st() returned the
wrong Steering Tag, resulting in incorrect TPH functionality on arm64.
Convert the Linux logical CPU ID to the ACPI Processor UID with
acpi_get_cpu_uid() before passing it to the _DSM. Additionally, rename
the pcie_tph_get_cpu_st() parameter from "cpu_uid" to "cpu" to reflect
that it represents a logical CPU ID (not an ACPI Processor UID).
[1] According to ECN_TPH-ST_Revision_20200924
(https://members.pcisig.com/wg/PCI-SIG/document/15470), the input
is defined as: "If the target is a processor, then this field
represents the ACPI Processor UID of the processor as specified in
the MADT. If the target is a processor container, then this field
represents the ACPI Processor UID of the processor container as
specified in the PPTT."
Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support") Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260401081640.26875-9-fengchengwen@huawei.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPI: PPTT: Use acpi_get_cpu_uid() and remove get_acpi_id_for_cpu()
Update acpi/pptt.c to use acpi_get_cpu_uid() and remove unused
get_acpi_id_for_cpu() from arm64/loongarch/riscv, completing PPTT's
migration to the unified ACPI CPU UID interface
ACPI: Centralize acpi_get_cpu_uid() declaration in include/linux/acpi.h
Centralize acpi_get_cpu_uid() in include/linux/acpi.h (global scope) and
remove arch-specific declarations from arm64/loongarch/riscv/x86
asm/acpi.h. This unifies the interface across architectures and
simplifies maintenance by eliminating duplicate prototypes.
x86/acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
As a step towards unifying the interface for retrieving ACPI CPU UID
across architectures, introduce a new function acpi_get_cpu_uid() for
x86. While at it, add input validation to make the code more robust.
Update Xen-related code to use acpi_get_cpu_uid() instead of the legacy
cpu_acpi_id() function, and remove the now-unused cpu_acpi_id() to clean
up redundant code.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://patch.msgid.link/20260401081640.26875-5-fengchengwen@huawei.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
RISC-V: ACPI: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
As a step towards unifying the interface for retrieving ACPI CPU UID
across architectures, introduce a new function acpi_get_cpu_uid() for
riscv. While at it, add input validation to make the code more robust.
And also update acpi_numa.c and rhct.c to use the new interface instead
of the legacy get_acpi_id_for_cpu().
LoongArch: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
As a step towards unifying the interface for retrieving ACPI CPU UID
across architectures, introduce a new function acpi_get_cpu_uid() for
loongarch. While at it, add input validation to make the code more
robust.
arm64: acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
As a step towards unifying the interface for retrieving ACPI CPU UID
across architectures, introduce a new function acpi_get_cpu_uid() for
arm64. While at it, add input validation to make the code more robust.
Reimplement get_cpu_for_acpi_id() based on acpi_get_cpu_uid() for
consistency, and move its implementation next to the new function for
code coherence.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://patch.msgid.link/20260401081640.26875-2-fengchengwen@huawei.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Abel Vesa [Fri, 27 Mar 2026 16:18:39 +0000 (18:18 +0200)]
remoteproc: qcom: pas: Add Eliza ADSP support
The ADSP found on Eliza SoC is similar to the one found on SM8550.
So just add the dedicated compatible for Eliza ADSP and reuse the
SM8550 resource configuration.
Kai-Heng Feng [Mon, 30 Mar 2026 09:41:57 +0000 (17:41 +0800)]
ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler
Add support for decoding NVIDIA-specific CPER sections delivered via
the APEI GHES vendor record notifier chain. NVIDIA hardware generates
vendor-specific CPER sections containing error signatures and diagnostic
register dumps. This implementation registers a notifier_block with the
GHES vendor record notifier and decodes these sections, printing error
details via dev_info().
The driver binds to ACPI device NVDA2012, present on NVIDIA server
platforms. The NVIDIA CPER section contains a fixed header with error
metadata (signature, error type, severity, socket) followed by
variable-length register address-value pairs for hardware diagnostics.
Kai-Heng Feng [Mon, 30 Mar 2026 09:41:56 +0000 (17:41 +0800)]
PCI: hisi: Use devm_ghes_register_vendor_record_notifier()
Switch to the device-managed variant so the notifier is automatically
unregistered on device removal, allowing the open-coded remove callback
to be dropped entirely.
Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com> Acked-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20260330094203.38022-3-kaihengf@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Abel Vesa [Fri, 27 Mar 2026 16:18:38 +0000 (18:18 +0200)]
dt-bindings: remoteproc: qcom,milos-pas: Document Eliza ADSP
Since the devicetree bindings are exactly the same between Eliza ADSP and
Milos ADSP, reuse the existing Milos schema, just add the Eliza specific
ADSP compatible.
Shawn Guo [Fri, 6 Mar 2026 14:56:07 +0000 (22:56 +0800)]
remoteproc: qcom: Add missing space before closing bracket
Add missing space before closing curly bracket for qcom_q6v5_mss and
qcom_q6v5_pas driver of_match[] lines, so that all qcom remoteproc
drivers are consistent on the common coding style.
Shawn Guo [Mon, 9 Mar 2026 12:33:57 +0000 (20:33 +0800)]
dt-bindings: remoteproc: qcom: Drop types for firmware-name
The type of firmware-name is already defined by core schemas. Drop it
from individual bindings that have either a redundant definition or
an override as string type. For the later cases, constrain the number
of expected firmware names to 1.
Mukesh Ojha [Tue, 31 Mar 2026 17:12:43 +0000 (22:42 +0530)]
remoteproc: qcom: Fix minidump out-of-bounds access on subsystems array
MAX_NUM_OF_SS was hardcoded to 10 in the minidump_global_toc struct,
which is a direct overlay on an SMEM item allocated by the firmware.
Newer Qualcomm SoC firmware allocates space for more subsystems, while
older firmware only allocates space for 10. Bumping the constant would
cause Linux to read/write beyond the SMEM item boundary on older
platforms.
Fix this by converting subsystems[] to a flexible array member and
deriving the actual number of subsystems at runtime from the size
returned by qcom_smem_get(). Add a bounds check on minidump_id against
the derived count before indexing into the array.
Wolfram Sang [Wed, 1 Apr 2026 07:11:39 +0000 (09:11 +0200)]
hwspinlock: u8500: delete driver
The U8500 platform was converted to DT around 2013 and is DT only
meanwhile. This driver has never been converted to a DT driver, so it
clearly hasn't been used since then. To ease upcoming refactoring in the
hwspinlock subsystem, remove this obsolete driver.
Xi Ruoyao [Wed, 1 Apr 2026 13:53:12 +0000 (21:53 +0800)]
ACPI: tables: Enable FPDT on LoongArch
FPDT provides system- and application-readable performance statistics,
useful for profiling and analyzing boot-time performance. FPDT table
support is now available as a pending patch at the EDK II upstream [1]
and has been tested on real hardware such as Loongson XA61200_V1.1 and
XB612B0_V1.2 with patched firmware.
We have also cross checked systemd-analyze(1) against a stop watch and
the `dp' command in EFI Shell to see that the timing information are
correct.
Now that the functionality of FPDT is verified on LoongArch hardware,
list LOONGARCH as a possible dependency, allowing it to be enabled.
media: platform: mtk-mdp3: Constify buffer passed to mdp_vpu_sendmsg()
mdp_vpu_sendmsg() passes the buffer to scp_ipi_send(), which takes now
pointer to const, so adjust this interface as well for increased code
safety and code readability.
ASoC: qcom: Constify GPR packet being send over GPR interface
gpr_send_pkt() and pkt_router_send_svc_pkt() only send the GPR packet
they receive, without any need to actually modify it, so mark the
pointer to GPR packet as pointer to const for code safety and code
self-documentation. Several users of this interface can follow up and
also operate on pointer to const.
The rpmsg_send(), rpmsg_sendto() and other variants of sending
interfaces should only send the passed data, without modifying its
contents, so mark pointer 'data' as pointer to const. All users of this
interface already follow this approach, so only the function
declarations have to be updated.
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260317-rpmsg-send-const-v3-3-4d7fd27f037f@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
selftests: ublk: test that teardown after incomplete recovery completes
Before the fix, teardown of a ublk server that was attempting to recover
a device, but died when it had submitted a nonempty proper subset of the
fetch commands to any queue would loop forever. Add a test to verify
that, after the fix, teardown completes. This is done by:
- Adding a new argument to the fault_inject target that causes it die
after fetching a nonempty proper subset of the IOs to a queue
- Using that argument in a new test while trying to recover an
already-created device
- Attempting to delete the ublk device at the end of the test; this
hangs forever if teardown from the fault-injected ublk server never
completed.
It was manually verified that the test passes with the fix and hangs
without it.
If a ublk server starts recovering devices but dies before issuing fetch
commands for all IOs, cancellation of the fetch commands that were
successfully issued may never complete. This is because the per-IO
canceled flag can remain set even after the fetch for that IO has been
submitted - the per-IO canceled flags for all IOs in a queue are reset
together only once all IOs for that queue have been fetched. So if a
nonempty proper subset of the IOs for a queue are fetched when the ublk
server dies, the IOs in that subset will never successfully be canceled,
as their canceled flags remain set, and this prevents ublk_cancel_cmd
from actually calling io_uring_cmd_done on the commands, despite the
fact that they are outstanding.
Fix this by resetting the per-IO cancel flags immediately when each IO
is fetched instead of waiting for all IOs for the queue (which may never
happen).
Signed-off-by: Uday Shankar <ushankar@purestorage.com> Fixes: 728cbac5fe21 ("ublk: move device reset into ublk_ch_release()") Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: zhang, the-essence-of-life <zhangweize9@gmail.com> Link: https://patch.msgid.link/20260405-cancel-v2-1-02d711e643c2@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge tag 'opp-updates-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull OPP updates for 7.1 from Viresh Kumar:
"- Use performance level if available to distinguish between rates in
debugfs (Manivannan Sadhasivam).
- Fix scoped_guard in dev_pm_opp_xlate_required_opp() (Viresh Kumar)."
* tag 'opp-updates-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
OPP: Move break out of scoped_guard in dev_pm_opp_xlate_required_opp()
OPP: debugfs: Use performance level if available to distinguish between rates
batman-adv: hold claim backbone gateways by reference
batadv_bla_add_claim() can replace claim->backbone_gw and drop the old
gateway's last reference while readers still follow the pointer.
The netlink claim dump path dereferences claim->backbone_gw->orig and
takes claim->backbone_gw->crc_lock without pinning the underlying
backbone gateway. batadv_bla_check_claim() still has the same naked
pointer access pattern.
Reuse batadv_bla_claim_get_backbone_gw() in both readers so they operate
on a stable gateway reference until the read-side work is complete.
This keeps the dump and claim-check paths aligned with the lifetime
rules introduced for the other BLA claim readers.
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code") Fixes: 04f3f5bf1883 ("batman-adv: add B.A.T.M.A.N. Dump BLA claims via netlink") Cc: stable@vger.kernel.org Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Haoze Xie <royenheart@gmail.com> Signed-off-by: Ao Zhou <n05ec@lzu.edu.cn> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Document the WCN6755 PMU using a fallback to WCN6750 since the two chips
seem to be completely pin and software compatible. In fact the original
downstream kernel just pretends the WCN6755 is a WCN6750.
regulator: dt-bindings: regulator-max77620: convert to DT schema
Convert regulator-max77620 devicetree bindings for the MAX77620 PMIC from
TXT to YAML format. This patch does not change any functionality; the
bindings remain the same.
commit f0fba2ad1b6b ("ASoC: multi-component - ASoC Multi-Component
Support") has replaced "card->pmdown_time" to "rtd->pmdown_time".
card->pmdown_time has been not used this 15 years. Let's remove it.
Maciej Strozek [Thu, 2 Apr 2026 06:45:31 +0000 (14:45 +0800)]
ASoC: SOF: Intel: fix iteration in is_endpoint_present()
is_endpoint_present() iterates over sdca_data.num_functions, but checks
the dai_type according to codec info list, which will cause problems if
not all endpoints from the codec info list are present. Make sure the
type of actually present functions is compared against target dai_type.
Fixes: 5226d19d4cae ("ASoC: SOF: Intel: use sof_sdw as default SDW machine driver") Signed-off-by: Maciej Strozek <mstrozek@opensource.cirrus.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20260402064531.2287261-3-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
Charles Keepax [Mon, 16 Mar 2026 14:14:49 +0000 (14:14 +0000)]
ASoC: SDCA: Fix errors in IRQ cleanup
IRQs are enabled through sdca_irq_populate() from component probe
using devm_request_threaded_irq(), this however means the IRQs can
persist if the sound card is torn down. Some of the IRQ handlers
store references to the card and the kcontrols which can then
fail. Some detail of the crash was explained in [1].
Generally it is not advised to use devm outside of bus probe, so
the code is updated to not use devm. The IRQ requests are not moved
to bus probe time as it makes passing the snd_soc_component into
the IRQs very awkward and would the require a second step once the
component is available, so it is simpler to just register the IRQs
at this point, even though that necessitates some manual cleanup.
Rong Zhang [Sun, 15 Mar 2026 18:44:00 +0000 (02:44 +0800)]
MIPS: dts: loongson64g-package: Switch to Loongson UART driver
Loongson64g is Loongson 3A4000, whose UART controller is compatible with
Loongson 2K1500, which is NS16550A-compatible with an additional
fractional frequency divisor register.
Update the compatible strings to reflect this, so that 3A4000 can
benefit from the fractional frequency divisor provided by loongson-uart.
This is required on some devices, otherwise their UART can't work at
some high baud rates, e.g., 115200.
Tested on Loongson-LS3A4000-7A1000-NUC-SE with a 25MHz UART clock.
Without fractional frequency divisor, the actual baud rate was 111607
(25MHz / 16 / 14, measured value: 111545) and some USB-to-UART
converters couldn't work with it at all. With fractional frequency
divisor, the measured baud rate becomes 115207, which is quite accurate.
Signed-off-by: Rong Zhang <rongrong@oss.cipunited.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cássio Gabriel [Wed, 25 Mar 2026 20:05:11 +0000 (17:05 -0300)]
ASoC: SOF: compress: return the configured codec from get_params
The SOF compressed offload path accepts codec parameters in
sof_compr_set_params() and forwards them to firmware as
extended data in the SOF IPC stream params message.
However, sof_compr_get_params() still returns success without
filling the snd_codec structure. Since the compress core allocates
that structure zeroed and copies it back to userspace on success,
SNDRV_COMPRESS_GET_PARAMS returns an all-zero codec description
even after the stream has been configured successfully.
The stale TODO in this callback conflates get_params() with capability
discovery. Supported codec enumeration belongs in get_caps() and
get_codec_caps(). get_params() should report the current codec settings.
Cache the codec accepted by sof_compr_set_params() in the per-stream SOF
compress state and return it from sof_compr_get_params().
Add missing sound-sai-cells for this codec into schema.
At the same time, drop trailing spaces from description.
Fixes: 506e0825a4c9 ("ASoC: dt-bindings: Convert ti,tas2552 to DT schema") Signed-off-by: Marek Vasut <marex@nabladev.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://patch.msgid.link/20260405234502.154227-1-marex@nabladev.com Signed-off-by: Mark Brown <broonie@kernel.org>
Speaker protection and VI feedback modules are disabled by default.
Explicitly enable them when configuring speaker protection.
Fixes: 3e43a8c033c3 ("ASoC: qcom: audioreach: Add support for VI Sense module") Fixes: 0db76f5b2235 ("ASoC: qcom: audioreach: Add support for Speaker Protection module") Signed-off-by: Ravi Hothi <ravi.hothi@oss.qualcomm.com> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Link: https://patch.msgid.link/20260326113531.3144998-1-ravi.hothi@oss.qualcomm.com Signed-off-by: Mark Brown <broonie@kernel.org>
Shiji Yang [Wed, 18 Jun 2025 03:42:07 +0000 (11:42 +0800)]
mips: pci-mt7620: rework initialization procedure
Move the reset operation to the common part to reduce the code
redundancy. They are actually the same and needed for all SoCs.
Disabling power and clock are unnecessary for MT7620 and will be
removed. In vendor SDK, it's used to save the power when the PCI
driver is not selected. The MT7628 GPIO pinctrl has been removed
because this should be done in device-tree. Some delay intervals
have also been increased to follow the recommendations of the SoC
SDK and datasheet. Tested on both MT7620 and MT7628.
Signed-off-by: Shiji Yang <yangshiji66@outlook.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Shiji Yang [Wed, 18 Jun 2025 03:42:05 +0000 (11:42 +0800)]
mips: pci-mt7620: fix bridge register access
Host bridge registers and PCI RC control registers have different
memory base. pcie_m32() is used to write the RC control registers
instead of bridge registers. This patch introduces bridge_m32()
and use it to operate bridge registers to fix the access issue.
Signed-off-by: Shiji Yang <yangshiji66@outlook.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Théo Lebrun [Wed, 25 Feb 2026 16:55:22 +0000 (17:55 +0100)]
dt-bindings: soc: mobileye: OLB is an Ethernet PHY provider on EyeQ5
OLB on EyeQ5 ("mobileye,eyeq5-olb" compatible) is now declared as a
generic PHY provider. Under the hood, it provides Ethernet RGMII/SGMII
PHY support for both MAC instances.
ASoC: rt5640: Handle 0Hz sysclk during stream shutdown
Commit 2458adb8f92a ("SoC: simple-card-utils: set 0Hz to sysclk when
shutdown") sends a 0Hz sysclk request during stream shutdown to clear
codec rate constraints. The rt5640 codec forwards this 0Hz to
clk_set_rate(), which can cause clock controller firmware faults on
platforms where MCLK is SoC-driven (e.g. Tegra) and 0Hz falls below
the hardware minimum rate.
Handle the 0Hz case by clearing the internal sysclk state and
returning early, avoiding the invalid clk_set_rate() call.
MIPS: DEC: Rate-limit memory errors for non-KN01 parity systems
Similarly to memory errors in ECC systems also rate-limit memory parity
errors for KN02-BA, KN02-CA, KN04-BA, KN04-CA DECstation and DECsystem
models. Unlike with ECC these events are always fatal and are less
likely to cause a message flood, but handle them the same way for
consistency.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
MIPS: DEC: Rate-limit memory errors for KN01 systems
Similarly to memory errors in ECC systems also rate-limit memory parity
errors for KN01 DECstation and DECsystem models. Unlike with ECC these
events are always fatal and are less likely to cause a message flood,
but handle them the same way for consistency.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
MIPS: DEC: Rate-limit memory errors for ECC systems
Prevent the system from becoming unusable due to a flood of memory error
messages with DECstation and DECsystem models using ECC, that is KN02,
KN03 and KN05 systems. It seems common for gradual oxidation of memory
module contacts to cause memory errors to eventually develop and while
ECC takes care of correcting them and the system affected can continue
operating normally until the contacts have been cleaned, the unlimited
messages make the system spend all its time on producing them, therefore
preventing it from being used.
Rate-limiting removes the load from the system and enables its normal
operation, e.g.:
Bus error interrupt: CPU memory read ECC error at 0x139cfb04
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU partial memory write ECC error at 0x138c1f5c
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU partial memory write ECC error at 0x138c1f6c
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x139cff64
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af00c
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af044
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af0cc
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af0cc
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af0e4
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af104
ECC syndrome 0x54 -- corrected single bit error at data bit D3
dec_ecc_be_backend: 34455 callbacks suppressed
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
MIPS: kernel: Remove $0 clobber from `mult_sh_align_mod'
Remove rubbish $0 clobber added to inline asm within `mult_sh_align_mod'
with the removal of support for GCC versions below 3.4 made with commit 57810ecb581a ("MIPS: Remove GCC_IMM_ASM & GCC_REG_ACCUM macros").
Previously a macro was used that, depending on GCC version, expanded to
either `accum' or $0. Since the latter choice was only a placeholder to
keep the syntax consistent and the register referred is hardwired, there
is no point in having it here as it has no effect on code generation.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The second Late Binding version allows to send payload bigger
than client MTU by splitting it to chunks and uses separate
firmware client for transfer.
The component interface is unchanged and driver doing all splitting.
Only one Late Binding version is supported by firmware.
When Late binding version 2 is supported, the new client is advertised
by firmware and existing MKHI will have version 2.
This helps driver to select the right mode of work.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Link: https://patch.msgid.link/20260405112326.1535208-3-alexander.usyskin@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cássio Gabriel [Mon, 6 Apr 2026 03:20:06 +0000 (00:20 -0300)]
ALSA: gusmax: add ISA suspend and resume callbacks
gusmax still leaves its ISA PM callbacks disabled even though the shared
GF1 suspend and resume path now exists.
This board needs one extra piece of PM glue around the shared GF1 helpers.
The attached WSS codec has its own register image that must be saved and
restored across suspend, and the MAX control register must be rewritten on
resume before the codec and GF1 sides are brought back.
Use the existing wss->suspend() and wss->resume() hooks for the codec, then
wire the driver up to the shared GUS suspend and resume helpers for the GF1
side.
Cássio Gabriel [Mon, 6 Apr 2026 03:20:05 +0000 (00:20 -0300)]
ALSA: gusextreme: add ISA suspend and resume callbacks
gusextreme still leaves its ISA PM callbacks disabled because the shared
GF1 core had no suspend and resume path suitable for PM recovery.
Resume on this board needs one extra step before the shared GF1 path can
touch the chip again: the ES1688 side must restore the GF1 routing. Split
that routing sequence into a helper, reuse it for probe and resume, reset
the ES1688 side first on resume, and then wire the driver up to the shared
GUS PM helpers.
This restores usable post-resume GF1 operation on GUS Extreme without
rerunning probe-only detection in the shared GF1 path.
Cássio Gabriel [Mon, 6 Apr 2026 03:20:04 +0000 (00:20 -0300)]
ALSA: gusclassic: add ISA suspend and resume callbacks
gusclassic still leaves its ISA PM callbacks disabled because the shared
GF1 core had no suspend and resume path suitable for PM recovery.
Wire the driver up to the new shared GUS suspend and resume helpers so a
suspend/resume cycle restores usable GF1 operation without rerunning
probe-only detection or tearing down the runtime bookkeeping kept by the
card instance.
Cássio Gabriel [Mon, 6 Apr 2026 03:20:03 +0000 (00:20 -0300)]
ALSA: gus: add shared GF1 suspend and resume helpers
gusclassic and gusextreme still leave their ISA PM callbacks disabled
because the shared GF1 core only provides probe-time startup and full
shutdown paths.
Those helpers are not suitable for suspend and resume. They reset software
handlers and tear down runtime state such as the DRAM allocator, timer
state, DMA queues, PCM state and UART setup. Resume instead needs a
narrower recovery path that rebuilds the GF1 hardware state without
rerunning probe-only detection or discarding the bookkeeping kept by the
card instance.
Add shared GF1 suspend and resume helpers for that recovery path. Suspend
now quiesces GF1 PCM, aborts queued GF1 DMA work, resets the UART and
powers the chip down without tearing down allocator, timer or rawmidi
bookkeeping. Resume rebuilds the GF1 hardware state, restores timer and
UART handlers, and brings the chip back to a usable post-resume state for
the ISA front-ends.
The scope is limited to restoring post-resume usability. It does not
attempt transparent continuation of active GF1 PCM or synth state across
suspend, and userspace may still need to reprepare streams or reload
onboard sample data after resume. Open rawmidi substreams are restored
only to a usable post-resume state.
ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14IAH10
The bass speakers are not working, and add the following entry
in /etc/modprobe.d/snd.conf:
options snd-sof-intel-hda-generic hda_model=alc287-yoga9-bass-spk-pin
Fixes the bass speakers.
So add the quick ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN here.
Harin Lee [Mon, 6 Apr 2026 07:49:13 +0000 (16:49 +0900)]
ALSA: ctxfi: Add fallback to default RSR for S/PDIF
spdif_passthru_playback_get_resources() uses atc->pll_rate as the RSR
for the MSR calculation loop. However, pll_rate is only updated in
atc_pll_init() and not in hw_pll_init(), so it remains 0 after the
card init.
When spdif_passthru_playback_setup() skips atc_pll_init() for
32000 Hz, (rsr * desc.msr) always becomes 0, causing the loop to spin
indefinitely.
Add fallback to use atc->rsr when atc->pll_rate is 0. This reflects
the hardware state, since hw_card_init() already configures the PLL
to the default RSR.
Harin Lee [Mon, 6 Apr 2026 07:48:57 +0000 (16:48 +0900)]
ALSA: ctxfi: Limit PTP to a single page
Commit 391e69143d0a increased CT_PTP_NUM from 1 to 4 to support 256
playback streams, but the additional pages are not used by the card
correctly. The CT20K2 hardware already has multiple VMEM_PTPAL
registers, but using them separately would require refactoring the
entire virtual memory allocation logic.
ct_vm_map() always uses PTEs in vm->ptp[0].area regardless of
CT_PTP_NUM. On AMD64 systems, a single PTP covers 512 PTEs (2M). When
aggregate memory allocations exceed this limit, ct_vm_map() tries to
access beyond the allocated space and causes a page fault:
ALSA: scarlett2: Add missing sentinel initializer field
A "-Wmissing-field-initializers" warning was emitted when compiling the
module using the W=2 option. There is a sentinel initializer field
missing in the end of scarlett2_devices[]. Tested using a
Scarlett Solo 4th gen.
Cássio Gabriel [Fri, 3 Apr 2026 03:47:13 +0000 (00:47 -0300)]
ALSA: aoa: onyx: Update IEC958 sample-rate status for PCM playback
onyx_prepare() accepts 32/44.1/48 kHz PCM playback, but it leaves the
Onyx IEC958 sample-rate status bits at the driver's initial 44.1 kHz
setting in DIG_INFO3. As a result, 32 kHz and 48 kHz PCM streams
advertise a stale IEC958 sample rate unless userspace rewrites IEC958
Playback Default first.
Update only the consumer sample-frequency bits in DIG_INFO3 from the PCM
runtime during prepare, resolving the long-standing FIXME in the PCM
playback path while leaving the other user-controlled IEC958 status bits
unchanged.
Mark IEC958 Playback Default as volatile as well, since prepare() now
changes the exposed register contents outside the control put callback.
Ian Rogers [Sat, 4 Apr 2026 06:05:52 +0000 (23:05 -0700)]
perf cgroup: Update metric leader in evlist__expand_cgroup
When the evlist is expanded the metric leader wasn't being updated. As
the original evsel is deleted this creates a use-after-free in
stat-shadow's prepare_metric. This was detected running the "perf stat
--bpf-counters --for-each-cgroup test" with sanitizers.
The change itself puts the copied evsel into the priv field (known
unused because of evsel__clone use) and then in a second pass over the
list updates the copied values using the priv pointer.
Fixes: d1c5a0e86a4e ("perf stat: Add --for-each-cgroup option") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Sun Jian <sun.jian.kdev@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ian Rogers [Sat, 4 Apr 2026 03:43:03 +0000 (20:43 -0700)]
perf sample: Add evsel to struct perf_sample
Add the evsel from evsel__parse_sample into the struct
perf_sample. Sometimes we want to alter the evsel associated with a
sample, such as with off-cpu bpf-output events. In general the evsel
and perf_sample are passed as a pair, but this makes an altered evsel
something of a chore to keep checking for and setting up. Later
patches will remove passing an evsel with the perf_sample and switch
to just using the perf_sample's value.
Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ian Rogers [Sat, 4 Apr 2026 03:43:02 +0000 (20:43 -0700)]
perf sample: Make sure perf_sample__init/exit are used
The deferred stack trace code wasn't using perf_sample__init/exit. Add
the deferred stack trace clean up to perf_sample__exit which requires
proper NULL initialization in perf_sample__init. Make the
perf_sample__exit robust to being called more than once by using
zfree. Make the error paths in evsel__parse_sample exit the
sample. Add a merged_callchain boolean to capture that callchain is
allocated, deferred_callchain doen't suffice for this. Pack the struct
variables to avoid padding bytes for this.
Similiarly powerpc_vpadtl_sample wasn't using perf_sample__init/exit,
use it for consistency and potential issues with uninitialized
variables.
Similarly guest_session__inject_events in builtin-inject wasn't using
perf_sample_init/exit. The lifetime management for fetched events is
somewhat complex there, but when an event is fetched the sample should
be initialized and needs exiting on error. The sample may be left in
place so that future injects have access to it.
Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ian Rogers [Sat, 21 Mar 2026 06:14:48 +0000 (23:14 -0700)]
perf tests sched stats: Write output to temp file
Writing to the perf.data file can fail in various contexts such as
continual test. Other tests write to a mktemp-ed file, make the "perf
sched stats tests" follow this convention.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Swapnil Sapkal <swapnil.sapkal@amd.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Namhyung Kim [Mon, 6 Apr 2026 05:18:16 +0000 (22:18 -0700)]
perf sched: Avoid crash for unexpected perf sched stats report
Doing a `perf sched record` then `perf sched stats report` crashes as
the tp_handler isn't set. Add a dummy tp_handler for it rather than
adding an extra check.
Reported-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
RISC-V: KVM: Fix shift-out-of-bounds in make_xfence_request()
The make_xfence_request() function uses a shift operation to check if a
vCPU is in the hart mask:
if (!(hmask & (1UL << (vcpu->vcpu_id - hbase))))
However, when the difference between vcpu_id and hbase
is >= BITS_PER_LONG, the shift operation causes undefined behavior.
This was detected by UBSAN:
UBSAN: shift-out-of-bounds in arch/riscv/kvm/tlb.c:343:23
shift exponent 256 is too large for 64-bit type 'long unsigned int'
Fix this by adding a bounds check before the shift operation.
This bug was found by fuzzing the KVM RISC-V interface.
Fixes: 13acfec2dbcc ("RISC-V: KVM: Add remote HFENCE functions based on VCPU requests") Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260403232011.2394966-1-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
Yu Kuai [Mon, 30 Mar 2026 05:52:13 +0000 (13:52 +0800)]
md: fix array_state=clear sysfs deadlock
When "clear" is written to array_state, md_attr_store() breaks sysfs
active protection so the array can delete itself from its own sysfs
store method.
However, md_attr_store() currently drops the mddev reference before
calling sysfs_unbreak_active_protection(). Once do_md_stop(..., 0)
has made the mddev eligible for delayed deletion, the temporary
kobject reference taken by sysfs_break_active_protection() can become
the last kobject reference protecting the md kobject.
That allows sysfs_unbreak_active_protection() to drop the last
kobject reference from the current sysfs writer context. kobject
teardown then recurses into kernfs removal while the current sysfs
node is still being unwound, and lockdep reports recursive locking on
kn->active with kernfs_drain() in the call chain.
Reproducer on an existing level:
1. Create an md0 linear array and activate it:
mknod /dev/md0 b 9 0
echo none > /sys/block/md0/md/metadata_version
echo linear > /sys/block/md0/md/level
echo 1 > /sys/block/md0/md/raid_disks
echo "$(cat /sys/class/block/sdb/dev)" > /sys/block/md0/md/new_dev
echo "$(($(cat /sys/class/block/sdb/size) / 2))" > \
/sys/block/md0/md/dev-sdb/size
echo 0 > /sys/block/md0/md/dev-sdb/slot
echo active > /sys/block/md0/md/array_state
2. Wait briefly for the array to settle, then clear it:
sleep 2
echo clear > /sys/block/md0/md/array_state
The warning looks like:
WARNING: possible recursive locking detected
bash/588 is trying to acquire lock:
(kn->active#65) at __kernfs_remove+0x157/0x1d0
but task is already holding lock:
(kn->active#65) at sysfs_unbreak_active_protection+0x1f/0x40
...
Call Trace:
kernfs_drain
__kernfs_remove
kernfs_remove_by_name_ns
sysfs_remove_group
sysfs_remove_groups
__kobject_del
kobject_put
md_attr_store
kernfs_fop_write_iter
vfs_write
ksys_write
Restore active protection before mddev_put() so the extra sysfs
kobject reference is dropped while the mddev is still held alive. The
actual md kobject deletion is then deferred until after the sysfs
write path has fully returned.
bpf: Fix stale offload->prog pointer after constant blinding
When a dev-bound-only BPF program (BPF_F_XDP_DEV_BOUND_ONLY) undergoes
JIT compilation with constant blinding enabled (bpf_jit_harden >= 2),
bpf_jit_blind_constants() clones the program. The original prog is then
freed in bpf_jit_prog_release_other(), which updates aux->prog to point
to the surviving clone, but fails to update offload->prog.
This leaves offload->prog pointing to the freed original program. When
the network namespace is subsequently destroyed, cleanup_net() triggers
bpf_dev_bound_netdev_unregister(), which iterates ondev->progs and calls
__bpf_prog_offload_destroy(offload->prog). Accessing the freed prog
causes a page fault:
1. Set net.core.bpf_jit_harden=2 (echo 2 > /proc/sys/net/core/bpf_jit_harden)
2. Run xdp_metadata selftest, which creates a dev-bound-only XDP
program on a veth inside a netns (./test_progs -t xdp_metadata)
3. cleanup_net -> page fault in __bpf_prog_offload_destroy
Dev-bound-only programs are unique in that they have an offload structure
but go through the normal JIT path instead of bpf_prog_offload_compile().
This means they are subject to constant blinding's prog clone-and-replace,
while also having offload->prog that must stay in sync.
Fix this by updating offload->prog in bpf_jit_prog_release_other(),
alongside the existing aux->prog update. Both are back-pointers to
the prog that must be kept in sync when the prog is replaced.
tc_tunnel test is based on a send_and_test_data function which takes a
subtest configuration, and a boolean indicating whether the connection
is supposed to fail or not. This boolean is systematically passed to
true, and is a remnant from the first (not integrated) attempts to
convert tc_tunnel to test_progs: those versions validated for
example that a connection properly fails when only one side of the
connection has tunneling enabled. This specific testing has not been
integrated because it involved large timeouts which increased quite a
lot the test duration, for little added value.
Remove the unused boolean from send_and_test_data to simplify the
generic part of subtests.
====================
bpf: fix end-of-list detection in cgroup_storage_get_next_key()
list_next_entry() never returns NULL, so the NULL check in
cgroup_storage_get_next_key() is dead code. When iterating past the last
element, the function reads storage->key from a bogus pointer that aliases
internal map fields and copies the result to userspace.
Patch 1 replaces the NULL check with list_entry_is_head() so the function
correctly returns -ENOENT when there are no more entries.
Patch 2 adds a selftest to cover this corner case, as suggested by Sun Jian
and Paul Chaignon.
Weiming Shi [Fri, 3 Apr 2026 13:29:51 +0000 (21:29 +0800)]
selftests/bpf: add get_next_key boundary test for cgroup_storage
Verify that bpf_map__get_next_key() correctly returns -ENOENT when
called on the last (and only) key in a cgroup_storage map. Before the
fix in the previous patch, this would succeed with bogus key data
instead of failing.
Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/20260403132951.43533-3-bestswngs@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Weiming Shi [Fri, 3 Apr 2026 13:29:50 +0000 (21:29 +0800)]
bpf: fix end-of-list detection in cgroup_storage_get_next_key()
list_next_entry() never returns NULL -- when the current element is the
last entry it wraps to the list head via container_of(). The subsequent
NULL check is therefore dead code and get_next_key() never returns
-ENOENT for the last element, instead reading storage->key from a bogus
pointer that aliases internal map fields and copying the result to
userspace.
Replace it with list_entry_is_head() so the function correctly returns
-ENOENT when there are no more entries.
Fixes: de9cbbaadba5 ("bpf: introduce cgroup storage maps") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/20260403132951.43533-2-bestswngs@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
====================
bpf: Fix torn writes in non-prealloc htab with BPF_F_LOCK
A torn write issue was reported in htab_map_update_elem() with
BPF_F_LOCK on hash maps. The BPF_F_LOCK fast path performs
a lockless lookup and copies the value under the element's embedded
spin_lock. A concurrent delete can free the element via
bpf_mem_cache_free(), which allows immediate reuse. When
alloc_htab_elem() recycles the same memory, it writes the value with
plain copy_map_value() without taking the spin_lock, racing with the
stale lock holder and producing torn writes.
Patch 1 fixes alloc_htab_elem() to use copy_map_value_locked() when
BPF_F_LOCK is set.
Patch 2 adds a selftest that reliably detects the torn writes on an
unpatched kernel.