EDAC/i7300: disable error reporting if init fails and refactor helper
If error reporting is enabled during initialization but initialization
fails immediately after, or during normal driver exit, error reporting
is left enabled in the mask register even after exit.
Replace i7300_enable_error_reporting() with i7300_set_error_reporting()
to combine enabling/disabling. Disable reporting at initialization
failure and driver exit, before call to i7300_put_devices() for cleanup.
Add enabled reporting flag to i7300_pvt.
This ensures clean hardware handling by disabling any unused error
reporting bits before exiting.
Ulf Hansson [Fri, 29 May 2026 14:42:41 +0000 (16:42 +0200)]
mmc: Merge branch fixes into next
Merge the mmc fixes for v7.1-rc[n] into the next branch, to allow them to
get tested together with the mmc changes that are targeted for the next
release.
Crystal Wood [Mon, 11 May 2026 22:30:35 +0000 (17:30 -0500)]
tracing/osnoise: Array printk init and cleanup
None of the calls to trace_array_printk_buf() will do anything
if we don't initialize the buffer on instance creation (unless
some other tracer called it), so do that.
Add an osnoise_print() function to facilitate adding debug prints
(without tainting).
Use trace_array_printk() instead of trace_array_printk_buf(), as we're
only writing to the main buffer (of a non-main instance) anyway -- and
trace_array_printk_buf() skips the check to make sure we're not printing
to the global instance.
Shawn Lin [Fri, 29 May 2026 01:17:39 +0000 (09:17 +0800)]
mmc: dw_mmc: Add desc_num field for clarity
The ring_size field in struct dw_mci is misleadingly named.
Despite its name, it does not represent the size of the descriptor
ring buffer in bytes, but rather the number of descriptors allocated
within the fixed-size ring buffer.
The actual ring buffer size is fixed at PAGE_SIZE (or DESC_RING_BUF_SZ,
which equals PAGE_SIZE). Within this buffer, we allocate either
struct idmac_desc or struct idmac_desc_64addr descriptors, and
ring_size stores the count of these descriptors.
This naming has caused confusion, as it's also used to set
mmc->max_segs (the maximum number of scatter-gather segments),
which logically corresponds to the number of descriptors, not a
size in bytes.
No functional change is introduced by this naming-only patch.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Ulf Hansson <ulfh@kernel.org>
dt-bindings: mmc: sdhci-msm: Rename the binding to include 'qcom' prefix
This is the only Qcom binding that doesn't have 'qcom' prefix in the
bindings name. This doesn't match with the regex in MAINTAINERS file and
the 'get_maintainer.pl' script fails to list the 'linux-arm-msm' list:
Ulf Hansson <ulfh@kernel.org> (maintainer:MULTIMEDIA CARD (MMC), SECURE DIGITAL (SD) AND...)
Rob Herring <robh@kernel.org> (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS)
Krzysztof Kozlowski <krzk+dt@kernel.org> (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS)
Conor Dooley <conor+dt@kernel.org> (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS)
Bjorn Andersson <andersson@kernel.org> (in file)
Konrad Dybcio <konradybcio@kernel.org> (in file)
linux-mmc@vger.kernel.org (open list:MULTIMEDIA CARD (MMC), SECURE DIGITAL (SD) AND...)
devicetree@vger.kernel.org (open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS)
linux-kernel@vger.kernel.org (open list)
Hence, rename the binding to include 'qcom' prefix so that the regex
matches correctly.
Jisheng Zhang [Sun, 24 May 2026 02:34:55 +0000 (10:34 +0800)]
mmc: sdhci: add signal voltage switch in sdhci_resume_host
I met one suspend/resume issue with sdr104 capable sdio wifi card (with
"keep-power-in-suspend" set in DT property):
After resuming from suspend to ram, the sdio wifi card stops working.
Further debug shows that although ios shows the sdio card is at sdr104
mode, the voltage is still at 3V3. This is due to missing the calling
of ->start_signal_voltage_switch() in sdhci_resume_host().
Fix this issue by adding ->start_signal_voltage_switch() in
sdhci_resume_host(). This also matches what we do for
sdhci_runtime_resume_host().
Then the question is: why this issue hasn't reported and fixed for so
long time. IMHO, several reasons: Some host controllers just kick off
the runtime resume for system resume, so they benefit from the well
supported runtime pm code; Some platforms just use the old sdio wifi
card which doesn't need signal voltage switch at all, the default
voltage is 3v3 after resuming.
Fixes: 6308d2905bd3 ("mmc: sdhci: add quirk for keeping card power during suspend") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulfh@kernel.org>
Heiko Stuebner [Fri, 22 May 2026 18:43:07 +0000 (20:43 +0200)]
mmc: dw_mmc-rockchip: Add missing private data for very old controllers
The really old controllers (rk2928, rk3066, rk3188) do not support UHS
speeds at all, and thus never handled phase data.
For that reason it never had a parse_dt callback and no driver private
data at all.
Commit ff6f0286c896 ("mmc: dw_mmc-rockchip: Add memory clock auto-gating
support") makes the private data sort of mandatory, because the init
function checks whether phases are configured internally or through the
clock controller.
This results in the old SoCs then experiencing NULL-pointer dereferences
when they try to access that private-data struct.
While we could have if (priv) conditionals in all places, it's way less
cluttery to just give the old types their private-data struct.
Artem Shimko [Fri, 22 May 2026 07:31:32 +0000 (10:31 +0300)]
mmc: sdhci-of-dwcmshc: use dev_err_probe() to simplify error paths
Replace common pattern of dev_err() + return with dev_err_probe() in
probe functions and their callees. This macro provides standardized
error message format with symbolic error names and adds deferred probe
debugging information.
The conversion makes the code more compact and ensures consistent error
logging across all initialization paths.
The clk_disable_unprepare() function has internal protection against
ERR_PTR and NULL pointers (IS_ERR_OR_NULL). Remove the redundant
IS_ERR() check for bus_clk in dwcmshc_suspend() and in the error
path of dwcmshc_resume() to simplify the code.
Note that the clk_prepare_enable() call in dwcmshc_resume() must retain
its IS_ERR() check because clk_prepare() only handles NULL pointers,
not ERR_PTR.
Merge updates that introduce devm_acpi_install_notify_handler()
and convert some drivers for core ACPI devices previously using
acpi_dev_install_notify_handler() to devres-based resource
management.
* acpi-driver-devm:
ACPI: video: Switch over to devres-based resource management
ACPI: video: Use devm for video->entry and backlight cleanup
ACPI: video: Use devm action for freeing video devices
ACPI: video: Use devm action for video bus object cleanup
ACPI: video: Rearrange probe and remove code
ACPI: video: Reduce the number of auxiliary device dereferences
ACPI: PAD: Switch over to devres-based resource management
ACPI: PAD: Fix teardown ordering in acpi_pad_remove()
ACPI: PAD: Pass struct device pointer to acpi_pad_notify()
ACPI: PAD: Rearrange acpi_pad_notify()
ACPI: thermal: Switch over to devres-based resource management
ACPI: HED: Switch over to devres-based resource management
ACPI: HED: Refine guarding against adding a second instance
ACPI: battery: Switch over to devres-based resource management
ACPI: AC: Switch over to devres-based resource management
ACPI: NFIT: core: Use devm_acpi_install_notify_handler()
ACPI: bus: Introduce devm_acpi_install_notify_handler()
Inochi Amaoto [Thu, 21 May 2026 07:21:21 +0000 (15:21 +0800)]
mmc: litex_mmc: Set mandatory idle clocks before CMD0
The litex_mmc driver assumes the card is already probed in the BIOS
and skip the phy initialization. This will cause the command fail
like the following when the old card is unplugged and then insert
a new card:
[ 62.923593] litex-mmc f0004000.mmc: Command (cmd 8) error, status -110
[ 62.949717] litex-mmc f0004000.mmc: Command (cmd 55) error, status -110
[ 62.976606] litex-mmc f0004000.mmc: Command (cmd 55) error, status -110
[ 63.002516] litex-mmc f0004000.mmc: Command (cmd 55) error, status -110
[ 63.028442] litex-mmc f0004000.mmc: Command (cmd 55) error, status -110
Add required clock settings and initialization for the CMD 0, so it can
probe the new card.
Inochi Amaoto [Thu, 21 May 2026 07:21:20 +0000 (15:21 +0800)]
mmc: litex_mmc: Use DIV_ROUND_UP for more accurate clock calculation
The previous clock uses roundup_pow_of_two() to calculate the core
clock frequency. It does not meet the actual hardware meaning.
The actual frequency is calculated by "ref_clk / ((div >> 1) << 1)".
Lad Prabhakar [Tue, 19 May 2026 13:53:42 +0000 (14:53 +0100)]
mmc: renesas_sdhi: Add OF entry for RZ/G2E SoC
The RZ/G2E (R8A774C0) SoC was previously handled via the generic
"renesas,rcar-gen3-sdhi" fallback compatible string. However, because
the SDHI IP on RZ/G2E is identical with the R-Car E3 (R8A77990), it
requires the specific quirks and configuration defined in
`of_r8a77990_compatible` rather than the generic Gen3 data.
Add the explicit "renesas,sdhi-r8a774c0" match entry to map it correctly.
Note that the DT binding file renesas,sdhi.yaml does not need an update
as the entry for this SoC is already present.
Lad Prabhakar [Tue, 19 May 2026 13:53:41 +0000 (14:53 +0100)]
mmc: renesas_sdhi: Add OF entry for RZ/G2N SoC
The RZ/G2N (R8A774B1) SoC was previously handled via the generic
"renesas,rcar-gen3-sdhi" fallback compatible string. However, because
the SDHI IP on RZ/G2N is identical with the R-Car M3-N (R8A77965), it
requires the specific quirks and configuration defined in
`of_r8a77965_compatible` rather than the generic Gen3 data.
Add the explicit "renesas,sdhi-r8a774b1" match entry to map it correctly.
Note that the DT binding file renesas,sdhi.yaml does not need an update
as the entry for this SoC is already present.
Ulf Hansson [Fri, 29 May 2026 12:29:16 +0000 (14:29 +0200)]
mmc: Merge branch fixes into next
Merge the mmc fixes for v7.1-rc[n] into the next branch, to allow them to
get tested together with the mmc changes that are targeted for the next
release.
Lad Prabhakar [Tue, 19 May 2026 13:53:40 +0000 (14:53 +0100)]
mmc: renesas_sdhi: Add OF entry for RZ/G2H SoC
The RZ/G2H (R8A774E1) SoC was previously handled via the generic
"renesas,rcar-gen3-sdhi" fallback compatible string. However, because
the SDHI IP on RZ/G2H is identical with the R-Car H3-N (R8A77951), it
requires the specific quirks and configuration defined in
`of_r8a7795_compatible` rather than the generic Gen3 data.
Add the explicit "renesas,sdhi-r8a774e1" match entry to map it correctly.
Note that the DT binding file renesas,sdhi.yaml does not need an update
as the entry for this SoC is already present.
Abel Vesa [Wed, 13 May 2026 11:19:37 +0000 (14:19 +0300)]
dt-bindings: mmc: sdhci-msm: Add Eliza compatible
Document the compatible string for the SDHCI controller on the
Eliza platform.
Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Ulf Hansson <ulfh@kernel.org>
Osama Abdelkader [Sun, 10 May 2026 16:29:39 +0000 (18:29 +0200)]
mmc: davinci: fix mmc_add_host order in probe
mmc_add_host() makes the host visible to the MMC core. Register the
interrupt handlers and advertise MMC_CAP_SDIO_IRQ before that, so the
core cannot start using the host before IRQ handling is set up.
Huan He [Sat, 9 May 2026 08:49:07 +0000 (16:49 +0800)]
mmc: sdhci-of-dwcmshc: Fix reset, clk, and SDIO support for Eswin EIC7700
The EIC7700 code in sdhci-of-dwcmshc uses host->mmc->caps2 to select
different configuration paths for different card types. The current logic
distinguishes eMMC and SD, but does not handle SDIO separately.
Update the EIC7700 card-type checks so that eMMC, SD and SDIO are
distinguished explicitly.
Switch the reset path to dwcmshc_reset() so that pending interrupt state
is cleared consistently, and use sdhci_enable_clk() so the clock enable
sequence follows the standard SDHCI flow.
Fixes: 32b2633219d3 ("mmc: sdhci-of-dwcmshc: Add support for Eswin EIC7700") Signed-off-by: Huan He <hehuan1@eswincomputing.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulfh@kernel.org>
Tudor Ambarus [Tue, 5 May 2026 13:13:04 +0000 (13:13 +0000)]
firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion
Sashiko identified a possible infinite loop [1].
ACPM IPC sequence numbers are tracked via a 64-bit bitmap. Previously,
acpm_prepare_xfer() used a do...while loop to search for a free
sequence number.
If all 63 available sequence numbers are leaked due to transient
hardware timeouts or mailbox failures, the bitmap becomes full.
The next call to acpm_prepare_xfer() would enter an infinite loop.
Fix this by utilizing the kernel's optimized bitmap search functions
(find_next_zero_bit / find_first_zero_bit). If the pool is completely
exhausted, log the failure and return -EBUSY to allow the kernel to
fail gracefully instead of hanging.
Furthermore, drop the allocation loop entirely. Because
acpm_prepare_xfer() is strictly called under the 'tx_lock' mutex,
sequence number allocations are perfectly serialized. If
find_next_zero_bit() locates a free bit, a single
test_and_set_bit_lock() is mathematically guaranteed to succeed.
To enforce this locking invariant, wrap the allocation in a
WARN_ON_ONCE. If the atomic set fails, it indicates the driver's
mutex serialization is fundamentally broken. The warning generates a
stack trace for debugging, while returning -EIO immediately aborts the
transfer to prevent silent payload corruption.
Tudor Ambarus [Tue, 5 May 2026 13:13:03 +0000 (13:13 +0000)]
firmware: samsung: acpm: Fix missing LKMM barriers in sequence allocator
Sashiko identified memory ordering races in [1].
The ACPM driver uses a globally shared 'bitmap_seqnum' to track
available sequence numbers. Even though threads now strictly free their
own sequence numbers, the allocation and freeing of these bits across
concurrent threads are effectively lockless operations and require
explicit LKMM memory barriers.
Previously, the driver used plain bitwise operators (test_bit, set_bit,
clear_bit), which lack ordering guarantees. This creates two race
conditions on weakly ordered architectures like ARM64:
1. Polling Release Violation: The polling thread copies its payload and
calls clear_bit(). Without a release barrier, the CPU can reorder
the memory operations, making the cleared bit globally visible
before the payload reads have fully completed.
2. TX Acquire Violation: The TX thread loops on test_bit(), calls
set_bit(), and then wipes the payload buffer via memset(). Without
an acquire barrier, the CPU can speculatively execute the memset()
before the bit is safely and formally claimed.
If these reorderings overlap, a new TX thread can claim the sequence
number and overwrite the buffer while the original polling thread is
still actively reading from it.
Fix this by upgrading the bitwise operators. Wrap the TX allocation in
test_and_set_bit_lock() to establish formal LKMM Acquire semantics, and
pair it with clear_bit_unlock() in the polling path to enforce Release
semantics.
Tudor Ambarus [Tue, 5 May 2026 13:13:02 +0000 (13:13 +0000)]
firmware: samsung: acpm: Fix false timeouts and Use-After-Free in polling
Sashiko identified severe races in the polling state machine [1].
In the ACPM driver's polling mode, threads waited for responses by
monitoring the globally shared 'bitmap_seqnum'. This caused false
timeouts because if a thread processed its response and freed the
sequence number, a concurrent TX thread could immediately reallocate
it before the polling thread woke up.
Additionally, the driver suffered from a cross-thread Use-After-Free
(UAF) preemption race. Previously, acpm_get_rx() cleared the sequence
number of whichever RX message it drained from the hardware queue. This
meant Thread A could globally free Thread B's sequence slot while
Thread B was asleep. A new Thread C could then steal the slot,
overwrite the buffer, and leave Thread B to wake up to corrupted state
or a timeout.
Fix this by rewriting the polling state machine:
1. Decouple polling from the global allocator by introducing a per-slot
'completed' flag, synchronized via smp_store_release() and
smp_load_acquire().
2. Strip acpm_get_saved_rx() out of acpm_get_rx() to make it a pure
queue-draining function. Introduce a 'native_match' boolean argument
which evaluates to true only if the thread natively processed its
own sequence number during the call. This explicitly informs the
polling loop whether it must retrieve its payload from the
cross-thread cache.
3. Centralize the cache fallback and sequence number free (clear_bit)
inside the polling loop. Crucially, the free operation now strictly
targets the thread's own TX sequence number (xfer->txd[0]), rather
than the drained RX sequence number. This enforces strict ownership:
a thread only ever frees its own allocated sequence slot, and only
at the exact moment it completes its poll, eliminating the UAF
window.
Furthermore, explicitly guard the 'native_match' assignment with an
if (rx_seqnum == tx_seqnum) check, even for zero-length (no payload)
responses. While an unguarded assignment wouldn't crash (because the
cache fallback acpm_get_saved_rx() safely returns early on zero-length
transfers) doing so would "lie" to the state machine. If a thread
drained the queue and found another thread's zero-length message,
setting native_match = true would falsely convince the polling loop
that it natively handled its own response. Maintaining a rigorous state
machine requires that native_match is only set when a thread explicitly
processes its own sequence number.
Monish Chunara [Fri, 8 May 2026 10:15:44 +0000 (15:45 +0530)]
dt-bindings: mmc: sdhci-msm: Document the Shikra compatible
Document the Shikra-specific SDHCI compatible in the sdhci-msm binding.
Use "qcom,sdhci-msm-v5" as the fallback compatible for the MSM SDHCI v5
controller used on Shikra.
This is a simple helper which replaces page_folio(bvec->bv_page).
Minor improvement in readability, but the real motivation is to reduce
the number of references to bvec->bv_page so that it can be changed
with less work.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Leon Romanovsky <leon@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@linux.dev> Link: https://patch.msgid.link/20260528175905.1102280-2-willy@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
The change to format propagation in the BRx broke configuration of the
DRM pipeline. Revert it to fix the regression.
The original commit was meant to fix a v4l2-compliance failure, with no
known userspace applications being affected beside test tools. Reverting
is the simplest option, a more comprehensive fix can be developed (and
tested more thoroughly) later.
The change to format initialization, along with the change to format
propagation in the BRx in commit 937f3e6b51f1 ("media: renesas: vsp1:
brx: Fix format propagation"), broke configuration of the DRM pipeline.
Revert it to fix the regression.
The original commit was meant to fix a v4l2-compliance failure, with no
known userspace applications being affected beside test tools. Reverting
is the simplest option, a more comprehensive fix can be developed (and
tested more thoroughly) later.
Fixes: 133ac42af0a1 ("media: renesas: vsp1: Initialize format on all pads") Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/T2H Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/20260506215650.1897177-2-laurent.pinchart+renesas@ideasonboard.com Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Rik van Riel [Tue, 26 May 2026 19:43:29 +0000 (12:43 -0700)]
sched/fair: Use rq_clock() in update_tg_load_avg() rate-limit
update_tg_load_avg() is called once per leaf cfs_rq from the
__update_blocked_fair() walk that runs inside the NOHZ idle-balance
softirq, and again from update_load_avg() with UPDATE_TG. Its first
operation after the trivial early-outs is unconditionally:
now = sched_clock_cpu(cpu_of(rq_of(cfs_rq)));
if (now - cfs_rq->last_update_tg_load_avg < NSEC_PER_MSEC)
return;
Jakub ran into a system where nohz_idle_balance() was taking 75%
of a CPU (which is handling network traffic and doing many irq_exit_cpu
calls), with 35% of that CPU spent in update_load_avg, and 17% of the
CPU in sched_clock_cpu(), reading the TSC.
In a quick synthetic test, it looks like this patch reduces the
CPU use of sched_balance_update_blocked_averages by about 20%.
Switch the rate-limit to read rq_clock(rq_of(cfs_rq)) instead.
This eliminates the rdtsc, and uses a fairly fresh timestamp,
because all callers of update_tg_load_avg() and clear_tg_load_avg()
hold rq->lock and have called update_rq_clock(rq) within microseconds:
caller pre-state
__update_blocked_fair encloser did update_rq_clock(rq)
update_load_avg's three UPDATE_TG sites under rq->lock after enqueue/dequeue/update_curr
attach_/detach_entity_cfs_rq preceded by update_load_avg(...)
clear_tg_load_avg via offline path rq_clock_start_loop_update(rq) upfront
so rq->clock is fresh at every call. Since cfs_rqs are per-CPU
per-task_group, cfs_rq->last_update_tg_load_avg is always compared
against the same rq's clock; no cross-rq drift.
Signed-off-by: Rik van Riel <riel@surriel.com> Assisted-by: Claude (Anthropic) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://patch.msgid.link/20260527110250.6a91718d@fangorn
Andrea Righi [Tue, 26 May 2026 16:42:49 +0000 (18:42 +0200)]
selftests/sched_ext: Validate dl_server attach/detach in total_bw test
Extend the total_bw selftest to validate the fair/ext dl_server
auto-attach/detach operations.
After the existing consistency checks, the test now doubles the
fair_server's runtime on every CPU via debugfs and verifies that:
1. total_bw grew after the customization (proves fair_server was
attached and apply_params() honored the dl_bw_attached flag),
2. with the minimal BPF scheduler loaded, total_bw drops back to the
baseline value (proves fair_server was detached and ext_server was
attached at its own default runtime),
3. after unload total_bw matches the doubled value from step 1 (proves
fair_server was re-attached with the runtime customization preserved
across the load/unload cycle).
Commit cd959a3562050d ("sched_ext: Add a DL server for sched_ext tasks")
introduced an ext_server deadline server to protect sched_ext tasks from
fair/RT starvation, mirroring the existing fair_server.
Currently, both servers reserve their 50ms/1000ms bandwidth at boot,
regardless of whether a BPF scheduler is loaded. Unused bandwidth is
still reclaimed at runtime by other classes, but the static reservation
prevents the RT class from implicitly using that headroom when one of
the two classes is guaranteed to be empty.
A sysadmin can work around this by writing
/sys/kernel/debug/sched/{fair,ext}_server/cpu*/runtime, but that
requires manual action and not all systems expose debugfs.
A better approach is to make server bandwidth reservations dynamic: only
the scheduling policy that is currently active should register its
reservation, while the inactive one should not artificially hold
capacity (keeping both reservations only when the BPF scheduler is
running in partial mode):
+---------------------------------------------+-------------+------------+
| BPF scheduler state | fair server | ext server |
+---------------------------------------------+-------------+------------+
| not loaded (default boot) | reserved | none |
| loaded full mode (!SCX_OPS_SWITCH_PARTIAL) | none | reserved |
| loaded partial mode (SCX_OPS_SWITCH_PARTIAL)| reserved | reserved |
+---------------------------------------------+-------------+------------+
To achieve this, introduce an "attached/detached" state for each
deadline server, so the kernel can decide whether a server's bandwidth
should be accounted in global bandwidth tracking.
At boot, the system starts with only the fair server contributing to
bandwidth accounting. When a BPF scheduler is enabled, the ext server is
attached and may replace or complement the fair server depending on
whether full or partial mode is used. When sched_ext is disabled, the
system restores the previous deadline bandwidth values and behavior.
The transition logic ensures that switching between scheduling modes is
consistent and reversible, without losing runtime configuration or
requiring manual intervention.
Andrea Righi [Tue, 26 May 2026 10:05:02 +0000 (12:05 +0200)]
sched/deadline: Reject debugfs dl_server writes for offline CPUs
Writing runtime or period via the per-CPU dl_server debugfs files
(/sys/kernel/debug/sched/{fair,ext}_server/cpu*/{runtime,period}) on an
offline CPU can trigger two distinct kernel issues:
Both __dl_sub() and __dl_add() divide by cpus internally, which can be
0 once the CPU has been removed from any active root-domain span (this
has been latent since the debugfs interface was introduced).
2) WARN_ON_ONCE in dl_server_start():
WARNING: kernel/sched/deadline.c:1805 at dl_server_start+0x232/0x270
Commit ee6e44dfe6e5 ("sched/deadline: Stop dl_server before CPU goes
offline") added this check to catch enqueueing the server on an
offline rq.
There's no meaningful semantics for re-configuring the per-CPU dl_server
bandwidth while the CPU is offline, so simply reject the write with
-EBUSY so userspace gets a clear error.
Closes: https://lore.kernel.org/all/20260526092228.3B6891F00A3A@smtp.kernel.org/ Fixes: d741f297bcea ("sched/fair: Fair server interface") Reported-by: Sashiko <sashiko-bot@kernel.org> Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Tested-by: abaci-kreproducer <abaci@linux.alibaba.com> Link: https://patch.msgid.link/20260526100502.575774-1-arighi@nvidia.com
On powerpc, cpu_coregroup_mask is available only when the underlying
hardware support coregroup. In shared LPAR, QEMU guest or power9 etc
coregroup isn't supported. In such cases llc_mask was being referenced
when it was null leading to panic.
On powerpc, LLC is at SMT core level. So assumption that coregroup(MC)
domain point to LLC is wrong. Provide a way for archs to say where its
LLC is if it not at MC domain.
slab->partial is assigned by get_obj("partial") and then immediately
overwritten by get_obj_and_str("partial", &t). Remove the first
redundant assignment.
Xuewen Wang [Mon, 18 May 2026 06:21:58 +0000 (14:21 +0800)]
tools/mm/slabinfo: remove dead assignment in get_obj_and_str()
The assignment `x = NULL` sets the local parameter variable instead of
`*x`, which is a no-op since `*x` was already set to NULL on the line
above. Remove the dead assignment.
The disable trace path in slab_debug() had a logic error where it would
set trace=1 instead of trace=0. This made trace functionality permanently
enabled once turned on for any slab cache.
Mark Brown [Thu, 28 May 2026 23:01:44 +0000 (00:01 +0100)]
KVM: arm64: Correctly cap ZCR_EL2 provided by a guest hypervisor
ZCR_EL2 can be updated by a VHE guest hypervisor either using ZCR_EL2
(which traps) or ZCR_EL1 (which does not trap). KVM handles both in
different way:
- on ZCR_EL2 trap, ZCR_EL2.LEN is immediately capped at the VM's own
VL limit. This has the potential to break existing SW that relies
on the full LEN field to be stateful.
- on ZCR_EL1 access, we do absolutely nothing.
On restoring the SVE context for an L2 guest, we directly restore the
guest hypervisor's view of ZCR_EL2 into the physical ZCR_EL2. If the
guest's view of the register was updated using the ZCR_EL2 accessor,
the value has already been sanitised (with the caveat mentioned above).
But if the guest used ZCR_EL1, the raw value is written into the HW,
and the L2 guest can now access VLs that it shouldn't.
Fix all the above by moving the VL capping to the restore points,
ensuring that:
- the HW is always programmed with a capped value, irrespective of
the accessor being used,
- the ZCR_EL2.LEN field is always completely stateful, irrespective
of the accessor being used.
Additionally, move ZCR_EL2 to be a sanitised register, ensuring that
only the LEN field is actually stateful. This requires some creative
construction of the RES0 mask, as the sysreg generation script does
not yet generate RAZ/WI fields.
Jori Koolstra [Thu, 28 May 2026 17:58:47 +0000 (17:58 +0000)]
vfs: replace ints with enum last_type for LAST_XXX
Several functions in namei.c take an "int *type" parameter, such as
filename_parentat(). To know what values this can take you have to find
the anonymous struct that defines the LAST_XXX values. Define an enum
last_type to make this type explicit.
Jori Koolstra [Thu, 28 May 2026 17:58:46 +0000 (17:58 +0000)]
vfs: make LAST_XXX private to fs/namei.c
The only user of LAST_XXX outside of fs/namei.c is fs/smb/server/vfs.c;
ksmbd_vfs_path_lookup() calls vfs_path_parent_lookup() and expects a
LAST_NORM last type (or it will be ENOENT). ksmbd_vfs_rename() also calls
vfs_path_parent_lookup() but forgets the LAST_NORM check.
It does not really make sense to have vfs_path_parent_lookup() expose
the last_type because it is only needed to ensure it is LAST_NORM. So
let's do this check in vfs_path_parent_lookup() instead and keep the
LAST_XXX internal to fs/namei.c. This changes the ENOENT errno in
ksmbd_vfs_path_lookup() to EINVAL, which matches better with how this is
handled by callers of filename_parentat().
Tomas Glozar [Thu, 14 May 2026 07:30:38 +0000 (09:30 +0200)]
rtla: Document tests in README
RTLA tests are not documented anywhere. Mention both runtime and unit
tests in the README, with instructions on how to run them and a list of
dependencies and required system configuration.
Rajat Gupta [Thu, 21 May 2026 05:11:21 +0000 (22:11 -0700)]
drm: prevent integer overflows in dumb buffer creation helpers
Fix integer overflow issues in the dumb buffer creation path:
1. drm_mode_create_dumb() does not bound width, height, or bpp
before passing them to driver callbacks. Downstream helpers
(e.g. drm_gem_dma_dumb_create_internal) perform pitch/size
alignment in u32 arithmetic that can overflow for extreme
values. Add hard limits: width and height < 8192, bpp <= 32.
No legitimate software rendering use case exceeds these.
2. drm_mode_align_dumb() uses roundup(pitch, hw_pitch_align)
without checking for overflow. If pitch is near U32_MAX,
roundup() wraps to a small value, making subsequent
check_mul_overflow() pass with a much smaller pitch than
intended. Add an overflow check after roundup.
3. drm_mode_align_dumb() uses ALIGN(size, hw_size_align) which
only works correctly for power-of-two alignment values.
Replace with roundup() which works for any alignment.
Suggested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
crypto: af_alg - Document that it is *always* slower
Without support for zero-copy or off-CPU offloads, AF_ALG is always
slower than software cryptography. Its only advantage is that it might
save code size. However, this is largely mitigated by lightweight
userspace cryptographic libraries.
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: af_alg - Drop support for off-CPU cryptography
AF_ALG is deprecated and exposed to unprivileged userspace. Only
use the least buggy algorithm implementations: the pure software ones.
This removes one of the main advantages of AF_ALG, which is the
ability to use it with off-CPU accelerators. However, using off-CPU
accelerators has huge overheads, both in performance and attack surface.
I have yet to see real-world, performance-critical workloads where using
an accelerator via AF_ALG is actually a win over doing cryptography in
userspace.
If using an off-CPU accelerator really does turn out to be a win, a new
API should be developed that is actually a good fit for it.
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The only user of msg->msg_iocb was AF_ALG, but that's deprecated.
It can be removed entirely at the cost of only supporting synchronous
operations. This doesn't break userspace, which will silently block
(for a bounded amount of time) in io_submit instead of operating
asynchronously.
This also makes struct msghdr smaller, helping every other caller of
sendmsg().
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: ccp/tsm - Enable the root port after the endpoint
The PCIe r7.0, chapter "6.33.8 Other IDE Rules" mandates if selective IDE
is enabled for config requersts, a stream must be enabled on the endpoint
before enabling it on the rootport:
===
For Selective IDE, the Stream must not be used until it has been enabled in
both Partner Ports. For cases where one of the Partner Ports is a Root Port
and Selective IDE for Configuration Requests is enabled, the other
Partner Port must be enabled prior to the Root Port. For other scenarios,
the mechanisms to satisfy this requirement are implementation-specific.
===
Do what the spec says.
Fixes: 4be423572da1 ("crypto/ccp: Implement SEV-TIO PCIe IDE (phase1)") Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ahsan Atta [Wed, 20 May 2026 12:51:50 +0000 (13:51 +0100)]
crypto: qat - use pci logging variants for PCI-specific messages
Replace dev_err(&pdev->dev, ...), dev_info(&pdev->dev, ...) and
dev_dbg(&pdev->dev, ...) with pci_err(), pci_info() and pci_dbg()
where the log message relates to a PCI subsystem operation such as
device enable, BAR mapping, PCI region requests, PCI state
save/restore, and SR-IOV management.
Messages about driver-level logic (NUMA topology, device matching,
accelerator units, capabilities, configuration, DMA) are intentionally
left as dev_err() even when a struct pci_dev pointer is in scope,
since those concern the device or driver rather than the PCI bus.
No functional change.
Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ahsan Atta [Wed, 20 May 2026 12:41:55 +0000 (13:41 +0100)]
crypto: qat - protect service table iterations with service_lock
The service_table list is protected by service_lock when entries are
added or removed (in adf_service_add() and adf_service_remove()), but
several functions iterate over the list without holding this lock.
A concurrent adf_service_register() or adf_service_unregister() call
could modify the list during traversal, leading to list corruption or
a use-after-free.
Fix this by holding service_lock across all list_for_each_entry()
iterations of service_table in adf_dev_init(), adf_dev_start(),
adf_dev_stop(), adf_dev_shutdown(), adf_dev_restarting_notify(),
adf_dev_restarted_notify(), and adf_error_notifier().
The lock ordering is safe: callers of the static helpers (adf_dev_up()
and adf_dev_down()) acquire state_lock before service_lock, and no
event_hld callback or service_lock holder ever acquires state_lock in
the reverse order.
Cc: stable@vger.kernel.org Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework") Signed-off-by: Ahsan Atta <ahsan.atta@intel.com> Co-developed-by: Maksim Lukoshkov <maksim.lukoshkov@intel.com> Signed-off-by: Maksim Lukoshkov <maksim.lukoshkov@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ahsan Atta [Wed, 20 May 2026 12:33:00 +0000 (13:33 +0100)]
crypto: qat - fix restarting state leak on allocation failure
In adf_dev_aer_schedule_reset(), ADF_STATUS_RESTARTING is set before
allocating reset_data. If the allocation fails, the function returns
-ENOMEM without queuing reset work, so nothing ever clears the bit.
This leaves the device permanently stuck in the restarting state,
causing all subsequent reset attempts to be silently skipped.
Fix this by using test_and_set_bit() to atomically claim the
RESTARTING state, preventing duplicate reset scheduling races under
concurrent fatal error reporting. If the subsequent allocation fails,
clear the bit to restore clean state so future reset attempts can
proceed.
Cc: stable@vger.kernel.org Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework") Signed-off-by: Ahsan Atta <ahsan.atta@intel.com> Co-developed-by: Maksim Lukoshkov <maksim.lukoshkov@intel.com> Signed-off-by: Maksim Lukoshkov <maksim.lukoshkov@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thorsten Blum [Wed, 20 May 2026 10:00:30 +0000 (12:00 +0200)]
crypto: octeontx - use strscpy_pad in ucode_load_store
Instead of zero-initializing the temporary buffer and then copying into
it with strscpy(), use strscpy_pad() to copy the string and zero-pad any
trailing bytes. Drop the explicit size argument to further simplify the
code since strscpy_pad() can determine it automatically when the
destination buffer has a fixed length.
Also use strscpy_pad() to check for string truncation instead of the
hard-coded OTX_CPT_UCODE_NAME_LENGTH.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Arnd Bergmann [Wed, 20 May 2026 07:38:44 +0000 (09:38 +0200)]
crypto: s390 - add select CRYPTO_AEAD for aes
The aes driver registers both skcipher and aead algorithms,
but when aead is not enabled this causes a link failure:
s390-linux-ld: arch/s390/crypto/aes_s390.o: in function `aes_s390_fini':
arch/s390/crypto/aes_s390.c:969:(.text+0x115e): undefined reference to `crypto_unregister_aead'
s390-linux-ld: arch/s390/crypto/aes_s390.o: in function `aes_s390_init':
arch/s390/crypto/aes_s390.c:1028:(.init.text+0x294): undefined reference to `crypto_register_aead'
Add the missing 'select' statement.
Fixes: bf7fa038707c ("s390/crypto: add s390 platform specific aes gcm support.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: atmel-ecc - Use named initializers for struct i2c_device_id
While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.
This patch doesn't modify the compiled array, only its representation in
source form benefits. The former was confirmed with x86 and arm64
builds.
Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: atmel-sha204a - Use named initializers for struct i2c_device_id
While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.
This patch doesn't modify the compiled array, only its representation in
source form benefits. The former was confirmed with x86 and arm64
builds.
For consistency also assign .driver_data for the array item that the
driver relies on i2c_get_match_data() returning NULL for.
Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The driver binds to i2c devices only and thus in the absence of an
assignment for .data in the of_device_id array i2c_get_match_data()
falls back to .driver_data from the i2c_device_id array. So only provide
&atsha204_quality once to reduce duplication.
Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ecrdsa_exit_tfm() is empty, and sig_alg .exit is optional. The
corresponding .init callback is not set either, so there is nothing to
release in .exit.
Herbert Xu [Tue, 19 May 2026 04:22:18 +0000 (12:22 +0800)]
crypto: tegra - Fix dma_free_coherent size error
When freeing a coherent DMA buffer, the size must match the value
that was used during the allocation.
Unfortunately the size field in the tegra driver gets overwritten
by this point so it no longer matches and creates a warning.
Fix this by saving a copy of the size on the stack.
Note that the ccm function actually mixes up the inbuf and outbuf
sizes, but it doesn't matter because the two sizes are actually
equal.
Fixes: 1cb328da4e8f ("crypto: tegra - Do not use fixed size buffers") Reporeted-by: Patrick Talbert <ptalbert@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Vladislav Dronov <vdronov@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Zongyu Wu [Mon, 18 May 2026 14:29:56 +0000 (22:29 +0800)]
crypto: hisilicon/qm - support doorbell enable control
The driver notifies the hardware to handle task through
doorbell. Currently, doorbell is enabled by default. To
prevent the process from sending doorbells during hardware
reset scenarios, which could cause the hardware to process
doorbells and trigger new errors:
For example, when the physical machine is resetting the device,
doorbells are still being sent from the virtual machine.
Therefore, the driver disables doorbell during hardware
unavailability. After hardware initialization is completed,
doorbell is enabled, and any task sent during the unavailability
period will return errors.
The hardware supports the PF to disable doorbells for all functions,
while the VF can only disable its own doorbell function. When the PF
is reset, it will disable doorbells for all functions. When VF is
reset, it only disables its own doorbell and does not affect tasks
on other functions.
Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Weili Qian [Mon, 18 May 2026 14:29:55 +0000 (22:29 +0800)]
crypto: hisilicon - mask all error type when removing driver
Each bit in the error interrupt register corresponds to a specific
error type. A bit value of 0 enables the interrupt, and a bit value
of 1 disables the interrupt. Currently, when disabling interrupts,
it incorrectly enables the interrupt types that were not enabled.
Therefore, when disabling interrupts, all bits should be directly
written to 1.
Weili Qian [Mon, 18 May 2026 14:29:54 +0000 (22:29 +0800)]
crypto: hisilicon/qm - disable error report before flr
Before function level reset, driver first disable device error report
and then waits for the device reset to complete. However, when the
error is recovered, the error bits will be enabled again, resulting in
invalid disable. It is modified to detect that there is no error
before disable error report, and then do FLR.
Zhushuai Yin [Mon, 18 May 2026 14:29:53 +0000 (22:29 +0800)]
crypto: hisilicon/qm - support function-level error reset
When executing operations on crypto devices, hardware errors
are inevitable. For certain errors, a full device reset is
required to recover. However, in certain cases, only a
specific function may fail, while other functions can still
operate normally. A system-wide RAS reset in such cases would
unnecessarily impact functioning components.
This patch introduces function-level granularity handling,
enabling targeted resets of only the error-reporting
functions without affecting other operational functions.
Zhushuai Yin [Mon, 18 May 2026 14:29:52 +0000 (22:29 +0800)]
crypto: hisilicon/qm - place the interrupt status interface after the PM usage counter
To avoid accessing memory of a suspended device, and since the counter
interface used by PM involves sleep operations, the counter interface
cannot be placed in the interrupt top half. Therefore, the interface for
acquiring the interrupt status in the RAS reset flow that resides in the
interrupt context needs to be moved to the bottom half for processing.
Zhushuai Yin [Mon, 18 May 2026 14:29:51 +0000 (22:29 +0800)]
crypto: hisilicon/qm - allow VF devices to query hardware isolation status
The problem that the VF device cannot obtain the isolation
status and isolation threshold of the device is resolved.
The accelerator driver can query the device isolation status
and threshold via the VF device using the fault query sysfs
interface under uacce. Note that only the PF device supports
isolation policy configuration, while the VF device is
limited to read-only query operations.
Gao Xiang [Fri, 22 May 2026 08:27:16 +0000 (16:27 +0800)]
erofs: fix use-after-free on sbi->sync_decompress
z_erofs_decompress_kickoff() can race with filesystem unmount, causing
a use-after-free on sbi->sync_decompress.
When I/O completes, z_erofs_endio() calls z_erofs_decompress_kickoff()
to queue z_erofs_decompressqueue_work() asynchronously. Then, after all
folios are unlocked, unmount workflow can proceed and sbi will be freed
before accessing to sbi->sync_decompress.
Muchun Song [Sat, 23 May 2026 06:01:23 +0000 (14:01 +0800)]
mm/cma: fix reserved page leak on activation failure
If cma_activate_area() fails after allocating only part of the range
bitmaps, the cleanup path still has to release the reserved pages when
CMA_RESERVE_PAGES_ON_ERROR is clear.
That is still worth doing even in this __init path. A bitmap_zalloc()
failure does not necessarily mean the system cannot make further progress:
freeing the reserved CMA pages can return a substantial amount of memory
to the buddy allocator and may relieve the temporary memory shortage that
caused the allocation failure in the first place.
However, the cleanup path currently uses the bitmap-freeing bound for page
release as well. That is only correct for ranges whose bitmap allocation
already succeeded. The failed range and all later ranges still keep their
reserved pages, so a partial bitmap allocation failure can permanently
leak them.
Fix this by releasing reserved pages for all ranges. Use the saved
early_pfn[] value for ranges whose bitmap allocation already succeeded and
for the failed range, and use cmr->early_pfn for later ranges whose bitmap
allocation was never attempted.
Link: https://lore.kernel.org/20260523060123.2207992-1-songmuchun@bytedance.com Fixes: c009da4258f9 ("mm, cma: support multiple contiguous ranges, if requested") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Acked-by: Usama Arif <usama.arif@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Frank van der Linden <fvdl@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wupeng Ma [Fri, 22 May 2026 01:03:05 +0000 (09:03 +0800)]
mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison
Two concurrent madvise(MADV_HWPOISON) calls on the same hugetlb page can
trigger a recursive spinlock self-deadlock (AA deadlock) on hugetlb_lock
when racing with a concurrent unmap:
The out: path in __get_huge_page_for_hwpoison() calls folio_put() to drop
the GUP reference while the hugetlb_lock is still held by the hugetlb.c
wrapper get_huge_page_for_hwpoison(). If concurrent unmap has released
the page table mapping reference, folio_put() drops the folio refcount to
zero, triggering free_huge_folio() which attempts to re-acquire the
non-recursive hugetlb_lock.
Fix this by moving hugetlb_lock acquisition from the hugetlb.c wrapper
into get_huge_page_for_hwpoison(). Place spin_unlock_irq() before the
folio_put() at the out: label so the folio is always released outside the
lock.
[akpm@linux-foundation.org: fix race, rename label per Miaohe] Link: https://sashiko.dev/#/patchset/20260522010305.4099834-1-mawupeng1@huawei.com Link: https://lore.kernel.org/f39f405e-4b4b-8f79-70fe-a2b5b62114eb@huawei.com Link: https://lore.kernel.org/20260522010305.4099834-1-mawupeng1@huawei.com Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()") Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> Acked-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Acked-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Carlier [Wed, 20 May 2026 04:49:12 +0000 (05:49 +0100)]
mm/hugetlb: restore reservation on error in hugetlb folio copy paths
Two sites in mm/hugetlb.c allocate a hugetlb folio via
alloc_hugetlb_folio() (consuming a VMA reservation) and then call
copy_user_large_folio(), which became int-returning in commit 1cb9dc4b475c
("mm: hwpoison: support recovery from HugePage copy-on-write faults") and
can now fail (e.g. -EHWPOISON on a hwpoisoned source page). On the
failure path, folio_put() restores the global hugetlb pool count through
free_huge_folio(), but the per-VMA reservation map entry is left marked
consumed:
- hugetlb_mfill_atomic_pte() resubmission path (UFFDIO_COPY)
- copy_hugetlb_page_range() fork-time CoW path when
hugetlb_try_dup_anon_rmap() fails (rare: pinned hugetlb anon
folio under fork)
User-visible effect: on UFFDIO_COPY into a private hugetlb VMA where the
resubmission copy fails, the reservation for that address is leaked from
the VMA's reserve map. A subsequent fault at the same address takes the
no-reservation path, and under hugetlb pool pressure the task is SIGBUSed
at an address it had previously reserved. The fork-time CoW path leaks
the same way in the child VMA's reserve map, though it requires the much
rarer combination of pinned hugetlb anon page + hwpoisoned source.
Add the missing restore_reserve_on_error() call before folio_put() on both
error paths.
Link: https://lore.kernel.org/20260520044912.6751-1-devnexen@gmail.com Fixes: 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults") Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: yuehaibing <yuehaibing@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Wed, 20 May 2026 06:10:25 +0000 (14:10 +0800)]
mm/cma_debug: fix invalid accesses for inactive CMA areas
cma_activate_area() can fail after allocating range bitmaps. Its cleanup
path frees those bitmaps, but only clears cma->count and
cma->available_count. It leaves cma->nranges and each range's count in
place, so cma_debugfs_init() can still register debugfs files for an area
that never activated successfully.
That exposes two problems. Reading the bitmap file can make debugfs walk
a freed range bitmap and trigger an invalid memory access. Reading
maxchunk can also take cma->lock even though that lock is initialized only
on the successful activation path.
Fix this by creating debugfs entries only for CMA areas that reached
CMA_ACTIVATED.
c009da4258f9 introduced the invalid access to bitmap file. 2e32b947606d
introduced the invalid access to cma->lock. This change applies to both
issues. So I added two Fixes tags.
Link: https://lore.kernel.org/20260520061025.3971821-1-songmuchun@bytedance.com Fixes: c009da4258f9 ("mm, cma: support multiple contiguous ranges, if requested") Fixes: 2e32b947606d ("mm: cma: add functions to get region pages counters") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Frank van der Linden <fvdl@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Stefan Strogin <stefan.strogin@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shakeel Butt [Thu, 21 May 2026 22:37:51 +0000 (15:37 -0700)]
memcg: use round-robin victim selection in refill_stock
Harry Yoo reported that get_random_u32_below() is not safe to call in the
nmi context and memcg charge draining can happen in nmi context.
More specifically get_random_u32_below() is neither reentrant- nor
NMI-safe: it acquires a per-cpu local_lock via local_lock_irqsave() on the
batched_entropy_u32 state. An NMI that lands on a CPU mid-update of the
ChaCha batch state and recurses into the random subsystem would corrupt
that state. The memcg_stock local_trylock prevents re-entry on the percpu
stock itself, but cannot protect an unrelated subsystem's per-cpu lock.
Replace the random pick with a per-cpu round-robin counter stored in
memcg_stock_pcp and serialized by the same local_trylock that already
guards cached[] and nr_pages[]. No atomics, no random calls, no extra
locks needed.
Link: https://lore.kernel.org/20260521223751.3794625-1-shakeel.butt@linux.dev Fixes: f735eebe55f8f ("memcg: multi-memcg percpu charge cache") Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reported-by: Harry Yoo <harry@kernel.org> Closes: https://lore.kernel.org/4e20f643-6983-4b6e-b12d-c6c4eb20ae0c@kernel.org/ Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 081056dc00a2 ("mm/hugetlb: unshare page tables during VMA split,
not before") changed the locking model around hugetlbfs PMD unsharing on
VMA split, but did not update the function which asserts the locks,
hugetlb_vma_assert_locked().
This function asserts that either the hugetlb VMA lock is held (if a
shared mapping) or that the reservation map lock is held (if private).
If you get an unfortunate race between something which results in one of
these locks being released and a hugetlb VMA split and you have
CONFIG_LOCKDEP enabled, you can therefore see a false positive assertion
arise when there is in fact no issue.
Since this change introduced a new take_locks parameter to
hugetlb_unshare_pmds(), which, when set to false, indicates that locking
is sufficient, simply pass this to the unsharing logic and predicate the
lock assertions on this.
This is safe, as we already asserted the file rmap lock and the VMA write
lock prior to this (implying exclusive mmap write lock), so we cannot be
raced by either rmap or page fault page table walkers which the asserted
locks are intended to protect against (we don't mind GUP-fast).
Separate out huge_pmd_unshare() into __huge_pmd_unshare() to add a
check_locks parameter, and update hugetlb_unshare_pmds() to pass this
parameter to it.
This leaves all other callers of huge_pmd_unshare() still correctly
asserting the locks.
The below reproducer will trigger the assert in a kernel with
CONFIG_LOCKDEP enabled by racing process teardown (which will release the
hugetlb lock) against a hugetlb split.
void execute_one(void)
{
void *ptr;
pid_t pid;
/*
* Create a hugetlb mapping spanning a PUD entry.
*
* We force the hugetlb page allocation with populate and
* noreserve.
*
* |---------------------|
* | |
* |---------------------|
* 0 PUD boundary
*/
ptr = mmap(0, PUD_SIZE, PROT_READ | PROT_WRITE,
MAP_FIXED | MAP_SHARED | MAP_ANON |
MAP_NORESERVE | MAP_HUGETLB | MAP_POPULATE,
-1, 0);
if (ptr == MAP_FAILED) {
perror("mmap");
exit(EXIT_FAILURE);
}
/*
* Fork but with a bogus stack pointer so we try to execute code in
* a non-VM_EXEC VMA, causing segfault + teardown via exit_mmap().
*
* The clone will cause PMD page table sharing between the
* processes first via:
* copy_process() -> ... -> huge_pte_alloc() -> huge_pmd_share()
*
* Then tear down and release the hugetlb 'VMA' lock via:
* exit_mmap() -> ... -> vma_close() -> hugetlb_vma_lock_free()
*/
pid = syscall(__NR_clone, 0, 2 * PMD_SIZE, 0, 0, 0);
if (pid < 0) {
perror("clone");
exit(EXIT_FAILURE);
} if (pid == 0) {
/* Pop stack... */
return;
}
/*
* We are the parent process.
*
* Race the child process's teardown with a PMD unshare.
*
* We do this by triggering:
*
* __split_vma() -> hugetlb_split() -> hugetlb_unshare_pmds()
*
* Which, importantly, doesn't hold the hugetlb VMA lock (nor can
* it), meaning we assert in hugetlb_vma_assert_locked().
*
* .
* |----------.----------|
* | . |
* |----------.----------|
* 0 . PUD boundary
*/
mmap(0, PUD_SIZE / 2, PROT_READ | PROT_WRITE,
MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0);
}
int main(void)
{
int i;
/* Kick off fork children. */
for (i = 0; i < NUM_FORKS; i++) {
pid_t pid = fork();
if (pid < 0) {
perror("fork");
exit(EXIT_FAILURE);
}
/* Fork children do their work and exit. */
if (!pid) {
int j;
/* If we succeeded, wait on children. */
for (i = 0; i < NUM_FORKS; i++)
wait(NULL);
return EXIT_SUCCESS;
}
[ljs@kernel.org: account for the !CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING case] Link: https://lore.kernel.org/agWZsPGYid08uU6O@lucifer Link: https://lore.kernel.org/20260513085658.45264-1-ljs@kernel.org Fixes: 081056dc00a2 ("mm/hugetlb: unshare page tables during VMA split, not before") Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Jann Horn <jannh@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sohil Mehta [Thu, 28 May 2026 18:48:25 +0000 (11:48 -0700)]
x86/cpu: Fix a F00F bug warning and clean up surrounding code
On x86 SMP systems with the F00F bug present, do_clear_cpu_cap()
rightfully warns that the code clears the X86_BUG_F00F flag after
alternatives have been patched.
X86_BUG_F00F is first cleared in intel_workarounds() and then set for
the affected models. This sequence works fine on the BSP but on AP
bringup, where alternatives have already been patched and clearing the
flag there triggers the warning.
There is no technical reason for clearing the flag before setting it. It
is mainly an artifact of introducing the X86_BUG_F00F flag in
While at it, remove the kernel notification and the surrounding logic to
inform the user about the workaround exactly once. If needed, the
presence of the F00F bug can be determined through /proc/cpuinfo.
Additionally, the F00F bug was the last remaining user of clear_cpu_bug().
With no users left, get rid of this helper as well.
[ bp: Massage commit message. ]
Co-developed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ahmed S. Darwish <darwi@linutronix.de> Link: https://patch.msgid.link/20260528184826.3642051-1-sohil.mehta@intel.com
Stephen Boyd [Fri, 29 May 2026 01:21:54 +0000 (18:21 -0700)]
Merge tag 'qcom-clk-fixes-for-7.1' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into clk-fixes
Pull Qualcomm clk driver fixes from Bjorn Andersson:
The parking of shared RCGs during registration parks the MDP source
clock, disabling display until the msm display driver successfully
initializes the hardware again. Mark this clock on Makena and Hamoa as
"no_init_park" to retain a working recovery console etc.
* tag 'qcom-clk-fixes-for-7.1' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
clk: qcom: dispcc-sc8280xp: Don't park mdp_clk_src at registration time
clk: qcom: x1e80100-dispcc: Stop disp_cc_mdss_mdp_clk_src from getting parked
Stephen Boyd [Fri, 29 May 2026 01:19:26 +0000 (18:19 -0700)]
Merge tag 'samsung-clk-fixes-7.1' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into clk-fixes
Pull a Samsung clk driver fix from Krzysztof Kozlowski:
Google GS101: Correct the register name for saving and restoring state
during system suspend and resume. Lack of proper save/restore leads to
incorrect clock values after system resume.
* tag 'samsung-clk-fixes-7.1' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
clk: samsung: gs101: Fix missing USI7_USI DIV clock in peric0_clk_regs
Dave Airlie [Fri, 29 May 2026 01:12:48 +0000 (11:12 +1000)]
Merge tag 'drm-intel-fixes-2026-05-27' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Fix potential UAF in TTM object purge (Janusz Krzysztofik)
- Use polling when irqs are unavailable [aux] (Michał Grzelak)
- Fix HDR pre-CSC LUT programming loop [color] (Pranay Samala)
- Block DC states on vblank enable when Panel Replay supported [psr] (Jouni Högander)
- Use DC_OFF wake reference to block DC6 on vblank enable [psr] (Jouni Högander)
Commit 8871389da151 introduces common pcs dts properties which writes
rx=normal,tx=normal polarity to register SGMSYS_QPHY_WRAP_CTRL of switch.
This is initialized with tx-bit set and so change inverts polarity
compared to before.
It looks like mt7531 has tx polarity inverted in hardware and set tx-bit
by default to restore the normal polarity.
The MT7531 datasheet quite clearly states:
Register 000050EC QPHY_WRAP_CTRL -- QPHY wrapper control
Reset value: 0x00000501
BIT 1 RX_BIT_POLARITY -- RX bit polarity control
1'b0: normal
1'b1: inverted
BIT 0 TX_BIT_POLARITY -- TX bit polarity control (TX default inversed
in MT7531)
1'b0: normal
1'b1: inverted
Till this patch the register write was only called when mediatek,pnswap
property was set which cannot be done for switch because the fw-node param
was always NULL from switch driver in the mtk_pcs_lynxi_create call.
Do not configure switch side like it's done before.
Fixes: 8871389da151 ("net: pcs: pcs-mtk-lynxi: deprecate "mediatek,pnswap"") Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260526153239.30194-1-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 29 May 2026 00:02:54 +0000 (17:02 -0700)]
Merge tag 'for-net-2026-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_core: Rework hci_dev_do_reset() to use hci_sync functions
- hci_conn: Fix memory leak in hci_le_big_terminate()
- hci_sync: Set HCI_CMD_DRAIN_WORKQUEUE during device close
- hci_sync: Reset device counters in hci_dev_close_sync()
- hci_sync: fix UAF in hci_le_create_cis_sync
- L2CAP: Fix possible crash on l2cap_ecred_conn_rsp
- L2CAP: fix chan ref leak in l2cap_chan_timeout() on !conn
- L2CAP: use chan timer to close channels in cleanup_listen()
- L2CAP: clear chan->ident on ECRED reconfiguration success
- ISO: fix UAF in iso_recv_frame
- ISO: serialize iso_sock_clear_timer with socket lock
- HIDP: fix missing length checks in hidp_input_report()
- 6lowpan: check skb_clone() return value in send_mcast_pkt()
- btusb: Allow firmware re-download when version matches
- hci_qca: Use 100 ms SSR delay for rampatch and NVM loading
* tag 'for-net-2026-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: Reset device counters in hci_dev_close_sync()
Bluetooth: hci_sync: Set HCI_CMD_DRAIN_WORKQUEUE during device close
Bluetooth: hci_core: Rework hci_dev_do_reset() to use hci_sync functions
Bluetooth: ISO: serialize iso_sock_clear_timer with socket lock
Bluetooth: ISO: fix UAF in iso_recv_frame
Bluetooth: L2CAP: Fix possible crash on l2cap_ecred_conn_rsp
Bluetooth: l2cap: clear chan->ident on ECRED reconfiguration success
Bluetooth: hci_qca: Use 100 ms SSR delay for rampatch and NVM loading
Bluetooth: hci_sync: fix UAF in hci_le_create_cis_sync
Bluetooth: 6lowpan: check skb_clone() return value in send_mcast_pkt()
Bluetooth: btusb: Allow firmware re-download when version matches
Bluetooth: HIDP: fix missing length checks in hidp_input_report()
Bluetooth: L2CAP: use chan timer to close channels in cleanup_listen()
Bluetooth: L2CAP: fix chan ref leak in l2cap_chan_timeout() on !conn
Bluetooth: hci_conn: Fix memory leak in hci_le_big_terminate()
====================
Zhenghang Xiao [Wed, 27 May 2026 03:24:11 +0000 (11:24 +0800)]
sctp: fix race between sctp_wait_for_connect and peeloff
sctp_wait_for_connect() drops and re-acquires the socket lock while
waiting for the association to reach ESTABLISHED state. During this
window, another thread can peeloff the association to a new socket via
getsockopt(SCTP_SOCKOPT_PEELOFF), changing asoc->base.sk. After
re-acquiring the old socket lock, sctp_wait_for_connect() returns
success without noticing the migration — the caller then accesses
the association under the wrong lock in sctp_datamsg_from_user().
Add the same sk != asoc->base.sk check that sctp_wait_for_sndbuf()
already has, returning an error if the association was migrated while
we slept.
Fixes: 668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave") Signed-off-by: Zhenghang Xiao <kipreyyy@gmail.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20260527032411.60959-1-kipreyyy@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: mana: Fix NULL dereferences during teardown after attach failure
When mana_attach() fails (e.g. during queue allocation), the error
cleanup frees apc->tx_qp and apc->rxqs and sets them to NULL. Multiple
subsequent teardown paths can then dereference these NULL pointers,
causing kernel panics.
Patch 1 adds NULL guards in the low-level teardown functions
(mana_fence_rqs, mana_destroy_vport, mana_dealloc_queues) so they are
safe to call regardless of queue initialization state. This covers all
callers: mana_remove(), mana_change_mtu() recovery, and internal error
paths in mana_alloc_queues().
Patch 2 adds an early exit in mana_detach() for already-detached ports,
making it safe for non-close callers. This allows the queue reset
handler to safely retry mana_attach() without redundant teardown.
====================
Dipayaan Roy [Mon, 25 May 2026 08:08:25 +0000 (01:08 -0700)]
net: mana: Skip redundant detach on already-detached port
When mana_per_port_queue_reset_work_handler() runs after a previous
detach succeeded but attach failed, the port is left in a detached
state with apc->tx_qp and apc->rxqs already freed. Calling
mana_detach() again unconditionally leads to NULL pointer dereferences
during queue teardown.
Add an early exit in mana_detach() when the port is already in
detached state (!netif_device_present) for non-close callers, making
it safe to call idempotently. This allows the queue reset handler and
other recovery paths to simply retry mana_attach() without redundant
teardown.
Fixes: 3b194343c250 ("net: mana: Implement ndo_tx_timeout and serialize queue resets per port.") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260525081129.1230035-3-dipayanroy@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dipayaan Roy [Mon, 25 May 2026 08:08:24 +0000 (01:08 -0700)]
net: mana: Add NULL guards in teardown path to prevent panic on attach failure
When queue allocation fails partway through, the error cleanup frees
and NULLs apc->tx_qp and apc->rxqs. Multiple teardown paths such as
mana_remove(), mana_change_mtu() recovery, and internal error handling
in mana_alloc_queues() can subsequently call into functions that
dereference these pointers without NULL checks:
- mana_chn_setxdp() dereferences apc->rxqs[0], causing a NULL pointer
dereference panic (CR2: 0000000000000000 at mana_chn_setxdp+0x26).
- mana_destroy_vport() iterates apc->rxqs without a NULL check.
- mana_fence_rqs() iterates apc->rxqs without a NULL check.
- mana_dealloc_queues() iterates apc->tx_qp without a NULL check.
Add NULL guards for apc->rxqs in mana_fence_rqs(),
mana_destroy_vport(), and before the mana_chn_setxdp() call. Add a
NULL guard for apc->tx_qp in mana_dealloc_queues() to skip TX queue
draining when TX queues were never allocated or already freed.
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260525081129.1230035-2-dipayanroy@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Danilo Krummrich [Tue, 26 May 2026 00:04:41 +0000 (02:04 +0200)]
rust: devres: add 'static bound to Devres<T>
Devres::new() registers a callback with the C devres subsystem via
devres_node_add(). If the Devres is leaked (e.g. via
core::mem::forget(), which is safe), its Drop impl never runs, and the
devres release callback will revoke the inner Revocable on device
unbind, which drops T in place. If T contains non-'static references,
those may be dangling by that point.
Add a 'static bound to prevent storing types with borrowed data in
Devres.
When the I2C controller attempts a new transaction while the target
controller is shutting down or restarting, it can lead to bus lockups
and system bootloops if the hardware enters an inconsistent state.
Address this by ensuring that the internal state machines are properly
cleared when disabling the controller if target activity is detected.
If the controller remains active after disabling, perform a bus recovery
to reset it to a known good state.
Signed-off-by: William A. Kennington III <william@wkennington.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20260527-dw-i2c-v5-4-3483057f8d67@wkennington.com
i2c: designware: Convert platform driver to use shutdown hook
Convert the platform driver to use the new i2c_dw_shutdown() hook,
allowing the controller to gracefully NACK controllers requests during
system shutdown.
i2c: designware: Introduce shutdown exported function
Introduce an exported shutdown function to safely shutdown the
DesignWare I2C controller.
This shutdown hook gracefully sets the target disable bit before disabling
the controller. This guarantees that any incoming requests from the
controller are immediately NACKed during shutdown, preventing the bus from
hanging.
Signed-off-by: William A. Kennington III <william@wkennington.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20260527-dw-i2c-v5-1-3483057f8d67@wkennington.com
Danilo Krummrich [Thu, 28 May 2026 22:38:54 +0000 (00:38 +0200)]
Merge patch series "rust: device: Higher-Ranked Lifetime Types for device drivers"
Danilo Krummrich <dakr@kernel.org> says:
Currently, Rust device drivers access device resources such as PCI BAR mappings
and I/O memory regions through Devres<T>.
Devres::access() provides zero-overhead access by taking a &Device<Bound>
reference as proof that the device is still bound. Since a &Device<Bound> is
available in almost all contexts by design, Devres is mostly a type-system level
proof that the resource is valid, but it can also be used from scopes without
this guarantee through its try_access() accessor.
This works well in general, but has a few limitations:
- Every access to a device resource goes through Devres::access(), which
despite zero cost, adds boilerplate to every access site.
- Destructors do not receive a &Device<Bound>, so they must use try_access(),
which can fail. In practice the access succeeds if teardown ordering is
correct, but the type system can't express this, forcing drivers to handle a
failure path that should never be taken.
- Sharing a resource across components (e.g. passing a BAR to a sub-component)
requires Arc<Devres<T>>.
- Device references must be stored as ARef<Device> rather than plain &Device
borrows.
These limitations stem from the driver's bus device private data being 'static
-- the driver struct cannot borrow from the device reference it receives in
probe(), even though it structurally cannot outlive the device binding.
This series introduces Higher-Ranked Lifetime Types (HRT) for Rust device
drivers. An HRT is a type that is generic over a lifetime -- it does not have a
fixed lifetime, but can be instantiated with any lifetime chosen by the caller.
Bus driver traits use a Generic Associated Type (GAT) type Data<'bound> to
introduce the lifetime on the private data, rather than parameterizing the
Driver trait itself. This avoids a driver trait global lifetime and avoids the
need for ForLt for bus device private data, making the bus implementations much
simpler. ForLt is only needed for auxiliary registration data, where the
lifetime is not introduced by a trait callback but must be threaded through
Registration.
With HRT, driver structs carry a lifetime parameter tied to the device binding
scope -- the interval of a bus device being bound to a driver. Device resources
like pci::Bar<'bound> and IoMem<'bound> are handed out with this lifetime, so
the compiler enforces at build time that they do not escape the binding scope.
Before:
struct MyDriver {
pdev: ARef<pci::Device>,
bar: Devres<pci::Bar<BAR_SIZE>>,
}
let io = self.bar.access(dev)?;
io.read32(OFFSET);
After:
struct MyDriver<'bound> {
pdev: &'bound pci::Device,
bar: pci::Bar<'bound, BAR_SIZE>,
}
self.bar.read32(OFFSET);
Lifetime-parameterized device resources can be put into a Devres at any point
via Bar::into_devres() / IoMem::into_devres(), providing the exact same
semantics as before. This is useful for resources shared across subsystem
boundaries where revocation is needed.
This also synergizes with the upcoming self-referential initialization support
in pin-init, which allows one field of the driver struct to borrow another
during initialization without unsafe code.
The same pattern is applied to auxiliary device registration data as a first
example beyond bus device private data. Registration<F: ForLt> can hold
lifetime-parameterized data tied to the parent driver's binding scope. Since the
auxiliary bus guarantees that the parent remains bound while the auxiliary
device is registered, the registration data can safely borrow the parent's
device resources.
More generally, binding resource lifetimes to a registration scope applies to
every registration that is scoped to a driver binding -- auxiliary devices,
class devices, IRQ handlers, workqueues.
A follow-up series extends this to class device registrations, starting with
DRM, so that class device callbacks (IOCTLs, etc.) can safely access device
resources through the separate registration data bound to the registration's
lifetime without Devres indirection.
Thanks to Gary for coming up with the ForLt implementation; thanks to Alice for
the early discussions around lifetime-parameterized private data that helped
shape the direction of this work.