Philip Yang [Tue, 9 Dec 2025 20:13:23 +0000 (15:13 -0500)]
drm/amdkfd: Unreserve bo if queue update failed
Error handling path should unreserve bo then return failed.
Fixes: 305cd109b761 ("drm/amdkfd: Validate user queue update") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c24afed7de9ecce341825d8ab55a43a254348b33)
Ivan Lipski [Thu, 26 Feb 2026 02:48:36 +0000 (21:48 -0500)]
drm/amd/display: Check for S0i3 to be done before DCCG init on DCN21
[WHY]
On DCN21, dccg2_init() is called in dcn10_init_hw() before
bios_golden_init(). During S0i3 resume, BIOS sets MICROSECOND_TIME_BASE_DIV
to 0x00120464 as a marker. dccg2_init() overwrites this to 0x00120264,
causing dcn21_s0i3_golden_init_wa() to misdetect the state and skip golden
init.
Eventually during the resume sequence, a flip timeout occurs.
[HOW]
Skip DCCG on dccg2_is_s0i3_golden_init_wa_done() on DCN21.
Fixes: 4c595e75110e ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c61eda434336cf2c033aa35efdc9a08b31d2fdfa)
Ivan Lipski [Tue, 24 Feb 2026 21:28:00 +0000 (16:28 -0500)]
drm/amd/display: Add missing DCCG register entries for DCN20-DCN316
Commit 4c595e75110e ("drm/amd/display: Migrate DCCG registers access
from hwseq to dccg component.") moved register writes from hwseq to
dccg2_*() functions but did not add the registers to the DCCG register
list macros. The struct fields default to 0, so REG_WRITE() targets
MMIO offset 0, causing a GPU hang on resume (seen on DCN21/DCN30
during IGT kms_cursor_crc@cursor-suspend).
Add
- MICROSECOND_TIME_BASE_DIV
- MILLISECOND_TIME_BASE_DIV
- DCCG_GATE_DISABLE_CNTL
- DCCG_GATE_DISABLE_CNTL2
- DC_MEM_GLOBAL_PWR_REQ_CNTL
to macros in dcn20_dccg.h, dcn301_dccg.h, dcn31_dccg.h, and dcn314_dccg.h.
Fixes: 4c595e75110e ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") Reported-by: Rafael Passos <rafael@rcpassos.me> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e6e2b956fc814de766d3480be7018297c41d3ce0)
drm/amd: Fix a few more NULL pointer dereference in device cleanup
I found a few more paths that cleanup fails due to a NULL version pointer
on unsupported hardware.
Add NULL checks as applicable.
Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f5a05f8414fc10f307eb965f303580c7778f8dd2) Cc: stable@vger.kernel.org
Yang Wang [Wed, 4 Mar 2026 23:45:45 +0000 (18:45 -0500)]
drm/amdgpu: fix gpu idle power consumption issue for gfx v12
Older versions of the MES firmware may cause abnormal GPU power consumption.
When performing inference tasks on the GPU (e.g., with Ollama using ROCm),
the GPU may show abnormal power consumption in idle state and incorrect GPU load information.
This issue has been fixed in firmware version 0x8b and newer.
Closes: https://github.com/ROCm/ROCm/issues/5706 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4e22a5fe6ea6e0b057e7f246df4ac3ff8bfbc46a)
drm/amd: Fix NULL pointer dereference in device cleanup
When GPU initialization fails due to an unsupported HW block
IP blocks may have a NULL version pointer. During cleanup in
amdgpu_device_fini_hw, the code calls amdgpu_device_set_pg_state and
amdgpu_device_set_cg_state which iterate over all IP blocks and access
adev->ip_blocks[i].version without NULL checks, leading to a kernel
NULL pointer dereference.
Add NULL checks for adev->ip_blocks[i].version in both
amdgpu_device_set_cg_state and amdgpu_device_set_pg_state to prevent
dereferencing NULL pointers during GPU teardown when initialization has
failed.
Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b7ac77468cda92eecae560b05f62f997a12fe2f2) Cc: stable@vger.kernel.org
Yang Wang [Wed, 4 Mar 2026 02:14:10 +0000 (21:14 -0500)]
drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14
add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14.0.2/14.0.3
Fixes: 9710b84e2a6a ("drm/amd/pm: add overdrive support on smu v14.0.2/3") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1b5cf07d80bb16d1593579ccdb23f08ea4262c14)
Yang Wang [Wed, 4 Mar 2026 02:10:11 +0000 (21:10 -0500)]
drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13
add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13.0.0/13.0.7
Fixes: cfffd980bf21 ("drm/amd/pm: add zero RPM OD setting support for SMU13") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 576a10797b607ee9e4068218daf367b481564120)
Dave Airlie [Fri, 6 Mar 2026 09:40:23 +0000 (19:40 +1000)]
Merge tag 'drm-misc-fixes-2026-03-06' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Another early drm-misc-fixes PR to revert the previous uapi fix sent in
drm-misc-fixes-2026-03-05, together with a UAF fix in TTM, an argument
order fix for panthor, a fix for the firmware getting stuck on
resource allocation error handling for amdxdna, and a few fixes for
ethosu (size calculation and reference underflows, and a validation
fix).
Dave Airlie [Fri, 6 Mar 2026 07:45:47 +0000 (17:45 +1000)]
Merge tag 'drm-misc-fixes-2026-03-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A return type fix for ttm, a display fix for solomon, several misc fixes
for amdxdna, a DSI clock rate fix for rz-du, a uapi fix for syncobj, a
possible build failure fix for dma-buf, a doc warning fix for sched, a
build failure fix for ttm tests, and a crash fix when suspended for
nouveau.
accel: ethosu: Handle possible underflow in IFM size calculations
If the command stream has larger padding sizes than the IFM and OFM
diminsions, then the calculations will underflow to a negative value.
The result is a very large region bounds which is caught on submit, but
it's better to catch it earlier.
Current mesa ethosu driver has a signedness bug which resulted in
padding of 127 (the max) and triggers this issue.
accel: ethosu: Fix NPU_OP_ELEMENTWISE validation with scalar
The NPU_OP_ELEMENTWISE instruction uses a scalar value for IFM2 if the
IFM2_BROADCAST "scalar" mode is set. It is a bit (7) on the u65 and
part of a field (bits 3:0) on the u85. The driver was hardcoded to the
u85.
If the job submit fails before adding the job to the scheduler queue
such as when the GEM buffer bounds checks fail, then doing a
ethosu_job_put() results in a pm_runtime_put_autosuspend() without the
corresponding pm_runtime_resume_and_get(). The dma_fence_put()'s are
also unnecessary, but seem to be harmless.
Split the ethosu_job_cleanup() function into 2 parts for the before
and after the job is queued.
Lizhi Hou [Thu, 5 Mar 2026 06:20:41 +0000 (22:20 -0800)]
accel/amdxdna: Split mailbox channel create function
The management channel used for firmware control command submission is
currently created after the firmware is started. If channel creation
fails (for example, due to memory allocation failure or workqueue
creation interruption), the firmware remains in a pending state and is
unable to receive any control commands.
To avoid leaving the firmware in this inconsistent state, split
xdna_mailbox_create_channel() into two separate functions so that
resource allocation can be completed before interacting with the
hardware.
xdna_mailbox_alloc_channel()
Allocates memory and initializes the workqueue. This can be called
earlier, before interacting with the hardware.
xdna_mailbox_start_channel()
Performs the hardware interaction required to start the channel.
Rename xdna_mailbox_destroy_channel() to xdna_mailbox_free_channel().
Ensure that xdna_mailbox_stop_channel() and xdna_mailbox_free_channel()
properly unwind the corresponding start and allocation steps, respectively.
Akash Goel [Thu, 5 Mar 2026 11:07:23 +0000 (11:07 +0000)]
drm/panthor: Correct the order of arguments passed to gem_sync
This commit corrects the order of arguments passed to panthor_gem_sync()
function, called when the SYNC_WAIT condition has to be evaluated for a
blocked GPU queue.
Fixes: cd2c9c3015e6 ("drm/panthor: Add flag to map GEM object Write-Back Cacheable") Signed-off-by: Akash Goel <akash.goel@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patch.msgid.link/20260305110723.2871733-1-akash.goel@arm.com Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
The problem occurs when userspace is compiled against new headers
with new members, but don't correctly initialise those new members.
This is not a kernel problem, and should be fixed in userspace by
correctly zero'ing all members.
Cc: Rob Clark <robdclark@chromium.org> Cc: Julian Orth <ju.orth@gmail.com> Cc: Christian König <christian.koenig@amd.com> Cc: Michel Dänzer <michel.daenzer@mailbox.org> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Julian Orth <ju.orth@gmail.com> Link: https://patch.msgid.link/20260305113734.1309238-1-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
When allocating a lot of buffers and putting the TTM under memory pressure,
during swapout, it might crash the system with the stack trace below.
It turns out that ttm_bo_swapout_cb might replace bo->resource when it
moves it to system cached.
When commit c06da4b3573a ("drm/ttm: Tidy usage of local variables a little
bit") used a local variable for bo->resource, it used the freed resource
later in the function, leading to a UAF.
Move back to using bo->resource in all cases in that function instead of a
local variable.
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Fixes: c06da4b3573a ("drm/ttm: Tidy usage of local variables a little bit") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20260304-ttm_bo_res_uaf-v1-1-43f20125b67f@igalia.com
This is a simple fix to get backported. We should probably engineer a
proper power domain solution to wake up devices and keep them awake
while fw updates are happening.
Cc: stable@vger.kernel.org Fixes: 8894f4919bc4 ("drm/nouveau: register a drm_dp_aux channel for each dp connector") Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20260224031750.791621-1-airlied@gmail.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Sunil Khatri [Mon, 2 Mar 2026 13:20:46 +0000 (18:50 +0530)]
drm/amdgpu/userq: refcount userqueues to avoid any race conditions
To avoid race condition and avoid UAF cases, implement kref
based queues and protect the below operations using xa lock
a. Getting a queue from xarray
b. Increment/Decrement it's refcount
Every time some one want to access a queue, always get via
amdgpu_userq_get to make sure we have locks in place and get
the object if active.
A userqueue is destroyed on the last refcount is dropped which
typically would be via IOCTL or during fini.
v2: Add the missing drop in one the condition in the signal ioclt [Alex]
v3: remove the queue from the xarray first in the free queue ioctl path
[Christian]
- Pass queue to the amdgpu_userq_put directly.
- make amdgpu_userq_put xa_lock free since we are doing put for each get
only and final put is done via destroy and we remove the queue from xa
with lock.
- use userq_put in fini too so cleanup is done fully.
v4: Use xa_erase directly rather than doing load and erase in free
ioctl. Also remove some of the error logs which could be exploited
by the user to flood the logs [Christian]
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4952189b284d4d847f92636bb42dd747747129c0) Cc: <stable@vger.kernel.org> # 048c1c4e5171: drm/amdgpu/userq: Consolidate wait ioctl exit path Cc: <stable@vger.kernel.org>
If we gate the fence destruction with a check telling us whether there are
valid pointers in there we can eliminate the need for dual, basically
identical, exit paths.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bea29bb0dd29012949cd44fdb122465a9fd5cf91)
sguttula [Wed, 25 Feb 2026 08:27:01 +0000 (13:57 +0530)]
drm/amdgpu/psp: Use Indirect access address for GFX to PSP mailbox
The reason the RAP is not granting access to 0x58200 is that
a dedicated RSMU slot would have to be spent for this address range,
and MPASP is close to running out of RSMU slots.
This will help to fix PSP TOC load failure during secureboot.
GFX Driver Need to use indirect access for SMN address regs.
Signed-off-by: sguttula <suresh.guttula@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9b822e26eea3899003aa8a89d5e2c4408e066e20)
Alysa Liu [Thu, 5 Feb 2026 16:21:45 +0000 (11:21 -0500)]
drm/amdgpu: Fix use-after-free race in VM acquire
Replace non-atomic vm->process_info assignment with cmpxchg()
to prevent race when parent/child processes sharing a drm_file
both try to acquire the same VM after fork().
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alysa Liu <Alysa.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c7c573275ec20db05be769288a3e3bb2250ec618) Cc: stable@vger.kernel.org
Yang Wang [Thu, 26 Feb 2026 03:51:06 +0000 (22:51 -0500)]
drm/amd/pm: remove invalid gpu_metrics.energy_accumulator on smu v13.0.x
v1:
The metrics->EnergyAccumulator field has been deprecated on newer pmfw.
v2:
add smu 13.0.0/13.0.7/13.0.10 support.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8de9edb35976fa56565dc8fbb5d1310e8e10187c) Cc: stable@vger.kernel.org
Varun Gupta [Mon, 23 Feb 2026 17:51:45 +0000 (23:21 +0530)]
drm/xe: Fix memory leak in xe_vm_madvise_ioctl
When check_bo_args_are_sane() validation fails, jump to the new
free_vmas cleanup label to properly free the allocated resources.
This ensures proper cleanup in this error path.
Fixes: 293032eec4ba ("drm/xe/bo: Update atomic_access attribute on madvise") Cc: stable@vger.kernel.org # v6.18+ Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Varun Gupta <varun.gupta@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260223175145.1532801-1-varun.gupta@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
(cherry picked from commit 29bd06faf727a4b76663e4be0f7d770e2d2a7965) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matt Roper [Fri, 27 Feb 2026 16:43:41 +0000 (08:43 -0800)]
drm/xe/xe2_hpg: Correct implementation of Wa_16025250150
Wa_16025250150 asks us to set five register fields of the register to
0x1 each. However we were just OR'ing this into the existing register
value (which has a default of 0x4 for each nibble-sized field) resulting
in final field values of 0x5 instead of the desired 0x1. Correct the
RTP programming (use FIELD_SET instead of SET) to ensure each field is
assigned to exactly the value we want.
Zhanjun Dong [Fri, 20 Feb 2026 22:53:08 +0000 (17:53 -0500)]
drm/xe/gsc: Fix GSC proxy cleanup on early initialization failure
xe_gsc_proxy_remove undoes what is done in both xe_gsc_proxy_init and
xe_gsc_proxy_start; however, if we fail between those 2 calls, it is
possible that the HW forcewake access hasn't been initialized yet and so
we hit errors when the cleanup code tries to write GSC register. To
avoid that, split the cleanup in 2 functions so that the HW cleanup is
only called if the HW setup was completed successfully.
Since the HW cleanup (interrupt disabling) is now removed from
xe_gsc_proxy_remove, the cleanup on error paths in xe_gsc_proxy_start
must be updated to disable interrupts before returning.
With commit a69d1ab971a6 ("mm: Fix a hmm_range_fault() livelock / starvation problem")
device-to-device migration is not functional again and the
disabling can be reverted.
Add the above commit as a Fixes: tag in order for the revert to not
take place unless that commit is present.
Jouni Högander [Wed, 25 Feb 2026 07:42:21 +0000 (09:42 +0200)]
drm/i915/psr: Fix for Panel Replay X granularity DPCD register handling
DP specification is saying value 0xff 0xff in PANEL REPLAY SELECTIVE UPDATE
X GRANULARITY CAPABILITY registers (0xb2 and 0xb3) means full-line
granularity. Take this into account when handling Panel Replay X
granularity informed by the panel.
Fixes: 1cc854647450 ("drm/i915/psr: Use SU granularity information available in intel_connector") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7284 Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca> Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Link: https://patch.msgid.link/20260225074221.1744330-2-jouni.hogander@intel.com
(cherry picked from commit f5c8f824a495e849492f09a43bd965a8f4d86cb2) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Jouni Högander [Wed, 25 Feb 2026 07:42:20 +0000 (09:42 +0200)]
drm/dp: Add definition for Panel Replay full-line granularity
DP specification is saying value 0xff 0xff in PANEL REPLAY SELECTIVE UPDATE
X GRANULARITY CAPABILITY registers (0xb2 and 0xb3) means full-line
granularity. Add definition for this.
include/uapi/linux/dma-buf.h uses several macros from ioctl.h to define
its ioctl commands. However, it does not include ioctl.h itself. So,
if userspace source code tries to include the dma-buf.h file without
including ioctl.h, it can result in build failures.
Therefore, include ioctl.h in the dma-buf UAPI header.
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Reviewed-by: T.J. Mercier <tjmercier@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20260303002309.1401849-1-isaacmanjarres@google.com
Dillon Varone [Wed, 18 Feb 2026 19:34:28 +0000 (14:34 -0500)]
drm/amd/display: Fallback to boot snapshot for dispclk
[WHY & HOW]
If the dentist is unavailable, fallback to reading CLKIP via the boot
snapshot to get the current dispclk.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2ab77600d1e55a042c02437326d3c7563e853c6c) Cc: stable@vger.kernel.org
Alex Hung [Fri, 27 Feb 2026 19:30:38 +0000 (12:30 -0700)]
drm/amd/display: Enable DEGAMMA and reject COLOR_PIPELINE+DEGAMMA_LUT
[WHAT]
Create DEGAMMA properties even if color pipeline is enabled, and enforce
the mutual exclusion in atomic check by rejecting any commit that
attempts to enable both COLOR_PIPELINE on the plane and DEGAMMA_LUT on
the CRTC simultaneously.
Fixes: 18a4127e9315 ("drm/amd/display: Disable CRTC degamma when color pipeline is enabled") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4963 Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 196a6aa727f1f15eb54dda5e60a41543ea9397ee)
Lizhi Hou [Thu, 26 Feb 2026 21:38:57 +0000 (13:38 -0800)]
accel/amdxdna: Fix NULL pointer dereference of mgmt_chann
mgmt_chann may be set to NULL if the firmware returns an unexpected
error in aie2_send_mgmt_msg_wait(). This can later lead to a NULL
pointer dereference in aie2_hw_stop().
Fix this by introducing a dedicated helper to destroy mgmt_chann
and by adding proper NULL checks before accessing it.
Thomas Hellström [Tue, 10 Feb 2026 11:56:53 +0000 (12:56 +0100)]
mm: Fix a hmm_range_fault() livelock / starvation problem
If hmm_range_fault() fails a folio_trylock() in do_swap_page,
trying to acquire the lock of a device-private folio for migration,
to ram, the function will spin until it succeeds grabbing the lock.
However, if the process holding the lock is depending on a work
item to be completed, which is scheduled on the same CPU as the
spinning hmm_range_fault(), that work item might be starved and
we end up in a livelock / starvation situation which is never
resolved.
This can happen, for example if the process holding the
device-private folio lock is stuck in
migrate_device_unmap()->lru_add_drain_all()
sinc lru_add_drain_all() requires a short work-item
to be run on all online cpus to complete.
A prerequisite for this to happen is:
a) Both zone device and system memory folios are considered in
migrate_device_unmap(), so that there is a reason to call
lru_add_drain_all() for a system memory folio while a
folio lock is held on a zone device folio.
b) The zone device folio has an initial mapcount > 1 which causes
at least one migration PTE entry insertion to be deferred to
try_to_migrate(), which can happen after the call to
lru_add_drain_all().
c) No or voluntary only preemption.
This all seems pretty unlikely to happen, but indeed is hit by
the "xe_exec_system_allocator" igt test.
Resolve this by waiting for the folio to be unlocked if the
folio_trylock() fails in do_swap_page().
Rename migration_entry_wait_on_locked() to
softleaf_entry_wait_unlock() and update its documentation to
indicate the new use-case.
Future code improvements might consider moving
the lru_add_drain_all() call in migrate_device_unmap() to be
called *after* all pages have migration entries inserted.
That would eliminate also b) above.
v2:
- Instead of a cond_resched() in hmm_range_fault(),
eliminate the problem by waiting for the folio to be unlocked
in do_swap_page() (Alistair Popple, Andrew Morton)
v3:
- Add a stub migration_entry_wait_on_locked() for the
!CONFIG_MIGRATION case. (Kernel Test Robot)
v4:
- Rename migrate_entry_wait_on_locked() to
softleaf_entry_wait_on_locked() and update docs (Alistair Popple)
v5:
- Add a WARN_ON_ONCE() for the !CONFIG_MIGRATION
version of softleaf_entry_wait_on_locked().
- Modify wording around function names in the commit message
(Andrew Morton)
Suggested-by: Alistair Popple <apopple@nvidia.com> Fixes: 1afaeb8293c9 ("mm/migrate: Trylock device page in do_swap_page") Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: linux-mm@kvack.org Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.15+ Reviewed-by: John Hubbard <jhubbard@nvidia.com> #v3 Reviewed-by: Alistair Popple <apopple@nvidia.com> Link: https://patch.msgid.link/20260210115653.92413-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit a69d1ab971a624c6f112cea61536569d579c3215) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Tomasz Lis [Thu, 26 Feb 2026 21:26:58 +0000 (22:26 +0100)]
drm/xe/queue: Call fini on exec queue creation fail
Every call to queue init should have a corresponding fini call.
Skipping this would mean skipping removal of the queue from GuC list
(which is part of guc_id allocation). A damaged queue stored in
exec_queue_lookup list would lead to invalid memory reference,
sooner or later.
Call fini to free guc_id. This must be done before any internal
LRCs are freed.
Since the finalization with this extra call became very similar to
__xe_exec_queue_fini(), reuse that. To make this reuse possible,
alter xe_lrc_put() so it can survive NULL parameters, like other
similar functions.
v2: Reuse _xe_exec_queue_fini(). Make xe_lrc_put() aware of NULLs.
Matthew Brost [Thu, 15 Jan 2026 00:45:46 +0000 (16:45 -0800)]
drm/xe: Do not preempt fence signaling CS instructions
If a batch buffer is complete, it makes little sense to preempt the
fence signaling instructions in the ring, as the largest portion of the
work (the batch buffer) is already done and fence signaling consists of
only a few instructions. If these instructions are preempted, the GuC
would need to perform a context switch just to signal the fence, which
is costly and delays fence signaling. Avoid this scenario by disabling
preemption immediately after the BB start instruction and re-enabling it
after executing the fence signaling instructions.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Carlos Santa <carlos.santa@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260115004546.58060-1-matthew.brost@intel.com
(cherry picked from commit 2bcbf2dcde0c839a73af664a3c77d4e77d58a3eb) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
int main(void) {
int fd = open("/dev/dri/renderD128", O_RDWR);
struct drm_syncobj_create arg1;
ioctl(fd, DRM_IOCTL_SYNCOBJ_CREATE, &arg1);
struct drm_syncobj_handle arg2;
memset(&arg2, 1, sizeof(arg2)); // simulate dirty stack
arg2.handle = arg1.handle;
arg2.flags = 0;
arg2.fd = 0;
arg2.pad = 0;
// arg2.point = 0; // userspace is required to set point to 0
ioctl(fd, DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD, &arg2);
}
The last ioctl returns EINVAL because args->point is not 0. However,
userspace developed against older kernel versions is not aware of the
new point field and might therefore not initialize it.
The correct check would be
if (args->flags & DRM_SYNCOBJ_FD_TO_HANDLE_FLAGS_TIMELINE)
return -EINVAL;
However, there might already be userspace that relies on this not
returning an error as long as point == 0. Therefore use the more lenient
check.
Fixes: c2d3a7300695 ("drm/syncobj: Extend EXPORT_SYNC_FILE for timeline syncobjs") Signed-off-by: Julian Orth <ju.orth@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20260301-point-v1-1-21fc5fd98614@gmail.com
Linus Torvalds [Sun, 1 Mar 2026 23:34:47 +0000 (15:34 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Arm:
- Make sure we don't leak any S1POE state from guest to guest when
the feature is supported on the HW, but not enabled on the host
- Propagate the ID registers from the host into non-protected VMs
managed by pKVM, ensuring that the guest sees the intended feature
set
- Drop double kern_hyp_va() from unpin_host_sve_state(), which could
bite us if we were to change kern_hyp_va() to not being idempotent
- Don't leak stage-2 mappings in protected mode
- Correctly align the faulting address when dealing with single page
stage-2 mappings for PAGE_SIZE > 4kB
- Fix detection of virtualisation-capable GICv5 IRS, due to the
maintainer being obviously fat fingered... [his words, not mine]
- Remove duplication of code retrieving the ASID for the purpose of
S1 PT handling
- Fix slightly abusive const-ification in vgic_set_kvm_info()
Generic:
- Remove internal Kconfigs that are now set on all architectures
- Remove per-architecture code to enable KVM_CAP_SYNC_MMU, all
architectures finally enable it in Linux 7.0"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: always define KVM_CAP_SYNC_MMU
KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER
KVM: arm64: Deduplicate ASID retrieval code
irqchip/gic-v5: Fix inversion of IRS_IDR0.virt flag
KVM: arm64: Revert accidental drop of kvm_uninit_stage2_mmu() for non-NV VMs
KVM: arm64: Fix protected mode handling of pages larger than 4kB
KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type
KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state()
KVM: arm64: Fix ID register initialization for non-protected pKVM guests
KVM: arm64: Optimise away S1POE handling when not supported by host
KVM: arm64: Hide S1POE from guests when not supported by the host
Linus Torvalds [Sun, 1 Mar 2026 21:32:32 +0000 (13:32 -0800)]
Merge tag 'core-debugobjects-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects fix from Thomas Gleixner:
"A single fix for debugobjects.
The deferred page initialization prevents debug objects from
allocating slab pages until the initialization is complete. That
causes depletion of the pool and disabling of debugobjects.
The reason is that debugobjects uses __GFP_HIGH for allocations as it
might be invoked from arbitrary contexts. When PREEMPT_COUNT is
disabled there is no way to know whether the context is safe to set
__GFP_KSWAPD_RECLAIM.
This worked until v6.18. Since then allocations w/o a reclaim flag
cause new_slab() to end up in alloc_frozen_pages_nolock_noprof(),
which returns early when deferred page initialization has not yet
completed.
Work around that when PREEMPT_COUNT is enabled as the preempt counter
allows debugobjects to add __GFP_KSWAPD_RECLAIM to the GFP flags when
the context is preemtible. When PREEMPT_COUNT is disabled the context
is unknown and the reclaim bit can't be set because the caller might
hold locks which might deadlock in the allocator.
That makes debugobjects depend on PREEMPT_COUNT ||
!DEFERRED_STRUCT_PAGE_INIT, which limits the coverage slightly, but
keeps it functional for most cases"
* tag 'core-debugobjects-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobject: Make it work with deferred page initialization - again
Linus Torvalds [Sun, 1 Mar 2026 21:16:35 +0000 (13:16 -0800)]
Merge tag 'x86-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix speculative safety in fred_extint()
- Fix __WARN_printf() trap in early_fixup_exception()
- Fix clang-build boot bug for unusual alignments, triggered by
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y
- Replace the final few __ASSEMBLY__ stragglers that snuck in lately
into non-UAPI x86 headers and use __ASSEMBLER__ consistently (again)
* tag 'x86-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/headers: Replace __ASSEMBLY__ stragglers with __ASSEMBLER__
x86/cfi: Fix CFI rewrite for odd alignments
x86/bug: Handle __WARN_printf() trap in early_fixup_exception()
x86/fred: Correct speculative safety in fred_extint()
Linus Torvalds [Sun, 1 Mar 2026 20:15:58 +0000 (12:15 -0800)]
Merge tag 'timers-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Improve the inlining of jiffies_to_msecs() and jiffies_to_usecs(), for
the common HZ=100, 250 or 1000 cases. Only use a function call for odd
HZ values like HZ=300 that generate more code.
The function call overhead showed up in performance tests of the TCP
code"
* tag 'timers-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time/jiffies: Inline jiffies_to_msecs() and jiffies_to_usecs()
Linus Torvalds [Sun, 1 Mar 2026 19:09:24 +0000 (11:09 -0800)]
Merge tag 'sched-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix zero_vruntime tracking when there's a single task running
- Fix slice protection logic
- Fix the ->vprot logic for reniced tasks
- Fix lag clamping in mixed slice workloads
- Fix objtool uaccess warning (and bug) in the
!CONFIG_RSEQ_SLICE_EXTENSION case caused by unexpected un-inlining,
which triggers with older compilers
- Fix a comment in the rseq registration rseq_size bound check code
- Fix a legacy RSEQ ABI quirk that handled 32-byte area sizes
differently, which special size we now reached naturally and want to
avoid. The visible ugliness of the new reserved field will be avoided
the next time the RSEQ area is extended.
* tag 'sched-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: slice ext: Ensure rseq feature size differs from original rseq size
rseq: Clarify rseq registration rseq_size bound check comment
sched/core: Fix wakeup_preempt's next_class tracking
rseq: Mark rseq_arm_slice_extension_timer() __always_inline
sched/fair: Fix lag clamp
sched/eevdf: Update se->vprot in reweight_entity()
sched/fair: Only set slice protection at pick time
sched/fair: Fix zero_vruntime tracking
Linus Torvalds [Sun, 1 Mar 2026 19:07:20 +0000 (11:07 -0800)]
Merge tag 'perf-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events fixes from Ingo Molnar:
- Fix lock ordering bug found by lockdep in perf_event_wakeup()
- Fix uncore counter enumeration on Granite Rapids and Sierra Forest
- Fix perf_mmap() refcount bug found by Syzkaller
- Fix __perf_event_overflow() vs perf_remove_from_context() race
* tag 'perf-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix __perf_event_overflow() vs perf_remove_from_context() race
perf/core: Fix refcount bug and potential UAF in perf_mmap
perf/x86/intel/uncore: Add per-scheduler IMC CAS count events
perf/core: Fix invalid wait context in ctx_sched_in()
Linus Torvalds [Sun, 1 Mar 2026 19:00:43 +0000 (11:00 -0800)]
Merge tag 'locking-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
"Now that LLVM 22 has been released officially, require a release
version to use the new CONFIG_WARN_CONTEXT_ANALYSIS feature.
In particular this avoids the widely used Android clang 22.0.1
pre-release build which is known to be broken for this usecase"
* tag 'locking-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/Kconfig.debug: Require a release version of LLVM 22 for context analysis
Linus Torvalds [Sun, 1 Mar 2026 18:58:16 +0000 (10:58 -0800)]
Merge tag 'irq-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irqchip driver fixes from Ingo Molnar:
- Fix frozen interrupt bug in the sifive-plic driver
- Limit per-device MSI interrupts on uncommon gic-v3-its hardware
variants
- Address Sparse warning by constifying a variable in the MMP driver
- Revert broken commit and also fix an error check in the ls-extirq
driver
* tag 'irq-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/ls-extirq: Fix devm_of_iomap() error check
Revert "irqchip/ls-extirq: Use for_each_of_imap_item iterator"
irqchip/mmp: Make icu_irq_chip variable static const
irqchip/gic-v3-its: Limit number of per-device MSIs to the range the ITS supports
irqchip/sifive-plic: Fix frozen interrupt due to affinity setting
Linus Torvalds [Sun, 1 Mar 2026 17:59:29 +0000 (09:59 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"All changes in drivers (well technically SES is enclosure services,
but its change is minor). The biggest is the write combining change in
lpfc followed by the additional NULL checks in mpi3mr"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix shift out of bounds when MAXQ=32
scsi: ufs: core: Move link recovery for hibern8 exit failure to wl_resume
scsi: ufs: core: Fix possible NULL pointer dereference in ufshcd_add_command_trace()
scsi: snic: MAINTAINERS: Update snic maintainers
scsi: snic: Remove unused linkstatus
scsi: pm8001: Fix use-after-free in pm8001_queue_command()
scsi: mpi3mr: Add NULL checks when resetting request and reply queues
scsi: ufs: core: Reset urgent_bkops_lvl to allow runtime PM power mode
scsi: ses: Fix devices attaching to different hosts
scsi: ufs: core: Fix RPMB region size detection for UFS 2.2
scsi: storvsc: Fix scheduling while atomic on PREEMPT_RT
scsi: lpfc: Properly set WC for DPP mapping
Linus Torvalds [Sun, 1 Mar 2026 03:54:28 +0000 (19:54 -0800)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix alignment of arm64 JIT buffer to prevent atomic tearing (Fuad
Tabba)
- Fix invariant violation for single value tnums in the verifier
(Harishankar Vishwanathan, Paul Chaignon)
- Fix a bunch of issues found by ASAN in selftests/bpf (Ihor Solodrai)
- Fix race in devmpa and cpumap on PREEMPT_RT (Jiayuan Chen)
- Fix show_fdinfo of kprobe_multi when cookies are not present (Jiri
Olsa)
- Fix race in freeing special fields in BPF maps to prevent memory
leaks (Kumar Kartikeya Dwivedi)
- Fix OOB read in dmabuf_collector (T.J. Mercier)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (36 commits)
selftests/bpf: Avoid simplification of crafted bounds test
selftests/bpf: Test refinement of single-value tnum
bpf: Improve bounds when tnum has a single possible value
bpf: Introduce tnum_step to step through tnum's members
bpf: Fix race in devmap on PREEMPT_RT
bpf: Fix race in cpumap on PREEMPT_RT
selftests/bpf: Add tests for special fields races
bpf: Retire rcu_trace_implies_rcu_gp() from local storage
bpf: Delay freeing fields in local storage
bpf: Lose const-ness of map in map_check_btf()
bpf: Register dtor for freeing special fields
selftests/bpf: Fix OOB read in dmabuf_collector
selftests/bpf: Fix a memory leak in xdp_flowtable test
bpf: Fix stack-out-of-bounds write in devmap
bpf: Fix kprobe_multi cookies access in show_fdinfo callback
bpf, arm64: Force 8-byte alignment for JIT buffer to prevent atomic tearing
selftests/bpf: Don't override SIGSEGV handler with ASAN
selftests/bpf: Check BPFTOOL env var in detect_bpftool_path()
selftests/bpf: Fix out-of-bounds array access bugs reported by ASAN
selftests/bpf: Fix array bounds warning in jit_disasm_helpers
...
Linus Torvalds [Sun, 1 Mar 2026 03:35:30 +0000 (19:35 -0800)]
Merge tag 'driver-core-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Do not register imx_clk_scu_driver in imx8qxp_clk_probe(); besides
fixing two other issues, this avoids a deadlock in combination with
commit dc23806a7c47 ("driver core: enforce device_lock for
driver_match_device()")
- Move secondary node lookup from device_get_next_child_node() to
fwnode_get_next_child_node(); this avoids issues when users switch
from the device API to the fwnode API
- Export io_define_{read,write}!() to avoid unused import warnings when
CONFIG_PCI=n
* tag 'driver-core-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
clk: scu/imx8qxp: do not register driver in probe()
rust: io: macro_export io_define_read!() and io_define_write!()
device property: Allow secondary lookup in fwnode_get_next_child_node()
Linus Torvalds [Sat, 28 Feb 2026 18:45:56 +0000 (10:45 -0800)]
Merge tag 'v7.0rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Two multichannel fixes
- Locking fix for superblock flags
- Fix to remove debug message that could log password
- Cleanup fix for setting credentials
* tag 'v7.0rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Use snprintf in cifs_set_cifscreds
smb: client: Don't log plaintext credentials in cifs_set_cifscreds
smb: client: fix broken multichannel with krb5+signing
smb: client: use atomic_t for mnt_cifs_flags
smb: client: fix cifs_pick_channel when channels are equally loaded
Takashi Sakamoto [Sat, 28 Feb 2026 02:56:03 +0000 (11:56 +0900)]
firewire: ohci: initialize page array to use alloc_pages_bulk() correctly
The call of alloc_pages_bulk() skips to fill entries of page array when
the entries already have values. While, 1394 OHCI PCI driver passes the
page array without initializing. It could cause invalid state at PFN
validation in vmap().
Fixes: f2ae92780ab9 ("firewire: ohci: split page allocation from dma mapping") Reported-by: John Ogness <john.ogness@linutronix.de> Reported-and-tested-by: Harald Arnesen <linux@skogtun.org> Reported-and-tested-by: David Gow <david@davidgow.net> Closes: https://lore.kernel.org/lkml/87tsv1vig5.fsf@jogness.linutronix.de/ Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 28 Feb 2026 17:21:18 +0000 (09:21 -0800)]
Merge tag 'spi-fix-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"One fix for the stm32 driver which got broken for DMA chaining cases,
plus a removal of some straggling bindings for the Bikal SoC which has
been pulled out of the kernel"
* tag 'spi-fix-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: stm32: fix missing pointer assignment in case of dma chaining
spi: dt-bindings: snps,dw-abp-ssi: Remove unused bindings
Linus Torvalds [Sat, 28 Feb 2026 17:18:02 +0000 (09:18 -0800)]
Merge tag 'regulator-fix-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A small pile of fixes, none of which are super major - the code fixes
are improved error handling and fixing a leak of a device node.
We also have a typo fix and an improvement to make the binding example
for mt6359 more directly usable"
* tag 'regulator-fix-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Kconfig: fix a typo
regulator: bq257xx: Fix device node reference leak in bq257xx_reg_dt_parse_gpio()
regulator: fp9931: Fix PM runtime reference leak in fp9931_hwmon_read()
regulator: tps65185: check devm_kzalloc() result in probe
regulator: dt-bindings: mt6359: make regulator names unique
Linus Torvalds [Sat, 28 Feb 2026 17:01:33 +0000 (09:01 -0800)]
Merge tag 's390-7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix guest pfault init to pass a physical address to DIAG 0x258,
restoring pfault interrupts and avoiding vCPU stalls during host
page-in
- Fix kexec/kdump hangs with stack protector by marking
s390_reset_system() __no_stack_protector; set_prefix(0) switches
lowcore and the canary no longer matches
- Fix idle/vtime cputime accounting (idle-exit ordering, vtimer
double-forwarding) and small cleanups
* tag 's390-7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pfault: Fix virtual vs physical address confusion
s390/kexec: Disable stack protector in s390_reset_system()
s390/idle: Remove psw_idle() prototype
s390/vtime: Use lockdep_assert_irqs_disabled() instead of BUG_ON()
s390/vtime: Use __this_cpu_read() / get rid of READ_ONCE()
s390/irq/idle: Remove psw bits early
s390/idle: Inline update_timer_idle()
s390/idle: Slightly optimize idle time accounting
s390/idle: Add comment for non obvious code
s390/vtime: Fix virtual timer forwarding
s390/idle: Fix cpu idle exit cpu time accounting
Lizhi Hou [Fri, 27 Feb 2026 00:48:41 +0000 (16:48 -0800)]
accel/amdxdna: Fill invalid payload for failed command
Newer userspace applications may read the payload of a failed command
to obtain detailed error information. However, the driver and old firmware
versions may not support returning advanced error information.
In this case, initialize the command payload with an invalid value so
userspace can detect that no detailed error information is available.
====================
Fix invariant violation for single-value tnums
We're hitting an invariant violation in Cilium that sometimes leads to
BPF programs being rejected and Cilium failing to start [1]. As far as
I know this is the first case of invariant violation found in a real
program (i.e., not by a fuzzer). The following extract from verifier
logs shows what's happening:
More details are given in the second patch, but in short, the verifier
should be able to detect that the false branch of instruction 237 is
never true. After instruction 236, the u64 range and the tnum overlap
in a single value, 0xf00.
The long-term solution to invariant violation is likely to rely on the
refinement + invariant violation check to detect dead branches, as
started by Eduard. To fix the current issue, we need something with
less refactoring that we can backport to affected kernels.
The solution implemented in the second patch is to improve the bounds
refinement to avoid this case. It relies on a new tnum helper,
tnum_step, first sent as an RFC in [2]. The last two patches extend and
update the selftests.
Link: https://github.com/cilium/cilium/issues/44216 Link: https://lore.kernel.org/bpf/20251107192328.2190680-2-harishankar.vishwanathan@gmail.com/
Changes in v3:
- Fix commit description error spotted by AI bot.
- Simplify constants in first two tests (Eduard).
- Rework comment on third test (Eduard).
- Add two new negative test cases (Eduard).
- Rebased.
Changes in v2:
- Add guard suggested by Hari in tnum_step, to avoid undefined
behavior spotted by AI code review.
- Add explanation diagrams in code as suggested by Eduard.
- Rework conditions for readability as suggested by Eduard.
- Updated reference to SMT formula.
- Rebased.
====================
Paul Chaignon [Fri, 27 Feb 2026 21:42:45 +0000 (22:42 +0100)]
selftests/bpf: Avoid simplification of crafted bounds test
The reg_bounds_crafted tests validate the verifier's range analysis
logic. They focus on the actual ranges and thus ignore the tnum. As a
consequence, they carry the assumption that the tested cases can be
reproduced in userspace without using the tnum information.
Unfortunately, the previous change the refinement logic breaks that
assumption for one test case:
The tested bytecode is shown below. Without our previous improvement, on
the false branch of the condition, R7 is only known to have u64 range
[0xfffffffe; 0x100000000]. With our improvement, and using the tnum
information, we can deduce that R7 equals 0x100000000.
R7's tnum is (0; 0x1ffffffff). On the false branch, regs_refine_cond_op
refines R7's u32 range to [0; 0x7fffffff]. Then, __reg32_deduce_bounds
refines the s32 range to 0 using u32 and finally also sets u32=0.
From this, __reg_bound_offset improves the tnum to (0; 0x100000000).
Finally, our previous patch uses this new tnum to deduce that it only
intersect with u64=[0xfffffffe; 0x100000000] in a single value:
0x100000000.
Because the verifier uses the tnum to reach this constant value, the
selftest is unable to reproduce it by only simulating ranges. The
solution implemented in this patch is to change the test case such that
there is more than one overlap value between u64 and the tnum. The max.
u64 value is thus changed from 0x100000000 to 0x300000000.
Paul Chaignon [Fri, 27 Feb 2026 21:36:30 +0000 (22:36 +0100)]
selftests/bpf: Test refinement of single-value tnum
This patch introduces selftests to cover the new bounds refinement
logic introduced in the previous patch. Without the previous patch,
the first two tests fail because of the invariant violation they
trigger. The last test fails because the R10 access is not detected as
dead code. In addition, all three tests fail because of R0 having a
non-constant value in the verifier logs.
In addition, the last two cases are covering the negative cases: when we
shouldn't refine the bounds because the u64 and tnum overlap in at least
two values.
Paul Chaignon [Fri, 27 Feb 2026 21:35:02 +0000 (22:35 +0100)]
bpf: Improve bounds when tnum has a single possible value
We're hitting an invariant violation in Cilium that sometimes leads to
BPF programs being rejected and Cilium failing to start [1]. The
following extract from verifier logs shows what's happening:
We reach instruction 236 with two possible values for R9, 0xe00 and
0xf00. This is perfectly reflected in the tnum, but of course the ranges
are less accurate and cover [0xe00; 0xf00]. Taking the fallthrough path
at instruction 236 allows the verifier to reduce the range to
[0xe01; 0xf00]. The tnum is however not updated.
With these ranges, at instruction 237, the verifier is not able to
deduce that R9 is always equal to 0xf00. Hence the fallthrough pass is
explored first, the verifier refines the bounds using the assumption
that R9 != 0xf00, and ends up with an invariant violation.
This pattern of impossible branch + bounds refinement is common to all
invariant violations seen so far. The long-term solution is likely to
rely on the refinement + invariant violation check to detect dead
branches, as started by Eduard. To fix the current issue, we need
something with less refactoring that we can backport.
This patch uses the tnum_step helper introduced in the previous patch to
detect the above situation. In particular, three cases are now detected
in the bounds refinement:
1. The u64 range and the tnum only overlap in umin.
u64: ---[xxxxxx]-----
tnum: --xx----------x-
2. The u64 range and the tnum only overlap in the maximum value
represented by the tnum, called tmax.
u64: ---[xxxxxx]-----
tnum: xx-----x--------
3. The u64 range and the tnum only overlap in between umin (excluded)
and umax.
u64: ---[xxxxxx]-----
tnum: xx----x-------x-
To detect these three cases, we call tnum_step(tnum, umin), which
returns the smallest member of the tnum greater than umin, called
tnum_next here. We're in case (1) if umin is part of the tnum and
tnum_next is greater than umax. We're in case (2) if umin is not part of
the tnum and tnum_next is equal to tmax. Finally, we're in case (3) if
umin is not part of the tnum, tnum_next is inferior or equal to umax,
and calling tnum_step a second time gives us a value past umax.
This change implements these three cases. With it, the above bytecode
looks as follows:
In addition to the new selftests, this change was also verified with
Agni [3]. For the record, the raw SMT is available at [4]. The property
it verifies is that: If a concrete value x is contained in all input
abstract values, after __update_reg_bounds, it will continue to be
contained in all output abstract values.
bpf: Introduce tnum_step to step through tnum's members
This commit introduces tnum_step(), a function that, when given t, and a
number z returns the smallest member of t larger than z. The number z
must be greater or equal to the smallest member of t and less than the
largest member of t.
The first step is to compute j, a number that keeps all of t's known
bits, and matches all unknown bits to z's bits. Since j is a member of
the t, it is already a candidate for result. However, we want our result
to be (minimally) greater than z.
There are only two possible cases:
(1) Case j <= z. In this case, we want to increase the value of j and
make it > z.
(2) Case j > z. In this case, we want to decrease the value of j while
keeping it > z.
(Case 1.1) Let's first consider the case where j < z. We will address j
== z later.
Since z > j, there had to be a bit position that was 1 in z and a 0 in
j, beyond which all positions of higher significance are equal in j and
z. Further, this position could not have been unknown in a, because the
unknown positions of a match z. This position had to be a 1 in z and
known 0 in t.
Let k be position of the most significant 1-to-0 flip. In our example, k
= 3 (starting the count at 1 at the least significant bit). Setting (to
1) the unknown bits of t in positions of significance smaller than
k will not produce a result > z. Hence, we must set/unset the unknown
bits at positions of significance higher than k. Specifically, we look
for the next larger combination of 1s and 0s to place in those
positions, relative to the combination that exists in z. We can achieve
this by concatenating bits at unknown positions of t into an integer,
adding 1, and writing the bits of that result back into the
corresponding bit positions previously extracted from z.
>From our example, considering only positions of significance greater
than k:
t = xx..x
z = 10..1
+ 1
-----
11..0
This is the exact combination 1s and 0s we need at the unknown bits of t
in positions of significance greater than k. Further, our result must
only increase the value minimally above z. Hence, unknown bits in
positions of significance smaller than k should remain 0. We finally
have,
Matching the unknown bits of the t to the bits of z yielded exactly z.
To produce a number greater than z, we must set/unset the unknown bits
in t, and *all* the unknown bits of t candidates for being set/unset. We
can do this similar to Case 1.1, by adding 1 to the bits extracted from
the masked bit positions of z. Essentially, this case is equivalent to
Case 1.1, with k = 0.
t = 1x1x0xxx
z = .0.1.100
+ 1
---------
.0.1.101
This is the exact combination of bits needed in the unknown positions of
t. After recalling the known positions of t, we get
Since j > z, there had to be a bit position which was 0 in z, and a 1 in
j, beyond which all positions of higher significance are equal in j and
z. This position had to be a 0 in z and known 1 in t. Let k be the
position of the most significant 0-to-1 flip. In our example, k = 4.
Because of the 0-to-1 flip at position k, a member of t can become
greater than z if the bits in positions greater than k are themselves >=
to z. To make that member *minimally* greater than z, the bits in
positions greater than k must be exactly = z. Hence, we simply match all
of t's unknown bits in positions more significant than k to z's bits. In
positions less significant than k, we set all t's unknown bits to 0
to retain minimality.
In our example, in positions of greater significance than k (=4),
t=x000. These positions are matched with z (1000) to produce 1000. In
positions of lower significance than k, t=10x1. All unknown bits are set
to 0 to produce 1001. The final result is:
This concludes the computation for a result > z that is a member of t.
The procedure for tnum_step() in this commit implements the idea
described above. As a proof of correctness, we verified the algorithm
against a logical specification of tnum_step. The specification asserts
the following about the inputs t, z and output res that:
1. res is a member of t, and
2. res is strictly greater than z, and
3. there does not exist another value res2 such that
3a. res2 is also a member of t, and
3b. res2 is greater than z
3c. res2 is smaller than res
We checked the implementation against this logical specification using
an SMT solver. The verification formula in SMTLIB format is available
at [1]. The verification returned an "unsat": indicating that no input
assignment exists for which the implementation and the specification
produce different outputs.
In addition, we also automatically generated the logical encoding of the
C implementation using Agni [2] and verified it against the same
specification. This verification also returned an "unsat", confirming
that the implementation is equivalent to the specification. The formula
for this check is also available at [3].
====================
bpf: Fix per-CPU bulk queue races on PREEMPT_RT
On PREEMPT_RT kernels, local_bh_disable() only calls migrate_disable()
(when PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable
preemption. This means CFS scheduling can preempt a task inside the
per-CPU bulk queue (bq) operations in cpumap and devmap, allowing
another task on the same CPU to concurrently access the same bq,
leading to use-after-free, list corruption, and kernel panics.
Patch 1 fixes the cpumap race in bq_flush_to_queue(), originally
reported by syzbot [1].
Patch 2 fixes the same class of race in devmap's bq_xmit_all(),
identified by code inspection after Sebastian Andrzej Siewior pointed
out that devmap has the same per-CPU bulk queue pattern [2].
Both patches use local_lock_nested_bh() to serialize access to the
per-CPU bq. On non-RT this is a pure lockdep annotation with no
overhead; on PREEMPT_RT it provides a per-CPU sleeping lock.
To reproduce the devmap race, insert an mdelay(100) in bq_xmit_all()
after "cnt = bq->count" and before the actual transmit loop. Then pin
two threads to the same CPU, each running BPF_PROG_TEST_RUN with an XDP
program that redirects to a DEVMAP entry (e.g. a veth pair). CFS
timeslicing during the mdelay window causes interleaving. Without the
fix, KASAN reports null-ptr-deref due to operating on freed frames:
BUG: KASAN: null-ptr-deref in __build_skb_around+0x22d/0x340
Write of size 32 at addr 0000000000000d50 by task devmap_race_rep/449
v3 -> v4: https://lore.kernel.org/all/20260213034018.284146-1-jiayuan.chen@linux.dev/
- Move panic trace to cover letter. (Sebastian Andrzej Siewior)
- Add Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> to both patches
from cover letter.
v2 -> v3: https://lore.kernel.org/bpf/20260212023634.366343-1-jiayuan.chen@linux.dev/
- Fix commit message: remove incorrect "spin_lock() becomes rt_mutex"
claim, the per-CPU bq has no spin_lock at all. (Sebastian Andrzej Siewior)
- Fix commit message: accurately describe local_lock_nested_bh()
behavior instead of referencing local_lock(). (Sebastian Andrzej Siewior)
- Remove incomplete discussion of snapshot alternative.
(Sebastian Andrzej Siewior)
- Remove panic trace from commit message. (Sebastian Andrzej Siewior)
- Add patch 2/2 for devmap, same race pattern. (Sebastian Andrzej Siewior)
v1 -> v2: https://lore.kernel.org/bpf/20260211064417.196401-1-jiayuan.chen@linux.dev/
- Use local_lock_nested_bh()/local_unlock_nested_bh() instead of
local_lock()/local_unlock(), since these paths already run under
local_bh_disable(). (Sebastian Andrzej Siewior)
- Replace "Caller must hold bq->bq_lock" comment with
lockdep_assert_held() in bq_flush_to_queue(). (Sebastian Andrzej Siewior)
- Fix Fixes tag to 3253cb49cbad ("softirq: Allow to drop the
softirq-BKL lock on PREEMPT_RT") which is the actual commit that
makes the race possible. (Sebastian Andrzej Siewior)
====================
Jiayuan Chen [Wed, 25 Feb 2026 12:14:56 +0000 (20:14 +0800)]
bpf: Fix race in devmap on PREEMPT_RT
On PREEMPT_RT kernels, the per-CPU xdp_dev_bulk_queue (bq) can be
accessed concurrently by multiple preemptible tasks on the same CPU.
The original code assumes bq_enqueue() and __dev_flush() run atomically
with respect to each other on the same CPU, relying on
local_bh_disable() to prevent preemption. However, on PREEMPT_RT,
local_bh_disable() only calls migrate_disable() (when
PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable
preemption, which allows CFS scheduling to preempt a task during
bq_xmit_all(), enabling another task on the same CPU to enter
bq_enqueue() and operate on the same per-CPU bq concurrently.
This leads to several races:
1. Double-free / use-after-free on bq->q[]: bq_xmit_all() snapshots
cnt = bq->count, then iterates bq->q[0..cnt-1] to transmit frames.
If preempted after the snapshot, a second task can call bq_enqueue()
-> bq_xmit_all() on the same bq, transmitting (and freeing) the
same frames. When the first task resumes, it operates on stale
pointers in bq->q[], causing use-after-free.
2. bq->count and bq->q[] corruption: concurrent bq_enqueue() modifying
bq->count and bq->q[] while bq_xmit_all() is reading them.
3. dev_rx/xdp_prog teardown race: __dev_flush() clears bq->dev_rx and
bq->xdp_prog after bq_xmit_all(). If preempted between
bq_xmit_all() return and bq->dev_rx = NULL, a preempting
bq_enqueue() sees dev_rx still set (non-NULL), skips adding bq to
the flush_list, and enqueues a frame. When __dev_flush() resumes,
it clears dev_rx and removes bq from the flush_list, orphaning the
newly enqueued frame.
4. __list_del_clearprev() on flush_node: similar to the cpumap race,
both tasks can call __list_del_clearprev() on the same flush_node,
the second dereferences the prev pointer already set to NULL.
The race between task A (__dev_flush -> bq_xmit_all) and task B
(bq_enqueue -> bq_xmit_all) on the same CPU:
Task A (xdp_do_flush) Task B (ndo_xdp_xmit redirect)
---------------------- --------------------------------
__dev_flush(flush_list)
bq_xmit_all(bq)
cnt = bq->count /* e.g. 16 */
/* start iterating bq->q[] */
<-- CFS preempts Task A -->
bq_enqueue(dev, xdpf)
bq->count == DEV_MAP_BULK_SIZE
bq_xmit_all(bq, 0)
cnt = bq->count /* same 16! */
ndo_xdp_xmit(bq->q[])
/* frames freed by driver */
bq->count = 0
<-- Task A resumes -->
ndo_xdp_xmit(bq->q[])
/* use-after-free: frames already freed! */
Fix this by adding a local_lock_t to xdp_dev_bulk_queue and acquiring
it in bq_enqueue() and __dev_flush(). These paths already run under
local_bh_disable(), so use local_lock_nested_bh() which on non-RT is
a pure annotation with no overhead, and on PREEMPT_RT provides a
per-CPU sleeping lock that serializes access to the bq.
Fixes: 3253cb49cbad ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT") Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260225121459.183121-3-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiayuan Chen [Wed, 25 Feb 2026 12:14:55 +0000 (20:14 +0800)]
bpf: Fix race in cpumap on PREEMPT_RT
On PREEMPT_RT kernels, the per-CPU xdp_bulk_queue (bq) can be accessed
concurrently by multiple preemptible tasks on the same CPU.
The original code assumes bq_enqueue() and __cpu_map_flush() run
atomically with respect to each other on the same CPU, relying on
local_bh_disable() to prevent preemption. However, on PREEMPT_RT,
local_bh_disable() only calls migrate_disable() (when
PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable
preemption, which allows CFS scheduling to preempt a task during
bq_flush_to_queue(), enabling another task on the same CPU to enter
bq_enqueue() and operate on the same per-CPU bq concurrently.
This leads to several races:
1. Double __list_del_clearprev(): after bq->count is reset in
bq_flush_to_queue(), a preempting task can call bq_enqueue() ->
bq_flush_to_queue() on the same bq when bq->count reaches
CPU_MAP_BULK_SIZE. Both tasks then call __list_del_clearprev()
on the same bq->flush_node, the second call dereferences the
prev pointer that was already set to NULL by the first.
2. bq->count and bq->q[] races: concurrent bq_enqueue() can corrupt
the packet queue while bq_flush_to_queue() is processing it.
The race between task A (__cpu_map_flush -> bq_flush_to_queue) and
task B (bq_enqueue -> bq_flush_to_queue) on the same CPU:
Task A (xdp_do_flush) Task B (cpu_map_enqueue)
---------------------- ------------------------
bq_flush_to_queue(bq)
spin_lock(&q->producer_lock)
/* flush bq->q[] to ptr_ring */
bq->count = 0
spin_unlock(&q->producer_lock)
bq_enqueue(rcpu, xdpf)
<-- CFS preempts Task A --> bq->q[bq->count++] = xdpf
/* ... more enqueues until full ... */
bq_flush_to_queue(bq)
spin_lock(&q->producer_lock)
/* flush to ptr_ring */
spin_unlock(&q->producer_lock)
__list_del_clearprev(flush_node)
/* sets flush_node.prev = NULL */
<-- Task A resumes -->
__list_del_clearprev(flush_node)
flush_node.prev->next = ...
/* prev is NULL -> kernel oops */
Fix this by adding a local_lock_t to xdp_bulk_queue and acquiring it
in bq_enqueue() and __cpu_map_flush(). These paths already run under
local_bh_disable(), so use local_lock_nested_bh() which on non-RT is
a pure annotation with no overhead, and on PREEMPT_RT provides a
per-CPU sleeping lock that serializes access to the bq.
To reproduce, insert an mdelay(100) between bq->count = 0 and
__list_del_clearprev() in bq_flush_to_queue(), then run reproducer
provided by syzkaller.
Fixes: 3253cb49cbad ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT") Reported-by: syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69369331.a70a0220.38f243.009d.GAE@google.com/T/ Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260225121459.183121-2-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
====================
Close race in freeing special fields and map value
There exists a race across various map types where the freeing of
special fields (tw, timer, wq, kptr, etc.) can be done eagerly when a
logical delete operation is done on a map value, such that the program
which continues to have access to such a map value can recreate the
fields and cause them to leak.
The set contains fixes for this case. It is a continuation of Mykyta's
previous attempt in [0], but applies to all fields. A test is included
which reproduces the bug reliably in absence of the fixes.
Local Storage Benchmarks
------------------------
Evaluation Setup: Benchmarked on a dual-socket Intel Xeon Gold 6348 (Ice
Lake) @ 2.60GHz (56 cores / 112 threads), with the CPU governor set to
performance. Bench was pinned to a single NUMA node throughout the test.
Benchmark comes from [1] using the following command:
./bench -p 1 local-storage-create --storage-type <socket,task> --batch-size <16,32,64>
Before the test, 10 runs of all cases ([socket|task] x 3 batch sizes x 7
iterations per batch size) are done to warm up and prime the machine.
Then, 3 runs of all cases are done (with and without the patch, across
reboots).
For each comparison, we have 21 samples, i.e. per batch size (e.g.
socket 16) of a given local storage, we have 3 runs x 7 iterations.
The statistics (mean, median, stddev) and t-test is done for each
scenario (local storage and batch size pair) individually (21 samples
for either case). All values are for local storage creations in thousand
creations / sec (k/s).
The cases for socket are within the range of noise, and improvements in task
local storage are due to high variance (CV ~4%-6% across batch sizes). The only
statistically significant case worth mentioning is socket with batch size 64
with p-value from t-test < 0.05, but the absolute difference is small (~2k/s).
TL;DR there doesn't appear to be any significant regression or improvement.
* Add Paul's Reviewed-by.
* Fix use-after-free in accessing bpf_mem_alloc embedded in map. (syzbot CI)
* Add benchmark numbers for local storage.
* Add extra test case for per-cpu hashmap coverage with up to 16 refcount leaks.
* Target bpf tree.
====================
Add a couple of tests to ensure that the refcount drops to zero when we
exercise the race where creation of a special field succeeds the logical
bpf_obj_free_fields done when deleting an element. Prior to previous
changes, the fields would be freed eagerly and repopulate and end up
leaking, causing the reference to not drop down correctly. Running this
test on a kernel without fixes will cause a hang in delete_module, since
the module reference stays active due to the leaked kptr not dropping
it. After the fixes tests succeed as expected.
Currently, when use_kmalloc_nolock is false, the freeing of fields for a
local storage selem is done eagerly before waiting for the RCU or RCU
tasks trace grace period to elapse. This opens up a window where the
program which has access to the selem can recreate the fields after the
freeing of fields is done eagerly, causing memory leaks when the element
is finally freed and returned to the kernel.
Make a few changes to address this. First, delay the freeing of fields
until after the grace periods have expired using a __bpf_selem_free_rcu
wrapper which is eventually invoked after transitioning through the
necessary number of grace period waits. Replace usage of the kfree_rcu
with call_rcu to be able to take a custom callback. Finally, care needs
to be taken to extend the rcu barriers for all cases, and not just when
use_kmalloc_nolock is true, as RCU and RCU tasks trace callbacks can be
in flight for either case and access the smap field, which is used to
obtain the BTF record to walk over special fields in the map value.
While we're at it, drop migrate_disable() from bpf_selem_free_rcu, since
migration should be disabled for RCU callbacks already.
BPF hash map may now use the map_check_btf() callback to decide whether
to set a dtor on its bpf_mem_alloc or not. Unlike C++ where members can
opt out of const-ness using mutable, we must lose the const qualifier on
the callback such that we can avoid the ugly cast. Make the change and
adjust all existing users, and lose the comment in hashtab.c.
There is a race window where BPF hash map elements can leak special
fields if the program with access to the map value recreates these
special fields between the check_and_free_fields done on the map value
and its eventual return to the memory allocator.
Several ways were explored prior to this patch, most notably [0] tried
to use a poison value to reject attempts to recreate special fields for
map values that have been logically deleted but still accessible to BPF
programs (either while sitting in the free list or when reused). While
this approach works well for task work, timers, wq, etc., it is harder
to apply the idea to kptrs, which have a similar race and failure mode.
Instead, we change bpf_mem_alloc to allow registering destructor for
allocated elements, such that when they are returned to the allocator,
any special fields created while they were accessible to programs in the
mean time will be freed. If these values get reused, we do not free the
fields again before handing the element back. The special fields thus
may remain initialized while the map value sits in a free list.
When bpf_mem_alloc is retired in the future, a similar concept can be
introduced to kmalloc_nolock-backed kmem_cache, paired with the existing
idea of a constructor.
Note that the destructor registration happens in map_check_btf, after
the BTF record is populated and (at that point) avaiable for inspection
and duplication. Duplication is necessary since the freeing of embedded
bpf_mem_alloc can be decoupled from actual map lifetime due to logic
introduced to reduce the cost of rcu_barrier()s in mem alloc free path in 9f2c6e96c65e ("bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc.").
As such, once all callbacks are done, we must also free the duplicated
record. To remove dependency on the bpf_map itself, also stash the key
size of the map to obtain value from htab_elem long after the map is
gone.
Linus Torvalds [Fri, 27 Feb 2026 21:40:30 +0000 (13:40 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The diffstat is dominated by changes to our TLB invalidation errata
handling and the introduction of a new GCS selftest to catch one of
the issues that is fixed here relating to PROT_NONE mappings.
- Fix cpufreq warning due to attempting a cross-call with interrupts
masked when reading local AMU counters
- Fix DEBUG_PREEMPT warning from the delay loop when it tries to
access per-cpu errata workaround state for the virtual counter
- Re-jig and optimise our TLB invalidation errata workarounds in
preparation for more hardware brokenness
- Fix GCS mappings to interact properly with PROT_NONE and to avoid
corrupting the pte on CPUs with FEAT_LPA2
- Fix ioremap_prot() to extract only the memory attributes from the
user pte and ignore all the other 'prot' bits"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: topology: Fix false warning in counters_read_on_cpu() for same-CPU reads
arm64: Fix sampling the "stable" virtual counter in preemptible section
arm64: tlb: Optimize ARM64_WORKAROUND_REPEAT_TLBI
arm64: tlb: Allow XZR argument to TLBI ops
kselftest: arm64: Check access to GCS after mprotect(PROT_NONE)
arm64: gcs: Honour mprotect(PROT_NONE) on shadow stack mappings
arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled
arm64: io: Extract user memory type in ioremap_prot()
arm64: io: Rename ioremap_prot() to __ioremap_prot()
Linus Torvalds [Fri, 27 Feb 2026 21:32:52 +0000 (13:32 -0800)]
Merge tag 'pci-v7.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Update MAINTAINERS email address (Shawn Guo)
- Refresh cached Endpoint driver MSI Message Address to fix a v7.0
regression when kernel changes the address after firmware has
configured it (Niklas Cassel)
- Flush Endpoint MSI-X writes so they complete before the outbound ATU
entry is unmapped (Niklas Cassel)
- Correct the PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value, which broke VMM use
of PCI capabilities (Bjorn Helgaas)
* tag 'pci-v7.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Correct PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value
PCI: dwc: ep: Flush MSI-X write before unmapping its ATU entry
PCI: dwc: ep: Refresh MSI Message Address cache on change
MAINTAINERS: Update Shawn Guo's address for HiSilicon PCIe controller driver
Linus Torvalds [Fri, 27 Feb 2026 18:52:57 +0000 (10:52 -0800)]
Merge tag 'cxl-fixes-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dave Jiang:
- Fix incorrect usages of decoder flags
- Validate payload size before accessing contents
- Fix race condition when creating nvdimm objects
- Fix deadlock on attach failure
* tag 'cxl-fixes-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Test CXL_DECODER_F_NORMALIZED_ADDRESSING as a bitmask
cxl: Test CXL_DECODER_F_LOCK as a bitmask
cxl/mbox: validate payload size before accessing contents in cxl_payload_from_user_allowed()
cxl: Fix race of nvdimm_bus object when creating nvdimm objects
cxl: Move devm_cxl_add_nvdimm_bridge() to cxl_pmem.ko
cxl/port: Hold port host lock during dport adding.
cxl/port: Introduce port_to_host() helper
cxl/memdev: fix deadlock in cxl_memdev_autoremove() on attach failure
Linus Torvalds [Fri, 27 Feb 2026 18:49:54 +0000 (10:49 -0800)]
Merge tag 'mmc-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Avoid bitfield RMW for claim/retune flags
MMC host:
- dw_mmc-rockchip: Fix runtime PM support for internal phase support
- mmci: Fix device_node reference leak in of_get_dml_pipe_index()
- sdhci-brcmstb: Use correct register offset for V1 pin_sel restore"
* tag 'mmc-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: core: Avoid bitfield RMW for claim/retune flags
mmc: sdhci-brcmstb: use correct register offset for V1 pin_sel restore
mmc: dw_mmc-rockchip: Fix runtime PM support for internal phase support
mmc: mmci: Fix device_node reference leak in of_get_dml_pipe_index()
Linus Torvalds [Fri, 27 Feb 2026 18:42:02 +0000 (10:42 -0800)]
Merge tag 'block-7.0-20260227' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
"Two sets of fixes, one for drbd, and one for the zoned loop driver"
* tag 'block-7.0-20260227' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
zloop: check for spurious options passed to remove
zloop: advertise a volatile write cache
drbd: fix null-pointer dereference on local read error
drbd: Replace deprecated strcpy with strscpy
drbd: fix "LOGIC BUG" in drbd_al_begin_io_nonblock()
Linus Torvalds [Fri, 27 Feb 2026 18:39:11 +0000 (10:39 -0800)]
Merge tag 'io_uring-7.0-20260227' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
"Just two minor patches in here, ensuring the use of READ_ONCE() for
sqe field reading is consistent across the codebase. There were two
missing cases, now they are covered too"
* tag 'io_uring-7.0-20260227' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/timeout: READ_ONCE sqe->addr
io_uring/cmd_net: use READ_ONCE() for ->addr3 read
Linus Torvalds [Fri, 27 Feb 2026 18:21:06 +0000 (10:21 -0800)]
Merge tag 'xfs-fixes-7.0-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
"Nothing reeeally stands out here: a few bug fixes, some refactoring to
easily fit the bug fixes, and a couple cosmetic changes"
* tag 'xfs-fixes-7.0-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: add static size checks for ioctl UABI
xfs: remove duplicate static size checks
xfs: Add comments for usages of some macros.
xfs: Update lazy counters in xfs_growfs_rt_bmblock()
xfs: Add a comment in xfs_log_sb()
xfs: Fix xfs_last_rt_bmblock()
xfs: don't report half-built inodes to fserror
xfs: don't report metadata inodes to fserror
xfs: fix potential pointer access race in xfs_healthmon_get
xfs: fix xfs_group release bug in xfs_dax_notify_dev_failure
xfs: fix xfs_group release bug in xfs_verify_report_losses
xfs: fix copy-paste error in previous fix
xfs: Fix error pointer dereference
xfs: remove metafile inodes from the active inode stat
xfs: cleanup inode counter stats
xfs: fix code alignment issues in xfs_ondisk.c
xfs: Replace &rtg->rtg_group with rtg_group()
xfs: Refactoring the nagcount and delta calculation
xfs: Replace ASSERT with XFS_IS_CORRUPT in xfs_rtcopy_summary()
Linus Torvalds [Fri, 27 Feb 2026 17:54:02 +0000 (09:54 -0800)]
Merge tag 'slab-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
- Fix for spurious page allocation warnings on sheaf refill (Harry Yoo)
- Fix for CONFIG_MEM_ALLOC_PROFILING_DEBUG warnings (Suren
Baghdasaryan)
- Fix for kernel-doc warning on ksize() (Sanjay Chitroda)
- Fix to avoid setting slab->stride later than on slab allocation.
Doesn't yet fix the reports from powerpc; debugging is making
progress (Harry Yoo)
* tag 'slab-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: initialize slab->stride early to avoid memory ordering issues
mm/slub: drop duplicate kernel-doc for ksize()
mm/slab: mark alloc tags empty for sheaves allocated with __GFP_NO_OBJ_EXT
mm/slab: pass __GFP_NOWARN to refill_sheaf() if fallback is available
Linus Torvalds [Fri, 27 Feb 2026 17:42:17 +0000 (09:42 -0800)]
Merge tag 'gpio-fixes-for-v7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix memory leaks in shared GPIO management
- normalize the return values of gpio_chip::get() in GPIO core on
behalf of drivers that return invalid values (this is done because
adding stricter sanitization of callback retvals led to breakages in
existing users, we'll revert that once all are fixed)
* tag 'gpio-fixes-for-v7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: normalize the return value of gc->get() on behalf of buggy drivers
gpio: shared: fix memory leaks
Linus Torvalds [Fri, 27 Feb 2026 17:34:02 +0000 (09:34 -0800)]
Merge tag 'sound-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A bunch of small device-specific fixes. Mostly quirks and fix-ups for
USB- and HD-audio at this time, in addition to a couple of ASoC AMD
and Cirrus fixes"
* tag 'sound-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ASoC: SDCA: Fix comments for sdca_irq_request()
ALSA: us144mkii: Drop kernel-doc markers
ALSA: usb: qcom: Correct parameter comment for uaudio_transfer_buffer_setup()
ALSA: usb-audio: Drop superfluous kernel-doc markers
ALSA: hda: cs35l56: Remove unnecessary struct cs_dsp_client_ops
ALSA: hda: cs35l56: Fix signedness error in cs35l56_hda_posture_put()
ALSA: usb-audio: Use correct version for UAC3 header validation
ALSA: hda/realtek: add quirk for Acer Nitro ANV15-51
ALSA: hda/intel: increase default bdl_pos_adj for Nvidia controllers
ALSA: usb-audio: Use inclusive terms
ALSA: usb-audio: Avoid implicit feedback mode on DIYINHK USB Audio 2.0
ALSA: usb-audio: Check max frame size for implicit feedback mode, too
ALSA: usb-audio: Cap the packet size pre-calculations
ASoC: amd: yc: Add ASUS EXPERTBOOK BM1503CDA to quirk table
ASoC: cs42l43: Report insert for exotic peripherals
ALSA: usb-audio: Skip clock selector for Focusrite devices
ALSA: usb-audio: Add QUIRK_FLAG_SKIP_IFACE_SETUP
ALSA: usb-audio: Remove VALIDATE_RATES quirk for Focusrite devices
ALSA: usb-audio: Improve Focusrite sample rate filtering
ALSA: hda/realtek: add quirk for Samsung Galaxy Book Flex (NT950QCT-A38A)
...
Linus Torvalds [Fri, 27 Feb 2026 16:56:07 +0000 (08:56 -0800)]
Merge tag 'drm-fixes-2026-02-27' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Regular fixes pull, amdxdna and amdgpu are the main ones, with a
couple of intel fixes, then a scattering of fixes across drivers,
nothing too major.
i915/display:
- Fix Panel Replay stuck with X during mode transitions on Panther
Lake
vmwgfx:
- A reference count and error handling fix"
* tag 'drm-fixes-2026-02-27' of https://gitlab.freedesktop.org/drm/kernel: (39 commits)
drm/amd: Disable MES LR compute W/A
drm/amdgpu: Fix error handling in slot reset
drm/amdgpu/vcn5: Add SMU dpm interface type
drm/amdgpu: Fix locking bugs in error paths
drm/amdgpu: Unlock a mutex before destroying it
drm/amd/display: Use GFP_ATOMIC in dc_create_stream_for_sink
drm/amdgpu: add upper bound check on user inputs in wait ioctl
drm/amdgpu: add upper bound check on user inputs in signal ioctl
drm/amdgpu/userq: Do not allow userspace to trivially triger kernel warnings
drm/amdgpu/userq: Fix reference leak in amdgpu_userq_wait_ioctl
accel/amdxdna: Use a different name for latest firmware
drm/client: Do not destroy NULL modes
drm/gpusvm: Fix drm_gpusvm_pages_valid_unlocked() kernel-doc
drm/xe/sync: Fix user fence leak on alloc failure
drm/xe/sync: Cleanup partially initialized sync on parse failure
drm/xe/wa: Steer RMW of MCR registers while building default LRC
accel/amdxdna: Validate command buffer payload count
accel/amdxdna: Prevent ubuf size overflow
accel/amdxdna: Fix out-of-bounds memset in command slot handling
accel/amdxdna: Fix command hang on suspended hardware context
...
Bjorn Helgaas [Fri, 27 Feb 2026 12:10:08 +0000 (06:10 -0600)]
PCI: Correct PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value
fb82437fdd8c ("PCI: Change capability register offsets to hex") incorrectly
converted the PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value from decimal 52 to hex
0x32:
-#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52 /* v2 endpoints with link end here */
+#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 0x32 /* end of v2 EPs w/ link */
This broke PCI capabilities in a VMM because subsequent ones weren't
DWORD-aligned.
Change PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 to the correct value of 0x34.
fb82437fdd8c was from Baruch Siach <baruch@tkos.co.il>, but this was not
Baruch's fault; it's a mistake I made when applying the patch.
Fixes: fb82437fdd8c ("PCI: Change capability register offsets to hex") Reported-by: David Woodhouse <dwmw2@infradead.org> Closes: https://lore.kernel.org/all/3ae392a0158e9d9ab09a1d42150429dd8ca42791.camel@infradead.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Thorsten Blum [Thu, 26 Feb 2026 22:15:22 +0000 (23:15 +0100)]
smb: client: Use snprintf in cifs_set_cifscreds
Replace unbounded sprintf() calls with the safer snprintf(). Avoid using
magic numbers and use strlen() to calculate the key descriptor buffer
size. Save the size in a local variable and reuse it for the bounded
snprintf() calls. Remove CIFSCREDS_DESC_SIZE.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Harry Yoo [Mon, 23 Feb 2026 07:58:09 +0000 (16:58 +0900)]
mm/slab: initialize slab->stride early to avoid memory ordering issues
When alloc_slab_obj_exts() is called later (instead of during slab
allocation and initialization), slab->stride and slab->obj_exts are
updated after the slab is already accessible by multiple CPUs.
The current implementation does not enforce memory ordering between
slab->stride and slab->obj_exts. For correctness, slab->stride must be
visible before slab->obj_exts. Otherwise, concurrent readers may observe
slab->obj_exts as non-zero while stride is still stale.
With stale slab->stride, slab_obj_ext() could return the wrong obj_ext.
This could cause two problems:
- obj_cgroup_put() is called on the wrong objcg, leading to
a use-after-free due to incorrect reference counting [1] by
decrementing the reference count more than it was incremented.
- refill_obj_stock() is called on the wrong objcg, leading to
a page_counter overflow [2] by uncharging more memory than charged.
Fix this by unconditionally initializing slab->stride in
alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
In the case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the function.
This ensures updates to slab->stride become visible before the slab
can be accessed by other CPUs via the per-node partial slab list
(protected by spinlock with acquire/release semantics).
Thanks to Shakeel Butt for pointing out this issue [3].
[vbabka@kernel.org: the bug reports [1] and [2] are not yet fully fixed,
with investigation ongoing, but it is nevertheless a step in the right
direction to only set stride once after allocating the slab and not
change it later ]
Tvrtko Ursulin [Fri, 27 Feb 2026 12:49:01 +0000 (12:49 +0000)]
drm/ttm: Fix ttm_pool_beneficial_order() return type
Fix a nasty copy and paste bug, where the incorrect boolean return type of
the ttm_pool_beneficial_order() helper had a consequence of avoiding
direct reclaim too eagerly for drivers which use this feature (currently
amdgpu).
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: 7e9c548d3709 ("drm/ttm: Allow drivers to specify maximum beneficial TTM pool size") Cc: Christian König <christian.koenig@amd.com> Cc: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.19+ Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20260227124901.3177-1-tvrtko.ursulin@igalia.com
Thorsten Blum [Thu, 26 Feb 2026 21:28:45 +0000 (22:28 +0100)]
smb: client: Don't log plaintext credentials in cifs_set_cifscreds
When debug logging is enabled, cifs_set_cifscreds() logs the key
payload and exposes the plaintext username and password. Remove the
debug log to avoid exposing credentials.
Fixes: 8a8798a5ff90 ("cifs: fetch credentials out of keyring for non-krb5 auth multiuser mounts") Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Steve French <stfrench@microsoft.com>