]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
2 weeks agodrm/amd/pm: Use smu vram copy in SMUv15
Lijo Lazar [Fri, 27 Mar 2026 07:50:40 +0000 (13:20 +0530)] 
drm/amd/pm: Use smu vram copy in SMUv15

Use smu vram copy wrapper function for vram copy operations in
SMUv15.0.8

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/pm: Use smu vram copy in SMUv13
Lijo Lazar [Fri, 27 Mar 2026 07:46:33 +0000 (13:16 +0530)] 
drm/amd/pm: Use smu vram copy in SMUv13

Use smu vram copy wrapper function for vram copy operations in
SMUv13.0.6 and SMUv13.0.12.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/pm: Add smu vram copy function
Lijo Lazar [Fri, 27 Mar 2026 07:43:00 +0000 (13:13 +0530)] 
drm/amd/pm: Add smu vram copy function

Add a wrapper function for copying data/to from vram. This additionally
checks for any RAS fatal error. Copy cannot be trusted if any RAS fatal
error happened as VRAM becomes inaccessible.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: add CONFIG_GCOV_PROFILE_AMDGPU Kconfig option
Vitaly Prosyak [Tue, 24 Mar 2026 23:53:59 +0000 (19:53 -0400)] 
drm/amdgpu: add CONFIG_GCOV_PROFILE_AMDGPU Kconfig option

Add a Kconfig option to enable GCOV code coverage profiling for the
amdgpu driver, following the established upstream pattern used by
CONFIG_GCOV_PROFILE_FTRACE (kernel/trace), CONFIG_GCOV_PROFILE_RDS
(net/rds), and CONFIG_GCOV_PROFILE_URING (io_uring).

This allows CI systems to enable amdgpu code coverage entirely via
.config (e.g., scripts/config --enable GCOV_PROFILE_AMDGPU) without
manually editing the amdgpu Makefile. The option depends on both
DRM_AMDGPU and GCOV_KERNEL, defaults to n, and is therefore never
enabled in production or distro builds.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Replace use of system_wq with system_percpu_wq
Marco Crivellari [Fri, 13 Mar 2026 14:47:15 +0000 (15:47 +0100)] 
drm/amd/display: Replace use of system_wq with system_percpu_wq

This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:

   commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.

Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:

   system_wq -> system_percpu_wq
   system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: add an option to allow gpu partition allocate all available memory
Xiaogang Chen [Tue, 31 Mar 2026 18:24:17 +0000 (13:24 -0500)] 
drm/amdgpu: add an option to allow gpu partition allocate all available memory

Current driver reports and limits memory allocation for each partition equally
among partitions using same memory partition. Application may not be able to
use all available memory when run on a partitioned gpu though system still has
enough free memory.

Add an option that app can use to have gpu partition allocate all available
memory.

Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Consolidate reserve region allocations
Lijo Lazar [Thu, 26 Mar 2026 05:51:15 +0000 (11:21 +0530)] 
drm/amdgpu: Consolidate reserve region allocations

Move marking reserve regions to a single function. It loops through all
the reserve region ids. The ones with non-zero size are reserved. There
are still some reservations which could happen later during runtime like
firmware extended reservation region.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Move validation of reserve region info
Lijo Lazar [Thu, 26 Mar 2026 05:41:39 +0000 (11:11 +0530)] 
drm/amdgpu: Move validation of reserve region info

Keep validation of reserved regions also as part of filling details. If
the information is invalid, size is kept as 0 so that it's not
considered for reservation.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Add function to fill training region
Lijo Lazar [Thu, 26 Mar 2026 05:36:10 +0000 (11:06 +0530)] 
drm/amdgpu: Add function to fill training region

Add a function to fill in memory training reservation region. Only if
the reservation for the region is successful, memory training context
will be initialized.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Add function to fill fw reserve region
Lijo Lazar [Thu, 26 Mar 2026 05:24:38 +0000 (10:54 +0530)] 
drm/amdgpu: Add function to fill fw reserve region

Add a function to fill in details for firmware reserve region.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Group filling reserve region details
Lijo Lazar [Thu, 26 Mar 2026 05:09:16 +0000 (10:39 +0530)] 
drm/amdgpu: Group filling reserve region details

Add a function which groups filling of reserve region information. It
may not cover all as info on some regions are still filled outside like
those from atomfirmware tables.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Add memory training reserve-region
Lijo Lazar [Wed, 25 Mar 2026 13:40:16 +0000 (19:10 +0530)] 
drm/amdgpu: Add memory training reserve-region

Use reserve region helpers for initializing/reserving memory training
region.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Add host driver reserved-region
Lijo Lazar [Wed, 25 Mar 2026 13:10:24 +0000 (18:40 +0530)] 
drm/amdgpu: Add host driver reserved-region

Use reserve region helpers for initializing/reserving host driver
reserved region in virtualization environment.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Add fw vram usage reserve-region
Lijo Lazar [Wed, 25 Mar 2026 13:02:25 +0000 (18:32 +0530)] 
drm/amdgpu: Add fw vram usage reserve-region

Use reserve region helpers for initializing/reserving firmware usage
region in virtualized environments.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Add firmware extended reserve-region
Lijo Lazar [Wed, 25 Mar 2026 11:50:11 +0000 (17:20 +0530)] 
drm/amdgpu: Add firmware extended reserve-region

Use reserve region helpers for initializing/reserving extended firmware
reservation area.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Add fw_reserved reserve-region
Lijo Lazar [Wed, 25 Mar 2026 11:47:10 +0000 (17:17 +0530)] 
drm/amdgpu: Add fw_reserved reserve-region

Use reserve region helpers for initializing/reserving fw_reserved region.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Add stolen_reserved reserve-region
Lijo Lazar [Wed, 25 Mar 2026 11:38:45 +0000 (17:08 +0530)] 
drm/amdgpu: Add stolen_reserved reserve-region

Use reserve region helpers for initializing/reserving stolen_reserved region.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Add extended stolen vga reserve-region
Lijo Lazar [Wed, 25 Mar 2026 11:34:36 +0000 (17:04 +0530)] 
drm/amdgpu: Add extended stolen vga reserve-region

Use reserve region helpers for initializing/reserving extended stolen
vga region.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Add stolen vga reserve-region
Lijo Lazar [Wed, 25 Mar 2026 11:30:31 +0000 (17:00 +0530)] 
drm/amdgpu: Add stolen vga reserve-region

Use reserve region helpers for initializing/reserving stolen vga region.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Add reserved region ids
Lijo Lazar [Wed, 25 Mar 2026 11:06:16 +0000 (16:36 +0530)] 
drm/amdgpu: Add reserved region ids

Add reserved regions and helper functions to memory manager.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/ras: enable uniras via IP version check
Ce Sun [Mon, 30 Mar 2026 06:22:03 +0000 (14:22 +0800)] 
drm/amd/ras: enable uniras via IP version check

enable uniras via IP version check

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu/vcn4: Prevent OOB reads when parsing IB
Benjamin Cheng [Tue, 24 Mar 2026 20:42:05 +0000 (16:42 -0400)] 
drm/amdgpu/vcn4: Prevent OOB reads when parsing IB

Rewrite the IB parsing to use amdgpu_ib_get_value() which handles the
bounds checks.

Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2 weeks agodrm/amdgpu/vcn4: Prevent OOB reads when parsing dec msg
Benjamin Cheng [Wed, 25 Mar 2026 13:09:27 +0000 (09:09 -0400)] 
drm/amdgpu/vcn4: Prevent OOB reads when parsing dec msg

Check bounds against the end of the BO whenever we access the msg.

Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2 weeks agodrm/amdgpu/vcn3: Prevent OOB reads when parsing dec msg
Benjamin Cheng [Tue, 24 Mar 2026 20:25:56 +0000 (16:25 -0400)] 
drm/amdgpu/vcn3: Prevent OOB reads when parsing dec msg

Check bounds against the end of the BO whenever we access the msg.

Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2 weeks agodrm/amdgpu/vce: Prevent partial address patches
Benjamin Cheng [Mon, 30 Mar 2026 19:01:27 +0000 (15:01 -0400)] 
drm/amdgpu/vce: Prevent partial address patches

In the case that only one of lo/hi is valid, the patching could result
in a bad address written to in FW.

Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2 weeks agoMerge branch 'for-7.0-fixes' into for-7.1
Tejun Heo [Fri, 3 Apr 2026 17:48:28 +0000 (07:48 -1000)] 
Merge branch 'for-7.0-fixes' into for-7.1

Conflict in kernel/sched/ext.c between:

  7e0ffb72de8a ("sched_ext: Fix stale direct dispatch state in
  ddsp_dsq_id")

which clears ddsp state at individual call sites instead of
dispatch_enqueue(), and sub-sched related code reorg and API updates on
for-7.1. Resolved by applying the ddsp fix with for-7.1's signatures.

Signed-off-by: Tejun Heo <tj@kernel.org>
2 weeks agodrm/amdgpu: Add bounds checking to ib_{get,set}_value
Benjamin Cheng [Wed, 25 Mar 2026 12:39:19 +0000 (08:39 -0400)] 
drm/amdgpu: Add bounds checking to ib_{get,set}_value

The uvd/vce/vcn code accesses the IB at predefined offsets without
checking that the IB is large enough. Check the bounds here. The caller
is responsible for making sure it can handle arbitrary return values.

Also make the idx a uint32_t to prevent overflows causing the condition
to fail.

Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2 weeks agodrm/amd/display: Fix missing parameter details in amdgpu_dm_ism
Srinivasan Shanmugam [Sun, 29 Mar 2026 09:56:42 +0000 (15:26 +0530)] 
drm/amd/display: Fix missing parameter details in amdgpu_dm_ism

Update comments in dm_ism_get_idle_allow_delay() and
dm_ism_insert_record() to better reflect their behavior and inputs.

dm_ism_get_idle_allow_delay() computes the delay before allowing
idle optimizations based on history and stream timing.

dm_ism_insert_record() stores idle duration records in the
circular history buffer.

These functions explain what they do, but they do not explain what their
inputs mean.

Fixes the below with gcc W=1:
../display/amdgpu_dm/amdgpu_dm_ism.c:44 function parameter 'current_state' not described in 'dm_ism_next_state'
../display/amdgpu_dm/amdgpu_dm_ism.c:44 function parameter 'event' not described in 'dm_ism_next_state'
../display/amdgpu_dm/amdgpu_dm_ism.c:44 function parameter 'next_state' not described in 'dm_ism_next_state'
../display/amdgpu_dm/amdgpu_dm_ism.c:153 function parameter 'ism' not described in 'dm_ism_get_idle_allow_delay'
../display/amdgpu_dm/amdgpu_dm_ism.c:153 function parameter 'stream' not described in 'dm_ism_get_idle_allow_delay'
../display/amdgpu_dm/amdgpu_dm_ism.c:216 function parameter 'ism' not described in 'dm_ism_insert_record'
../display/amdgpu_dm/amdgpu_dm_ism.c:44 function parameter 'current_state' not described in 'dm_ism_next_state'
../display/amdgpu_dm/amdgpu_dm_ism.c:44 function parameter 'event' not described in 'dm_ism_next_state'
../display/amdgpu_dm/amdgpu_dm_ism.c:44 function parameter 'next_state' not described in 'dm_ism_next_state'
../display/amdgpu_dm/amdgpu_dm_ism.c:153 function parameter 'ism' not described in 'dm_ism_get_idle_allow_delay'
../display/amdgpu_dm/amdgpu_dm_ism.c:153 function parameter 'stream' not described in 'dm_ism_get_idle_allow_delay'
../display/amdgpu_dm/amdgpu_dm_ism.c:216 function parameter 'ism' not described in 'dm_ism_insert_record'

Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)")
Cc: Ray Wu <ray.wu@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Fix parameter mismatch in panel self-refresh helper
Srinivasan Shanmugam [Mon, 30 Mar 2026 03:12:29 +0000 (08:42 +0530)] 
drm/amd/display: Fix parameter mismatch in panel self-refresh helper

Align parameter names with function arguments.

The function controls panel self-refresh enable/disable based on vblank
and VRR state.

Fixes the below with gcc W=1:
../display/amdgpu_dm/amdgpu_dm_crtc.c:131 function parameter 'dm' not described in 'amdgpu_dm_crtc_set_panel_sr_feature'
../display/amdgpu_dm/amdgpu_dm_crtc.c:131 function parameter 'acrtc' not described in 'amdgpu_dm_crtc_set_panel_sr_feature'
../display/amdgpu_dm/amdgpu_dm_crtc.c:131 function parameter 'stream' not described in 'amdgpu_dm_crtc_set_panel_sr_feature'
../display/amdgpu_dm/amdgpu_dm_crtc.c:131 function parameter 'dm' not described in 'amdgpu_dm_crtc_set_panel_sr_feature'
../display/amdgpu_dm/amdgpu_dm_crtc.c:131 function parameter 'acrtc' not described in 'amdgpu_dm_crtc_set_panel_sr_feature'
../display/amdgpu_dm/amdgpu_dm_crtc.c:131 function parameter 'stream' not described in 'amdgpu_dm_crtc_set_panel_sr_feature'

Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)")
Cc: Ray Wu <ray.wu@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Use drm_display_info for AMD VSDB data
Chenyu Chen [Tue, 31 Mar 2026 03:15:02 +0000 (11:15 +0800)] 
drm/amd/display: Use drm_display_info for AMD VSDB data

Replace the raw EDID byte-walking in parse_amd_vsdb() with a read
from connector->display_info.amd_vsdb, now populated by drm_edid.

Factor out panel type determination into dm_set_panel_type(), which
checks VSDB panel_type, DPCD ext caps, and a luminance heuristic as
fallbacks.

Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/edid: Parse AMD Vendor-Specific Data Block
Chenyu Chen [Tue, 31 Mar 2026 03:14:26 +0000 (11:14 +0800)] 
drm/edid: Parse AMD Vendor-Specific Data Block

Parse the AMD VSDB v3 from CTA extension blocks and store the result
in struct drm_amd_vsdb_info, a new field of drm_display_info. This
includes replay mode, panel type, and luminance ranges.

Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Fix dc_is_fp_enabled name mismatch
Srinivasan Shanmugam [Mon, 30 Mar 2026 02:56:03 +0000 (08:26 +0530)] 
drm/amd/display: Fix dc_is_fp_enabled name mismatch

Fix incorrect function name in comment to match dc_is_fp_enabled.

This function checks if FPU is currently active by reading a counter.
The FPU helpers manage safe usage of FPU in the kernel by tracking when
it starts and stops, avoiding misuse or crashes.

Fixes: 3539437f354b ("drm/amd/display: Move FPU Guards From DML To DC - Part 1")
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Dillon Varone <dillon.varone@amd.com>
Cc: Rafal Ostrowski <rafal.ostrowski@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agosoftware node: remove software_node_exit()
Bartosz Golaszewski [Thu, 2 Apr 2026 14:15:03 +0000 (16:15 +0200)] 
software node: remove software_node_exit()

software_node_exit() is an __exitcall() in a built-in compilation unit
so effectively dead code. Remove it.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260402-nokia770-gpio-swnodes-v5-2-d730db3dd299@oss.qualcomm.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2 weeks agokernel: ksysfs: initialize kernel_kobj earlier
Bartosz Golaszewski [Thu, 2 Apr 2026 14:15:02 +0000 (16:15 +0200)] 
kernel: ksysfs: initialize kernel_kobj earlier

Software nodes depend on kernel_kobj which is initialized pretty late
into the boot process - as a core_initcall(). Ahead of moving the
software node initialization to driver_init() we must first make
kernel_kobj available before it.

Make ksysfs_init() visible in a new header - ksysfs.h - and call it in
do_basic_setup() right before driver_init().

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260402-nokia770-gpio-swnodes-v5-1-d730db3dd299@oss.qualcomm.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2 weeks agodrm/amd/display: Wire up dcn10_dio_construct() for all pre-DCN401 generations
Ionut Nechita [Mon, 23 Mar 2026 21:13:43 +0000 (23:13 +0200)] 
drm/amd/display: Wire up dcn10_dio_construct() for all pre-DCN401 generations

Description:
 - Commit b82f0759346617b2 ("drm/amd/display: Migrate DIO registers access
   from hwseq to dio component") moved DIO_MEM_PWR_CTRL register access
   behind the new dio abstraction layer but only created the dio object for
   DCN 4.01. On all other generations (DCN 10/20/21/201/30/301/302/303/
   31/314/315/316/32/321/35/351/36), the dio pointer is NULL, causing the
   register write to be silently skipped.

   This results in AFMT HDMI memory not being powered on during init_hw,
   which can cause HDMI audio failures and display issues on affected
   hardware including Renoir/Cezanne (DCN 2.1) APUs that use dcn10_init_hw.

   Call dcn10_dio_construct() in each older DCN generation's resource.c
   to create the dio object, following the same pattern as DCN 4.01. This
   ensures the dio pointer is non-NULL and the mem_pwr_ctrl callback works
   through the dio abstraction for all DCN generations.

Fixes: b82f07593466 ("drm/amd/display: Migrate DIO registers access from hwseq to dio component.")
Reviewed-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Ionut Nechita <ionut_n2001@yahoo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a4983968fa5b3179ab090407d325a71cdc96874e)

2 weeks agoMerge tag 'spi-fix-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Fri, 3 Apr 2026 17:19:52 +0000 (10:19 -0700)] 
Merge tag 'spi-fix-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A small collection of fixes, mostly probe/remove issues that are the
  result of Felix Gu going and auditing those areas, plus one error
  handling fix for the Cadence QSPI driver"

* tag 'spi-fix-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: cadence-qspi: Fix exec_mem_op error handling
  spi: amlogic: spifc-a4: unregister ECC engine on probe failure and remove() callback
  spi: stm32-ospi: Fix DMA channel leak on stm32_ospi_dma_setup() failure
  spi: stm32-ospi: Fix reset control leak on probe error
  spi: stm32-ospi: Fix resource leak in remove() callback

2 weeks agosched_ext: Fix stale direct dispatch state in ddsp_dsq_id
Andrea Righi [Fri, 3 Apr 2026 06:57:20 +0000 (08:57 +0200)] 
sched_ext: Fix stale direct dispatch state in ddsp_dsq_id

@p->scx.ddsp_dsq_id can be left set (non-SCX_DSQ_INVALID) triggering a
spurious warning in mark_direct_dispatch() when the next wakeup's
ops.select_cpu() calls scx_bpf_dsq_insert(), such as:

 WARNING: kernel/sched/ext.c:1273 at scx_dsq_insert_commit+0xcd/0x140

The root cause is that ddsp_dsq_id was only cleared in dispatch_enqueue(),
which is not reached in all paths that consume or cancel a direct dispatch
verdict.

Fix it by clearing it at the right places:

 - direct_dispatch(): cache the direct dispatch state in local variables
   and clear it before dispatch_enqueue() on the synchronous path. For
   the deferred path, the direct dispatch state must remain set until
   process_ddsp_deferred_locals() consumes them.

 - process_ddsp_deferred_locals(): cache the dispatch state in local
   variables and clear it before calling dispatch_to_local_dsq(), which
   may migrate the task to another rq.

 - do_enqueue_task(): clear the dispatch state on the enqueue path
   (local/global/bypass fallbacks), where the direct dispatch verdict is
   ignored.

 - dequeue_task_scx(): clear the dispatch state after dispatch_dequeue()
   to handle both the deferred dispatch cancellation and the holding_cpu
   race, covering all cases where a pending direct dispatch is
   cancelled.

 - scx_disable_task(): clear the direct dispatch state when
   transitioning a task out of the current scheduler. Waking tasks may
   have had the direct dispatch state set by the outgoing scheduler's
   ops.select_cpu() and then been queued on a wake_list via
   ttwu_queue_wakelist(), when SCX_OPS_ALLOW_QUEUED_WAKEUP is set. Such
   tasks are not on the runqueue and are not iterated by scx_bypass(),
   so their direct dispatch state won't be cleared. Without this clear,
   any subsequent SCX scheduler that tries to direct dispatch the task
   will trigger the WARN_ON_ONCE() in mark_direct_dispatch().

Fixes: 5b26f7b920f7 ("sched_ext: Allow SCX_DSQ_LOCAL_ON for direct dispatches")
Cc: stable@vger.kernel.org # v6.12+
Cc: Daniel Hodges <hodgesd@meta.com>
Cc: Patrick Somaru <patsomaru@meta.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2 weeks agoworkqueue: avoid unguarded 64-bit division
Arnd Bergmann [Thu, 2 Apr 2026 20:59:03 +0000 (22:59 +0200)] 
workqueue: avoid unguarded 64-bit division

The printk() requires a division that is not allowed on 32-bit architectures:

x86_64-linux-ld: lib/test_workqueue.o: in function `test_workqueue_init':
test_workqueue.c:(.init.text+0x36f): undefined reference to `__udivdi3'

Use div_u64() to print the resulting elapsed microseconds.

Fixes: 24b2e73f9700 ("workqueue: add test_workqueue benchmark module")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
2 weeks agoMerge tag 'pm-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 3 Apr 2026 16:56:32 +0000 (09:56 -0700)] 
Merge tag 'pm-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix a potential NULL pointer dereference in the energy model
  netlink interface and a potential double free in an error path in
  the common cpufreq governor management code:

   - Fix a NULL pointer dereference in the energy model netlink
     interface that may occur if a given perf domain ID is not
     recognized (Changwoo Min)

   - Avoid double free in the cpufreq_dbs_governor_init() error
     path when kobject_init_and_add() fails (Guangshuo Li)"

* tag 'pm-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: governor: fix double free in cpufreq_dbs_governor_init() error path
  PM: EM: Fix NULL pointer dereference when perf domain ID is not found

2 weeks agoMerge tag 'thermal-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 3 Apr 2026 16:49:06 +0000 (09:49 -0700)] 
Merge tag 'thermal-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "Address potential races between thermal zone removal and system
  resume that may lead to a use-after-free (in two different ways)
  and a potential use-after-free in the thermal zone unregistration
  path (Rafael Wysocki)"

* tag 'thermal-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: core: Fix thermal zone device registration error path
  thermal: core: Address thermal zone removal races with resume

2 weeks agoKVM: SEV: Disallow LAUNCH_FINISH if vCPUs are actively being created
Sean Christopherson [Tue, 10 Mar 2026 23:48:12 +0000 (16:48 -0700)] 
KVM: SEV: Disallow LAUNCH_FINISH if vCPUs are actively being created

Reject LAUNCH_FINISH for SEV-ES and SNP VMs if KVM is actively creating
one or more vCPUs, as KVM needs to process and encrypt each vCPU's VMSA.
Letting userspace create vCPUs while LAUNCH_FINISH is in-progress is
"fine", at least in the current code base, as kvm_for_each_vcpu() operates
on online_vcpus, LAUNCH_FINISH (all SEV+ sub-ioctls) holds kvm->mutex, and
fully onlining a vCPU in kvm_vm_ioctl_create_vcpu() is done under
kvm->mutex.  I.e. there's no difference between an in-progress vCPU and a
vCPU that is created entirely after LAUNCH_FINISH.

However, given that concurrent LAUNCH_FINISH and vCPU creation can't
possibly work (for any reasonable definition of "work"), since userspace
can't guarantee whether a particular vCPU will be encrypted or not,
disallow the combination as a hardening measure, to reduce the probability
of introducing bugs in the future, and to avoid having to reason about the
safety of future changes related to LAUNCH_FINISH.

Cc: Jethro Beekman <jethro@fortanix.com>
Closes: https://lore.kernel.org/all/b31f7c6e-2807-4662-bcdd-eea2c1e132fa@fortanix.com
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260310234829.2608037-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: Protect *all* of sev_mem_enc_register_region() with kvm->lock
Sean Christopherson [Tue, 10 Mar 2026 23:48:11 +0000 (16:48 -0700)] 
KVM: SEV: Protect *all* of sev_mem_enc_register_region() with kvm->lock

Take and hold kvm->lock for before checking sev_guest() in
sev_mem_enc_register_region(), as sev_guest() isn't stable unless kvm->lock
is held (or KVM can guarantee KVM_SEV_INIT{2} has completed and can't
rollack state).  If KVM_SEV_INIT{2} fails, KVM can end up trying to add to
a not-yet-initialized sev->regions_list, e.g. triggering a #GP

  Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
  KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
  CPU: 110 UID: 0 PID: 72717 Comm: syz.15.11462 Tainted: G     U  W  O        6.16.0-smp-DEV #1 NONE
  Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024
  RIP: 0010:sev_mem_enc_register_region+0x3f0/0x4f0 ../include/linux/list.h:83
  Code: <41> 80 3c 04 00 74 08 4c 89 ff e8 f1 c7 a2 00 49 39 ed 0f 84 c6 00
  RSP: 0018:ffff88838647fbb8 EFLAGS: 00010256
  RAX: dffffc0000000000 RBX: 1ffff92015cf1e0b RCX: dffffc0000000000
  RDX: 0000000000000000 RSI: 0000000000001000 RDI: ffff888367870000
  RBP: ffffc900ae78f050 R08: ffffea000d9e0007 R09: 1ffffd4001b3c000
  R10: dffffc0000000000 R11: fffff94001b3c001 R12: 0000000000000000
  R13: ffff8982ab0bde00 R14: ffffc900ae78f058 R15: 0000000000000000
  FS:  00007f34e9dc66c0(0000) GS:ffff89ee64d33000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fe180adef98 CR3: 000000047210e000 CR4: 0000000000350ef0
  Call Trace:
   <TASK>
   kvm_arch_vm_ioctl+0xa72/0x1240 ../arch/x86/kvm/x86.c:7371
   kvm_vm_ioctl+0x649/0x990 ../virt/kvm/kvm_main.c:5363
   __se_sys_ioctl+0x101/0x170 ../fs/ioctl.c:51
   do_syscall_x64 ../arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0x6f/0x1f0 ../arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f34e9f7e9a9
  Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f34e9dc6038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007f34ea1a6080 RCX: 00007f34e9f7e9a9
  RDX: 0000200000000280 RSI: 000000008010aebb RDI: 0000000000000007
  RBP: 00007f34ea000d69 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
  R13: 0000000000000000 R14: 00007f34ea1a6080 R15: 00007ffce77197a8
   </TASK>

with a syzlang reproducer that looks like:

  syz_kvm_add_vcpu$x86(0x0, &(0x7f0000000040)={0x0, &(0x7f0000000180)=ANY=[], 0x70}) (async)
  syz_kvm_add_vcpu$x86(0x0, &(0x7f0000000080)={0x0, &(0x7f0000000180)=ANY=[@ANYBLOB="..."], 0x4f}) (async)
  r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000200), 0x0, 0x0)
  r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0)
  r2 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000240), 0x0, 0x0)
  r3 = ioctl$KVM_CREATE_VM(r2, 0xae01, 0x0)
  ioctl$KVM_SET_CLOCK(r3, 0xc008aeba, &(0x7f0000000040)={0x1, 0x8, 0x0, 0x5625e9b0}) (async)
  ioctl$KVM_SET_PIT2(r3, 0x8010aebb, &(0x7f0000000280)={[...], 0x5}) (async)
  ioctl$KVM_SET_PIT2(r1, 0x4070aea0, 0x0) (async)
  r4 = ioctl$KVM_CREATE_VM(0xffffffffffffffff, 0xae01, 0x0)
  openat$kvm(0xffffffffffffff9c, 0x0, 0x0, 0x0) (async)
  ioctl$KVM_SET_USER_MEMORY_REGION(r4, 0x4020ae46, &(0x7f0000000400)={0x0, 0x0, 0x0, 0x2000, &(0x7f0000001000/0x2000)=nil}) (async)
  r5 = ioctl$KVM_CREATE_VCPU(r4, 0xae41, 0x2)
  close(r0) (async)
  openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x8000, 0x0) (async)
  ioctl$KVM_SET_GUEST_DEBUG(r5, 0x4048ae9b, &(0x7f0000000300)={0x4376ea830d46549b, 0x0, [0x46, 0x0, 0x0, 0x0, 0x0, 0x1000]}) (async)
  ioctl$KVM_RUN(r5, 0xae80, 0x0)

Opportunistically use guard() to avoid having to define a new error label
and goto usage.

Fixes: 1e80fdc09d12 ("KVM: SVM: Pin guest memory when SEV is active")
Cc: stable@vger.kernel.org
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://patch.msgid.link/20260310234829.2608037-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: Reject attempts to sync VMSA of an already-launched/encrypted vCPU
Sean Christopherson [Tue, 10 Mar 2026 23:48:10 +0000 (16:48 -0700)] 
KVM: SEV: Reject attempts to sync VMSA of an already-launched/encrypted vCPU

Reject synchronizing vCPU state to its associated VMSA if the vCPU has
already been launched, i.e. if the VMSA has already been encrypted.  On a
host with SNP enabled, accessing guest-private memory generates an RMP #PF
and panics the host.

  BUG: unable to handle page fault for address: ff1276cbfdf36000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x80000003) - RMP violation
  PGD 5a31801067 P4D 5a31802067 PUD 40ccfb5063 PMD 40e5954063 PTE 80000040fdf36163
  SEV-SNP: PFN 0x40fdf36, RMP entry: [0x6010fffffffff001 - 0x000000000000001f]
  Oops: Oops: 0003 [#1] SMP NOPTI
  CPU: 33 UID: 0 PID: 996180 Comm: qemu-system-x86 Tainted: G           OE
  Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  Hardware name: Dell Inc. PowerEdge R7625/0H1TJT, BIOS 1.5.8 07/21/2023
  RIP: 0010:sev_es_sync_vmsa+0x54/0x4c0 [kvm_amd]
  Call Trace:
   <TASK>
   snp_launch_update_vmsa+0x19d/0x290 [kvm_amd]
   snp_launch_finish+0xb6/0x380 [kvm_amd]
   sev_mem_enc_ioctl+0x14e/0x720 [kvm_amd]
   kvm_arch_vm_ioctl+0x837/0xcf0 [kvm]
   kvm_vm_ioctl+0x3fd/0xcc0 [kvm]
   __x64_sys_ioctl+0xa3/0x100
   x64_sys_call+0xfe0/0x2350
   do_syscall_64+0x81/0x10f0
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7ffff673287d
   </TASK>

Note, the KVM flaw has been present since commit ad73109ae7ec ("KVM: SVM:
Provide support to launch and run an SEV-ES guest"), but has only been
actively dangerous for the host since SNP support was added.  With SEV-ES,
KVM would "just" clobber guest state, which is totally fine from a host
kernel perspective since userspace can clobber guest state any time before
sev_launch_update_vmsa().

Fixes: ad27ce155566 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command")
Reported-by: Jethro Beekman <jethro@fortanix.com>
Closes: https://lore.kernel.org/all/d98692e2-d96b-4c36-8089-4bc1e5cc3d57@fortanix.com
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260310234829.2608037-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: selftests: Remove duplicate LAUNCH_UPDATE_VMSA call in SEV-ES migrate test
Sean Christopherson [Tue, 10 Mar 2026 23:48:09 +0000 (16:48 -0700)] 
KVM: selftests: Remove duplicate LAUNCH_UPDATE_VMSA call in SEV-ES migrate test

Drop the explicit KVM_SEV_LAUNCH_UPDATE_VMSA call when creating an SEV-ES
VM in the SEV migration test, as sev_vm_create() automatically updates the
VMSA pages for SEV-ES guests.  The only reason the duplicate call doesn't
cause visible problems is because the test doesn't actually try to run the
vCPUs.  That will change when KVM adds a check to prevent userspace from
re-launching a VMSA (which corrupts the VMSA page due to KVM writing
encrypted private memory).

Fixes: 69f8e15ab61f ("KVM: selftests: Use the SEV library APIs in the intra-host migration test")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260310234829.2608037-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: Use kvzalloc_objs() when pinning userpages
Sean Christopherson [Fri, 13 Mar 2026 00:33:02 +0000 (17:33 -0700)] 
KVM: SEV: Use kvzalloc_objs() when pinning userpages

Use kvzalloc_objs() instead of sev_pin_memory()'s open coded (rough)
equivalent to harden the code and

Note!  This sanity check in __kvmalloc_node_noprof()

  /* Don't even allow crazy sizes */
  if (unlikely(size > INT_MAX)) {
          WARN_ON_ONCE(!(flags & __GFP_NOWARN));
          return NULL;
  }

will artificially limit the maximum size of any single pinned region to
just under 1TiB.  While there do appear to be providers that support SEV
VMs with more than 1TiB of _total_ memory, it's unlikely any KVM-based
providers pin 1TiB in a single request.

Allocate with NOWARN so that fuzzers can't trip the WARN_ON_ONCE() when
they inevitably run on systems with copious amounts of RAM, i.e. when they
can get by KVM's "total_npages > totalram_pages()" restriction.

Note #2, KVM's usage of vmalloc()+kmalloc() instead of kvmalloc() predates
commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls") by 4+
years (see commit 89c505809052 ("KVM: SVM: Add support for
KVM_SEV_LAUNCH_UPDATE_DATA command").  I.e. the open coded behavior wasn't
intended to avoid the aforementioned sanity check.  The implementation
appears to be pure oversight at the time the code was written, as it showed
up in v3[1] of the early RFCs, whereas as v2[2] simply used kmalloc().

Cc: Liam Merwick <liam.merwick@oracle.com>
Link: https://lore.kernel.org/all/20170724200303.12197-17-brijesh.singh@amd.com
Link: https://lore.kernel.org/all/148846786714.2349.17724971671841396908.stgit__25299.4950431914$1488470940$gmane$org@brijesh-build-machine
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Tested-by: Liam Merwick <liam.merwick@oracle.com>
Link: https://patch.msgid.link/20260313003302.3136111-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: Use PFN_DOWN() to simplify "number of pages" math when pinning memory
Sean Christopherson [Fri, 13 Mar 2026 00:33:01 +0000 (17:33 -0700)] 
KVM: SEV: Use PFN_DOWN() to simplify "number of pages" math when pinning memory

Use PFN_DOWN() instead of open coded equivalents in sev_pin_memory() to
simplify the code and make it easier to read.

No functional change intended (verified before and after versions of the
generated code are identical).

Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Tested-by: Liam Merwick <liam.merwick@oracle.com>
Link: https://patch.msgid.link/20260313003302.3136111-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: Disallow pinning more pages than exist in the system
Sean Christopherson [Fri, 13 Mar 2026 00:33:00 +0000 (17:33 -0700)] 
KVM: SEV: Disallow pinning more pages than exist in the system

Explicitly disallow pinning more pages for an SEV VM than exist in the
system to defend against absurd userspace requests without relying on
somewhat arbitrary kernel functionality to prevent truly stupid KVM
behavior.  E.g. even with the INT_MAX check, userspace can request that
KVM pin nearly 8TiB of memory, regardless of how much RAM exists in the
system.

Opportunistically rename "locked" to a more descriptive "total_npages".

Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Tested-by: Liam Merwick <liam.merwick@oracle.com>
Link: https://patch.msgid.link/20260313003302.3136111-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: Drop useless sanity checks in sev_mem_enc_register_region()
Sean Christopherson [Fri, 13 Mar 2026 00:32:59 +0000 (17:32 -0700)] 
KVM: SEV: Drop useless sanity checks in sev_mem_enc_register_region()

Drop sev_mem_enc_register_region()'s sanity checks on the incoming address
and size, as SEV is 64-bit only, making ULONG_MAX a 64-bit, all-ones value,
and thus making it impossible for kvm_enc_region.{addr,size} to be greater
than ULONG_MAX.

Note, sev_pin_memory() verifies the incoming address is non-NULL (which
isn't strictly required, but whatever), and that addr+size don't wrap to
zero (which _is_ needed and what really needs to be guarded against).

Note #2, pin_user_pages_fast() guards against the end address walking into
kernel address space, so lack of an access_ok() check is also safe (maybe
not ideal, but safe).

No functional change intended (the generated code is literally the same,
i.e. the compiler was smart enough to know the checks were useless).

Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Tested-by: Liam Merwick <liam.merwick@oracle.com>
Link: https://patch.msgid.link/20260313003302.3136111-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoKVM: SEV: Drop WARN on large size for KVM_MEMORY_ENCRYPT_REG_REGION
Sean Christopherson [Fri, 13 Mar 2026 00:32:58 +0000 (17:32 -0700)] 
KVM: SEV: Drop WARN on large size for KVM_MEMORY_ENCRYPT_REG_REGION

Drop the WARN in sev_pin_memory() on npages overflowing an int, as the
WARN is comically trivially to trigger from userspace, e.g. by doing:

  struct kvm_enc_region range = {
          .addr = 0,
          .size = -1ul,
  };

  __vm_ioctl(vm, KVM_MEMORY_ENCRYPT_REG_REGION, &range);

Note, the checks in sev_mem_enc_register_region() that presumably exist to
verify the incoming address+size are completely worthless, as both "addr"
and "size" are u64s and SEV is 64-bit only, i.e. they _can't_ be greater
than ULONG_MAX.  That wart will be cleaned up in the near future.

if (range->addr > ULONG_MAX || range->size > ULONG_MAX)
return -EINVAL;

Opportunistically add a comment to explain why the code calculates the
number of pages the "hard" way, e.g. instead of just shifting @ulen.

Fixes: 78824fabc72e ("KVM: SVM: fix svn_pin_memory()'s use of get_user_pages_fast()")
Cc: stable@vger.kernel.org
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Tested-by: Liam Merwick <liam.merwick@oracle.com>
Link: https://patch.msgid.link/20260313003302.3136111-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agomisc: pci_endpoint_test: Use -EINVAL for small subrange size
Koichiro Den [Fri, 20 Mar 2026 14:01:39 +0000 (23:01 +0900)] 
misc: pci_endpoint_test: Use -EINVAL for small subrange size

The sub_size check ensures that each subrange is large enough for 32-bit
accesses. Subranges smaller than sizeof(u32) do not satisfy this
assumption, so this is a local sanity check rather than a resource
exhaustion case.

Return -EINVAL instead of -ENOSPC for this case.

Suggested-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260320140139.2415480-1-den@valinux.co.jp
2 weeks agoKVM: x86: Suppress WARNs on nested_run_pending after userspace exit
Sean Christopherson [Thu, 12 Mar 2026 23:48:23 +0000 (16:48 -0700)] 
KVM: x86: Suppress WARNs on nested_run_pending after userspace exit

To end an ongoing game of whack-a-mole between KVM and syzkaller, WARN on
illegally cancelling a pending nested VM-Enter if and only if userspace
has NOT gained control of the vCPU since the nested run was initiated.  As
proven time and time again by syzkaller, userspace can clobber vCPU state
so as to force a VM-Exit that violates KVM's architectural modelling of
VMRUN/VMLAUNCH/VMRESUME.

To detect that userspace has gained control, while minimizing the risk of
operating on stale data, convert nested_run_pending from a pure boolean to
a tri-state of sorts, where '0' is still "not pending", '1' is "pending",
and '2' is "pending but untrusted".  Then on KVM_RUN, if the flag is in
the "trusted pending" state, move it to "untrusted pending".

Note, moving the state to "untrusted" even if KVM_RUN is ultimately
rejected is a-ok, because for the "untrusted" state to matter, KVM must
get past kvm_x86_vcpu_pre_run() at some point for the vCPU.

Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Link: https://patch.msgid.link/20260312234823.3120658-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoMerge tag 'gpio-fixes-for-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 3 Apr 2026 16:33:38 +0000 (09:33 -0700)] 
Merge tag 'gpio-fixes-for-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

 - fix kerneldocs for gpio-timberdale and gpio-nomadik

 - clear the "requested" flag in error path in gpiod_request_commit()

 - call of_xlate() if provided when setting up shared GPIOs

 - handle pins shared by child firmware nodes of consumer devices

 - fix return value check in gpio-qixis-fpga

 - fix suspend on gpio-mxc

 - fix gpio-microchip DT bindings

* tag 'gpio-fixes-for-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  dt-bindings: gpio: fix microchip #interrupt-cells
  gpio: shared: shorten the critical section in gpiochip_setup_shared()
  gpio: mxc: map Both Edge pad wakeup to Rising Edge
  gpio: qixis-fpga: Fix error handling for devm_regmap_init_mmio()
  gpio: shared: handle pins shared by child nodes of devices
  gpio: shared: call gpio_chip::of_xlate() if set
  gpiolib: clear requested flag if line is invalid
  gpio: nomadik: repair some kernel-doc comments
  gpio: timberdale: repair kernel-doc comments
  gpio: Fix resource leaks on errors in gpiochip_add_data_with_key()

2 weeks agoKVM: x86: Move nested_run_pending to kvm_vcpu_arch
Yosry Ahmed [Thu, 12 Mar 2026 23:48:22 +0000 (16:48 -0700)] 
KVM: x86: Move nested_run_pending to kvm_vcpu_arch

Move nested_run_pending field present in both svm_nested_state and
nested_vmx to the common kvm_vcpu_arch. This allows for common code to
use without plumbing it through per-vendor helpers.

nested_run_pending remains zero-initialized, as the entire kvm_vcpu
struct is, and all further accesses are done through vcpu->arch instead
of svm->nested or vmx->nested.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
[sean: expand the commend in the field declaration]
Link: https://patch.msgid.link/20260312234823.3120658-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2 weeks agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 3 Apr 2026 15:47:13 +0000 (08:47 -0700)] 
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Will Deacon:

 - Implement a basic static call trampoline to fix CFI failures with the
   generic implementation

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Use static call trampolines when kCFI is enabled

2 weeks agoselftests/seccomp: Add hard-coded __NR_uprobe for x86_64
Oleg Nesterov [Fri, 3 Apr 2026 13:30:40 +0000 (15:30 +0200)] 
selftests/seccomp: Add hard-coded __NR_uprobe for x86_64

This complements the commit 18f7686a1ce6 ("selftests/seccomp:
Add hard-coded __NR_uretprobe for x86_64").

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://patch.msgid.link/ac_BAMSggw-_ABPE@redhat.com
Signed-off-by: Kees Cook <kees@kernel.org>
2 weeks agoMerge branch 'bpf-prep-patches-for-static-stack-liveness'
Alexei Starovoitov [Fri, 3 Apr 2026 15:33:48 +0000 (08:33 -0700)] 
Merge branch 'bpf-prep-patches-for-static-stack-liveness'

Alexei Starovoitov says:

====================
bpf: Prep patches for static stack liveness.

v4->v5:
- minor test fixup

v3->v4:
- fixed invalid recursion detection when calback is called multiple times

v3: https://lore.kernel.org/bpf/20260402212856.86606-1-alexei.starovoitov@gmail.com/

v2->v3:
- added recursive call detection
- fixed ubsan warning
- removed double declaration in the header
- added Acks

v2: https://lore.kernel.org/bpf/20260402061744.10885-1-alexei.starovoitov@gmail.com/

v1->v2:
. fixed bugs spotted by Eduard, Mykyta, claude and gemini
. fixed selftests that were failing in unpriv
. gemini(sashiko) found several precision improvements in patch 6,
  but they made no difference in real programs.

v1: https://lore.kernel.org/bpf/20260401021635.34636-1-alexei.starovoitov@gmail.com/
First 6 prep patches for static stack liveness.

. do src/dst_reg validation early and remove defensive checks

. sort subprog in topo order. We wanted to do this long ago
  to process global subprogs this way and in other cases.

. Add constant folding pass that computes map_ptr, subprog_idx,
  loads from readonly maps, and other constants that fit into 32-bit

. Use these constants to eliminate dead code. Replace predicted
  conditional branches with "jmp always". That reduces JIT prog size.

. Add two helpers that return access size from their arguments.
====================

Link: https://patch.msgid.link/20260403024422.87231-1-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agobpf: Add helper and kfunc stack access size resolution
Alexei Starovoitov [Fri, 3 Apr 2026 02:44:21 +0000 (19:44 -0700)] 
bpf: Add helper and kfunc stack access size resolution

The static stack liveness analysis needs to know how many bytes a
helper or kfunc accesses through a stack pointer argument, so it can
precisely mark the affected stack slots as stack 'def' or 'use'.

Add bpf_helper_stack_access_bytes() and bpf_kfunc_stack_access_bytes()
which resolve the access size for a given call argument.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-7-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agobpf: Move verifier helpers to header
Alexei Starovoitov [Fri, 3 Apr 2026 02:44:20 +0000 (19:44 -0700)] 
bpf: Move verifier helpers to header

Move several helpers to header as preparation for
the subsequent stack liveness patches.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-6-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agobpf: Add bpf_compute_const_regs() and bpf_prune_dead_branches() passes
Alexei Starovoitov [Fri, 3 Apr 2026 02:44:19 +0000 (19:44 -0700)] 
bpf: Add bpf_compute_const_regs() and bpf_prune_dead_branches() passes

Add two passes before the main verifier pass:

bpf_compute_const_regs() is a forward dataflow analysis that tracks
register values in R0-R9 across the program using fixed-point
iteration in reverse postorder. Each register is tracked with
a six-state lattice:

  UNVISITED -> CONST(val) / MAP_PTR(map_index) /
               MAP_VALUE(map_index, offset) / SUBPROG(num) -> UNKNOWN

At merge points, if two paths produce the same state and value for
a register, it stays; otherwise it becomes UNKNOWN.

The analysis handles:
 - MOV, ADD, SUB, AND with immediate or register operands
 - LD_IMM64 for plain constants, map FDs, map values, and subprogs
 - LDX from read-only maps: constant-folds the load by reading the
   map value directly via bpf_map_direct_read()

Results that fit in 32 bits are stored per-instruction in
insn_aux_data and bitmasks.

bpf_prune_dead_branches() uses the computed constants to evaluate
conditional branches. When both operands of a conditional jump are
known constants, the branch outcome is determined statically and the
instruction is rewritten to an unconditional jump.
The CFG postorder is then recomputed to reflect new control flow.
This eliminates dead edges so that subsequent liveness analysis
doesn't propagate through dead code.

Also add runtime sanity check to validate that precomputed
constants match the verifier's tracked state.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-5-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agoselftests/bpf: Add tests for subprog topological ordering
Alexei Starovoitov [Fri, 3 Apr 2026 02:44:18 +0000 (19:44 -0700)] 
selftests/bpf: Add tests for subprog topological ordering

Add few tests for topo sort:
- linear chain: main -> A -> B
- diamond: main -> A, main -> B, A -> C, B -> C
- mixed global/static: main -> global -> static leaf
- shared callee: main -> leaf, main -> global -> leaf
- duplicate calls: main calls same subprog twice
- no calls: single subprog

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-4-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agobpf: Sort subprogs in topological order after check_cfg()
Alexei Starovoitov [Fri, 3 Apr 2026 02:44:17 +0000 (19:44 -0700)] 
bpf: Sort subprogs in topological order after check_cfg()

Add a pass that sorts subprogs in topological order so that iterating
subprog_topo_order[] walks leaf subprogs first, then their callers.
This is computed as a DFS post-order traversal of the CFG.

The pass runs after check_cfg() to ensure the CFG has been validated
before traversing and after postorder has been computed to avoid
walking dead code.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-3-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agobpf: Do register range validation early
Alexei Starovoitov [Fri, 3 Apr 2026 02:44:16 +0000 (19:44 -0700)] 
bpf: Do register range validation early

Instead of checking src/dst range multiple times during
the main verifier pass do them once.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-2-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agoMerge tag 'drm-fixes-2026-04-03' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 3 Apr 2026 15:23:51 +0000 (08:23 -0700)] 
Merge tag 'drm-fixes-2026-04-03' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Hopefully no Easter eggs in this bunch of fixes. Usual stuff across
  the amd/intel with some misc bits. Thanks to Thorsten and Alex for
  making sure a regression fix that was hanging around in process land
  finally made it in, that is probably the biggest change in here.

  core:
   - revert unplug/framebuffer fix as it caused problems
   - compat ioctl speculation fix

  bridge:
   - refcounting fix

  sysfb:
   - error handling fix

  amdgpu:
   - fix renoir audio regression
   - UserQ fixes
   - PASID handling fix
   - S4 fix for smu11 chips
   - Misc small fixes

  amdkfd:
   - Non-4K page fixes

  i915:
   - Fix for #12045: Huawei Matebook E (DRR-WXX): Persistent Black
     Screen on Boot with i915 and Gen11: Modesetting and Backlight
     Control Malfunction
   - Fix for #15826: i915: Raptor Lake-P [UHD Graphics] display
     flicker/corruption on eDP panel
   - Use crtc_state->enhanced_framing properly on ivb/hsw CPU eDP

  xe:
   - uapi: Accept canonical GPU addresses in xe_vm_madvise_ioctl
   - Disallow writes to read-only VMAs
   - PXP fixes
   - Disable garbage collector work item on SVM close
   - void memory allocations in xe_device_declare_wedged

  qaic:
   - hang fix

  ast:
   - initialisation fix"

* tag 'drm-fixes-2026-04-03' of https://gitlab.freedesktop.org/drm/kernel: (28 commits)
  drm/amd/display: Wire up dcn10_dio_construct() for all pre-DCN401 generations
  drm/ioc32: stop speculation on the drm_compat_ioctl path
  drm/sysfb: Fix efidrm error handling and memory type mismatch
  drm/i915/dp: Use crtc_state->enhanced_framing properly on ivb/hsw CPU eDP
  drm/i915/cdclk: Do the full CDCLK dance for min_voltage_level changes
  drm/amdkfd: Fix queue preemption/eviction failures by aligning control stack size to GPU page size
  drm/amdgpu: Fix wait after reset sequence in S4
  drm/amd/display: Fix NULL pointer dereference in dcn401_init_hw()
  drm/amdgpu: Change AMDGPU_VA_RESERVED_TRAP_SIZE to 64KB
  drm/amdgpu/userq: fix memory leak in MQD creation error paths
  drm/amd: Fix MQD and control stack alignment for non-4K
  drm/amdkfd: Align expected_queue_size to PAGE_SIZE
  drm/amdgpu: fix the idr allocation flags
  drm/amdgpu: validate doorbell_offset in user queue creation
  drm/amdgpu/pm: drop SMU driver if version not matched messages
  drm/xe: Avoid memory allocations in xe_device_declare_wedged()
  drm/xe: Disable garbage collector work item on SVM close
  drm/xe/pxp: Don't allow PXP on older PTL GSC FWs
  drm/xe/pxp: Clear restart flag in pxp_start after jumping back
  drm/xe/pxp: Remove incorrect handling of impossible state during suspend
  ...

2 weeks agoipmi: ssif_bmc: add unit test for state machine
Jian Zhang [Fri, 3 Apr 2026 14:39:38 +0000 (22:39 +0800)] 
ipmi: ssif_bmc: add unit test for state machine

Add some unit test for state machine when in SSIF_ABORTING state.

Fixes: dd2bc5cc9e25 ("ipmi: ssif_bmc: Add SSIF BMC driver")
Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com>
Message-ID: <20260403143939.434017-1-zhangjian.3032@bytedance.com>
Signed-off-by: Corey Minyard <corey@minyard.net>
2 weeks agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.0-rc6+
Alexei Starovoitov [Fri, 3 Apr 2026 15:12:58 +0000 (08:12 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.0-rc6+

Cross-merge BPF and other fixes after downstream PR.

Minor conflict in kernel/bpf/verifier.c

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 weeks agox86/alternative: delay freeing of smp_locks section
Mike Rapoport (Microsoft) [Mon, 30 Mar 2026 19:10:00 +0000 (22:10 +0300)] 
x86/alternative: delay freeing of smp_locks section

On SMP systems alternative_instructions() frees memory occupied by
smp_locks section immediately after patching the lock instructions.

The memory is freed using free_init_pages() that calls free_reserved_area()
that essentially does __free_page() for every page in the range.

Up until recently it didn't update memblock state so in cases when
CONFIG_ARCH_KEEP_MEMBLOCK is enabled (on x86 it is selected by
INTEL_TDX_HOST), the state of memblock and the memory map would be
inconsistent.

Additionally, with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled, freeing of
smp_locks happens before the memory map is fully initialized and freeing
reserved memory may cause an access to not-yet-initialized struct page when
__free_page() searches for a buddy page.

Following the discussion in [1], implementation of memblock_free_late() and
free_reserved_area() was unified to ensure that reserved memory that's
freed after memblock transfers the pages to the buddy allocator is actually
freed and that the memblock and the memory map are consistent. As a part of
these changes, free_reserved_area() now WARN()s when it is called before
the initialization of the memory map is complete.

The memory map is fully initialized in page_alloc_init_late() that
completes before initcalls are executed, so it is safe to free reserved
memory in any initcall except early_initcall().

Move freeing of smp_locks section to an initcall to ensure it will happen
after the memory map is fully initialized. Since it does not matter which
exactly initcall to use and the code lives in arch/, pick arch_initcall.

[1] https://lore.kernel.org/all/ec2aaef14783869b3be6e3c253b2dcbf67dbc12a.camel@kernel.crashing.org

Reported-By: Bert Karwatzki <spasswolf@web.de>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202603302154.b50adaf1-lkp@intel.com
Tested-By: Bert Karwatzki <spasswolf@web.de>
Link: https://lore.kernel.org/r/20260327140109.7561-1-spasswolf@web.de
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Fixes: b2129a39511b ("memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2 weeks agoASoC: Intel: Fix MCLK leaks and clean up error
Mark Brown [Fri, 3 Apr 2026 14:15:04 +0000 (15:15 +0100)] 
ASoC: Intel: Fix MCLK leaks and clean up error

aravindanilraj0702@gmail.com <aravindanilraj0702@gmail.com> says:

From: Aravind Anilraj <aravindanilraj0702@gmail.com>

This series fixes MCLK resource leaks in the platform_clock_control()
implementations for bytcr_rt5640, bytcr_rt5651, and cht_bsw_rt5672.

In the SND_SOC_DAPM_EVENT_ON() path, clk_prepare_enable() is called to
enable MCLK, but subsequent failures in codec clock configuration (eg:
*_prepare_and_enable_pll1() or snd_soc_dai_set_sysclk()) return without
disabling the clock, leaking a reference.

Patches 1-3 fix this by adding the missing clk_disable_unprepare() calls
in the relevant error paths, ensuring proper symmetry between enable and
disable operations within the EVENT_ON scope.

Patch 4 moves unrelated logging changes into a separate patch and
standardizes error messages.

2 weeks agoASoC: Intel: Standardize MCLK error logs across RT boards
Aravind Anilraj [Wed, 1 Apr 2026 22:05:07 +0000 (18:05 -0400)] 
ASoC: Intel: Standardize MCLK error logs across RT boards

Standardize the error logging in platform_clock_control() by adding
missing newline characters to dev_err() strings. Additionally, include
the return code in the error messages to assist with debugging.

Signed-off-by: Aravind Anilraj <aravindanilraj0702@gmail.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260401220507.23557-5-aravindanilraj0702@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoASoC: Intel: cht_bsw_rt5672: Fix MCLK leak on platform_clock_control error
Aravind Anilraj [Wed, 1 Apr 2026 22:05:06 +0000 (18:05 -0400)] 
ASoC: Intel: cht_bsw_rt5672: Fix MCLK leak on platform_clock_control error

If snd_soc_dai_set_pll() or snd_soc_dai_set_sysclk() fail inside the
EVENT_ON path, the function returns without calling
clk_disable_unprepare() on ctx->mclk, which was already enabled earlier
in the same code path. Add the missing clk_disable_unprepare() calls
before returning the error.

Signed-off-by: Aravind Anilraj <aravindanilraj0702@gmail.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260401220507.23557-4-aravindanilraj0702@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoASoC: Intel: bytcr_rt5651: Fix MCLK leak on platform_clock_control error
Aravind Anilraj [Wed, 1 Apr 2026 22:05:05 +0000 (18:05 -0400)] 
ASoC: Intel: bytcr_rt5651: Fix MCLK leak on platform_clock_control error

If byt_rt5651_prepare_and_enable_pll1() fails, the function returns
without calling clk_disable_unprepare() on priv->mclk, which was
already enabled earlier in the same code path. Add the missing
cleanup call to prevent the clock from leaking.

Signed-off-by: Aravind Anilraj <aravindanilraj0702@gmail.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260401220507.23557-3-aravindanilraj0702@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoASoC: Intel: bytcr_rt5640: Fix MCLK leak on platform_clock_control error
Aravind Anilraj [Wed, 1 Apr 2026 22:05:04 +0000 (18:05 -0400)] 
ASoC: Intel: bytcr_rt5640: Fix MCLK leak on platform_clock_control error

If byt_rt5640_prepare_and_enable_pll1() fails, the function returns
without calling clk_disable_unprepare() on priv->mclk, which was
already enabled earlier in the same code path. Add the missing
cleanup call to prevent the clock from leaking.

Signed-off-by: Aravind Anilraj <aravindanilraj0702@gmail.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260401220507.23557-2-aravindanilraj0702@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoASoC: imx-rpmsg: Add DSD format support with dynamic DAI format switching
Chancel Liu [Thu, 26 Mar 2026 05:56:14 +0000 (14:56 +0900)] 
ASoC: imx-rpmsg: Add DSD format support with dynamic DAI format switching

Add hw_params callback to dynamically switch DAI format between I2S
and PDM based on audio stream format. When DSD formats are detected,
the DAI format is switched to PDM mode.

Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
Link: https://patch.msgid.link/20260326055614.3614104-1-chancel.liu@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoASoC: SDCA: Export Q7.8 volume control helpers
Niranjan H Y [Wed, 1 Apr 2026 13:21:45 +0000 (18:51 +0530)] 
ASoC: SDCA: Export Q7.8 volume control helpers

Export the Q7.8 volume control helpers to allow reuse
by other ASoC drivers. These functions handle 16-bit
signed Q7.8 fixed-point format values for volume controls.

Changes include:
- Rename q78_get_volsw to sdca_asoc_q78_get_volsw
- Rename q78_put_volsw to sdca_asoc_q78_put_volsw
- Add a convenience macro SDCA_SINGLE_Q78_TLV and
  SDCA_DOUBLE_Q78_TLV for creating mixer controls

This allows other ASoC drivers to easily implement controls
using the Q7.8 fixed-point format without duplicating code.

Signed-off-by: Niranjan H Y <niranjan.hy@ti.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20260401132148.2367-1-niranjan.hy@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoASoC: codecs: tlv320dac33: remove kmemdup_array
Rosen Penev [Thu, 2 Apr 2026 02:50:40 +0000 (19:50 -0700)] 
ASoC: codecs: tlv320dac33: remove kmemdup_array

Use a flexible array member and struct_size to use one allocation.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260402025040.93569-1-rosenp@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoASoC: SDCA: Add RJ support to class driver
Charles Keepax [Fri, 27 Mar 2026 16:27:32 +0000 (16:27 +0000)] 
ASoC: SDCA: Add RJ support to class driver

Add the retaskable jack Function to the list of Functions supported by
the class driver, it shouldn't require anything that isn't already
supported.

Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20260327162732.877257-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoexfat: use exfat_chain_advance helper
Chi Zhiling [Fri, 3 Apr 2026 08:05:38 +0000 (16:05 +0800)] 
exfat: use exfat_chain_advance helper

Replace open-coded cluster chain walking logic with exfat_chain_advance()
across exfat_readdir, exfat_find_dir_entry, exfat_count_dir_entries,
exfat_search_empty_slot and exfat_check_dir_empty.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2 weeks agoexfat: introduce exfat_chain_advance helper
Chi Zhiling [Fri, 3 Apr 2026 08:05:37 +0000 (16:05 +0800)] 
exfat: introduce exfat_chain_advance helper

Introduce exfat_chain_advance() to walk a exfat_chain structure by a
given step, updating both ->dir and ->size fields atomically. This
helper handles both ALLOC_NO_FAT_CHAIN and ALLOC_FAT_CHAIN modes with
proper boundary checking.

Suggested-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2 weeks agoexfat: remove NULL cache pointer case in exfat_ent_get
Chi Zhiling [Fri, 3 Apr 2026 08:05:36 +0000 (16:05 +0800)] 
exfat: remove NULL cache pointer case in exfat_ent_get

Since exfat_get_next_cluster has been updated, no callers pass a NULL
pointer to exfat_ent_get, so remove the handling logic for this case.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2 weeks agoexfat: use exfat_cluster_walk helper
Chi Zhiling [Fri, 3 Apr 2026 08:05:35 +0000 (16:05 +0800)] 
exfat: use exfat_cluster_walk helper

Replace the custom exfat_walk_fat_chain() function and open-coded
FAT chain walking logic with the exfat_cluster_walk() helper across
exfat_find_location, __exfat_get_dentry_set, and exfat_map_cluster.

Suggested-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2 weeks agoexfat: introduce exfat_cluster_walk helper
Chi Zhiling [Fri, 3 Apr 2026 08:05:34 +0000 (16:05 +0800)] 
exfat: introduce exfat_cluster_walk helper

Introduce exfat_cluster_walk() to walk the FAT chain by a given step,
handling both ALLOC_NO_FAT_CHAIN and ALLOC_FAT_CHAIN modes. Also
redefine exfat_get_next_cluster as a thin wrapper around it for
backward compatibility.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2 weeks agoexfat: fix incorrect directory checksum after rename to shorter name
Chi Zhiling [Fri, 3 Apr 2026 08:05:33 +0000 (16:05 +0800)] 
exfat: fix incorrect directory checksum after rename to shorter name

When renaming a file in-place to a shorter name, exfat_remove_entries
marks excess entries as DELETED, but es->num_entries is not updated
accordingly. As a result, exfat_update_dir_chksum iterates over the
deleted entries and computes an incorrect checksum.

This does not lead to persistent corruption because mark_inode_dirty()
is called afterward, and __exfat_write_inode later recomputes the
checksum using the correct num_entries value.

Fix by setting es->num_entries = num_entries in exfat_init_ext_entry.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2 weeks agoNFSD: Docs: clean up pnfs server timeout docs
Randy Dunlap [Wed, 18 Mar 2026 22:21:05 +0000 (15:21 -0700)] 
NFSD: Docs: clean up pnfs server timeout docs

Make various changes to the documentation formatting to avoid docs
build errors and otherwise improve the produced output format:

- use bullets for lists
- don't use a '.' at the end of echo commands
- fix indentation

Documentation/admin-guide/nfs/pnfs-block-server.rst:55: ERROR: Unexpected indentation. [docutils]
Documentation/admin-guide/nfs/pnfs-scsi-server.rst:37: ERROR: Unexpected indentation. [docutils]

Fixes: 6a97f70b45e7 ("NFSD: Enforce timeout on layout recall and integrate lease manager fencing")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 weeks agonfsd: fix comment typo in nfsxdr
Joseph Salisbury [Mon, 16 Mar 2026 18:28:45 +0000 (14:28 -0400)] 
nfsd: fix comment typo in nfsxdr

The file contains a spelling error in a source comment (occured).

Typos in comments reduce readability and make text searches less reliable
for developers and maintainers.

Replace 'occured' with 'occurred' in the affected comment. This is a
comment-only cleanup and does not change behavior.

Signed-off-by: Joseph Salisbury <joseph.salisbury@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 weeks agonfsd: fix comment typo in nfs3xdr
Joseph Salisbury [Mon, 16 Mar 2026 18:25:16 +0000 (14:25 -0400)] 
nfsd: fix comment typo in nfs3xdr

The file contains a spelling error in a source comment (occured).

Typos in comments reduce readability and make text searches less reliable
for developers and maintainers.

Replace 'occured' with 'occurred' in the affected comment. This is a
comment-only cleanup and does not change behavior.

Signed-off-by: Joseph Salisbury <joseph.salisbury@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 weeks agoNFSD: convert callback RPC program to per-net namespace
Dai Ngo [Fri, 13 Mar 2026 16:31:48 +0000 (12:31 -0400)] 
NFSD: convert callback RPC program to per-net namespace

The callback channel's rpc_program, rpc_version, rpc_stat,
and per-procedure counts are declared as file-scope statics in
nfs4callback.c, shared across all network namespaces.
Forechannel RPC statistics are already maintained per-netns
(via nfsd_svcstats in struct nfsd_net); the backchannel
has no such separation. When backchannel statistics are
eventually surfaced to userspace, the global counters would
expose cross-namespace data.

Allocate per-netns copies of these structures through a new
opaque struct nfsd_net_cb, managed by nfsd_net_cb_init()
and nfsd_net_cb_shutdown(). The struct definition is private
to nfs4callback.c; struct nfsd_net holds only a pointer.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 weeks agoNFSD: use per-operation statidx for callback procedures
Chuck Lever [Fri, 13 Mar 2026 16:31:47 +0000 (12:31 -0400)] 
NFSD: use per-operation statidx for callback procedures

The callback RPC procedure table uses NFSPROC4_CB_##call for
p_statidx, which maps CB_NULL to index 0 and every
compound-based callback (CB_RECALL, CB_LAYOUT, CB_OFFLOAD,
etc.) to index 1. All compound callback operations therefore
share a single statistics counter, making per-operation
accounting impossible.

Assign p_statidx from the NFSPROC4_CLNT_##proc enum instead,
giving each callback operation its own counter slot. The
counts array is already sized by ARRAY_SIZE(nfs4_cb_procedures),
so no allocation change is needed.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 weeks agosvcrdma: Use contiguous pages for RDMA Read sink buffers
Chuck Lever [Tue, 10 Mar 2026 19:39:25 +0000 (15:39 -0400)] 
svcrdma: Use contiguous pages for RDMA Read sink buffers

svc_rdma_build_read_segment() constructs RDMA Read sink
buffers by consuming pages one-at-a-time from rq_pages[]
and building one bvec per page. A 64KB NFS READ payload
produces 16 separate bvecs, 16 DMA mappings, and
potentially multiple RDMA Read WRs (on platforms with
4KB pages).

A single higher-order allocation followed by split_page()
yields physically contiguous memory while preserving
per-page refcounts. A single bvec spanning the contiguous
range causes rdma_rw_ctx_init_bvec() to take the
rdma_rw_init_single_wr_bvec() fast path: one DMA mapping,
one SGE, one WR.

The split sub-pages replace the original rq_pages[] entries,
so all downstream page tracking, completion handling, and
xdr_buf assembly remain unchanged.

Allocation uses __GFP_NORETRY | __GFP_NOWARN and falls back
through decreasing orders. If even order-1 fails, the
existing per-page path handles the segment.

When nr_pages is not a power of two, get_order() rounds up
and the allocation yields more pages than needed. The extra
split pages replace existing rq_pages[] entries (freed via
put_page() first), so there is no net increase in per-
request page consumption. Successive segments reuse the
same padding slots, preventing accumulation. The
rq_maxpages guard rejects any allocation that would
overrun the array, falling back to the per-page path.
Under memory pressure, __GFP_NORETRY causes the higher-
order allocation to fail without stalling.

The contiguous path is attempted when the segment starts
page-aligned (rc_pageoff == 0) and spans at least two
pages. NFS WRITE segments carry application-modified byte
ranges of arbitrary length, so the optimization is not
restricted to power-of-two page counts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 weeks agoSUNRPC: Add svc_rqst_page_release() helper
Chuck Lever [Wed, 11 Mar 2026 16:18:54 +0000 (12:18 -0400)] 
SUNRPC: Add svc_rqst_page_release() helper

svc_rqst_replace_page() releases displaced pages through a
per-rqst folio batch, but exposes the add-or-flush sequence
directly. svc_tcp_restore_pages() releases displaced pages
individually with put_page().

Introduce svc_rqst_page_release() to encapsulate the
batched release mechanism. Convert svc_rqst_replace_page()
and svc_tcp_restore_pages() to use it. The latter now
benefits from the same batched release that
svc_rqst_replace_page() already uses.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 weeks agoipmi: ssif_bmc: change log level to dbg in irq callback
Jian Zhang [Fri, 3 Apr 2026 09:06:01 +0000 (17:06 +0800)] 
ipmi: ssif_bmc: change log level to dbg in irq callback

Long-running tests indicate that this logging can occasionally disrupt
timing and lead to request/response corruption.

Irq handler need to be executed as fast as possible,
most I2C slave IRQ implementations are byte-level, logging here
can significantly affect transfer behavior and timing. It is recommended
to use dev_dbg() for these messages.

Fixes: dd2bc5cc9e25 ("ipmi: ssif_bmc: Add SSIF BMC driver")
Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com>
Message-ID: <20260403090603.3988423-4-zhangjian.3032@bytedance.com>
Signed-off-by: Corey Minyard <corey@minyard.net>
2 weeks agoipmi: ssif_bmc: fix message desynchronization after truncated response
Jian Zhang [Fri, 3 Apr 2026 09:06:00 +0000 (17:06 +0800)] 
ipmi: ssif_bmc: fix message desynchronization after truncated response

A truncated response, caused by host power-off, or other conditions,
can lead to message desynchronization.

Raw trace data (STOP loss scenario, add state transition comment):

1. T-1: Read response phase (SSIF_RES_SENDING)
8271.955342  WR_RCV [03]                          <- Read polling cmd
8271.955348  RD_REQ [04]  <== SSIF_RES_SENDING    <- start sending response
8271.955436  RD_PRO [b4]
8271.955527  RD_PRO [00]
8271.955618  RD_PRO [c1]
8271.955707  RD_PRO [00]
8271.955814  RD_PRO [ad]  <== SSIF_RES_SENDING     <- last byte
<- !! STOP lost (truncated response)

2. T: New Write request arrives, BMC still in SSIF_RES_SENDING
8271.967973  WR_REQ []    <== SSIF_RES_SENDING >> SSIF_ABORTING  <- log: unexpected WR_REQ in RES_SENDING
8271.968447  WR_RCV [02]  <== SSIF_ABORTING  <- do nothing
8271.968452  WR_RCV [02]  <== SSIF_ABORTING  <- do nothing
8271.968454  WR_RCV [18]  <== SSIF_ABORTING  <- do nothing
8271.968456  WR_RCV [01]  <== SSIF_ABORTING  <- do nothing
8271.968458  WR_RCV [66]  <== SSIF_ABORTING  <- do nothing
8271.978714  STOP []      <== SSIF_ABORTING >> SSIF_READY  <- log: unexpected SLAVE STOP in state=SSIF_ABORTING

3. T+1: Next Read polling, treated as a fresh transaction
8271.979125  WR_REQ []    <== SSIF_READY >> SSIF_START
8271.979326  WR_RCV [03]  <== SSIF_START >> SSIF_SMBUS_CMD        <- smbus_cmd=0x03
8271.979331  RD_REQ [04]  <== SSIF_RES_SENDING      <- sending response
8271.979427  RD_PRO [b4]                            <- !! this is T's stale response -> desynchronization

When in SSIF_ABORTING state, a newly arrived command should still be
handled to avoid dropping the request or causing message
desynchronization.

Fixes: dd2bc5cc9e25 ("ipmi: ssif_bmc: Add SSIF BMC driver")
Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com>
Message-ID: <20260403090603.3988423-3-zhangjian.3032@bytedance.com>
Signed-off-by: Corey Minyard <corey@minyard.net>
2 weeks agoipmi: ssif_bmc: fix missing check for copy_to_user() partial failure
Jian Zhang [Fri, 3 Apr 2026 09:05:59 +0000 (17:05 +0800)] 
ipmi: ssif_bmc: fix missing check for copy_to_user() partial failure

copy_to_user() returns the number of bytes that could not be copied,
with a non-zero value indicating a partial or complete failure. The
current code only checks for negative return values and treats all
non-negative results as success.

Treating any positive return value from copy_to_user() as
an error and returning -EFAULT.

Fixes: dd2bc5cc9e25 ("ipmi: ssif_bmc: Add SSIF BMC driver")
Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com>
Message-ID: <20260403090603.3988423-2-zhangjian.3032@bytedance.com>
Signed-off-by: Corey Minyard <corey@minyard.net>
2 weeks agoipmi: ssif_bmc: cancel response timer on remove
Jian Zhang [Fri, 3 Apr 2026 09:05:58 +0000 (17:05 +0800)] 
ipmi: ssif_bmc: cancel response timer on remove

The response timer can stay armed across device teardown. If it fires after
remove, the callback dereferences the SSIF context and the i2c client after
teardown has started.

Cancel the timer in remove so the callback cannot run after the device is
unregistered.

Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com>
Message-ID: <20260403090603.3988423-1-zhangjian.3032@bytedance.com>
Signed-off-by: Corey Minyard <corey@minyard.net>
2 weeks agoASoC: rsnd: Fix potential out-of-bounds access of component_dais[]
Denis Rastyogin [Fri, 27 Mar 2026 10:33:11 +0000 (13:33 +0300)] 
ASoC: rsnd: Fix potential out-of-bounds access of component_dais[]

component_dais[RSND_MAX_COMPONENT] is initially zero-initialized
and later populated in rsnd_dai_of_node(). However, the existing boundary check:
  if (i >= RSND_MAX_COMPONENT)

does not guarantee that the last valid element remains zero. As a result,
the loop can rely on component_dais[RSND_MAX_COMPONENT] being zero,
which may lead to an out-of-bounds access.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 547b02f74e4a ("ASoC: rsnd: enable multi Component support for Audio Graph Card/Card2")
Signed-off-by: Denis Rastyogin <gerben@altlinux.org>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/20260327103311.459239-1-gerben@altlinux.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoASoC: amd: acp-sdw-legacy: remove unnecessary condition check
Vijendar Mukunda [Fri, 3 Apr 2026 06:34:25 +0000 (12:04 +0530)] 
ASoC: amd: acp-sdw-legacy: remove unnecessary condition check

Currently there is no mechanism to read dmic_num in mach_params
structure. In this scenario mach_params->dmic_num check always
returns 0.
Remove unnecessary condition check for mach_params->dmic_num.

Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Link: https://patch.msgid.link/20260403063452.159800-1-Vijendar.Mukunda@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoperf/x86/msr: Make SMI and PPERF on by default
Kan Liang [Fri, 27 Mar 2026 05:28:44 +0000 (13:28 +0800)] 
perf/x86/msr: Make SMI and PPERF on by default

The MSRs, SMI_COUNT and PPERF, are model-specific MSRs. A very long
CPU ID list is maintained to indicate the supported platforms. With more
and more platforms being introduced, new CPU IDs have to be kept adding.
Also, the old kernel has to be updated to apply the new CPU ID.

The MSRs have been introduced for a long time. There is no plan to
change them in the near future. Furthermore, the current code utilizes
rdmsr_safe() to check the availability of MSRs before using it.

Make them on by default. It should be good enough to only rely on the
rdmsr_safe() to check their availability for both existing and future
platforms.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260327052844.818218-1-dapeng1.mi@linux.intel.com
2 weeks agosched/fair: Prevent negative lag increase during delayed dequeue
Vincent Guittot [Tue, 31 Mar 2026 16:23:52 +0000 (18:23 +0200)] 
sched/fair: Prevent negative lag increase during delayed dequeue

Delayed dequeue feature aims to reduce the negative lag of a dequeued
task while sleeping but it can happens that newly enqueued tasks will
move backward the avg vruntime and increase its negative lag.
When the delayed dequeued task wakes up, it has more neg lag compared
to being dequeued immediately or to other tasks that have been
dequeued just before theses new enqueues.

Ensure that the negative lag of a delayed dequeued task doesn't
increase during its delayed dequeued phase while waiting for its neg
lag to diseappear. Similarly, we remove any positive lag that the
delayed dequeued task could have gain during thsi period.

Short slice tasks are particularly impacted in overloaded system.

Test on snapdragon rb5:

hackbench -T -p -l 16000000 -g 2 1> /dev/null &
cyclictest -t 1 -i 2777 -D 333 --policy=fair --mlock  -h 20000 -q

The scheduling latency of cyclictest is:

                       tip/sched/core  tip/sched/core    +this patch
cyclictest slice  (ms) (default)2.8             8               8
hackbench slice   (ms) (default)2.8            20              20
Total Samples          |   115632          119733          119806
Average           (us) |      364              64(-82%)        61(- 5%)
Median (P50)      (us) |       60              56(- 7%)        56(  0%)
90th Percentile   (us) |     1166              62(-95%)        62(  0%)
99th Percentile   (us) |     4192              73(-98%)        72(- 1%)
99.9th Percentile (us) |     8528            2707(-68%)      1300(-52%)
Maximum           (us) |    17735           14273(-20%)     13525(- 5%)

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260331162352.551501-1-vincent.guittot@linaro.org
2 weeks agosched/fair: Use sched_energy_enabled()
Vincent Guittot [Fri, 27 Mar 2026 13:20:13 +0000 (14:20 +0100)] 
sched/fair: Use sched_energy_enabled()

Use helper sched_energy_enabled() everywhere we want to test if EAS is
enabled instead of mixing sched_energy_enabled() and direct call to
static_branch_unlikely().

No functional change

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260327132013.2800517-1-vincent.guittot@linaro.org
2 weeks agosched: Handle blocked-waiter migration (and return migration)
John Stultz [Tue, 24 Mar 2026 19:13:25 +0000 (19:13 +0000)] 
sched: Handle blocked-waiter migration (and return migration)

Add logic to handle migrating a blocked waiter to a remote
cpu where the lock owner is runnable.

Additionally, as the blocked task may not be able to run
on the remote cpu, add logic to handle return migration once
the waiting task is given the mutex.

Because tasks may get migrated to where they cannot run, also
modify the scheduling classes to avoid sched class migrations on
mutex blocked tasks, leaving find_proxy_task() and related logic
to do the migrations and return migrations.

This was split out from the larger proxy patch, and
significantly reworked.

Credits for the original patch go to:
  Peter Zijlstra (Intel) <peterz@infradead.org>
  Juri Lelli <juri.lelli@redhat.com>
  Valentin Schneider <valentin.schneider@arm.com>
  Connor O'Brien <connoro@google.com>

Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260324191337.1841376-11-jstultz@google.com
2 weeks agosched: Move attach_one_task and attach_task helpers to sched.h
John Stultz [Tue, 24 Mar 2026 19:13:24 +0000 (19:13 +0000)] 
sched: Move attach_one_task and attach_task helpers to sched.h

The fair scheduler locally introduced attach_one_task() and
attach_task() helpers, but these could be generically useful so
move this code to sched.h so we can use them elsewhere.

One minor tweak made to utilize guard(rq_lock)(rq) to simplifiy
the function.

Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260324191337.1841376-10-jstultz@google.com
2 weeks agosched: Add logic to zap balance callbacks if we pick again
John Stultz [Tue, 24 Mar 2026 19:13:23 +0000 (19:13 +0000)] 
sched: Add logic to zap balance callbacks if we pick again

With proxy-exec, a task is selected to run via pick_next_task(),
and then if it is a mutex blocked task, we call find_proxy_task()
to find a runnable owner. If the runnable owner is on another
cpu, we will need to migrate the selected donor task away, after
which we will pick_again can call pick_next_task() to choose
something else.

However, in the first call to pick_next_task(), we may have
had a balance_callback setup by the class scheduler. After we
pick again, its possible pick_next_task_fair() will be called
which calls sched_balance_newidle() and sched_balance_rq().

This will throw a warning:
[    8.796467] rq->balance_callback && rq->balance_callback != &balance_push_callback
[    8.796467] WARNING: CPU: 32 PID: 458 at kernel/sched/sched.h:1750 sched_balance_rq+0xe92/0x1250
...
[    8.796467] Call Trace:
[    8.796467]  <TASK>
[    8.796467]  ? __warn.cold+0xb2/0x14e
[    8.796467]  ? sched_balance_rq+0xe92/0x1250
[    8.796467]  ? report_bug+0x107/0x1a0
[    8.796467]  ? handle_bug+0x54/0x90
[    8.796467]  ? exc_invalid_op+0x17/0x70
[    8.796467]  ? asm_exc_invalid_op+0x1a/0x20
[    8.796467]  ? sched_balance_rq+0xe92/0x1250
[    8.796467]  sched_balance_newidle+0x295/0x820
[    8.796467]  pick_next_task_fair+0x51/0x3f0
[    8.796467]  __schedule+0x23a/0x14b0
[    8.796467]  ? lock_release+0x16d/0x2e0
[    8.796467]  schedule+0x3d/0x150
[    8.796467]  worker_thread+0xb5/0x350
[    8.796467]  ? __pfx_worker_thread+0x10/0x10
[    8.796467]  kthread+0xee/0x120
[    8.796467]  ? __pfx_kthread+0x10/0x10
[    8.796467]  ret_from_fork+0x31/0x50
[    8.796467]  ? __pfx_kthread+0x10/0x10
[    8.796467]  ret_from_fork_asm+0x1a/0x30
[    8.796467]  </TASK>

This is because if a RT task was originally picked, it will
setup the rq->balance_callback with push_rt_tasks() via
set_next_task_rt().

Once the task is migrated away and we pick again, we haven't
processed any balance callbacks, so rq->balance_callback is not
in the same state as it was the first time pick_next_task was
called.

To handle this, add a zap_balance_callbacks() helper function
which cleans up the balance callbacks without running them. This
should be ok, as we are effectively undoing the state set in
the first call to pick_next_task(), and when we pick again,
the new callback can be configured for the donor task actually
selected.

Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260324191337.1841376-9-jstultz@google.com