Sunil Khatri [Fri, 20 Mar 2026 11:59:01 +0000 (17:29 +0530)]
drm/amdgpu/userq: cleanup amdgpu_userq_get/put where not needed
amdgpu_userq_put/get are not needed in case we already holding
the userq_mutex and reference is valid already from queue create
time or from signal ioctl. These additional get/put could be a
potential reason for deadlock in case the ref count reaches zero
and destroy is called which again try to take the userq_mutex.
Due to the above change we avoid deadlock between suspend/restore
calling destroy queues trying to take userq_mutex again.
Cc: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Wed, 18 Mar 2026 05:52:57 +0000 (13:52 +0800)]
drm/amd/pm: Return -EOPNOTSUPP for unsupported OD_MCLK on smu_v13_0_6
When SET_UCLK_MAX capability is absent, return -EOPNOTSUPP from
smu_v13_0_6_emit_clk_levels() for OD_MCLK instead of 0. This makes
unsupported OD_MCLK reporting consistent with other clock types
and allows callers to skip the entry cleanly.
Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Wed, 18 Mar 2026 05:48:30 +0000 (13:48 +0800)]
drm/amd/pm: Skip redundant UCLK restore in smu_v13_0_6
Only reapply UCLK soft limits during PP_OD_RESTORE_DEFAULT when the
current max differs from the DPM table max. This avoids redundant
SMC updates and prevents -EINVAL on restore when no change is needed.
Fixes: b7a900344546 ("drm/amd/pm: Allow setting max UCLK on SMU v13.0.6") Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Fri, 13 Mar 2026 22:42:59 +0000 (17:42 -0500)]
drm/amd/display: Promote DC to 3.2.375
This version brings along following fixes:
- Rework YCbCr422 DSC policy
- Restore full update for tiling change to linear
- add dccg FGCG mask init
- Remove unnecessary completion flag for secure display
- Agument live + capture with CVT case.
- remove dc_clock_limit for apu
- Fix Signed/Unsigned Int Usage Compiler Warning
- Hardcode dtbclk value in bw_params
- Revert inbox0 lock for cursor due to deadlock
- Add 3DLUT DMA broadcast support
- Fix Silence warnings
- export get_power_profile interface for later use
- pg cntl update based on previous asic.
- remove disable_sutter touch pstate debug code
- Refactor DC update checks
- Fix drm_edid leak in amdgpu_dm
- Add Extra SMU Log for dtbclk
- Clamp min DS DCFCLK value to DCN limit
- Update dpia supported configuration
- Multiple DCN42 updates
Joshua Aberback [Thu, 12 Mar 2026 22:33:49 +0000 (18:33 -0400)]
drm/amd/display: Restore full update for tiling change to linear
[Why]
There was previously a dc debug flag to indicate that tiling
changes should only be a medium update instead of full. The
function get_plane_info_type was refactored to not rely on dc
state, but in the process the logic was unintentionally changed,
which leads to screen corruption in some cases.
[How]
- add flag to tiling struct to avoid full update when necessary
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Joshua Aberback <joshua.aberback@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wayne Lin [Fri, 13 Mar 2026 04:05:40 +0000 (12:05 +0800)]
drm/amd/display: Remove unnecessary completion flag for secure display
The completion flag is not used in secure display today.
Remove unnecessary code.
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ChunTao Tso [Wed, 4 Mar 2026 08:37:58 +0000 (16:37 +0800)]
drm/amd/display: Agument live + capture with CVT case.
1. Add LIVE_CAPTURE_WITH_CVT bit (bit[2]) in union replay_optimization
to control this feature via DalRegKey_ReplayOptimization.
2. Check the bit in mod_power_set_live_capture_with_cvt_activate function
before enabling live capture with CVT.
3. Use LIVE_CAPTURE_WITH_CVT to control if Replay want to send CVT in
live + capture or not.
Reviewed-by: Leon Huang <leon.huang1@amd.com> Signed-off-by: ChunTao Tso <chuntao.tso@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Wed, 11 Mar 2026 21:05:19 +0000 (17:05 -0400)]
drm/amd/display: remove dc_clock_limit for apu
[why]
current apu pmfw does not support dc_clock_limit
Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix Signed/Unsigned Int Usage Compiler Warning
[Why] Compiler generates compiler warnings when signed enum
constants or literal -1 are implicitly converted to unsigned
integer types, cluttering build output and masking genuine issues.
[How] Use UINT_MAX as the invalid sentinel for unsigned IDs and align
loop/index types to unsigned where appropriate to remove implicit
signed-to-unsigned conversions, with no functional behavior change.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Gaghik Khachatrian <gaghik.khachatrian@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Matthew Stewart [Wed, 11 Mar 2026 19:16:00 +0000 (15:16 -0400)]
drm/amd/display: Hardcode dtbclk value in bw_params
[why&how]
dtbclk should always be 600MHz. Previous logic was to get the real value
from SMU, but this returns 0 when dtbclk is off. Not a problem during
boot when pre-OS enables dtbclk, but PnP was broken due to this.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Revert inbox0 lock for cursor due to deadlock
[Why]
A deadlock occurs when using inbox0 lock for cursor operations on
PSR-SU and Replays that does not when using the inbox1 locking path.
This is because of a priority inversion issue where inbox1 work
cannot be serviced while holding the HW lock from driver and sending
cursor notifications to DMUB.
Typically the lower priority of inbox1 for the lock command would
allow the PSR and Replay FSMs to complete their transition prior
to giving driver the lock but this is no longer the case with inbox0
having the highest priority in servicing.
[How]
This will reintroduce any synchronization bugs that were there
with Replay or PSR-SU touching the cursor at the same time as driver.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dillon Varone [Sat, 7 Mar 2026 05:53:03 +0000 (05:53 +0000)]
drm/amd/display: Add 3DLUT DMA broadcast support
[WHY&HOW]
A single HUBP can be used to fetch 3DLUT and broadcast to a
single HUBP. Add logic to select the top pipe for a given
plane and use it's HUBP as the broadcast source for multiple
MPC's.
Charlene Liu [Fri, 6 Mar 2026 15:40:07 +0000 (10:40 -0500)]
drm/amd/display: export get_power_profile interface for later use
[why]
export dcn401 get_power_profile for later asic.
Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dillon Varone [Thu, 5 Mar 2026 21:42:29 +0000 (16:42 -0500)]
drm/amd/display: Refactor DC update checks
[WHY&HOW]
DC currently has fragmented definitions of update types. This changes
consolidates them into a single interface, and adds expanded
functionality to accommodate all use cases.
- adds `dc_check_update_state_and_surfaces_for_stream` to determine
update type including state, surface, and stream changes.
- adds missing surface/stream update checks to
`dc_check_update_surfaces_for_stream`
- adds new update type `UPDATE_TYPE_ADDR_ONLY` to accomodate flows where
further distinction from `UPDATE_TYPE_FAST` was needed
- removes caller reliance on `enable_legacy_fast_update` to determine
which commit function to use, instead embedding it in the update type
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Mon, 9 Mar 2026 17:16:08 +0000 (11:16 -0600)]
drm/amd/display: Fix drm_edid leak in amdgpu_dm
[WHAT]
When a sink is connected, aconnector->drm_edid was overwritten without
freeing the previous allocation, causing a memory leak on resume.
[HOW]
Free the previous drm_edid before updating it.
Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix DCN42 memory clock table using MemClk instead of UClk
[Why]
DCN42 was using UClk values instead of MemClk from MemPstateTable, causing
DML to see half the actual DRAM bandwidth on DDR5 systems and reject high
refresh rate modes.
[How]
Change dcn42_init_clocks() to use MemPstateTable[i].MemClk instead of
MemPstateTable[i].UClk for memclk_mhz initialization.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Alexander Chechik <alexander.chechik@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Tue, 17 Mar 2026 00:17:57 +0000 (20:17 -0400)]
drm/amd/display: Update underflow detection for DCN42
[Why]
The DCN42 underflow detection functions in dcn42_optc.c use
OPTC_RSMU_UNDERFLOW register but the register offset definitions
were missing from dcn_4_2_0_offset.h and dcn42_resource.h.
[How]
Add missing register definitions.
Fixes: e56e3cff2a1b ("drm/amd/display: Sync dcn42 with DC 3.2.373") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Tue, 3 Feb 2026 16:30:57 +0000 (17:30 +0100)]
drm/amdgpu: fix some more bug in amdgpu_gem_va_ioctl
Some illegal combination of input flags were not checked and we need to
take the PDEs into account when returning the fence as well.
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Sat, 14 Mar 2026 00:34:48 +0000 (20:34 -0400)]
drm/amd/display: Clamp min DS DCFCLK value to DCN limit
[why & how]
DCN has a global limit for minimum DS DCFCLK during any operation.
Adhere to that limit and add a debug flag.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Split arbiter programming for DCN42
[Why]
We don't want to update the timeout threshold for stall recovery in
firmware dynamically for DCN42 as we're not using FAMS.
Firmware should own programming of this register since the recovery
can be broken if driver updates the value to 0.
[How]
Split program_arbiter for dcn42 and skip the part that updates the
timeout threshold.
Reviewed-by: Leo Chen <leo.chen@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Sat, 14 Mar 2026 00:27:54 +0000 (20:27 -0400)]
drm/amd/display: Add missing dcn42 hubbub function pointers
This aligning commit combines:
- fix dcn42 det programming)
- fix missing dcn42 pointers
- fix SDPIF_Request_Rate_Limit programming value
V2: Add back dchvm_init for DCN42
Reviewed-by: Alex Hung <alex.hung@amd.com> Reviewed-by: Leo Chen <leo.chen@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Fri, 13 Mar 2026 23:34:02 +0000 (19:34 -0400)]
drm/amd/display: Add get_default_tiling_info for dcn42
Add DCN42 portion that was stripped during previously.
Fixes: 8333f22e44a9 ("drm/amd/display: Query DC for gfx handling when setting linear tiling") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move it out of smu present block for cases where it isn't
Reviewed-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Thu, 5 Mar 2026 15:14:39 +0000 (10:14 -0500)]
drm/amd/display: System Hang When System enters to S0i3 w/ iGPU
[why]
System Hang when system enters to S0i3 w/ iGPU
some link_enc are NULL due to BIOS integration info table not correct,
but driver should have enough null pointer protection.
Reviewed-by: Leo Chen <leo.chen@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ivan Lipski [Wed, 4 Mar 2026 01:07:58 +0000 (20:07 -0500)]
drm/amd/display: Move DPM clk read to clk_mgr_construct in DCN42
[Why&How]
The DPM clocks on DCN42 are currently read on every dm_resume, which can
cause in gpu memory freeing while the device is still in suspend.
Move the DPM clock read functionality to clk_mgr_construct() so it
completes once on driver enablement.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
DCN401 didn't have a MRQ present so these fields didn't exist.
They are still present on DCN42 so we need to continue programming
them like we did on DCN35 or we can block have poor meta requesting
efficiency which blocks p-state.
[How]
Add `hubp42_program_requestor` which takes DML21 input and programs
the registers like DCN35 and prior.
Reviewed-by: Leo Chen <leo.chen@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Mon, 2 Mar 2026 20:45:41 +0000 (15:45 -0500)]
drm/amd/display: dcn42 don't round up disclk and dppclk
[why]
dml2 based on num_enabled clock != 2 to do clock ramming to dpm.
apu has 8 levels dispclk/dppclk/dcfclk/fclk, but only 4 levels of memclk.
to avoid mapping dispclk/dppclk to DPM clock,
based on arch review, force dispclk/dppclk num_level as 2.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kexin Sun [Sat, 21 Mar 2026 10:59:40 +0000 (18:59 +0800)]
spi: pl022: update outdated references to pump_transfers()
The pump_transfers tasklet was removed when the driver
switched to the SPI core message processing loop in commit 9b2ef250b31d ("spi: spl022: switch to use default
spi_transfer_one_message()"). The kdoc of
pl022_interrupt_handler() still describes the old flow: scheduling
the pump_transfers tasklet, calling giveback(), and flagging
STATE_ERROR — none of which exist in the current code. Rewrite
the kdoc to reflect the current paths, which flag errors via
SPI_TRANS_FAIL_IO and call spi_finalize_current_transfer()
directly.
Eric Huang [Mon, 16 Mar 2026 15:01:30 +0000 (11:01 -0400)]
drm/amdgpu: prevent immediate PASID reuse case
PASID resue could cause interrupt issue when process
immediately runs into hw state left by previous
process exited with the same PASID, it's possible that
page faults are still pending in the IH ring buffer when
the process exits and frees up its PASID. To prevent the
case, it uses idr cyclic allocator same as kernel pid's.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ruijing Dong [Tue, 17 Mar 2026 17:54:11 +0000 (13:54 -0400)]
drm/amdgpu: fix strsep() corrupting lockup_timeout on multi-GPU (v3)
amdgpu_device_get_job_timeout_settings() passes a pointer directly
to the global amdgpu_lockup_timeout[] buffer into strsep().
strsep() destructively replaces delimiter characters with '\0'
in-place.
On multi-GPU systems, this function is called once per device.
When a multi-value setting like "0,0,0,-1" is used, the first
GPU's call transforms the global buffer into "0\00\00\0-1". The
second GPU then sees only "0" (terminated at the first '\0'),
parses a single value, hits the single-value fallthrough
(index == 1), and applies timeout=0 to all rings — causing
immediate false job timeouts.
Fix this by copying into a stack-local array before calling
strsep(), so the global module parameter buffer remains intact
across calls. The buffer is AMDGPU_MAX_TIMEOUT_PARAM_LENGTH
(256) bytes, which is safe for the stack.
v2: wrap commit message to 72 columns, add Assisted-by tag.
v3: use stack array with strscpy() instead of kstrdup()/kfree()
to avoid unnecessary heap allocation (Christian).
This patch was developed with assistance from Claude (claude-opus-4-6).
Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 19 Feb 2026 23:20:27 +0000 (18:20 -0500)]
drm/amdgpu/gfx11: look at the right prop for gfx queue priority
Look at hqd_queue_priority rather than hqd_pipe_priority.
In practice, it didn't matter as both were always set for
kernel queues, but that will change in the future.
Fixes: 2e216b1e6ba2 ("drm/amdgpu/gfx11: handle priority setup for gfx pipe1")
Reviewed-by:Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 19 Feb 2026 23:18:28 +0000 (18:18 -0500)]
drm/amdgpu/gfx10: look at the right prop for gfx queue priority
Look at hqd_queue_priority rather than hqd_pipe_priority.
In practice, it didn't matter as both were always set for
kernel queues, but that will change in the future.
Fixes: b07d1d73b09e ("drm/amd/amdgpu: Enable high priority gfx queue")
Reviewed-by:Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 17 Mar 2026 20:34:41 +0000 (16:34 -0400)]
drm/amdgpu/pm: drop SMU driver if version not matched messages
It just leads to user confusion.
Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Skip discovery dump when topology is unavailable
When generating a devcoredump, amdgpu_discovery_dump() prints the IP
discovery topology.
The function already needs to handle the case where
adev->discovery.ip_top is NULL to avoid a crash.
Currently, the code prints a section header and an additional message
when the topology is unavailable.
However, for platforms where discovery is not used, this section is not
expected to be present. Printing an extra message adds unnecessary
output.
Simplify this by skipping the entire section when ip_top is NULL.
The NULL check is kept to avoid a crash, but no output is generated when
the discovery topology is unavailable.
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_discovery_dump() uses adev->discovery.ip_top. However,
ip_top may be NULL if the discovery topology was never initialized.
The current code does not check for this before using ip_top. As a
result, when ip_top is NULL, the coredump worker crashes while taking
the spinlock for ip_top->die_kset.
Fix this by checking for a missing ip_top before walking the discovery
topology. If it is unavailable, print a short message in the dump and
return safely.
- If ip_top is NULL, print a message and skip the dump
- Also add the same check in the cleanup path
This makes the coredump and cleanup paths safe even when the
discovery topology is not available.
v2: Updated commit message - Clarified that ip_top is not freed, it can
just be NULL if discovery was not initialized. (Christian/Lijo)
v3: Removed the extra drm_warn() for sysfs init failure as sysfs already
reports errors. (Christian)
Fixes: e81eff80aad6 ("drm/amdgpu: include ip discovery data in devcoredump") Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yussuf Khalil [Fri, 6 Mar 2026 12:06:35 +0000 (12:06 +0000)]
drm/amd/display: Do not skip unrelated mode changes in DSC validation
Starting with commit 17ce8a6907f7 ("drm/amd/display: Add dsc pre-validation in
atomic check"), amdgpu resets the CRTC state mode_changed flag to false when
recomputing the DSC configuration results in no timing change for a particular
stream.
However, this is incorrect in scenarios where a change in MST/DSC configuration
happens in the same KMS commit as another (unrelated) mode change. For example,
the integrated panel of a laptop may be configured differently (e.g., HDR
enabled/disabled) depending on whether external screens are attached. In this
case, plugging in external DP-MST screens may result in the mode_changed flag
being dropped incorrectly for the integrated panel if its DSC configuration
did not change during precomputation in pre_validate_dsc().
At this point, however, dm_update_crtc_state() has already created new streams
for CRTCs with DSC-independent mode changes. In turn,
amdgpu_dm_commit_streams() will never release the old stream, resulting in a
memory leak. amdgpu_dm_atomic_commit_tail() will never acquire a reference to
the new stream either, which manifests as a use-after-free when the stream gets
disabled later on:
BUG: KASAN: use-after-free in dc_stream_release+0x25/0x90 [amdgpu]
Write of size 4 at addr ffff88813d836524 by task kworker/9:9/29977
Since there is no reliable way of figuring out whether a CRTC has unrelated
mode changes pending at the time of DSC validation, remember the value of the
mode_changed flag from before the point where a CRTC was marked as potentially
affected by a change in DSC configuration. Reset the mode_changed flag to this
earlier value instead in pre_validate_dsc().
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5004 Fixes: 17ce8a6907f7 ("drm/amd/display: Add dsc pre-validation in atomic check") Signed-off-by: Yussuf Khalil <dev@pp3345.net> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/ras: Remove redundant NULL check in pending bad-bank list iteration
ras_umc_log_pending_bad_bank() walks through a list of pending ECC
bad-bank entries. These entries are saved when a bad-bank error cannot
be processed immediately, for example during a GPU reset.
Later, this function iterates over the pending list and retries logging
each bad-bank error. If logging succeeds, the entry is removed from the
list and the memory for that node is freed.
The loop uses list_for_each_entry_safe(), which already guarantees that
ecc_node points to a valid list entry while the loop body is executing.
Checking "ecc_node &&" inside the loop is therefore unnecessary and
redundant.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_umc.c:225 ras_umc_log_pending_bad_bank() warn: variable dereferenced before check 'ecc_node' (see line 223)
Fixes: 7a3f9c0992c4 ("drm/amd/ras: Add umc common ras functions") Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: YiPeng Chai <YiPeng.Chai@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 29 Jan 2026 11:58:10 +0000 (12:58 +0100)]
drm/amdgpu: make amdgpu_user_wait_ioctl more resilent v2
When the memory allocated by userspace isn't sufficient for all the
fences then just wait on them instead of returning an error.
v2: use correct variable as pointed out by Sunil
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.Zhang [Fri, 13 Mar 2026 06:17:10 +0000 (14:17 +0800)]
drm/amdgpu: replace WARN with DRM_ERROR for invalid sched priority
amdgpu_sched_ioctl() currently uses WARN(1, ...) when userspace passes
an out-of-range context priority value. WARN(1, ...) is unconditional
and produces a full stack trace, which is disproportionate for a simple
input validation failure -- the invalid value is already rejected with
-EINVAL on the next line.
Replace WARN(1, ...) with DRM_ERROR() to log the invalid value at an
appropriate level without generating a stack dump. The -EINVAL return
to userspace is unchanged.
No functional change for well-formed userspace callers.
v2:
- Reworked commit message to focus on appropriate log level for
parameter validation
- Clarified that -EINVAL behavior is preserved (Vitaly)
v3: completely drop that warning.
Invalid parameters should never clutter the system log. (Christian)
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kexin Sun [Sat, 21 Mar 2026 11:50:18 +0000 (19:50 +0800)]
ASoC: generic: update outdated comment for removed soc_bind_dai_link()
The function soc_bind_dai_link() was first merged into
snd_soc_add_dai_link() by commit 63dc47da1f39 ("ASoC: soc-core: merge
snd_soc_add_dai_link() and soc_bind_dai_link()"), and later renamed to
snd_soc_add_pcm_runtime() by commit 0c04800424c4 ("ASoC: soc-core:
rename snd_soc_add_dai_link() to snd_soc_add_pcm_runtime()").
In simple-card.c, also adjust the wording since snd_soc_add_pcm_runtime()
no longer uses "xxx_of_node" fields but matches components by of_node
through snd_soc_find_dai() and snd_soc_is_matching_component().
In simple-card-utils.c, simply update the function name to its
successor snd_soc_add_pcm_runtime().
Yury Norov [Thu, 19 Mar 2026 00:43:47 +0000 (20:43 -0400)]
bitmap: exclude nbits == 0 cases from bitmap test
Bitmap API handles nbits == 0 in most cases correctly, i.e. it doesn't
dereferene underlying bitmap and returns a sane value where convenient,
or implementation defined, or undef.
Implicitly testing nbits == 0 case, however, may make an impression that
this is a regular case. This is wrong. In most cases nbits == 0 is a
sign of an error on a client side. The tests should not make such an
implression.
This patch reworks the existing tests to not test nbits == 0. The
following patch adds an explicit test for it with an appropriate
precaution.
Yury Norov [Thu, 19 Mar 2026 00:43:46 +0000 (20:43 -0400)]
bitmap: test bitmap_weight() for more
Test the function for correctness when some bits are set in the last word
of bitmap beyond nbits. This is motivated by commit a9dadc1c512807f9
("powerpc/xive: Fix the size of the cpumask used in
xive_find_target_in_mask()").
Cheng-Yang Chou [Mon, 23 Mar 2026 10:48:29 +0000 (18:48 +0800)]
sched_ext: Fix invalid kobj cast in scx_uevent()
When scx_alloc_and_add_sched() creates the sub-scheduler kset, it sets
sch->kobj as the parent. Because sch->kobj.kset points to scx_kset,
registering this sub-kset triggers a KOBJ_ADD uevent. The uevent walk
finds scx_kset and calls scx_uevent() with the sub-kset's kobject.
scx_uevent() unconditionally uses container_of() to cast the incoming
kobject to struct scx_sched, producing a wild pointer when the kobject
belongs to the kset itself rather than a scheduler instance. Accessing
sch->ops.name through this pointer causes a KASAN slab-out-of-bounds
read:
BUG: KASAN: slab-out-of-bounds in string+0x3b6/0x4c0
Read of size 1 at addr ffff888004d04348 by task scx_enable_help/748
Add a transparent compatibility wrapper for the scx_bpf_sub_dispatch()
kfunc in compat.bpf.h. This allows BPF schedulers using the sub-sched
dispatch feature to build and run on older kernels that lack the kfunc.
To avoid requiring code changes in individual schedulers, the
transparent wrapper pattern is used instead of a __COMPAT prefix. The
kfunc is declared with a ___compat suffix, while the static inline
wrapper retains the original scx_bpf_sub_dispatch() name.
When the kfunc is unavailable, the wrapper safely falls back to
returning false. This is acceptable because the dispatch path cannot
do anything useful without underlying sub-sched support anyway.
Yury Norov [Thu, 12 Mar 2026 23:08:16 +0000 (19:08 -0400)]
lib: count_zeros: fix 32/64-bit inconsistency in count_trailing_zeros()
Based on 'sizeof(x) == 4' condition, in 32-bit case the function is wired
to ffs(), while in 64-bit case to __ffs(). The difference is substantial:
ffs(x) == __ffs(x) + 1. Also, ffs(0) == 0, while __ffs(0) is undefined.
The 32-bit behaviour is inconsistent with the function description, so it
needs to get fixed.
There are 9 individual users for the function in 6 different subsystems.
Some arches and drivers are 64-bit only:
- arch/loongarch/kvm/intc/eiointc.c;
- drivers/hv/mshv_vtl_main.c;
- kernel/liveupdate/kexec_handover.c;
The others are:
- ib_umem_find_best_pgsz(): as per comment, __ffs() should be correct;
- rzv2m_csi_reg_write_bit(): ARCH_RENESAS only, unclear;
- lz77_match_len(): CIFS_COMPRESSION only, unclear, experimental;
IB and CIFS are explicitly OK with the change.
The attached patch gets rid of 32-bit explicit support, so that both
32- and 64-bit versions rely on __ffs().
CC: K. Y. Srinivasan <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Mark Brown <broonie@kernel.org> CC: Steve French <sfrench@samba.org> CC: Alexander Graf <graf@amazon.com> CC: Mike Rapoport <rppt@kernel.org> CC: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Yury Norov <ynorov@nvidia.com>
Yury Norov [Tue, 10 Mar 2026 20:53:02 +0000 (16:53 -0400)]
lib: crypto: fix comments for count_leading_zeros()
count_leading_zeros() is based on fls(), which is defined for x == 0,
contrary to __ffs() family. The comment in crypto/mpi erroneously
states that the function may return undef in such case.
Fix the comment together with the outdated function signature, and now
that COUNT_LEADING_ZEROS_0 is not referenced in the codebase, get rid of
it too.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Yury Norov <ynorov@nvidia.com>
Akinobu Mita [Tue, 10 Mar 2026 15:21:26 +0000 (00:21 +0900)]
lib/find_bit_benchmark: avoid clearing randomly filled bitmap in test_find_first_bit()
test_find_first_bit() searches for a set bit from the beginning of the
test bitmap and clears it repeatedly, eventually clearing the entire
bitmap.
After test_find_first_bit() is executed, test_find_first_and_bit() and
test_find_next_and_bit() are executed without randomly reinitializing the
cleared bitmap.
In the first phase (testing find_bit() with a random-filled bitmap),
test_find_first_bit() only operates on 1/10 of the entire size of the
testing bitmap, so this isn't a big problem.
However, in the second phase (testing find_bit() with a sparse bitmap),
test_find_first_bit() clears the entire test bitmap, so the subsequent
test_find_first_and_bit() and test_find_next_and_bit() will not find any
set bits. This is probably not the intended benchmark.
To fix this issue, test_find_first_bit() operates on a duplicated bitmap
and does not clear the original test bitmap.
The same is already done in test_find_first_and_bit().
While we're at it, add const qualifiers to the bitmap pointer arguments
in the test functions.
Yury Norov [Thu, 19 Feb 2026 18:13:57 +0000 (13:13 -0500)]
bitmap: switch test to scnprintf("%*pbl")
scnprintf("%*pbl") is more verbose than bitmap_print_to_pagebuf().
Switch the test to using it. This also improves the test output
because bitmap_print_to_pagebuf() adds \n at the end of the printed
bitmap, which breaks the test format.
Kees Cook [Mon, 23 Mar 2026 17:24:52 +0000 (10:24 -0700)]
ACPICA: Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy()
Replace the deprecated[1] strncpy() with strscpy_pad() in
acpi_ut_safe_strncpy().
The function is a "safe strncpy" wrapper that does
strncpy(dest, source, dest_size) followed by manual NUL-termination
at dest[dest_size - 1]. strscpy_pad() is a direct replacement: it
NUL-terminates, zero-pads the remainder, and the manual termination
is no longer needed.
All callers pass NUL-terminated source strings (C string literals,
__FILE__ via ACPI_MODULE_NAME, or user-provided filenames that have
already been validated). The destinations are fixed-size char arrays
in ACPICA internal structures (allocation->module, aml_op_name,
acpi_gbl_db_debug_filename), all consumed as C strings.
No behavioral change: strscpy_pad() produces identical output to
strncpy() + manual NUL-termination for NUL-terminated sources that
are shorter than dest_size. For sources longer than dest_size,
strncpy() wrote dest_size non-NUL bytes then the manual termination
overwrote the last byte with NUL; strscpy_pad() writes dest_size-1
bytes plus NUL: same result.
Cezary Rojewski [Fri, 20 Mar 2026 10:12:17 +0000 (11:12 +0100)]
ASoC: Intel: catpt: Fix the device initialization
The DMA mask shall be coerced before any buffer allocations for the
device are done. At the same time explain why DMA mask of 31 bits is
used in the first place.
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Fixes: 7a10b66a5df9 ("ASoC: Intel: catpt: Device driver lifecycle") Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260320101217.1243688-1-cezary.rojewski@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
Eric Biggers [Sat, 21 Mar 2026 23:06:50 +0000 (16:06 -0700)]
dm-crypt: Reimplement elephant diffuser using AES library
Simplify and optimize dm-crypt's implementation of Bitlocker's "elephant
diffuser" to use the AES library instead of an "ecb(aes)"
crypto_skcipher.
Note: struct aes_enckey is fixed-size, so it could be embedded directly
in struct iv_elephant_private. But I kept it as a separate allocation
so that the size of struct crypt_config doesn't increase. The elephant
diffuser is rarely used in dm-crypt.
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Eric Biggers [Fri, 20 Mar 2026 21:15:08 +0000 (14:15 -0700)]
dm-verity-fec: warn even when there were no errors
Currently FEC logs a warning message if at least one error was
corrected, or an error message if there were uncorrectable errors.
However, it doesn't log anything if there were no errors.
"No errors" is actually unexpected, though, considering that dm-verity
calls verity_fec_decode() only when a block's digest doesn't match.
If there were to ever be a bug where verity_fec_decode() is called on
blocks with the correct digest, then there would be no indication in the
log that FEC is running and degrading performance.
Therefore, let's log the warning message even when there were no errors.
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
SeongJae Park [Mon, 16 Mar 2026 23:51:17 +0000 (16:51 -0700)]
mm/damon/stat: monitor all System RAM resources
DAMON_STAT usage document (Documentation/admin-guide/mm/damon/stat.rst)
says it monitors the system's entire physical memory. But, it is
monitoring only the biggest System RAM resource of the system. When there
are multiple System RAM resources, this results in monitoring only an
unexpectedly small fraction of the physical memory. For example, suppose
the system has a 500 GiB System RAM, 10 MiB non-System RAM, and 500 GiB
System RAM resources in order on the physical address space. DAMON_STAT
will monitor only the first 500 GiB System RAM. This situation is
particularly common on NUMA systems.
Select a physical address range that covers all System RAM areas of the
system, to fix this issue and make it work as documented.
Commit e2c3b6b21c77 ("mm: zswap: use SG list decompression APIs from
zsmalloc") updated zswap_decompress() to use the scatterwalk API to copy
data for uncompressed pages.
In doing so, it mapped kernel memory locally for 32-bit kernels using
kmap_local_folio(), however it never unmapped this memory.
This resulted in the linked syzbot report where a BUG_ON() is triggered
due to leaking the kmap slot.
This patch fixes the issue by explicitly unmapping the established kmap.
Also, add flush_dcache_folio() after the kunmap_local() call
I had assumed that a new folio here combined with the flush that is done at
the point of setting the PTE would suffice, but it doesn't seem that's
actually the case, as update_mmu_cache() will in many archtectures only
actually flush entries where a dcache flush was done on a range previously.
I had also wondered whether kunmap_local() might suffice, but it doesn't
seem to be the case.
Some arches do seem to actually dcache flush on unmap, parisc does it if
CONFIG_HIGHMEM is not set by setting ARCH_HAS_FLUSH_ON_KUNMAP and calling
kunmap_flush_on_unmap() from __kunmap_local(), otherwise non-CONFIG_HIGHMEM
callers do nothing here.
Otherwise arch_kmap_local_pre_unmap() is called which does:
* sparc - flush_cache_all()
* arm - if VIVT, __cpuc_flush_dcache_area()
* otherwise - nothing
Also arch_kmap_local_post_unmap() is called which does:
But this is only if it's high memory, and doesn't cover all architectures,
so is presumably intended to handle other cache consistency concerns.
In any case, VIPT is problematic here whether low or high memory (in spite
of what the documentation claims, see [0] - 'the kernel did write to a page
that is in the page cache page and / or in high memory'), because dirty
cache lines may exist at the set indexed by the kernel direct mapping,
which won't exist in the set indexed by any subsequent userland mapping,
meaning userland might read stale data from L2 cache.
Even if the documentation is correct and low memory is fine not to be
flushed here, we can't be sure as to whether the memory is low or high
(kmap_local_folio() will be a no-op if low), and this call should be
harmless if it is low.
VIVT would require more work if the memory were shared and already mapped,
but this isn't the case here, and would anyway be handled by the dcache
flush call.
In any case, we definitely need this flush as far as I can tell.
And we should probably consider updating the documentation unless it turns
out there's somehow dcache synchronisation that happens for low
memory/64-bit kernels elsewhere?
[ljs@kernel.org: add flush_dcache_folio() after the kunmap_local() call] Link: https://lkml.kernel.org/r/13e09a99-181f-45ac-a18d-057faf94bccb@lucifer.local Link: https://lkml.kernel.org/r/20260316140122.339697-1-ljs@kernel.org Link: https://docs.kernel.org/core-api/cachetlb.html Fixes: e2c3b6b21c77 ("mm: zswap: use SG list decompression APIs from zsmalloc") Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reported-by: syzbot+fe426bef95363177631d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69b75e2c.050a0220.12d28.015a.GAE@google.com Acked-by: Yosry Ahmed <yosry@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: SeongJae Park <sj@kernel.org> Acked-by: Yosry Ahmed <yosry@kernel.org> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mailmap: update email address for Muhammad Usama Anjum
Add updated email address.
Link: https://lkml.kernel.org/r/20260310171757.3970390-1-usama.anjum@arm.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Hans Verkuil <hverkuil@kernel.org> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Martin Kepplinger <martink@posteo.de> Cc: Shannon Nelson <sln@onemain.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
dt-bindings: clock: qcom,eliza-dispcc: Add Eliza SoC display CC
Add bindings for Qualcomm Eliza SoC display clock controller (dispcc),
which is very similar to one in SM8750, except new HDMI-related clocks
and additional clock input from HDMI PHY PLL.
Enable the Kaanapali display, video, camera and gpu clock controller
for their respective functionalities on the Qualcomm Kaanapali QRD and
MTP boards.
Signed-off-by: Taniya Das <taniya.das@oss.qualcomm.com> Reviewed-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260224-knp-dts-misc-v6-10-79d20dab8a60@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Koichiro Den [Thu, 5 Mar 2026 15:10:50 +0000 (00:10 +0900)]
PCI: dwc: rcar-gen4: Change EPC BAR alignment to 4K as per the documentation
R-Car S4 Series (R8A779F[4-7]*) EP controller uses a 4K minimum iATU region
size (CX_ATU_MIN_REGION_SIZE = 4K) as per R19UH0161EJ0130 Rev.1.30. Also,
the controller itself can only be configured in the range 4 KB to 64 KB, so
the current 1 MB alignment requirement is incorrect.
Hence, change the alignment to the min size 4K as per the documentation.
This also fixes needless unusability of BAR4 on this platform when the
target address is fixed, such as for doorbell targets.
Kexin Sun [Sat, 21 Mar 2026 10:59:27 +0000 (18:59 +0800)]
signal: update outdated comment for removed freezable_schedule()
The function freezable_schedule() was removed in commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"), which
rewrote the freezer to use a dedicated TASK_FROZEN state instead.
do_signal_stop() and ptrace_stop() no longer call
freezable_schedule(); they now set TASK_STOPPED/TASK_TRACED and the
freezer handles those states directly via TASK_FROZEN. Update the
comment to reflect the current mechanism.
Merge patch series "pidfds: add coredump_code field to pidfd_info"
Emanuele Rocca <emanuele.rocca@arm.com> says:
This patchs series adds a new field called coredump_code to struct
pidfd_info, as well as the relevant selftests. Note that the coredump
selftests are currently not passing.
* patches from https://patch.msgid.link/acE5fYOgyVUYahIn@NH27D9T0LF:
selftests: check pidfd_info->coredump_code correctness
pidfds: add coredump_code field to pidfd_info
Emanuele Rocca [Mon, 23 Mar 2026 13:02:16 +0000 (14:02 +0100)]
pidfds: add coredump_code field to pidfd_info
The struct pidfd_info currently exposes in a field called coredump_signal the
signal number (si_signo) that triggered the dump (for example, 11 for SIGSEGV).
However, it is also valuable to understand the reason why that signal was sent.
This additional context is provided by the signal code (si_code), such as 2 for
SEGV_ACCERR.
Add a new field to struct pidfd_info called coredump_code with the value of
si_code for the benefit of sysadmins who pipe core dumps to user-space programs
for later analysis. The following snippet illustrates a simplified C program
that consumes coredump_signal and coredump_code, and then logs core dump
signals and codes to a file:
if (ioctl(pidfd, PIDFD_GET_INFO, &info) == 0)
if (info.mask & PIDFD_INFO_COREDUMP)
fprintf(f, "PID=%d, si_signo: %d si_code: %d\n",
info.pid, info.coredump_signal, info.coredump_code);
Assuming the program is installed under /usr/local/bin/core-logger, core dump
processing can be enabled by setting /proc/sys/kernel/core_pattern to
'|/usr/local/bin/dumpstuff %F'.
systemd-coredump(8) already uses pidfds to process core dumps, and it could be
extended to include the values of coredump_code too.
Commit 673a55cc49da replaced the null pointer dereference used in
crashing_child() with __builtin_trap to address the following LLVM warnings:
coredump_test_helpers.c:59:6: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
coredump_test_helpers.c:59:6: note: consider using __builtin_trap() or qualifying pointer with 'volatile'
All coredump tests expect crashing_child() to result in a SIGSEGV. However, the
behavior of __builtin_trap is architecture-dependent. On x86 it yields SIGILL,
on aarch64 SIGTRAP. Given that neither of those signals are SIGSEGV, both
coredump_socket_test and coredump_socket_protocol_test are currently failing:
Qualify the pointer with volatile instead of calling __builtin_trap to fix the
tests.
Signed-off-by: Emanuele Rocca <emanuele.rocca@arm.com> Link: https://patch.msgid.link/ab2kI0PI_Vk6bU88@NH27D9T0LF Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>