]> git.ipfire.org Git - thirdparty/kernel/linux.git/commit
drm/amd/display: Fix ISM teardown crash from NULL dc dereference
authorSrinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Wed, 15 Apr 2026 01:03:33 +0000 (06:33 +0530)
committerAlex Deucher <alexander.deucher@amd.com>
Fri, 17 Apr 2026 19:40:04 +0000 (15:40 -0400)
commit732a8adde033fb084f16409206b7d9ee9c3849c9
tree6b5cba333d83ce13961458768c5d2f0fc4650093
parent8bf0cb97edb697dba2515e6452c17c5245111448
drm/amd/display: Fix ISM teardown crash from NULL dc dereference

The Idle State Manager (ISM) uses delayed work to apply display idle
optimizations later, instead of immediately. This helps avoid rapid idle
transitions that can hurt power or performance.

A crash was seen during driver teardown. The system boots normally and
the driver loads successfully. Later, when the GPU is being stopped, the
log shows:

  amdgpu 0000:0e:00.0: finishing device.
  Workqueue: events_unbound dm_ism_sso_delayed_work_func [amdgpu]

After this, delayed ISM work still runs and reaches:

  dm_ism_sso_delayed_work_func()
    -> amdgpu_dm_ism_commit_event()
    -> dm_ism_commit_idle_optimization_state()
    -> dc_allow_idle_optimizations_internal()

The crash report showed:
  KASAN: null-ptr-deref in range [0x690-0x697]

Signature:
[22601.113316] KASAN: null-ptr-deref in range [0x0000000000000690-0x0000000000000697]
...
[22601.113368] Workqueue: events_unbound dm_ism_sso_delayed_work_func [amdgpu]
[22601.113930] RIP: 0010:dc_allow_idle_optimizations_internal+0xa6/0xc40 [amdgpu]
...
[22601.114491] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000690
...
[22601.114561] Call Trace:
[22601.114566]  <TASK>
[22601.114572]  ? srso_alias_return_thunk+0x5/0xfbef5
[22601.114582]  ? update_load_avg+0x1b6/0x20b0
[22601.114593]  ? __pfx_dc_allow_idle_optimizations_internal+0x10/0x10 [amdgpu]
[22601.114932]  ? psi_group_change+0x4ed/0x8d0
[22601.114942]  dm_ism_commit_idle_optimization_state+0x214/0x570 [amdgpu]
[22601.115268]  amdgpu_dm_ism_commit_event+0xe1d/0x15a0 [amdgpu]
[22601.115588]  ? srso_alias_return_thunk+0x5/0xfbef5
[22601.115595]  ? __kasan_check_write+0x18/0x20
[22601.115603]  ? srso_alias_return_thunk+0x5/0xfbef5
[22601.115610]  ? mutex_lock+0x83/0xc0
[22601.115620]  dm_ism_sso_delayed_work_func+0x64/0x90 [amdgpu]

GDB resolved dc_allow_idle_optimizations_internal+0xa6 to:

  struct dc_state *context = dc->current_state;

The matching disassembly showed:

  mov %rdi, %r12
  mov 0x690(%r12), %r13

where r12 holds the dc pointer. A GDB layout dump of struct dc showed:

  /* 1680 | 8 */ struct dc_state *current_state;

Since 1680 decimal is 0x690, this confirms that current_state is at
offset 0x690. The faulting access was effectively:

  dc + 0x690

which indicates that dc was NULL at the time of dereference.

This shows that ISM work can still run during teardown after dc has
been cleared.

ISM is not expected to run after dc is destroyed. Fix this by disabling
ISM under dc_lock in amdgpu_dm_fini() before dc_destroy(), ensuring no
further ISM work runs after dc teardown.

Also add ASSERT(dm->dc) in amdgpu_dm_ism_commit_event() to enforce this
invariant, and ASSERT(mutex_is_locked(&dm->dc_lock)) in
amdgpu_dm_ism_disable() to clarify the locking requirement.

Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)")
Suggested-by: Leo Li <sunpeng.li@amd.com>
Cc: Ray Wu <ray.wu@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_ism.c