]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
3 weeks agoACPICA: Enhance OEM ID and Table ID validation in acpi_ex_load_table_op()
ikaros [Wed, 27 May 2026 18:06:25 +0000 (20:06 +0200)] 
ACPICA: Enhance OEM ID and Table ID validation in acpi_ex_load_table_op()

Enhance OEM ID and Table ID validation in acpi_ex_load_table_op() to
prevent buffer overflows.

Link: https://github.com/acpica/acpica/commit/f85a43098d65
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2230782.OBFZWjSADL@rafael.j.wysocki
3 weeks agoACPICA: Fix NULL pointer dereference in acpi_ns_custom_package()
Weiming Shi [Wed, 27 May 2026 18:05:42 +0000 (20:05 +0200)] 
ACPICA: Fix NULL pointer dereference in acpi_ns_custom_package()

acpi_ns_custom_package() unconditionally dereferences the first element
of the package to read the _BIX version number, without checking for
NULL:

    if ((*Elements)->Common.Type != ACPI_TYPE_INTEGER)

When firmware returns a _BIX package whose first element is an
unresolvable reference, ACPICA evaluates that entry to NULL.
acpi_ns_remove_null_elements() does not strip NULL entries for
ACPI_PTYPE_CUSTOM packages (fixed-position format would break if
elements were shifted), so acpi_ns_custom_package() sees the NULL
and causes a crash.

Add a NULL check for the first element (version field) before
dereferencing it. The caller then receives AE_AML_OPERAND_TYPE
instead of crashing.

Link: https://github.com/acpica/acpica/commit/f3f111b9013b
Reported-by: Xiang Mei <xmei5@asu.edu>
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5674388.Sb9uPGUboI@rafael.j.wysocki
3 weeks agoACPICA: Enhance buffer validation in acpi_ut_walk_aml_resources()
ikaros [Wed, 27 May 2026 18:04:46 +0000 (20:04 +0200)] 
ACPICA: Enhance buffer validation in acpi_ut_walk_aml_resources()

Enhance buffer validation in acpi_ut_walk_aml_resources() to prevent
buffer overflows.

Link: https://github.com/acpica/acpica/commit/975cb20c7992
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2481429.NG923GbCHz@rafael.j.wysocki
3 weeks agoACPICA: Add validation for node in acpi_ns_build_normalized_path()
ikaros [Wed, 27 May 2026 18:04:06 +0000 (20:04 +0200)] 
ACPICA: Add validation for node in acpi_ns_build_normalized_path()

Add validation for node in acpi_ns_build_normalized_path()
to prevent use-after-free vulnerabilities.

Link: https://github.com/acpica/acpica/commit/b35adf49e89a
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/118666237.nniJfEyVGO@rafael.j.wysocki
3 weeks agoACPICA: validate handler object type in two places
ikaros [Wed, 27 May 2026 18:03:26 +0000 (20:03 +0200)] 
ACPICA: validate handler object type in two places

ACPICA: validate handler object type in acpi_ev_has_default_handler()
and acpi_ev_find_region_handler().

Link: https://github.com/acpica/acpica/commit/f6fc648a1389
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/48111441.fMDQidcC6G@rafael.j.wysocki
3 weeks agoACPICA: Improve argument parsing in acpi_ps_get_next_simple_arg()
ikaros [Wed, 27 May 2026 18:02:49 +0000 (20:02 +0200)] 
ACPICA: Improve argument parsing in acpi_ps_get_next_simple_arg()

Improve argument parsing in acpi_ps_get_next_simple_arg() to handle
remaining AML data safely.

Link: https://github.com/acpica/acpica/commit/ecbb8bcfe301
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2008043.taCxCBeP46@rafael.j.wysocki
3 weeks agoACPICA: Fix integer overflow in acpi_ex_opcode_3A_1T_1R() (mid_op)
ikaros [Wed, 27 May 2026 18:02:06 +0000 (20:02 +0200)] 
ACPICA: Fix integer overflow in acpi_ex_opcode_3A_1T_1R() (mid_op)

Add overflow check for Index + Length to prevent integer overflow
when calculating the truncation length. This prevents negative
size parameter being passed to memcpy().

Link: https://github.com/acpica/acpica/commit/d281ec1ac84e
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3760974.R56niFO833@rafael.j.wysocki
3 weeks agoACPICA: Prevent adding invalid references
ikaros [Wed, 27 May 2026 18:01:21 +0000 (20:01 +0200)] 
ACPICA: Prevent adding invalid references

Prevent adding references for local, argument, and debug objects
in acpi_ut_copy_simple_object().

Link: https://github.com/acpica/acpica/commit/f576898d7814
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/4511989.ejJDZkT8p0@rafael.j.wysocki
3 weeks agoACPICA: add boundary checks in acpi_ps_get_next_field()
ikaros [Wed, 27 May 2026 18:00:39 +0000 (20:00 +0200)] 
ACPICA: add boundary checks in acpi_ps_get_next_field()

Add boundary checks in acpi_ps_get_next_field() to prevent out-of-bounds
access.

Link: https://github.com/acpica/acpica/commit/c39183ea84bc
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/24388159.6Emhk5qWAg@rafael.j.wysocki
3 weeks agoACPICA: validate byte_count in acpi_ps_get_next_package_length()
ikaros [Wed, 27 May 2026 17:59:57 +0000 (19:59 +0200)] 
ACPICA: validate byte_count in acpi_ps_get_next_package_length()

Validate package length reading in acpi_ps_get_next_package_length().

Link: https://github.com/acpica/acpica/commit/40e03f9941e2
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3616255.QJadu78ljV@rafael.j.wysocki
3 weeks agoACPICA: Fix use-after-free in acpi_ds_terminate_control_method()
ikaros [Wed, 27 May 2026 17:59:12 +0000 (19:59 +0200)] 
ACPICA: Fix use-after-free in acpi_ds_terminate_control_method()

Fix use-after-free issue in acpi_ds_terminate_control_method() by
clearing references to method locals and arguments.

Link: https://github.com/acpica/acpica/commit/36f22a94cb1b
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/8730924.NyiUUSuA9g@rafael.j.wysocki
3 weeks agoACPICA: fix I2C LVR item count in the conversion table
Akhil R [Wed, 27 May 2026 17:58:29 +0000 (19:58 +0200)] 
ACPICA: fix I2C LVR item count in the conversion table

For ACPI_RSC_MOVE8, the 'Value' field in struct acpi_rsconvert_info
is the item count count and not a bit position like for the
bitflags. Set 'Value' as '1' to fix this.

Conversion still works coincidentally with '0' because
item_count is not reset between table entries, and the previous
count value was taking effect.

Link: https://github.com/acpica/acpica/commit/70082dc8fc84
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/6164740.MhkbZ0Pkbq@rafael.j.wysocki
3 weeks agoACPICA: Mention the LVR bits
Akhil R [Wed, 27 May 2026 17:57:52 +0000 (19:57 +0200)] 
ACPICA: Mention the LVR bits

Add a comment mentioning the LVR byte position in the type_specific_flag.

Link: https://github.com/acpica/acpica/commit/014fa9f2dbcc
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/9627007.CDJkKcVGEf@rafael.j.wysocki
3 weeks agoACPICA: Change LVR to 8 bit value
Akhil R [Wed, 27 May 2026 17:57:13 +0000 (19:57 +0200)] 
ACPICA: Change LVR to 8 bit value

In the LVR I2C resource entry to acpi_rs_convert_i2c_serial_bus[].

Link: https://github.com/acpica/acpica/commit/7650d4a889ea
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3952474.kQq0lBPeGt@rafael.j.wysocki
3 weeks agoACPICA: Fetch LVR I2C resource descriptor
Akhil R [Wed, 27 May 2026 17:56:38 +0000 (19:56 +0200)] 
ACPICA: Fetch LVR I2C resource descriptor

Add LVR I2C resource entry to acpi_rs_convert_i2c_serial_bus[].

Link: https://github.com/acpica/acpica/commit/c40411823510
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/23121545.EfDdHjke4D@rafael.j.wysocki
3 weeks agoACPICA: Add LVR to acrestyp.h
Akhil R [Wed, 27 May 2026 17:55:57 +0000 (19:55 +0200)] 
ACPICA: Add LVR to acrestyp.h

Add a new field called lvr to struct acpi_resource_i2c_serialbus.

Link: https://github.com/acpica/acpica/commit/e62e74baf7e0
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2354060.iZASKD2KPV@rafael.j.wysocki
3 weeks agoACPICA: Fix FADT 32/64X length mismatch warning
Abdelkader Boudih [Wed, 27 May 2026 17:55:21 +0000 (19:55 +0200)] 
ACPICA: Fix FADT 32/64X length mismatch warning

When the 64-bit address is set but bit_width is 0, the spec says
the legacy length should be used. That is valid firmware.

Skip the warning if bit_width is 0.

This avoids false warnings like:
  32/64X length mismatch in FADT/gpe0_block: 128/0

Tested on free_BSD 16.0-CURRENT.

Link: https://github.com/acpica/acpica/commit/b6387a387c51
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/8688362.T7Z3S40VBb@rafael.j.wysocki
3 weeks agoACPICA: Add modern standby DSM GUIDs
Daniel Schaefer [Wed, 27 May 2026 17:54:43 +0000 (19:54 +0200)] 
ACPICA: Add modern standby DSM GUIDs

Add AMD, Intel and Microsoft GUIDs for Low-power S0 Idle _DSM.

Link: https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf
Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-firmware-notifications
Link: https://github.com/torvalds/linux/blob/v6.18/drivers/acpi/x86/s2idle.c
Link: https://github.com/acpica/acpica/commit/cae0082158e4
Signed-off-by: Daniel Schaefer <dhs@frame.work>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3415679.aeNJFYEL58@rafael.j.wysocki
3 weeks agoACPICA: Add alias node support in namespace handling
ikaros [Wed, 27 May 2026 17:53:58 +0000 (19:53 +0200)] 
ACPICA: Add alias node support in namespace handling

 - Mark nodes as alias in ld_namespace2_begin() function.
 - Skip teardown for alias nodes in acpi_ns_detach_object() function.
 - Define ANOBJ_IS_ALIAS flag in aclocal.h.

Link: https://github.com/acpica/acpica/commit/cfcc46c4f717
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/14020896.uLZWGnKmhe@rafael.j.wysocki
3 weeks agoACPICA: Fix condition check in acpi_ps_parse_loop()
ikaros [Wed, 27 May 2026 17:53:12 +0000 (19:53 +0200)] 
ACPICA: Fix condition check in acpi_ps_parse_loop()

Fix condition check for AML_ELSE_OP in acpi_ps_parse_loop() to prevent
out-of-bounds access.

Link: https://github.com/acpica/acpica/commit/3b537b92336e
Signed-off-by: ikaros <void0red@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1959692.tdWV9SEqCh@rafael.j.wysocki
3 weeks agoACPICA: actbl2.h: ACPI 6.6: Updates for MADT MPWakeup
Pawel Chmielewski [Wed, 27 May 2026 17:52:32 +0000 (19:52 +0200)] 
ACPICA: actbl2.h: ACPI 6.6: Updates for MADT MPWakeup

ACPI 6.6 introduces "Test" command for Multiprocessor Wakeup as well as
resetting the Multiprocessor Wakeup Mailbox

Link: https://github.com/acpica/acpica/commit/a4f629dc90fc
Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2414431.ElGaqSPkdT@rafael.j.wysocki
3 weeks agoACPICA: actypes: Distinguish between D3hot/cold
Aymeric Wibo [Wed, 27 May 2026 17:51:50 +0000 (19:51 +0200)] 
ACPICA: actypes: Distinguish between D3hot/cold

And default `ACPI_STATE_D3` to D3cold.

Link: https://github.com/acpica/acpica/commit/c11cc9c68233
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5105913.31r3eYUQgx@rafael.j.wysocki
3 weeks agorust: helpers: add is_vmalloc_addr wrapper for NOMMU builds
Shivam Kalra [Fri, 22 May 2026 18:54:32 +0000 (00:24 +0530)] 
rust: helpers: add is_vmalloc_addr wrapper for NOMMU builds

Commit 47ac2a4b5cd8 ("rust: kvec: implement shrink_to for KVVec")
introduced a call to bindings::is_vmalloc_addr(). However, this
fails to compile on architectures where CONFIG_MMU is disabled,
resulting in the following build error:

    error[E0425]: cannot find function `is_vmalloc_addr` in crate `bindings`
       --> rust/kernel/alloc/kvec.rs:781:32
        |
    781 |         if !unsafe { bindings::is_vmalloc_addr(self.ptr.as_ptr().cast()) } {
        |                                ^^^^^^^^^^^^^^^ not found in `bindings`

When CONFIG_MMU is not set, is_vmalloc_addr() is defined as a
static inline function in <linux/mm.h> that unconditionally
returns false. Because bindgen skips static inline functions
when generating bindings, the symbol is completely missing from
the Rust bindings crate.

Fix this by providing a C helper wrapper, rust_helper_is_vmalloc_addr(),
in rust/helpers/vmalloc.c. This ensures the function is reliably
exposed to Rust regardless of the MMU configuration. On NOMMU builds,
this allows KVVec::shrink_to() to successfully compile and correctly
route all allocations through the kmalloc realloc path.

Fixes: 47ac2a4b5cd8 ("rust: kvec: implement shrink_to for KVVec")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605220811.LRplxeBR-lkp@intel.com/
Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260523-is-vmalloc-addr-build-fix-v1-1-73c919440c41@zohomail.in
[ Pasted exact compiler output and expanded it. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
3 weeks agonvme-pci: fix out-of-bounds access in nvme_setup_descriptor_pools
Mateusz Nowicki [Sat, 23 May 2026 08:28:16 +0000 (08:28 +0000)] 
nvme-pci: fix out-of-bounds access in nvme_setup_descriptor_pools

nvme_setup_descriptor_pools() indexes dev->descriptor_pools[] using the
numa_node forwarded from hctx->numa_node by its single caller,
nvme_init_hctx_common().  On a non-NUMA kernel hctx->numa_node is
NUMA_NO_NODE (-1).  Because the parameter was declared 'unsigned', the
value becomes UINT_MAX and the index walks off the array (sized to
nr_node_ids), faulting during nvme_alloc_ns() and leaving the namespace
without a /dev node.

Reproduces on any NVMe controller probed by a CONFIG_NUMA=n kernel:

  BUG: unable to handle page fault for address: ffff889101603d38
  RIP: 0010:nvme_init_hctx_common+0x5a/0x190 [nvme]
  Call Trace:
   nvme_init_hctx+0x10/0x20 [nvme]
   nvme_alloc_ns+0x9e/0xa10 [nvme_core]
   nvme_scan_ns+0x301/0x3b0 [nvme_core]
   nvme_scan_ns_async+0x23/0x30 [nvme_core]

Switch the parameter to int and fall back to node 0 when it is
NUMA_NO_NODE; node 0 is always present.

Fixes: d977506f8863 ("nvme-pci: make PRP list DMA pools per-NUMA-node")
Link: https://lore.kernel.org/r/20260309062840.2937858-2-iam@sung-woo.kim
Reported-by: Sung-woo Kim <iam@sung-woo.kim>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mateusz Nowicki <mateusz.nowicki@posteo.net>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agopinctrl: qcom: sm6115: Add egpio support
Stanislav Zaikin [Fri, 22 May 2026 14:11:48 +0000 (16:11 +0200)] 
pinctrl: qcom: sm6115: Add egpio support

This mirrors the egpio support added to sc7280/sm8450/sm8250/etc. This change
is necessary for GPIOs 98-112 (15 GPIOs) to be used as normal GPIOs.

Signed-off-by: Stanislav Zaikin <zstaseg@gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>
3 weeks agodrm/amdgpu: fix calling VM invalidation in amdgpu_hmm_invalidate_gfx
Christian König [Wed, 18 Feb 2026 11:31:29 +0000 (12:31 +0100)] 
drm/amdgpu: fix calling VM invalidation in amdgpu_hmm_invalidate_gfx

Otherwise we don't invalidate page tables on next CS.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b6444d1bcbc34f6f2a31a3aab3059be082f3683e)
Cc: stable@vger.kernel.org
3 weeks agoALSA: hda/hdmi: Use 'AC_PINSENSE_ELDV' to detect pinsense for Loongson
Huacai Chen [Wed, 27 May 2026 14:08:41 +0000 (22:08 +0800)] 
ALSA: hda/hdmi: Use 'AC_PINSENSE_ELDV' to detect pinsense for Loongson

Due to a hardware defect, for Loongson PCI HDMI devices with a reversion
ID of 2, the pin sense status must be determined via the ELD.

Add a codec flag, eld_jack_detect, to indicate this case, and do special
handlings in read_pin_sense().

Cc: stable@vger.kernel.org
Signed-off-by: Baoqi Zhang <zhangbaoqi@loongson.cn>
Signed-off-by: Haowei Zheng <zhenghaowei@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://patch.msgid.link/20260527140841.3407183-1-chenhuacai@loongson.cn
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 weeks agodrm/amdgpu: fix amdgpu_hmm_range_get_pages
Christian König [Wed, 18 Feb 2026 11:53:27 +0000 (12:53 +0100)] 
drm/amdgpu: fix amdgpu_hmm_range_get_pages

The notifier sequence must only be read once or otherwise we could work
with invalid pages.

While at it also fix the coding style, e.g. drop the pre-initialized
return value and use the common define for 2G range.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c08972f555945cda57b0adb72272a37910153390)
Cc: stable@vger.kernel.org
3 weeks agodrm/amdgpu/userq: use array instead of list for userq_vas
Sunil Khatri [Wed, 20 May 2026 11:09:49 +0000 (16:39 +0530)] 
drm/amdgpu/userq: use array instead of list for userq_vas

Use arrays instead of list for userq_vas since we have fixed no
of bos. Also, we dont have to worry to free that memory later
since this array would be free along with queue only.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ef7dc711a664b0c548ecfdf13a00436b7446b8e7)

3 weeks agodrm/amdgpu/userq: move mqd_destroy to later stage to keep core obj valid
Sunil Khatri [Wed, 20 May 2026 10:55:50 +0000 (16:25 +0530)] 
drm/amdgpu/userq: move mqd_destroy to later stage to keep core obj valid

mqd_destroy cleans up queue core objects like mqd and fw_object
which are needed for any pending fence to signal properly.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4ad65d610096498c8e265615aba42b3c47441bb5)

3 weeks agodrm/amdkfd: fix a vulnerability of integer overflow in kfd debugger
Eric Huang [Tue, 12 May 2026 14:19:52 +0000 (10:19 -0400)] 
drm/amdkfd: fix a vulnerability of integer overflow in kfd debugger

get_queue_ids() computes array_size = num_queues * sizeof(uint32_t),
which could overflow on 32-bit size_t build. using array_size()
instead, it saturates to SIZE_MAX on overflow.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2d57a0475f085c08b49312dfd8edcb461845f285)
Cc: stable@vger.kernel.org
3 weeks agodrm/amdgpu/userq: remove amdgpu_userq_create/destroy_object wrapper
Sunil Khatri [Wed, 20 May 2026 10:43:09 +0000 (16:13 +0530)] 
drm/amdgpu/userq: remove amdgpu_userq_create/destroy_object wrapper

Remove the amdgpu_userq_create/destroy_object wrappers and
use directly the kernel bo allocation function which does all the
things which are done in wrapper.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit deb02080ca5d3f015cf71e56067a39ef2f141998)

3 weeks agodrm/amd/pm/si: Disregard vblank time when no displays are connected
Timur Kristóf [Tue, 19 May 2026 08:41:54 +0000 (10:41 +0200)] 
drm/amd/pm/si: Disregard vblank time when no displays are connected

When no displays are connected, there is no vblank
happening so the power management code shouldn't
worry about it.

This fixes a regression that caused the memory clock
to be stuck at maximum when there were no displays
connected to a SI GPU.

Fixes: 9003a0746864 ("drm/amd/pm: Treat zero vblank time as too short in si_dpm (v3)")
Fixes: 9d73b107a61b ("drm/amd/pm: Use pm_display_cfg in legacy DPM (v2)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Jeremy Klarenbeek <jeremy.klarenbeek99@gmail.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6d87e0199f7b83735b56e422d59f170a201897a8)
Cc: stable@vger.kernel.org
3 weeks agodrm/amdkfd: Check for pdd drm file first in CRIU restore path
David Francis [Thu, 14 May 2026 14:31:20 +0000 (10:31 -0400)] 
drm/amdkfd: Check for pdd drm file first in CRIU restore path

CRIU restore ioctls are meant to be called by CRIU with no
existing drm file. There's an error path
for if the drm file unexpectedly exists. It was positioned so
it was missing a fput(drm_file).

Do that check earlier, as soon as we have the pdd.

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2bab781dac78916c5cc8de76345a4102449267d7)
Cc: stable@vger.kernel.org
3 weeks agodrm/amdgpu: fix potential overflow in fs_info.debugfs_name
Stanley.Yang [Mon, 11 May 2026 08:49:19 +0000 (16:49 +0800)] 
drm/amdgpu: fix potential overflow in fs_info.debugfs_name

Use snprintf() with sizeof(fs_info.debugfs_name) so a long RAS block
name plus the "_err_inject" suffix cannot overflow the 32-byte buffer.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1a58070fda26857a8f6acc0ab05428e60d5c6844)

3 weeks agodrm/amdgpu/userq: make sure queue is valid in the hang_detect_work
Sunil Khatri [Mon, 18 May 2026 14:28:08 +0000 (19:58 +0530)] 
drm/amdgpu/userq: make sure queue is valid in the hang_detect_work

Thread 1: Running amdgpu_userq_destroy which eventually remove
the queue from door bell and set userq_mgr = NULL.

Thread2: An interrupt might have scheduled the hang_detect_work
which still need userq_mgr to be valid but could get an NULL
ptrs.

To fix that make sure we cancel the hang_detect_work again before
setting userq_mgr to NULL.

Along with that we also need all the queue va to remain valid till
we could be running anything on the queue and hence moving the
userq_va post hang_detect handler is cancelled.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1a66ceb98b137d18d303b9889f0e7d8c4db73943)

3 weeks agodrm/amdgpu/userq: reserve root bo without interruption
Sunil Khatri [Mon, 18 May 2026 13:25:25 +0000 (18:55 +0530)] 
drm/amdgpu/userq: reserve root bo without interruption

Fix the code to make it an uninterruptible reservation
for root bo.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d409ab4e387d94b2e593d558b54b7bfd315e0e75)

3 weeks agodrm/amdgpu/userq: add amdgpu_bo_unpin when amdgpu_ttm_alloc_gart fails
Sunil Khatri [Mon, 18 May 2026 13:03:00 +0000 (18:33 +0530)] 
drm/amdgpu/userq: add amdgpu_bo_unpin when amdgpu_ttm_alloc_gart fails

Unpin the wptr_obj->obj when amdgpu_ttm_alloc_gart fails.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d8145c437ccdc2d91c579787290f82788172bea0)

3 weeks agodrm/amdgpu: simplify return value in amdgpu_userq_get_doorbell_index
Sunil Khatri [Mon, 18 May 2026 12:12:15 +0000 (17:42 +0530)] 
drm/amdgpu: simplify return value in amdgpu_userq_get_doorbell_index

amdgpu_userq_get_doorbell_index returns a uint64 type index
as well as a int type failure values. Simplifying this and
using a int type return value and getting the index in input pointer
of type uint64 type.

Also since it's used at once place making it static would be better.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e947ec9d0529d5f93dbdb33cd197347f6a7b2922)

3 weeks agodrm/amdkfd: fix NULL pointer bug in svm_range_set_attr
Eric Huang [Thu, 7 May 2026 19:51:49 +0000 (15:51 -0400)] 
drm/amdkfd: fix NULL pointer bug in svm_range_set_attr

The process_info could be NULL if user doesn't call kfd_ioctl_acquire_vm
before calling kfd_ioctl_svm.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 83a26c812e0529eb040d31a76f73e33e637243d4)
Cc: stable@vger.kernel.org
3 weeks agodrm/amd/display: Write REFCLK to 48MHz on DCN21
Ivan Lipski [Thu, 14 May 2026 15:53:50 +0000 (11:53 -0400)] 
drm/amd/display: Write REFCLK to 48MHz on DCN21

[Why&How]
dccg21_init() calls dccg2_init() which hardcodes 100MHz refclk values
for MICROSECOND_TIME_BASE_DIV and MILLISECOND_TIME_BASE_DIV. DCN21
uses 48MHz refclk, so the wrong values corrupt DCCG timing and cause eDP
link training failure on cold boot.

Write the correct 48MHz values directly instead of calling dccg2_init().

v2:
Fixed typo

Fixes: e6e2b956fc81 ("drm/amd/display: Add missing DCCG register entries for DCN20-DCN316")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5272
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5311
Reported-by: Max Chernoff <git@maxchernoff.ca>
Tested-by: Max Chernoff <git@maxchernoff.ca>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 08236c3ef284cd2d110e5e3d51fc9615e551f9dc)
Cc: stable@vger.kernel.org
3 weeks agodrm/amdgpu/userq: Fix the mutex_init cleanup for fence_drv_lock
Sunil Khatri [Tue, 19 May 2026 09:42:42 +0000 (15:12 +0530)] 
drm/amdgpu/userq: Fix the mutex_init cleanup for fence_drv_lock

mutex fence_drv_lock is destroyed in amdgpu_userq_fence_driver_free
also in one of the jump condition mutex_destroy is also called leading
to double mutex_destroy.

So rearranging the code so amdgpu_userq_fence_driver_free takes care
of the clean up along with mutex_destroy.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 384dbef269d101e5b671fc7b942c56734cd1d186)

3 weeks agodrm/amdgpu/userq: Fix doorbell object cleanup of queue
Sunil Khatri [Tue, 19 May 2026 09:32:00 +0000 (15:02 +0530)] 
drm/amdgpu/userq: Fix doorbell object cleanup of queue

Unpin and unref the door bell obj if queue creation fails before
initialization is complete.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8c7506f7ba945f21e5abe7f8eac0a3acca6b5330)

3 weeks agodrm/amdgpu: check num_entries in GEM_OP GET_MAPPING_INFO
Ziyi Guo [Sun, 8 Feb 2026 00:02:55 +0000 (00:02 +0000)] 
drm/amdgpu: check num_entries in GEM_OP GET_MAPPING_INFO

kvcalloc(args->num_entries, sizeof(*vm_entries), GFP_KERNEL) at
amdgpu_gem.c:1050 uses the user-supplied num_entries directly without
any upper bounds check. Since num_entries is a __u32 and
sizeof(drm_amdgpu_gem_vm_entry) is 32 bytes, a large num_entries
produces an allocation exceeding INT_MAX, triggering
WARNING in __kvmalloc_node_noprof(), causing a kernel WARNING,
TAINT_WARN, and panic on CONFIG_PANIC_ON_WARN=y systems.

Add a size bounds check before we invoke the kvzalloc() to
reject oversized num_entries early with -EINVAL.

Fixes: 4d82724f7f2b ("drm/amdgpu: Add mapping info option for GEM_OP ioctl")
Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1fe7bf5457f6efd7be60b17e23163ba54341d73d)
Cc: stable@vger.kernel.org
3 weeks agodrm/amdgpu: fix lock leak on ENOMEM in AMDGPU_GEM_OP_GET_MAPPING_INFO
Michael Bommarito [Sun, 17 May 2026 13:17:42 +0000 (09:17 -0400)] 
drm/amdgpu: fix lock leak on ENOMEM in AMDGPU_GEM_OP_GET_MAPPING_INFO

The AMDGPU_GEM_OP_GET_MAPPING_INFO branch of amdgpu_gem_op_ioctl()
holds three cleanup-tracked resources before calling kvcalloc():
the drm_gem_object reference from drm_gem_object_lookup(), the
drm_exec lock on the looked-up GEM via drm_exec_lock_obj(), and
the drm_exec lock on the per-process VM root page directory via
amdgpu_vm_lock_pd().  All three are released by the out_exec
label that every other error path in this function jumps to.
The kvcalloc() failure path returns -ENOMEM directly, skipping
out_exec and leaking all three.

The leaked per-process VM root PD dma_resv lock is the
load-bearing leak: any subsequent operation on the same VM
(further GEM ops, command-submission, eviction, TTM shrinker
callbacks) blocks on the held lock.  DRM_IOCTL_AMDGPU_GEM_OP is
DRM_AUTH | DRM_RENDER_ALLOW, so this is an unprivileged-local
denial of service against the caller's GPU context, reachable
by any process with /dev/dri/renderD* access.

Route the failure through out_exec so drm_exec_fini() and
drm_gem_object_put() run.

Reproduced on stock 7.0.0-10, Ryzen 7 5700U / Radeon Vega
(Lucienne): the failing ioctl returns -ENOMEM and a second
GET_MAPPING_INFO on the same fd then blocks in
drm_exec_lock_obj() on the leaked dma_resv.  SIGKILL on the
caller does not reap the task; the fd-release path during
process exit goes through amdgpu_gem_object_close() ->
drm_exec_prepare_obj() on the same lock, leaving the task in D
state until the box is rebooted.  The patched kernel was not
rebuilt and re-tested on this hardware; the fix is mechanical.
Tested on a single Lucienne / Vega box only.

Ziyi Guo posted an independent INT_MAX-bound check for
args->num_entries in the same branch [1]; the two patches are
complementary and can land in either order.

Fixes: 4d82724f7f2b ("drm/amdgpu: Add mapping info option for GEM_OP ioctl")
Link: https://lore.kernel.org/all/20260208000255.4073363-1-n7l8m4@u.northwestern.edu/
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b69d3256d79de15f54c322986ff4da68f1d65b0a)
Cc: stable@vger.kernel.org
3 weeks agonvme: target: rdma: fix ndev refcount leak on queue connect
Wentao Liang [Wed, 27 May 2026 08:45:44 +0000 (08:45 +0000)] 
nvme: target: rdma: fix ndev refcount leak on queue connect

nvmet_rdma_queue_connect() calls nvmet_rdma_find_get_device() which
acquires a reference on the returned ndev via kref_get(). On the path
where the host queue backlog is exceeded and the function returns
NVME_SC_CONNECT_CTRL_BUSY, reference of ndev is not released, leaking
the kref.

Fix this by adding a goto to the existing put_device label before the
early return.

Fixes: 31deaeb11ba7 ("nvmet-rdma: avoid circular locking dependency on install_queue()")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agodrm/xe: Restore IDLEDLY regiter on engine reset
Balasubramani Vivekanandan [Fri, 22 May 2026 16:35:32 +0000 (22:05 +0530)] 
drm/xe: Restore IDLEDLY regiter on engine reset

Wa_16023105232 programs the register IDLEDLY. The register is reset
whenever the engine is reset. Therefore it should be added to the GuC
save-restore register list for it to be restored after reset.

Fixes: 7c53ff050ba8 ("drm/xe: Apply Wa_16023105232")
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260522163531.1365540-2-balasubramani.vivekanandan@intel.com
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
(cherry picked from commit df1cfe24743a93b71eab27687e148ab8ae9b69e3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 weeks agonvme-multipath: fix flex array size in struct nvme_ns_head
Nilay Shroff [Wed, 27 May 2026 06:20:00 +0000 (11:50 +0530)] 
nvme-multipath: fix flex array size in struct nvme_ns_head

struct nvme_ns_head contains a flexible array member, current_path[],
which is indexed using the NUMA node ID:
head->current_path[numa_node_id()]

The structure is currently allocated as:
size = sizeof(struct nvme_ns_head) +
       (num_possible_nodes() * sizeof(struct nvme_ns *));
head = kzalloc(size, GFP_KERNEL);

This allocation assumes that NUMA node IDs are sequential and densely
packed from 0 .. num_possible_nodes() - 1. While this assumption holds
on many systems, it is not always true on some architectures such as
powerpc.

On some powerpc systems, NUMA node IDs can be sparse. For example:
NUMA:
  NUMA node(s):              6
  NUMA node0 CPU(s):         80-159
  NUMA node8 CPU(s):         0-79
  NUMA node252 CPU(s):
  NUMA node253 CPU(s):
  NUMA node254 CPU(s):
  NUMA node255 CPU(s):

That is, the possible/online NUMA node IDs are: 0, 8, 252, 253, 254, 255
In this case: num_possible_nodes() = 6

So memory is allocated for only 6 entries in current_path[]. However,
the array is later indexed using the actual NUMA node ID. As a result,
accesses such as:
head->current_path[8] or
head->current_path[252]
goes out of bounds, leading to the following KASAN splat:

==================================================================
BUG: KASAN: slab-out-of-bounds in nvme_mpath_revalidate_paths+0x22c/0x290 [nvme_core]
Write of size 8 at addr c00020003bda35b8 by task kworker/u641:2/1997

CPU: 1 UID: 0 PID: 1997 Comm: kworker/u641:2 Not tainted 7.1.0-rc5-dirty #14 PREEMPT(lazy)
Hardware name: 8335-GTH POWER9 0x4e1202 opal:skiboot-v6.5.3-35-g1851b2a06 PowerNV
Workqueue: async async_run_entry_fn
Call Trace:
[c000200037fa7510] [c0000000021c23d4] dump_stack_lvl+0x88/0xdc (unreliable)
[c000200037fa7540] [c0000000009fda90] print_report+0x22c/0x67c
[c000200037fa7630] [c0000000009fd508] kasan_report+0x108/0x220
[c000200037fa7740] [c0000000009fff48] __asan_store8+0xe8/0x120
[c000200037fa7760] [c008000018e76474] nvme_mpath_revalidate_paths+0x22c/0x290 [nvme_core]
[c000200037fa7800] [c008000018e6556c] nvme_update_ns_info+0x4a4/0x5e0 [nvme_core]
[c000200037fa7a50] [c008000018e66270] nvme_alloc_ns+0x6d8/0x1a70 [nvme_core]
[c000200037fa7c20] [c008000018e679fc] nvme_scan_ns+0x3f4/0x630 [nvme_core]
[c000200037fa7d10] [c00000000031f22c] async_run_entry_fn+0x9c/0x3a0
[c000200037fa7db0] [c0000000002fa544] process_one_work+0x414/0xa10
[c000200037fa7ec0] [c0000000002fbf00] worker_thread+0x320/0x640
[c000200037fa7f80] [c00000000030d0f8] kthread+0x278/0x290
[c000200037fa7fe0] [c00000000000ded8] start_kernel_thread+0x14/0x18

Allocated by task 1997 on cpu 1 at 35.928317s:

The buggy address belongs to the object at c00020003bda3000
 which belongs to the cache kmalloc-rnd-15-2k of size 2048
The buggy address is located 16 bytes to the right of
 allocated 1448-byte region [c00020003bda3000c00020003bda35a8)

The buggy address belongs to the physical page:

Memory state around the buggy address:
 c00020003bda3480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 c00020003bda3500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>c00020003bda3580: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
                                        ^
 c00020003bda3600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 c00020003bda3680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Fix this by allocating the flexible array using nr_node_ids instead
of num_possible_nodes(). Since nr_node_ids represents the maximum
possible NUMA node IDs, indexing current_path[] using numa_node_id()
becomes safe even on systems with sparse node IDs.

Fixes: f333444708f8 ("nvme: take node locality into account when selecting a path")
Tested-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com>
Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agoublk: set canceling flag even when disk is not allocated
Ming Lei [Wed, 27 May 2026 14:40:42 +0000 (09:40 -0500)] 
ublk: set canceling flag even when disk is not allocated

ublk_start_cancel() previously bailed out early when ublk_get_disk()
returned NULL, treating it as "our disk has been dead".  That is correct
for the post-teardown case, but it also wrongly covers the pre-start
case: ublk_ctrl_start_dev() has not assigned ub->ub_disk yet, while
io_uring is already tearing down the daemon's uring_cmds via
ublk_uring_cmd_cancel_fn().

In that window, the cancel path skips ublk_set_canceling(), so
ubq->canceling stays false, even though ublk_cancel_cmd() goes on to
NULL out every io->cmd.  ublk_ctrl_start_dev() then proceeds to set
ub->ub_disk, call add_disk(), and schedule partition_scan_work.  When
ublk_partition_scan_work() runs bdev_disk_changed() and the resulting
read reaches ublk_queue_rq() -> ublk_queue_cmd(), the ubq->canceling
check passes and the code dereferences the NULL io->cmd:

  BUG: kernel NULL pointer dereference, address: 0000000000000018
  RIP: ublk_queue_cmd drivers/block/ublk_drv.c [inline]
  RIP: ublk_queue_rq+0x73/0x100
  Call Trace:
   blk_mq_dispatch_rq_list+0x1c5/0xca0
   ...
   bdev_disk_changed+0x3d4/0x5e0
   ublk_partition_scan_work+0x89/0xe0
   process_one_work+0x344/0x8a0

Fix it by always setting ub->canceling / ubq->canceling under
cancel_mutex.  When the disk is allocated, keep the existing
quiesce/unquiesce dance so the flag is observed across the
ublk_queue_rq() barrier.  When the disk is not yet allocated, there is
no request_queue and ublk_queue_rq() cannot be running concurrently, so
simply flipping the flag is sufficient: any subsequent I/O - including
the partition scan started by ublk_ctrl_start_dev() - will see
canceling set and be aborted via __ublk_queue_rq_common().

Fixes: 7fc4da6a304b ("ublk: scan partition in async way")
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260527144042.2095194-1-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agonvme: use DEFINE_SIMPLE_SYSFS_GROUP_VISIBLE for multipath_sysfs
John Garry [Wed, 13 May 2026 09:50:30 +0000 (09:50 +0000)] 
nvme: use DEFINE_SIMPLE_SYSFS_GROUP_VISIBLE for multipath_sysfs

Use DEFINE_SIMPLE_SYSFS_GROUP_VISIBLE instead of
DEFINE_SYSFS_GROUP_VISIBLE, which means that we can drop
multipath_sysfs_attr_visible().

Incidentally, multipath_sysfs_attr_visible() should have returned a
umode_t.

This idea was suggested by Ben Marzinski elsewhere.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agodrm/radeon/radeon_connectors: remove radeon_connector_free_edid
Joshua Peisach [Sat, 23 May 2026 14:27:48 +0000 (10:27 -0400)] 
drm/radeon/radeon_connectors: remove radeon_connector_free_edid

Since we are using struct drm_edid, we can call drm_edid_free directly.
Also make sure to set the pointer to NULL afterwards.

Signed-off-by: Joshua Peisach <jpeisach@ubuntu.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/radeon/radeon_connectors: use struct drm_edid instead of struct edid
Joshua Peisach [Sat, 23 May 2026 14:27:47 +0000 (10:27 -0400)] 
drm/radeon/radeon_connectors: use struct drm_edid instead of struct edid

This was done with amdgpu, just bringing the same patch to radeon.

The goal of this is to stop using the deprecated edid functions,
specifically drm_connector_update_edid_property. Switch to struct
drm_edid and the appropriate function replacements for the new type.

Also, for audio, use the raw edid for SADB allocations and for
equivalent drm_edid_is_digital expressions.

Signed-off-by: Joshua Peisach <jpeisach@ubuntu.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Initialize dsc_caps to 0
Ivan Lipski [Wed, 13 May 2026 21:53:57 +0000 (17:53 -0400)] 
drm/amd/display: Initialize dsc_caps to 0

[Why&How]
If we don't do that we make DSC decisions based on random
inputs, which might result in disallowing DSC when the
monitor and HW support it.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: fix calling VM invalidation in amdgpu_hmm_invalidate_gfx
Christian König [Wed, 18 Feb 2026 11:31:29 +0000 (12:31 +0100)] 
drm/amdgpu: fix calling VM invalidation in amdgpu_hmm_invalidate_gfx

Otherwise we don't invalidate page tables on next CS.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: fix amdgpu_hmm_range_get_pages
Christian König [Wed, 18 Feb 2026 11:53:27 +0000 (12:53 +0100)] 
drm/amdgpu: fix amdgpu_hmm_range_get_pages

The notifier sequence must only be read once or otherwise we could work
with invalid pages.

While at it also fix the coding style, e.g. drop the pre-initialized
return value and use the common define for 2G range.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/ras: cap pending_ecc_list size
Stanley.Yang [Mon, 11 May 2026 11:44:16 +0000 (19:44 +0800)] 
drm/amd/ras: cap pending_ecc_list size

Drop new entries once pending_ecc_count hits RAS_UMC_PENDING_ECC_MAX
(8192) so an ECC storm or repeated UMC error injection cannot exhaust
kernel memory. Dropped events are counted and reported via a
rate-limited warning.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agonvmet-tcp: check return value of nvmet_tcp_set_queue_sock
Geliang Tang [Tue, 26 May 2026 09:28:05 +0000 (17:28 +0800)] 
nvmet-tcp: check return value of nvmet_tcp_set_queue_sock

The return value of nvmet_tcp_set_queue_sock() is currently ignored in
nvmet_tcp_tls_handshake_done(). If it fails (e.g., due to the socket
not being in TCP_ESTABLISHED state), the socket callbacks will not be
properly set, leading to queue and socket leakage.

Fix this by capturing the return value and calling
nvmet_tcp_schedule_release_queue() on failure to ensure proper cleanup.

Fixes: 675b453e0241 ("nvmet-tcp: enable TLS handshake upcall")
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Keith Busch <kbusch@kernel.org>
3 weeks agodrm/amdgpu: init locals in umc_v12_0_convert_error_address
Stanley.Yang [Mon, 11 May 2026 09:27:29 +0000 (17:27 +0800)] 
drm/amdgpu: init locals in umc_v12_0_convert_error_address

row, col, col_lower, row_lower, row_high and bank could be read on
code paths that never assign them. Initialize them to 0.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: use array instead of list for userq_vas
Sunil Khatri [Wed, 20 May 2026 11:09:49 +0000 (16:39 +0530)] 
drm/amdgpu/userq: use array instead of list for userq_vas

Use arrays instead of list for userq_vas since we have fixed no
of bos. Also, we dont have to worry to free that memory later
since this array would be free along with queue only.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: move mqd_destroy to later stage to keep core obj valid
Sunil Khatri [Wed, 20 May 2026 10:55:50 +0000 (16:25 +0530)] 
drm/amdgpu/userq: move mqd_destroy to later stage to keep core obj valid

mqd_destroy cleans up queue core objects like mqd and fw_object
which are needed for any pending fence to signal properly.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdkfd: fix a vulnerability of integer overflow in kfd debugger
Eric Huang [Tue, 12 May 2026 14:19:52 +0000 (10:19 -0400)] 
drm/amdkfd: fix a vulnerability of integer overflow in kfd debugger

get_queue_ids() computes array_size = num_queues * sizeof(uint32_t),
which could overflow on 32-bit size_t build. using array_size()
instead, it saturates to SIZE_MAX on overflow.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd: Add dedicated helper for amdgpu_device_find_parent()
Mario Limonciello [Wed, 20 May 2026 15:46:17 +0000 (10:46 -0500)] 
drm/amd: Add dedicated helper for amdgpu_device_find_parent()

There are a few cases that code walks up the topology to find the
link partner of the integrated switch in a dGPU.  Split this out
to a helper and call in all places.

This does have a functional change that amdgpu_device_gpu_bandwidth()
doesn't cache the internal link but only the parent.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: remove amdgpu_userq_create/destroy_object wrapper
Sunil Khatri [Wed, 20 May 2026 10:43:09 +0000 (16:13 +0530)] 
drm/amdgpu/userq: remove amdgpu_userq_create/destroy_object wrapper

Remove the amdgpu_userq_create/destroy_object wrappers and
use directly the kernel bo allocation function which does all the
things which are done in wrapper.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Fix TOCTOU on UniRAS command response size
Chenglei Xie [Mon, 11 May 2026 18:13:45 +0000 (14:13 -0400)] 
drm/amdgpu: Fix TOCTOU on UniRAS  command response size

The guest maps the PF response in shared VRAM (struct ras_cmd_ctx in the
command buffer). After amdgpu_virt_send_remote_ras_cmd() returns, the code
validated rcmd->output_size against the caller buffer, then copied
rcmd->output_buff_raw using rcmd->output_size again. A malicious PF could
change output_size between those reads so the memcpy length exceeds the
caller’s output_size and overflows guest stack or heap buffers.

Snapshot output_size with READ_ONCE() once, assign cmd->output_size from
that value, and use the same snapshot for the bounds check and memcpy.
Also read cmd_res once with READ_ONCE() so the error branch and
cmd->cmd_res assignment do not observe different values from shared memory.

Signed-off-by: Chenglei Xie <Chenglei.Xie@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: bound SR-IOV RAS CPER dump parsing against used_size
Chenglei Xie [Mon, 11 May 2026 19:24:29 +0000 (15:24 -0400)] 
drm/amdgpu: bound SR-IOV RAS CPER dump parsing against used_size

The VF copies a PF-provided CPER telemetry blob and walks records using
cper_dump->count and each entry's record_length. count is u64 while the
loop used u32, so a large count could loop indefinitely. record_length was
not limited to the kmemdup'd region, so the first iteration could read far
past the allocation; record_length == 0 could spin forever on the same
entry. Together that allowed a malicious hypervisor to leak heap past the
blob into the CPER ring or hang the guest.

Require used_size to cover the fixed header before buf and stay within the
telemetry cap. Track remaining bytes in buf, cap iterations with u64 and
CPER_MAX_ALLOWED_COUNT, and reject record_length outside
[sizeof(cper_hdr), remaining] before writing to the ring.

Signed-off-by: Chenglei Xie <Chenglei.Xie@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm/si: Notify the SMC when switching to AC
Jeremy Klarenbeek [Tue, 19 May 2026 08:41:58 +0000 (10:41 +0200)] 
drm/amd/pm/si: Notify the SMC when switching to AC

There are some platforms that don't have a dedicated
GPIO line to manage the AC/DC switch. In this case,
the SI SMC automatically notices when switching to DC,
but needs to be notified when switching to AC.

Fixup and use si_notify_hw_of_powersource() which was
previously hidden behind an "#if 0".

This fixes some SI laptop GPUs to be able to use their
performance power states after switching from DC to AC.

Some affected GPUs are:
FirePro W4170M - Dell Precision M2800
Radeon HD 8790M - Dell Latitude E6540

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Co-developed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Jeremy Klarenbeek <jeremy.klarenbeek99@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm/si: Fix updating clock limits from power states
Jeremy Klarenbeek [Tue, 19 May 2026 08:41:57 +0000 (10:41 +0200)] 
drm/amd/pm/si: Fix updating clock limits from power states

VBIOS can contain conflicting values between:
- the maximum allowed clocks and voltages on AC or DC
- the clocks and voltages in power states on AC or DC

Update maximum clock (and voltage) limits for both AC/DC
and take the highest value from the VBIOS limits and
the performance/battery power states. Previously this
was only done for AC, but is also needed for DC.

This commit fixes the behaviour on some laptop GPUs,
where the VBIOS limit was set to the lowest possible
clock frequency, so the GPU was stuck on the lowest
possible power level on battery.

Some affected GPUs are:
FirePro W4170M (Dell Precision M2800)
Radeon HD 8790M (Dell Latitude E6540)
and possibly other laptop GPUs.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Co-developed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Jeremy Klarenbeek <jeremy.klarenbeek99@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm/smu7: Notify SMU7 of DC->AC switch
Timur Kristóf [Tue, 19 May 2026 08:41:56 +0000 (10:41 +0200)] 
drm/amd/pm/smu7: Notify SMU7 of DC->AC switch

When ATOM_PP_PLATFORM_CAP_HARDWAREDC is set,
the SMU has a GPIO pin for detecting AC/DC switch
and everything works automatically.

Otherwise when there is no GPIO pin, the SMU can
automatically detect switching to DC, but needs
to be notified of switching to AC.

Use PPSMC_MSG_RunningOnAC to notify the SMC
when switching to AC.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Rename enable_bapm() to notify_ac_dc()
Timur Kristóf [Tue, 19 May 2026 08:41:55 +0000 (10:41 +0200)] 
drm/amd/pm: Rename enable_bapm() to notify_ac_dc()

No functional changes, just change the name of this
function pointer to be more generic.

BAPM refers to a specific feature on KV, but other kinds of
ASICs may also need the SMU to be notified on AC/DC changes.

Also remove the argument and use adev->pm.ac_power instead.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm/si: Disregard vblank time when no displays are connected
Timur Kristóf [Tue, 19 May 2026 08:41:54 +0000 (10:41 +0200)] 
drm/amd/pm/si: Disregard vblank time when no displays are connected

When no displays are connected, there is no vblank
happening so the power management code shouldn't
worry about it.

This fixes a regression that caused the memory clock
to be stuck at maximum when there were no displays
connected to a SI GPU.

Fixes: 9003a0746864 ("drm/amd/pm: Treat zero vblank time as too short in si_dpm (v3)")
Fixes: 9d73b107a61b ("drm/amd/pm: Use pm_display_cfg in legacy DPM (v2)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Jeremy Klarenbeek <jeremy.klarenbeek99@gmail.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Delete PP_DAL_POWERLEVEL
Timur Kristóf [Tue, 19 May 2026 10:21:18 +0000 (12:21 +0200)] 
drm/amd/pm: Delete PP_DAL_POWERLEVEL

Not used and not needed anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Delete get_dal_power_level
Timur Kristóf [Tue, 19 May 2026 10:21:17 +0000 (12:21 +0200)] 
drm/amd/pm: Delete get_dal_power_level

Not needed anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Delete vddc_dep_on_dal_pwrl
Timur Kristóf [Tue, 19 May 2026 10:21:16 +0000 (12:21 +0200)] 
drm/amd/pm: Delete vddc_dep_on_dal_pwrl

It was not used by anything anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Delete non-functional SMU8 get_dal_power_level implementation
Timur Kristóf [Tue, 19 May 2026 10:21:15 +0000 (12:21 +0200)] 
drm/amd/pm: Delete non-functional SMU8 get_dal_power_level implementation

This function was effectively a no-op because it always
returned the maximum possible power level, because the
maximum voltage is in millivolts while the dependency
table didn't contain actual voltages.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Delete dummy get_dal_power_level implementations
Timur Kristóf [Tue, 19 May 2026 10:21:14 +0000 (12:21 +0200)] 
drm/amd/pm: Delete dummy get_dal_power_level implementations

These implementations did not actually return
the DAL power level, so they were effectively
a no-op.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Delete unused get_display_power_level() function
Timur Kristóf [Tue, 19 May 2026 10:21:13 +0000 (12:21 +0200)] 
drm/amd/pm: Delete unused get_display_power_level() function

Was not called from anywhere.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Delete dm_pp_clocks_state
Timur Kristóf [Tue, 19 May 2026 10:21:12 +0000 (12:21 +0200)] 
drm/amd/display: Delete dm_pp_clocks_state

It isn't used by anything anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Delete disp_clk_voltage from integrated info (v2)
Timur Kristóf [Tue, 19 May 2026 10:21:11 +0000 (12:21 +0200)] 
drm/amd/display: Delete disp_clk_voltage from integrated info (v2)

Only DCE 11.0 relies on this information and even that
didn't use this field, because it queries the information
from the pplib. It also filled the field incorrectly on
that version.

On newer GPUs, the VIOS integrated info no longer contains
display clock voltage dependencies, so we don't need it.

v2:
- Also delete some code wrapped in #if 0

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Delete max_clks_by_state from DCE clock manager (v2)
Timur Kristóf [Tue, 19 May 2026 10:21:10 +0000 (12:21 +0200)] 
drm/amd/display: Delete max_clks_by_state from DCE clock manager (v2)

It was not used by anything anymore.

Note that the parts of DC that need this information actually
already query it from the pplib and don't use the hardcoded
information from max_clks_by_state.

v2:
- Also delete state_dependent_clocks

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Set max supported display clock without max_clks_by_state (v2)
Timur Kristóf [Tue, 19 May 2026 10:21:09 +0000 (12:21 +0200)] 
drm/amd/display: Set max supported display clock without max_clks_by_state (v2)

The max_clks_by_state was based on hardcoded values, which are
not really used anywhere, only to know the maximum clock.
Just hardcode the same maximum clock for each DCE version.

v2:
- Use previous max display clock for DCE 11.2

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Delete max_clocks_state
Timur Kristóf [Tue, 19 May 2026 10:21:08 +0000 (12:21 +0200)] 
drm/amd/display: Delete max_clocks_state

It's not used by anything anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Remove min/max clock levels from clk_mgr (v2)
Timur Kristóf [Tue, 19 May 2026 10:21:07 +0000 (12:21 +0200)] 
drm/amd/display: Remove min/max clock levels from clk_mgr (v2)

These fields are not used by anything anymore.

v2:
- Delete dm_pp_get_static_clocks()
- Delete pp_to_dc_powerlevel_state()

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Delete dce_get_required_clocks_state()
Timur Kristóf [Tue, 19 May 2026 10:21:06 +0000 (12:21 +0200)] 
drm/amd/display: Delete dce_get_required_clocks_state()

It is not called from anywhere anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdkfd: Check for pdd drm file first in CRIU restore path
David Francis [Thu, 14 May 2026 14:31:20 +0000 (10:31 -0400)] 
drm/amdkfd: Check for pdd drm file first in CRIU restore path

CRIU restore ioctls are meant to be called by CRIU with no
existing drm file. There's an error path
for if the drm file unexpectedly exists. It was positioned so
it was missing a fput(drm_file).

Do that check earlier, as soon as we have the pdd.

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: fix potential overflow in fs_info.debugfs_name
Stanley.Yang [Mon, 11 May 2026 08:49:19 +0000 (16:49 +0800)] 
drm/amdgpu: fix potential overflow in fs_info.debugfs_name

Use snprintf() with sizeof(fs_info.debugfs_name) so a long RAS block
name plus the "_err_inject" suffix cannot overflow the 32-byte buffer.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: make sure queue is valid in the hang_detect_work
Sunil Khatri [Mon, 18 May 2026 14:28:08 +0000 (19:58 +0530)] 
drm/amdgpu/userq: make sure queue is valid in the hang_detect_work

Thread 1: Running amdgpu_userq_destroy which eventually remove
the queue from door bell and set userq_mgr = NULL.

Thread2: An interrupt might have scheduled the hang_detect_work
which still need userq_mgr to be valid but could get an NULL
ptrs.

To fix that make sure we cancel the hang_detect_work again before
setting userq_mgr to NULL.

Along with that we also need all the queue va to remain valid till
we could be running anything on the queue and hence moving the
userq_va post hang_detect handler is cancelled.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: reserve root bo without interruption
Sunil Khatri [Mon, 18 May 2026 13:25:25 +0000 (18:55 +0530)] 
drm/amdgpu/userq: reserve root bo without interruption

Fix the code to make it an uninterruptible reservation
for root bo.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: add amdgpu_bo_unpin when amdgpu_ttm_alloc_gart fails
Sunil Khatri [Mon, 18 May 2026 13:03:00 +0000 (18:33 +0530)] 
drm/amdgpu/userq: add amdgpu_bo_unpin when amdgpu_ttm_alloc_gart fails

Unpin the wptr_obj->obj when amdgpu_ttm_alloc_gart fails.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: simplify return value in amdgpu_userq_get_doorbell_index
Sunil Khatri [Mon, 18 May 2026 12:12:15 +0000 (17:42 +0530)] 
drm/amdgpu: simplify return value in amdgpu_userq_get_doorbell_index

amdgpu_userq_get_doorbell_index returns a uint64 type index
as well as a int type failure values. Simplifying this and
using a int type return value and getting the index in input pointer
of type uint64 type.

Also since it's used at once place making it static would be better.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdkfd: fix NULL pointer bug in svm_range_set_attr
Eric Huang [Thu, 7 May 2026 19:51:49 +0000 (15:51 -0400)] 
drm/amdkfd: fix NULL pointer bug in svm_range_set_attr

The process_info could be NULL if user doesn't call kfd_ioctl_acquire_vm
before calling kfd_ioctl_svm.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agoblk-throttle: schedule parent dispatch in tg_flush_bios()
Tao Cui [Fri, 22 May 2026 09:15:30 +0000 (17:15 +0800)] 
blk-throttle: schedule parent dispatch in tg_flush_bios()

tg_flush_bios() schedules pending_timer on the child tg's own
service_queue, which causes throtl_pending_timer_fn() to dispatch from
the child's pending_tree.  For leaf cgroups this tree is empty, so the
timer fires and exits without dispatching the throttled bio.

The throttled bio sits in the parent's pending_tree with disptime set
to jiffies (THROTL_TG_CANCELING zeroes all dispatch times), but the
parent's timer is never explicitly rescheduled.  The bio only gets
dispatched when the parent timer eventually fires at its previously
scheduled expiry.

Fix by calling throtl_schedule_next_dispatch(sq->parent_sq, true)
instead, matching what tg_set_limit() already does.  This forces the
parent's dispatch cycle to run immediately and flush all canceling
bios without waiting for a stale timer.

For the device deletion path (blk_throtl_cancel_bios), directly
complete throttled bios with EIO via bio_io_error() instead of
dispatching them through the timer -> work -> submission chain.
This avoids a race with the SCSI state machine where bios can reach
the SCSI layer while the device is in SDEV_CANCEL state, causing
ENODEV instead of the expected EIO.

Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/all/ag2owaQQoigp_fSV@shinmob/
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Link: https://patch.msgid.link/20260522091530.1901437-1-cuitao@kylinos.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agorust: block: mq: align init_request numa_node arg with C signature
Andreas Hindborg [Wed, 27 May 2026 09:18:09 +0000 (11:18 +0200)] 
rust: block: mq: align init_request numa_node arg with C signature

Commit b040a1a4523d ("block: switch numa_node to int in
blk_mq_hw_ctx and init_request") changed the type of the
`numa_node` argument of `blk_mq_ops::init_request` from
`unsigned int` to `int`. Update the Rust callback signature to
match, so that the function item can be coerced to the C fn
pointer type stored in `blk_mq_ops`.

Without this change the Rust block layer fails to build:

  error[E0308]: mismatched types
     --> rust/kernel/block/mq/operations.rs:274:28
      |
  274 |         init_request: Some(Self::init_request_callback),
      |                       ---- ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |                       expected fn pointer, found fn item
      |
      = note: expected fn pointer
                `unsafe extern "C" fn(_, _, _, i32) -> _`
                    found fn item
                `unsafe extern "C" fn(_, _, _, u32) -> _ {...}`

The argument is unused on the Rust side, so this is a pure
type-signature change with no functional impact.

Fixes: b040a1a4523d ("block: switch numa_node to int in blk_mq_hw_ctx and init_request")
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260527-block-for-next-2026-05-26-2200-failure-v1-1-4865889e282c@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agodrm/amd/display: Write REFCLK to 48MHz on DCN21
Ivan Lipski [Thu, 14 May 2026 15:53:50 +0000 (11:53 -0400)] 
drm/amd/display: Write REFCLK to 48MHz on DCN21

[Why&How]
dccg21_init() calls dccg2_init() which hardcodes 100MHz refclk values
for MICROSECOND_TIME_BASE_DIV and MILLISECOND_TIME_BASE_DIV. DCN21
uses 48MHz refclk, so the wrong values corrupt DCCG timing and cause eDP
link training failure on cold boot.

Write the correct 48MHz values directly instead of calling dccg2_init().

v2:
Fixed typo

Fixes: e6e2b956fc81 ("drm/amd/display: Add missing DCCG register entries for DCN20-DCN316")
Reported-by: Max Chernoff <git@maxchernoff.ca>
Tested-by: Max Chernoff <git@maxchernoff.ca>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agoblock: partitions: replace __get_free_page() with kmalloc()
Mike Rapoport (Microsoft) [Wed, 27 May 2026 14:33:28 +0000 (17:33 +0300)] 
block: partitions: replace __get_free_page() with kmalloc()

check_partition() allocates a buffer to use as backing memory for
seq_buf.

This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.

kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.

For a single allocation on the cold path the performance difference between
kmalloc() and __get_free_pages() is not measurable as both allocators take
an object/page from a per-CPU list for fast path allocations.

For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.

Replace use of __get_free_page() with kmalloc() and free_page() with
kfree().

Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Link: https://patch.msgid.link/20260527-block-v2-1-8e06f914c484@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agodrm/amdgpu/userq: Fix the mutex_init cleanup for fence_drv_lock
Sunil Khatri [Tue, 19 May 2026 09:42:42 +0000 (15:12 +0530)] 
drm/amdgpu/userq: Fix the mutex_init cleanup for fence_drv_lock

mutex fence_drv_lock is destroyed in amdgpu_userq_fence_driver_free
also in one of the jump condition mutex_destroy is also called leading
to double mutex_destroy.

So rearranging the code so amdgpu_userq_fence_driver_free takes care
of the clean up along with mutex_destroy.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agoKVM: arm64: Don't populate TPIDR_EL2 in finalise_el2()
Will Deacon [Mon, 18 May 2026 15:31:26 +0000 (16:31 +0100)] 
KVM: arm64: Don't populate TPIDR_EL2 in finalise_el2()

Currently, it is not necessary for __finalise_el2() to configure
TPIDR_EL2:

* The hyp stub code does not consume the value of TPIDR_EL2.

* On the boot cpu, TPIDR_EL1 is used for the percpu offset until the
  ARM64_HAS_VIRT_HOST_EXTN cpucap is detected and boot alternatives
  are patched. Before boot alternatives are patched,
  cpu_copy_el2regs() will copy TPIDR_EL1 into TPIDR_EL2. It is not
  necessary for __finalise_el2() to initialise TPIDR_EL2 before this.

* Secondary CPUs are brought up after boot alternatives have been
  patched, and __secondary_switched() will initialize TPIDR_EL2 in
  'init_cpu_task', after finalise_el2() calls __finalise_el2()

* KVM hyp code which may consume TPIDR_EL2 is brought up after all
  secondaries have been booted, once TPIDR_El2 has been configured on
  all CPUs.

Remove the redundant initialisation from __finalise_el2().

Cc: Oliver Upton <oupton@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://patch.msgid.link/20260518153127.6078-1-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
3 weeks agosamples: rust: rust_driver_auxiliary: showcase lifetime-bound registration data
Danilo Krummrich [Mon, 25 May 2026 20:21:11 +0000 (22:21 +0200)] 
samples: rust: rust_driver_auxiliary: showcase lifetime-bound registration data

Make the Data struct lifetime-parameterized, storing a reference to the
parent pci::Device<Bound>. This demonstrates that registration data can
hold device resources tied to the parent driver's lifetime.

In connect(), retrieve the parent PCI device from the registration data
rather than casting through adev.parent().

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260525202921.124698-25-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
3 weeks agodrm/amd/display: Delete unimplemented dm_pp_apply_power_level_change_request() (v2)
Timur Kristóf [Tue, 19 May 2026 10:21:05 +0000 (12:21 +0200)] 
drm/amd/display: Delete unimplemented dm_pp_apply_power_level_change_request() (v2)

dm_pp_apply_power_level_change_request() was called from old
DCE clock manager implementations on DCE6, 8, 10, 11.2
but has not been implemented ever since the beginning of DC.

Affected GPUs have been working fine without that implementation
for many years. Let's delete it now.

v2:
- Delete dm_pp_apply_power_level_change_request too

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: Fix doorbell object cleanup of queue
Sunil Khatri [Tue, 19 May 2026 09:32:00 +0000 (15:02 +0530)] 
drm/amdgpu/userq: Fix doorbell object cleanup of queue

Unpin and unref the door bell obj if queue creation fails before
initialization is complete.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agorust: auxiliary: generalize Registration over ForLt
Danilo Krummrich [Mon, 25 May 2026 20:21:10 +0000 (22:21 +0200)] 
rust: auxiliary: generalize Registration over ForLt

Generalize Registration<T> to Registration<F: ForLt> and
Device::registration_data<F: ForLt>() to return Pin<&F::Of<'_>>.

The stored 'static lifetime is shortened to the borrow lifetime of &self
via ForLt::cast_ref; ForLt's covariance guarantee makes this sound.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260525202921.124698-24-dakr@kernel.org
[ Use PhantomData<F::Of<'a>> instead of
  PhantomData<(fn(&'a ()) -> &'a (), F)>], which also gets us rid of
  #[allow(clippy::type_complexity)]. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>