git.ipfire.org Git - thirdparty/kernel/stable.git/log

drm/amdgpu: Allow more flags to be set on gem create.

The GEM create flag AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE
specifies that gem memory contains sensitive information and
should be cleared to prevent snooping.

The COHERENT and UNCACHED gem create flags enable memory
features related to sharing memory across devices.

For CRIU we need to re-create KFD BOs through the
GEM_CREATE IOCTL, so allow those KFD specific flags here as well.
This will also aid us in the future and allows to move
the KFD components over using the render node for allocations.

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Merge tag 'drm-intel-gt-next-2025-09-01' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

Driver Changes:

- Apply multiple JSL/EHL/Gen7/Gen6 workaround properly at context level (Sebastian)
- Protect against overflow in active_engine() (Krzysztof)
- Use try_cmpxchg64() in __active_lookup() (Uros)

- Enable GuC CT_DEAD output in regular debug builds (John)
- Static checker and style fixes (Sebastian)
- Selftest improvements (Krzysztof)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://lore.kernel.org/r/aLWZoEZVlBj2d8J9@jlahtine-mobl

Merge tag 'drm-xe-next-2025-08-29' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

UAPI Changes:
- Add madvise interface (Himal Prasad Ghimiray)
- Add DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS to query VMA count and
   memory attributes (Himal Prasad Ghimiray)
- Handle Firmware reported Hardware Errors notifying userspace with
   device wedged uevent (Riana Tauro)

Cross-subsystem Changes:

- Add a vendor-specific recovery method to drm device wedged uevent
   (Riana Tauro)

Driver Changes:
- Use same directory structure in debugfs as in sysfs (Michal Wajdeczko)
- Cleanup and future-proof VRAM region initialization (Piotr Piórkowski)
- Add G-states and PCIe link states to debugfs (Soham Purkait)
- Cleanup eustall debug messages (Harish Chegondi)
- Add SR-IOV support to restore Compression Control Surface (CCS) to
   Xe2 and later (Satyanarayana K V P)
- Enable SR-IOV PF mode by default on supported platforms without
   needing CONFIG_DRM_XE_DEBUG and mark some platforms behind
   force_probe as supported (Michal Wajdeczko)
- More targeted log messages (Michal Wajdeczko)
- Cleanup STEER_SEMAPHORE/MCFG_MCR_SELECTOR usage (Nitin Gote)
- Use common code to emit flush (Tvrtko Ursulin)
- Add/extend more HW workarounds and tunings for Xe2 and Xe3
   (Sk Anirban, Tangudu Tilak Tirumalesh, Nitin Gote, Chaitanya Kumar Borah)
- Add a generic dependency scheduler to help with TLB invalidations
   and future scenarios (Matthew Brost)
- Use DRM scheduler for delayed GT TLB invalidations (Matthew Brost)
- Error out on incorrect device use in configfs
   (Michal Wajdeczko, Lucas De Marchi)
- Refactor configfs attributes (Michal Wajdeczko / Lucas De Marchi)
- Allow configuring future VF devices via configfs (Michal Wajdeczko)
- Implement some missing XeLP workarounds (Tvrtko Ursulin)
- Generalize WA BB setup/emission and add support for
   mid context restore BB, aka indirect context (Tvrtko Ursulin)
- Prepare the driver to expose mmio regions to userspace
   in future (Ilia Levi)
- Add more GuC load error status codes (John Harrison)
- Document DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING (Priyanka Dandamudi)
- Disable CSC and RPM on VFs (Lukasz Laguna, Satyanarayana K V P)
- Fix oops in xe_gem_fault with PREEMPT_RT (Maarten Lankhorst)
- Skip LMTT update if no LMEM was provisioned (Michal Wajdeczko)
- Add support to VF migration (Tomasz Lis)
- Use a helper for guc_waklv_enable functions (Jonathan Cavitt)
- Prepare GPU SVM for migration of THP (Francois Dugast)
- Program LMTT directory pointer on all GTs within a tile
   (Piotr Piórkowski)
- Rename XE_WA to XE_GT_WA to better convey its scope vs the device WAs
   (Matt Atwood)
- Allow to match devices on PCI devid/vendorid only (Lucas De Marchi)
- Improve PDE PAT index selection (Matthew Brost)
- Consolidate ASID allocation in xe_vm_create() vs
   xe_vm_create_ioctl() (Piotr Piórkowski)
- Resize VF BARS to max possible size according to number of VFs
   (Michał Winiarski)
- Untangle vm_bind_ioctl cleanup order (Christoph Manszewski)
- Start fixing usage of XE_PAGE_SIZE vs PAGE_SIZE to improve
   compatibility with non-x86 arch (Simon Richter)
- Improve tile vs gt initialization order and accounting
   (Gustavo Sousa)
- Extend WA kunit test to PTL
- Ensure data is initialized before transferring to pcode
   (Stuart Summers)
- Add PSMI support for HW validation (Lucas De Marchi,
   Vinay Belgaumkar, Badal Nilawar)
- Improve xe_dma_buf test (Thomas Hellström, Marcin Bernatowicz)
- Fix basename() usage in generator with !glibc (Carlos Llamas)
- Ensure GT is in C0 during resumes (Xin Wang)
- Add TLB invalidation abstraction (Matt Brost, Stuart Summers)
- Make MI_TLB_INVALIDATE conditional on migrate (Matthew Auld)
- Prepare xe_nvm to be initialized early for future use cases
   (Riana Tauro)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/nuejxdhnalyok7tzwkrj67dwjgdafwp4mhdejpyyqnrh4f2epq@nlldovuflnbx

Merge tag 'amd-drm-next-6.18-2025-08-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.18-2025-08-29:

amdgpu:
- Replay fixes
- RAS updates
- VCN SRAM load fixes
- EDID read fixes
- eDP ALPM support
- AUX fixes
- Documenation updates
- Rework how PTE flags are generated
- DCE6 fixes
- VCN devcoredump cleanup
- MMHUB client id fixes
- SR-IOV fixes
- VRR fixes
- VCN 5.0.1 RAS support
- Backlight fixes
- UserQ fixes
- Misc code cleanups
- SMU 13.0.12 updates
- Expanded PCIe DPC support
- Expanded VCN reset support
- SMU 13.0.x Updates
- VPE per queue reset support
- Cusor rotation fix
- DSC fixes
- GC 12 MES TLB invalidation update
- Cursor fixes
- Non-DC TMDS clock validation fix

amdkfd:
- debugfs fixes
- Misc code cleanups
- Page migration fixes
- Partition fixes
- SVM fixes

radeon:
- Misc code cleanups

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250829190848.1921648-1-alexander.deucher@amd.com

drm/amdgpu: Respect max pixel clock for HDMI and DVI-D (v2)

Update the legacy (non-DC) display code to respect the maximum
pixel clock for HDMI and DVI-D. Reject modes that would require
a higher pixel clock than can be supported.

Also update the maximum supported HDMI clock value depending on
the ASIC type.

For reference, see the DC code:
check max_hdmi_pixel_clock in dce*_resource.c

v2:
Fix maximum clocks for DVI-D and DVI/HDMI adapters.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Promote DC to 3.2.348

Summary:

* Refactor bounding box values handling
* Fix incorrect condition to fail dto clk calculation
* Skip check downlink setting for a certain MST branch device
* Fix double cursor issue on dcn314

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: [FW Promotion] Release 0.1.25.0

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Skip Check Runtime Link Setting for Specific Branch Device

[why]
Read link setting inside mode validation is not always the final downlink setting.
It is found true in Synaptics branch device.

At bootup, the preferred mode being set right after 1080p is set. It occurred
before graphic load. That modeset switch in a short period of time makes
the branch device switch back and forth from lower and higher link rate,
observed at Synaptics branch device.
DP2 RTK hub on the other hand, sticks to highest available downlink rate after bootup.

Existing check of runtime downlink setting in mode validation shows asynchronous at
branch device link switch, i.e., downlink switch to higher link rate not yet complete
when the mode validation tries to probe the downlink setting. That makes mode validation
checking downlink setting making wrong decision by pruning modes that should pass the
validation after the downlink setting switch is complete.

[how]
If Synaptics is found at the last branch, skip checking downlink setting
at mode validation.

Reviewed-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Clear the CUR_ENABLE register on DCN314 w/out DPP PG

[Why&How]
ON DCN314, clearing DPP SW structure without power gating it can cause a
double cursor in full screen with non-native scaling.

A W/A that clears CURSOR0_CONTROL cursor_enable flag if
dcn10_plane_atomic_power_down is called and DPP power gating is disabled.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4168
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: incorrect conditions for failing dto calculations

[Why & How]
Previously, when calculating dto phase, we would incorrectly fail when phase
<=0 without additionally checking for the integer value. This meant that
calculations would incorrectly fail when the desired pixel clock was an exact
multiple of the reference clock.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Clay King <clayking@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add Component To Handle Bounding Box Values and IP Caps

[Why]
Bounding box values can be stored in multiple locations. (e.g. PMFW, VBIOS, DMUB).
The source and interpretation of these values can vary with DCN revision
so there should be a component that can gather these values and translate
them accordingly

[How]
Have component start with the statically defined values as a base.
Then update them as needed with DCN-specific logic
Guard this component with FPU flags since values need to be in float point.

Reviewed-by: Jun Lei <jun.lei@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Optimize custom brightness curve interpolation

[Why]
Custom brightness curve works by walking through all data points one
by one. When the brightness value is at either extreme this is a lot
of data points to walk. This is especially noticeable when moving a
brightness slider around how it can lag.

[How]
Bisect the data points to find the closest for interpolation.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Increase minimum clock for TMDS 420 with pipe splitting

[Why]
-Pipe splitting allows for clocks to be reduced, but when using TMDS 420,
reduced clocks lead to missed clocks cycles on clock resyncing

[How]
-Impose a minimum clock when using TMDS 420

Reviewed-by: Chris Park <chris.park@amd.com>
Signed-off-by: Relja Vojvodic <rvojvodi@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: drop hw access in non-DC audio fini

We already disable the audio pins in hw_fini so
there is no need to do it again in sw_fini.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4481
Cc: oushixiong <oushixiong1025@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd: Re-enable common modes for eDP and LVDS

[Why]
Although compositors will add their own modes, Xorg won't use it's own
modes and will only stick to modes advertised by the driver. This mean a
user that used to pick 1024x768 could no longer access it unless the
panel's native resolution was 1024x768.

[How]
Revert commit 6d396e7ac1ce3 ("drm/amd/display: Disable common modes for
LVDS") and commit 7948afb46af92 ("drm/amd/display: Disable common modes
for eDP").

The panel will still use scaling for any non-native modes due to
commit 978fa2f6d0b12 ("drm/amd/display: Use scaling for non-native
resolutions on eDP")

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4538
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250828140856.2887993-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/mes11: make MES_MISC_OP_CHANGE_CONFIG failure non-fatal

If the firmware is too old, just warn and return success.

Fixes: 27b791514789 ("drm/amdgpu/mes: keep enforce isolation up to date")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4414
Cc: shaoyun.Liu@amd.com
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Make use of __free for cleanup

Use __free(kfree) for memory alloc cleanups in SMUv13.0.6

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Check vcn state before profile switch

The patch uses power state of VCN instances for requesting video
profile.

In idle worker of a vcn instance, when there is no outstanding
submisssion or fence, the instance is put to power gated state. When
all instances are powered off that means video profile is no longer
required. A request is made to turn off video profile.

A job submission starts with begin_use of ring, and at that time
vcn instance state is changed to power on. Subsequently a check is
made for active video profile, and if not active, a request is made.

Fixes: 3b669df92c85 ("drm/amdgpu/vcn: adjust workload profile handling")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Avoid vcn v5.0.1 poison irq call trace on sriov guest

Sriov guest side doesn't init ras feature hence the poison irq shouldn't
be put during hw fini

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Avoid jpeg v5.0.1 poison irq call trace on sriov guest

Sriov guest side doesn't init ras feature hence the poison irq shouldn't
be put during hw fini

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/amdgpu: unified amdgpu ip block name

v1:
1. Unified amdgpu ip block name print with format
"{ip_type}_v{major}_{minor}_{rev}"

2. Avoid IP block name conflicts for SMU/PSP ip block

v2:
Update IP block print format to keep legacy IP block name (Alex)
"{ip_type}_v{major}_{minor}_{rev} ({funcs->name})"

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/sdma: bump firmware version checks for user queue support

Using the previous firmware could lead to problems with
PROTECTED_FENCE_SIGNAL commands, specifically causing register
conflicts between MCU_DBG0 and MCU_DBG1.

The updated firmware versions ensure proper alignment
and unification of the SDMA_SUBOP_PROTECTED_FENCE_SIGNAL value with SDMA 7.x,
resolving these hardware coordination issues

Fixes: e8cca30d8b34 ("drm/amdgpu/sdma6: add ucode version checks for userq support")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Notify pmfw bad page threshold exceeded

Notify pmfw when bad page threshold is exceeded, no matter the module
parameter 'bad_page_threshold' is set or not.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vcn: add instance number to VCN version message

For multiple VCN instances case we get multiple lines of the same
message like below:

amdgpu 0000:43:00.0: amdgpu: Found VCN firmware Version ENC: 1.24 DEC: 9 VEP: 0 Revision: 11
amdgpu 0000:43:00.0: amdgpu: Found VCN firmware Version ENC: 1.24 DEC: 9 VEP: 0 Revision: 11

By adding instance number to the log message for multiple VCN instances,
each line will clearly indicate which VCN instance it refers to.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vcn: remove unused code in vcn_v4_0.c

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: use max() to improve code

Use max() to reduce the code and improve readability.

No functional changes.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Document num_rmcm_3dluts in mpc_color_caps

Fix a kernel-doc warning by documenting the num_rmcm_3dluts member of struct mpc_color_caps.

v2: improve comment (Melissa)

Signed-off-by: Kavithesh A.S <kavitheshnitt@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: atomfirmware.h: fix multiple spelling mistakes

This patch corrects several typographical errors in atomfirmware.h.
The fixes improve readability and maintain consistency in the codebase.
No functional changes are introduced.

Corrected terms include:
- aligment    → alignment
- Offest      → Offset
- defintion   → definition
- swithing    → switching
- calcualted  → calculated
- compability → compatibility
- intenal     → internal
- sequece     → sequence
- indiate     → indicate
- stucture    → structure
- regiser     → register

Signed-off-by: Yugansh Mittal <mittalyugansh1@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/xe: Fix incorrect migration of backed-up object to VRAM

If an object is backed up to shmem it is incorrectly identified
as not having valid data by the move code. This means moving
to VRAM skips the -EMULTIHOP step and the bo is cleared. This
causes all sorts of weird behaviour on DGFX if an already evicted
object is targeted by the shrinker.

Fix this by using ttm_tt_is_swapped() to identify backed-up
objects.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5996
Fixes: 00c8efc3180f ("drm/xe: Add a shrinker for xe bos")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.15+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250828134837.5709-1-thomas.hellstrom@linux.intel.com

Merge tag 'drm-misc-next-2025-08-28' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for v6.18:

UAPI Changes:

atomic:
- Reallow no-op async page flips

Cross-subsystem Changes:

hid:
- i2c-hid: Make elan touch controllers power on after panel is enabled

video:
- Improve pixel-format handling for struct screen_info

Core Changes:

display:
- dp: Fix command length

Driver Changes:

amdxdna:
- Fixes

bridge:
- Add support for Radxa Ra620 plus DT bindings

msm:
- Fix VMA allocation

panel:
- ilitek-ili9881c: Refactor mode setting; Add support for Bestar
BSD1218-A101KL68 LCD plus DT bindings
- lvds: Add support for Ampire AMP19201200B5TZQW-T03 to DT bindings

rockchip:
- dsi2: Add support for RK3576 plus DT bindings

stm:
- Clean up logging

vesadrm:
- Support 8-bit palette mode

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250828065714.GA11906@linux.fritz.box

drm/xe/uapi: Fix kernel-doc formatting for madvise and vma_query

Correct kernel-doc formatting issues in the UAPI definitions for
madvise and VMA query interfaces to resolve docutils warnings during
documentation build.

Fixes: 418807860e94 ("drm/xe/uapi: Add UAPI for querying VMA count and memory attributes")
Fixes: 231bb0ee7aa5 ("drm/xe/uapi: Add madvise interface")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250828071516.3838110-1-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/nvm: Use root tile mmio

To allow initialization of nvm during early probe for future usecases,
use root tile instead of root gt to access mmios, as gt is not
yet initialized at early probe.

v2: fix commit message (Lucas)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250825103537.2551837-1-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/tests: Make cross-device dma-buf BOs CPU-visible on small BAR

Small-BAR systems (e.g., SR-IOV VFs in VMs) expose only a subset of
VRAM via PCI/BAR. Exporting a BO outside that window fails, and the
selftests also do CPU fill/verify.

Set XE_BO_FLAG_NEEDS_CPU_ACCESS for cross-device variants to force
CPU-mappable placement and keep tests reliable. Large-BAR/P2P setups
are unaffected.

Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250814145950.430231-1-marcin.bernatowicz@linux.intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/migrate: make MI_TLB_INVALIDATE conditional

When clearing VRAM we should be able to skip invalidating the TLBs if we
are only using the identity map to access VRAM (which is the common
case), since no modifications are made to PTEs on the fly. Also since we
use huge 1G entries within the identity map, there should be a pretty
decent chance that the next packet(s) (if also clears) can avoid a tree
walk if we don't shoot down the TLBs, like if we have to process a long
stream of clears.

For normal moves/copies, we usually always end up with the src or dst
being system memory, meaning we can't only rely on the identity map and
will also need to emit PTEs and so will always require a TLB flush.

v2:
- Update commit to explain the situation for normal copies (Matt B)
- Rebase on latest changes

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250808110452.467513-2-matthew.auld@intel.com

HID: i2c-hid: Fix test in i2c_hid_core_register_panel_follower()

Bitwise AND was intended instead of OR. With the current code the
condition is always true.

Fixes: cbdd16b818ee ("HID: i2c-hid: Make elan touch controllers power on after panel is enabled")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Pin-yen Lin <treapking@chromium.org>
Acked-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/aK8Au3CgZSTvfEJ6@stanley.mountain

drm/xe: Split TLB invalidation code in frontend and backend

The frontend exposes an API to the driver to send invalidations, handles
sequence number assignment, synchronization (fences), and provides a
timeout mechanism. The backend issues the actual invalidation to the
hardware (or firmware).

The new layering easily allows issuing TLB invalidations to different
hardware or firmware interfaces.

Normalize some naming while here too.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-10-stuart.summers@intel.com

drm/xe: Add helpers to send TLB invalidations

Break out the GuC specific code into helpers as part of the process to
decouple frontback TLB invalidation code from the backend.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-9-stuart.summers@intel.com

drm/xe: Prep TLB invalidation fence before sending

It is a bit backwards to add a TLB invalidation fence to the pending
list after issuing the invalidation. Perform this step before issuing
the TLB invalidation in a helper function.

v2: Make sure the seqno_lock mutex covers the send as well (Matt)

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-8-stuart.summers@intel.com

drm/xe: Decouple TLB invalidations from GT

Decouple TLB invalidations from the GT by updating the TLB invalidation
layer to accept a `struct xe_tlb_inval` instead of a `struct xe_gt`.
Also, rename *gt_tlb* to *tlb*. The internals of the TLB invalidation
code still operate on a GT, but this is now hidden from the rest of the
driver.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-7-stuart.summers@intel.com

drm/xe: Add xe_gt_tlb_invalidation_done_handler

Decouple GT TLB seqno handling from G2H handler.

v2:
- Add kernel doc

Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-6-stuart.summers@intel.com

drm/xe: Add xe_tlb_inval structure

Extract TLB invalidation state into a structure to decouple TLB
invalidations from the GT, allowing the structure to be embedded
anywhere in the driver.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-5-stuart.summers@intel.com

drm/xe: s/tlb_invalidation/tlb_inval

tlb_invalidation is a bit verbose leading to ugly wraps in the code,
shorten to tlb_inval.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-4-stuart.summers@intel.com

drm/xe: Cancel pending TLB inval workers on teardown

Add a new _fini() routine on the GT TLB invalidation
side to handle this worker cleanup on driver teardown.

v2: Move the TLB teardown to the gt fini() routine called during
gt_init rather than in gt_alloc. This way the GT structure stays
alive for while we reset the TLB state.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-3-stuart.summers@intel.com

drm/xe: Move explicit CT lock in TLB invalidation sequence

Currently the CT lock is used to cover TLB invalidation
sequence number updates. In an effort to separate the GuC
back end tracking of communication with the firmware from
the front end TLB sequence number tracking, add a new lock
here to specifically track those sequence number updates
coming in from the user.

Apart from the CT lock, we also have a pending lock to
cover both pending fences and sequence numbers received
from the back end. Those cover interrupt cases and so
it makes not to overload those with sequence numbers
coming in from new transactions. In that way, we'll employ
a mutex here.

v2: Actually add the correct lock rather than just dropping
it... (Matt)

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-2-stuart.summers@intel.com

drm/xe/configfs: Block runtime attribute changes

Although it's possible to change the attributes in runtime, they have no
effect after the driver is already bound to the device. Check for that
and return -EBUSY in that case.

This should help users understand what's going on when the behavior is
not changing even if the value from the configfs is "right", but it got
to that state too late.

Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://lore.kernel.org/r/20250826153210.3068808-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/amdgpu/userq: fix error handling of invalid doorbell

If the doorbell is invalid, be sure to set the r to an error
state so the function returns an error.

Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: update firmware version checks for user queue support

The minimum firmware versions required for user queue functionality
have been increased to address an issue where the queue privilege
state was lost during queue connect operations.

The problem occurred because the privilege state was being restored
to its initial value at the beginning of the function, overwriting
the state that was properly set during the queue connect case.

This commit updates the minimum version requirements:
- ME firmware from 2390 to 2420
- PFP firmware from 2530 to 2580
- MEC firmware from 2600 to 2650
- MES firmware remains at 120

These updated firmware versions contain the necessary fixes to
properly maintain queue privilege state throughout connect operations.

Fixes: 61ca97e9590c ("drm/amdgpu: Add fw minimum version check for usermode queue")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/amdgpu: disable hwmon power1_cap* for gfx 11.0.3 on vf mode

the PPSMC_MSG_GetPptLimit msg is not valid for gfx 11.0.3 on vf mode,
so skiped to create power1_cap* hwmon sysfs node.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vpe: cancel delayed work in hw_fini

We need to cancel any outstanding work at both suspend
and driver teardown. Move the cancel to hw_fini which
gets called in both cases.

Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vcn: remove unused code in vcn_v1_0.c

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/amdgpu : Use the MES INV_TLBS API for tlb invalidation on gfx12

From MES version 0x81, it provide the new API INV_TLBS that support
invalidate tlbs with PASID.

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/include : Update MES v12 API header(INV_TLBS)

The requirement from driver side is to have an API that can do the
tlb invalidation on dedicate pasid since driver don't know the vmid
and process mapping.
Make the API generic to support different tlb invalidation related
request. Driver can specify pasid, vmid, hub_id and vm address range
need to be invalidated.
With this API the old INV_GART in MISC Op can be deprecated.

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix shift-out-of-bounds in amdgpu_debugfs_jpeg_sched_mask_set

Fix a UBSAN shift-out-of-bounds warning in amdgpu_debugfs_jpeg_sched_mask_set
when the shift exponent reaches or exceeds 32 bits. The issue occurred because
a 32-bit integer '1' was being shifted by up to 32 bits, which is undefined
behavior.

Replace '1' with '1ULL' to ensure 64-bit arithmetic, matching the u64 type of
'val' and preventing the shift overflow. This is consistent with the existing
mask calculation that already uses 1ULL.

The error manifested as:
UBSAN: shift-out-of-bounds in drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c:373:17
shift exponent 32 is too large for 32-bit type 'int'
v2: remove debug log

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: DC v3.2.347

DC Release v3.2.347

* Firmware releases for multiple asics
* CodeQL fixes
* Fix for double cursor with 180 degree rotation on large resolutions
* Misc bug fixes for DSC, PSR/Replay, DPIA etc.

Signed-off-by: Nicholas Carbones <ncarbone@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: [FW Promotion] Release 0.1.24.0

Add two new IPS residency data modes.

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Reapply "drm/amdgpu: fix incorrect vm flags to map bo"

It should use vm flags instead of pte flags
to specify bo vm attributes.

This reverts commit 1263ceea2a1327014d9de2858a122f3c27dfa4dd.

Reapply this patch with the proper fixes tag.

Fixes: 6716a823d18d ("drm/amdgpu: rework how PTE flags are generated v3")
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Revert "drm/amdgpu: fix incorrect vm flags to map bo"

This reverts commit b08425fa77ad2f305fe57a33dceb456be03b653f.

Revert this to align with 6.17 because the fixes tag
was wrong on this commit.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Consider sink max slice width limitation for dsc

[WHY&HOW]
The sink max slice width limitation should be considered for DSC, but
was removed in "refactor DSC cap calculations".
This patch adds it back and takes the valid minimum between the sink and
source.

Signed-off-by: Dillon Varone <Dillon.Varone@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Array offset used before range check

Consolidating multiple CodeQL Fixes for alerts with rule id: cpp/offset-use-before-range-check

Reviewed-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Clay King <clayking@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: wait for otg update pending latch before clock optimization

[WHY & HOW]
OTG pending update unlatched will cause system fail, wait OTG fully disabled to
avoid this error.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Incorrect 'not' operator usage

Consolidating multiple CodeQL Fixes for alerts with rule id: cpp/incorrect-not-operator-usage

Reviewed-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Clay King <clayking@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Multiplication result converted to larger type

Consolidating multiple CodeQL Fixes for alerts with rule id: cpp/integer-multiplication-cast-to-long

Reviewed-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Clay King <clayking@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Support HW cursor 180 rot for any number of pipe splits

[Why]
For the HW cursor, its current position in the pipe_ctx->stream struct is
not affected by the 180 rotation, i. e. the top left corner is still at
0,0. However, the DPP & HUBP set_cursor_position functions require rotated
position.

The current approach is hard-coded for ODM 2:1, thus it's failing for
ODM 4:1, resulting in a double cursor.

[How]
Instead of calculating the new cursor position relatively to the
viewports, we calculate it using a viewavable clip_rect of each plane.

The clip_rects are first offset and scaled to the same space as the
src_rect, i. e. Stream space -> Plane space.

In case of a pipe split, which divides the plane into 2 or more viewports,
the clip_rect is the union of all the viewports of the given plane.

With the assumption that the viewports in HUBP's set_cursor_position are
in the Plane space as well, it should produce a correct cursor position
for any number of pipe splits.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Decrease stack size in logging path

[why]
Reducing stack size can avoid stack over flow

[how]
Make local variables const and static so they are not
on the stack.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Reza Amini <reza.amini@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: track dpia support

[why&how]
initialize a flag to track if we previously
supported dpia and write that to boot options

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Ausef Yousof <Ausef.Yousof@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Reserve instance index notified by DMUB

[Why]
Reserve instance index notified by DMUB.

[How]
Add new variable for instance index.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Cruise Hung <Cruise.Hung@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add control flags to force PSR / replay

To change PSR/Replay behavior based on OS preferences, add some
config options.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Karthi Kandasamy <karthi.kandasamy@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vpe: add ring reset support

Implement ring reset for VPE. Similar to VCN and JPEG,
just powergate the the IP to reset it.

v2: Properly set per queue reset flag

Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vcn: drop extra cancel_delayed_work_sync()

We already call this in the hw_fini() methods for all
VCN instances, so no need to call it again in
amdgpu_vcn_suspend().

Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Tie UNMAP_LATENCY to queue_preemption

When KFD asks CP to preempt queues, other than preempt CP queues, CP
also requests SDMA to preempt SDMA queues with UNMAP_LATENCY timeout.
Currently queue_preemption_timeout_ms is 9000 ms by default but can be
configured via module parameter. KFD_UNMAP_LATENCY_MS is hard coded as
4000 ms though. This patch ties KFD_UNMAP_LATENCY_MS to
queue_preemption_timeout_ms so in a slow system such as emulator, both
CP and SDMA slowness are taken into account.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Update SMU v13.0.6 PPT caps initialization

Update the conditions for setting the SMU vcn reset caps in the SMU v13.0.6 PPT
initialization function. Specifically:

- Add support for VCN reset capability for firmware versions 0x00558200 and
above when the program version is 0.
- Add support for VCN reset capability for firmware versions 0x05551800 and
above when the program version is 5.

v2: correct the smu mp1 version for program 5 (Lijo)

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: fix vram allocation failure for a special case

When it only allocates vram without va, which is 0, and a
SVM range allocated stays in this range, the vram allocation
returns failure. It should be skipped for this case from
SVM usage check.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Allow device error to be logged

The addition of a WARN_ON() check in order to return early in the
kq_initialize function retroactively causes the default case in the
following switch statement to never be executed, preventing dev_err
from logging device errors in the kernel. Both logs are now checked
in the default case.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

docs: gpu: amdgpu: Fix spelling in amdgpu documentation

Fixed following typos reported by Codespell

1. propogated ==> propagated
aperatures ==> apertures
In Documentation/gpu/amdgpu/debugfs.rst

2. parition ==> partition
In Documentation/gpu/amdgpu/process-isolation.rst

3. conections ==> connections
In Documentation/gpu/amdgpu/display/programming-model-dcn.rst

In addition to above,
Fixed wrong bit-partition naming in gpu/amdgpu/process-isolation.rst
from "fourth" partition to "third" partition.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Rakuram Eswaran <rakuram.e96@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: clean-up dead code in dml2_mall_phantom

pipe_idx in funtion dml2_svp_validate_static_schedulabilit, although set
is never actually used. While building with GCC 16 this gives a warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml2_mall_phantom.c: In function ‘set_phantom_stream_timing’:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml2_mall_phantom.c:657:25: warning: variable ‘pipe_idx’ set but not used [-Wunused-but-set-variable=]
657 | unsigned int i, pipe_idx;
| ^~~~~~~~

Signed-off-by: Brahmajit Das <listout@listout.xyz>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add support for dpc to the product

Add support for dpc to the product

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: remove redundant AMDGPU_HAS_VRAM

AMDGPU_HAS_VRAM is redundant with is_app_apu, as both refer to
APUs with no carve-out. Since AMDGPU_HAS_VRAM only occurs once,
remove AMDGPU_HAS_VRAM definition. The tmr allocation can be covered
with AMDGPU_GEM_DOMAIN_GTT | AMDGPU_GEM_DOMAIN_VRAM in both vram and
non vram ASICs.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add support for dpc to a series of products

Add support for dpc to a series of products

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Documentation/gpu/amdgpu: Fix duplicate word in driver-core.rst

Remove duplicate word 'and' in driver-core.rst.

Signed-off-by: Kathara Sasikumar <katharasasikumar007@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Correct the loss of aca bank reg info

By polling, poll ACA bank count to ensure that valid
ACA bank reg info can be obtained

v2: add corresponding delay before send msg to SMU to query mca bank info
(Stanley)

v3: the loop cannot exit. (Thomas)

v4: remove amdgpu_aca_clear_bank_count. (Kevin)

v5: continuously inject ce. If a creation interruption
occurs at this time, bank reg info will be lost. (Thomas)
v5: each cycle is delayed by 100ms. (Tao)

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add a mutex lock to protect poison injection

When poison is triggered multiple times, competition will occur.
Add a mutex lock to protect poison injection

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Correct the counts of nr_banks and nr_errors

Correct the counts of nr_banks and nr_errors

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Remove redundant header files

The header file "dc_stream.h" is already included on line 1507. Remove the
redundant include.

This is because the header file was initially included towards the latter
part of the code. Subsequent commits had to include the header file again
earlier in the code. In my opinion, this doesn't count as a fix; it just
requires removing the redundant header inclusion.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/fence: Remove redundant 0 value initialization

The amdgpu_fence struct is already zeroed by kzalloc(). It's redundant to
initialize am_fence->context to 0.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Allocate psp fw private buffer in vram

It's not necessarily to allocate psp firmware private
buffer in different memory domain in sriov and bare
metal environment

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx12: set MQD as appriopriate for queue types

Set the MQD as appropriate for the kernel vs user queues.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx11: set MQD as appriopriate for queue types

Set the MQD as appropriate for the kernel vs user queues.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/xe: Ensure GT is in C0 during resumes

This patch ensures the gt will be awake for the entire duration
of the resume sequences until GuCRC takes over and GT-C6 gets
re-enabled.

Before suspending GT-C6 is kept enabled, but upon resume, GuCRC
is not yet alive to properly control the exits and some cases of
instability and corruption related to GT-C6 can be observed.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037
Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Xin Wang <x.wang@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037
Link: https://lore.kernel.org/r/20250827000633.1369890-3-x.wang@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: make xe_gt_idle_disable_c6() handle the forcewake internally

Move forcewake_get() into xe_gt_idle_enable_c6() to streamline the
code and make it easier to use.

Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Xin Wang <x.wang@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250827000633.1369890-2-x.wang@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/stm: ltdc: unify log system

DRM_ERROR and similar are deprecated. Use drm_dev based logging.

Link: https://lore.kernel.org/r/20250821130356.883553-1-raphael.gallais-pou@foss.st.com
Acked-by: Yannick Fertre <yannick.fertre@foss.st.com>
Link: https://lore.kernel.org/r/20250825132951.547899-1-raphael.gallais-pou@foss.st.com
Signed-off-by: Raphael Gallais-Pou <raphael.gallais-pou@foss.st.com>

dt-bindings: panel: lvds: Append ampire,amp19201200b5tzqw-t03 in panel-lvds

List Ampire AMP19201200B5TZQW-T03 in the LVDS panel enumeration.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20250826-drm-misc-next-v1-1-980d0a0592b9@foss.st.com
Signed-off-by: Raphael Gallais-Pou <raphael.gallais-pou@foss.st.com>

drm/sysfb: Do not deref unexisting CRTC state in atomic_disable

Do not access CRTC state in drm_sysfb_plane_helper_atomic_disable().
Use format from sysfb device for clearing scanout buffer. This is
the behavior from before commit 061963cd9e5b ("drm/sysfb: Blit to
CRTC destination format").

When being disabled, the plane has no associated CRTC. Trying to deref
the format pointer results in a segmentation fault. An example stack
track is shown below.

[   58.948915] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000023: 0000 [#1] SMP KASAN PTI
[   58.959971] KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f]
[...]
[   58.979308] RIP: 0010:drm_sysfb_plane_helper_atomic_disable+0x1af/0x520
[...]
[   59.084227] Call Trace:
[   59.086682]  <TASK>
[   59.088793]  ? __pfx_drm_sysfb_plane_helper_atomic_disable+0x10/0x10
[   59.095155]  ? crtc_disable+0xf2/0x5a0
[   59.098920]  drm_atomic_helper_commit_planes+0x848/0x1030
[   59.104336]  drm_atomic_helper_commit_tail+0x41/0xb0
[   59.109316]  commit_tail+0x204/0x330
[   59.112903]  drm_atomic_helper_commit+0x242/0x2e0
[   59.117618]  ? __pfx_drm_atomic_helper_commit+0x10/0x10
[   59.122851]  drm_atomic_commit+0x1e1/0x290
[   59.126957]  ? drm_atomic_add_affected_connectors+0x266/0x330
[   59.132714]  ? __pfx_drm_atomic_commit+0x10/0x10
[   59.137343]  ? __pfx___drm_printfn_info+0x10/0x10
[   59.142058]  ? drm_atomic_set_crtc_for_connector+0x436/0x630
[   59.147729]  atomic_remove_fb+0x631/0x920
[   59.151751]  ? save_trace+0xcf/0x180
[   59.155343]  ? __pfx_atomic_remove_fb+0x10/0x10
[   59.159890]  ? __pfx___drm_dev_dbg+0x10/0x10
[   59.164173]  drm_framebuffer_remove+0x19a/0x710

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 061963cd9e5b ("drm/sysfb: Blit to CRTC destination format")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14874
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250826145044.954396-1-tzimmermann@suse.de

drm/xe/wcl: Extend L3bank mask workaround

The commit 9ab440a9d042 ("drm/xe/ptl: L3bank mask is not
available on the media GT") added a workaround to ignore
the fuse register that L3 bank availability as it did not
contain valid values. Same is true for WCL therefore extend
the workaround to cover it.

Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Link: https://lore.kernel.org/r/20250822002512.1129144-1-chaitanya.kumar.borah@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>

accel/amdxdna: Fix incorrect type used for a local variable

drivers/accel/amdxdna/aie2_pci.c:794:13: sparse: sparse: incorrect type in assignment (different address spaces)

Fixes: c8cea4371e5e ("accel/amdxdna: Add a function to walk hardware contexts")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508230855.0b9efFl6-lkp@intel.com/
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://lore.kernel.org/r/20250826171951.801585-1-lizhi.hou@amd.com

drm/xe/xe_hw_error: Add fault injection to trigger csc error handler

Add a debugfs fault handler to trigger csc error handler that
wedges the device and enables runtime survivability mode.

v2: add debugfs only for bmg (Umesh)
v3: do not use csc_fault attribute if debugfs is not enabled
v4: rebase

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-11-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/xe_hw_error: Handle CSC Firmware reported Hardware errors

Add support to handle CSC firmware reported errors. When CSC firmware
errors are encoutered, a error interrupt is received by the GFX device as
a MSI interrupt.

Device Source control registers indicates the source of the error as CSC
The HEC error status register indicates that the error is firmware reported
Depending on the type of error, the error cause is written to the HEC
Firmware error register.

On encountering such CSC firmware errors, the graphics device is
non-recoverable from driver context. The only way to recover from these
errors is firmware flash.

System admin/userspace is notified of the necessity of firmware flash
with a combination of vendor-specific drm device edged uevent, dmesg logs
and runtime survivability sysfs. It is the responsiblity of the consumer
to verify all the actions and then trigger a firmware flash using tools
like fwupd.

$ udevadm monitor --property --kernel
monitor will print the received events for:
KERNEL - the kernel uevent

KERNEL[754.709341] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 (drm)
ACTION=change
DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0
SUBSYSTEM=drm
WEDGED=vendor-specific
DEVNAME=/dev/dri/card0
DEVTYPE=drm_minor
SEQNUM=5973
MAJOR=226
MINOR=0

Logs

xe 0000:03:00.0: [drm] *ERROR* [Hardware Error]: Tile0 reported NONFATAL error 0x20000
xe 0000:03:00.0: [drm] *ERROR* [Hardware Error]: NONFATAL: HEC Uncorrected FW FD Corruption error reported, bit[2] is set
xe 0000:03:00.0: Runtime Survivability mode enabled
xe 0000:03:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:03:00.0 as wedged.
               IOCTLs and executions are blocked. Only a rebind may clear the failure
               Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new
xe 0000:03:00.0: [drm] device wedged, needs recovery
xe 0000:03:00.0: Firmware flash required, Please refer to the userspace documentation for more details!

Runtime survivability Sysfs:

/sys/bus/pci/devices/<device>/survivability_mode

v2: use vendor recovery method with
    runtime survivability (Christian, Rodrigo, Raag)
v3: move declare wedged to runtime survivability mode (Rodrigo)
v4: update commit message

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-10-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Add support to handle hardware errors

Gfx device reports two classes of errors: uncorrectable and
correctable. Depending on the severity uncorrectable errors are further
classified Non-Fatal and Fatal.

Correctable and Non-Fatal errors: These errors are reported as MSI. Bits in
the Master Interrupt Register indicate the class of the error.
The source of the error is then read from the Device Error Source
Register.

Fatal errors: These are reported as PCIe errors
When a PCIe error is asserted, the OS will perform a SBR (Secondary
Bus reset) which causes the driver to reload. The error registers are
sticky and the values are maintained through SBR.

Add basic support to handle these errors.

Bspec: 50875, 53073, 53074, 53075, 53076

v2: Format commit message (Umesh)
v3: fix documentation (Stuart)

Cc: Stuart Summers <stuart.summers@intel.com>
Co-developed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-9-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/doc: Document device wedged and runtime survivability

Add documentation for vendor specific device wedged recovery method
and runtime survivability.

v2: fix documentation (Raag)
v3: add userspace tool for firmware update (Raag)
v4: use consistent documentation (Raag)
v5: add more documentation

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-8-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/xe_survivability: Add support for Runtime survivability mode

Certain runtime firmware errors can cause the device to be in a unusable
state requiring a firmware flash to restore normal operation.
Runtime Survivability Mode indicates firmware flash is necessary by
wedging the device and exposing survivability mode sysfs.

The below sysfs is an indication that device is in survivability mode

/sys/bus/pci/devices/<device>/survivability_mode

v2: Fix kernel-doc (Umesh)
v3: Add user friendly dmesg (Frank)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-7-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/xe_survivability: Refactor survivability mode

Refactor survivability mode code to support both boot
and runtime survivability.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-6-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>