Brian Nguyen [Thu, 5 Mar 2026 17:15:49 +0000 (17:15 +0000)]
drm/xe: Move page reclaim done_handler to own func
Originally, page reclamation is handled by the same fence as tlb
invalidation and uses its seqno, so there was no reason to separate out
the handlers. However in hindsight, for readability, and possible
future changes, it seems more beneficial to move this all out to its own
function.
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260305171546.67691-7-brian3.nguyen@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Brian Nguyen [Thu, 5 Mar 2026 17:15:48 +0000 (17:15 +0000)]
drm/xe: Skip over non leaf pte for PRL generation
The check using xe_child->base.children was insufficient in determining
if a pte was a leaf node. So explicitly skip over every non-leaf pt and
conditionally abort if there is a scenario where a non-leaf pt is
interleaved between leaf pt, which results in the page walker skipping
over some leaf pt.
Note that the behavior being targeted for abort is
PD[0] = 2M PTE
PD[1] = PT -> 512 4K PTEs
PD[2] = 2M PTE
results in abort, page walker won't descend PD[1].
With new abort, ensuring valid PRL before handling a second abort.
v2:
- Revert to previous assert.
- Revised non-leaf handling for interleaf child pt and leaf pte.
- Update comments to specifications. (Stuart)
- Remove unnecessary XE_PTE_PS64. (Matthew B)
v3:
- Modify secondary abort to only check non-leaf PTEs. (Matthew B)
Fixes: b912138df299 ("drm/xe: Create page reclaim list on unbind") Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260305171546.67691-6-brian3.nguyen@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Matt Roper [Wed, 11 Mar 2026 23:04:58 +0000 (16:04 -0700)]
drm/xe: Include running dword offset in default_lrc dumps
Printing a running dword offset in the default_lrc_* debugfs entries
makes it easier for developers to find the right offsets to use in
regs/xe_lrc_layout.h and/or compare the default LRC contents against the
bspec-documented LRC layout.
Francois Dugast [Thu, 12 Mar 2026 19:20:14 +0000 (20:20 +0100)]
drm/pagemap: Enable THP support for GPU memory migration
This enables support for Transparent Huge Pages (THP) for device pages by
using MIGRATE_VMA_SELECT_COMPOUND during migration. It removes the need to
split folios and loop multiple times over all pages to perform required
operations at page level. Instead, we rely on newly introduced support for
higher orders in drm_pagemap and folio-level API.
In Xe, this drastically improves performance when using SVM. The GT stats
below collected after a 2MB page fault show overall servicing is more than
7 times faster, and thanks to reduced CPU overhead the time spent on the
actual copy goes from 23% without THP to 80% with THP:
v2:
- Fix one occurrence of drm_pagemap_get_devmem_page() (Matthew Brost)
v3:
- Remove migrate_device_split_page() and folio_split_lock, instead rely on
free_zone_device_folio() to split folios before freeing (Matthew Brost)
- Assert folio order is HPAGE_PMD_ORDER (Matthew Brost)
- Always use folio_set_zone_device_data() in split (Matthew Brost)
Matthew Brost [Thu, 12 Mar 2026 19:20:13 +0000 (20:20 +0100)]
drm/pagemap: Correct cpages calculation for migrate_vma_setup
cpages returned from migrate_vma_setup represents the total number of
individual pages found, not the number of 4K pages. The math in
drm_pagemap_migrate_to_devmem for npages is based on the number of 4K
pages, so cpages != npages can fail even if the entire memory range is
found in migrate_vma_setup (e.g., when a single 2M page is found).
Add drm_pagemap_cpages, which converts cpages to the number of 4K pages
found.
Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: linux-mm@kvack.org Reviewed-by: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260312192126.2024853-4-francois.dugast@intel.com
Francois Dugast [Thu, 12 Mar 2026 19:20:11 +0000 (20:20 +0100)]
drm/pagemap: Unlock and put folios when possible
If the page is part of a folio, unlock and put the whole folio at once
instead of individual pages one after the other. This will reduce the
amount of operations once device THP are in use.
Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: linux-mm@kvack.org Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260312192126.2024853-2-francois.dugast@intel.com
Matthew Brost [Tue, 10 Mar 2026 22:50:39 +0000 (18:50 -0400)]
drm/xe: Open-code GGTT MMIO access protection
GGTT MMIO access is currently protected by hotplug (drm_dev_enter),
which works correctly when the driver loads successfully and is later
unbound or unloaded. However, if driver load fails, this protection is
insufficient because drm_dev_unplug() is never called.
Additionally, devm release functions cannot guarantee that all BOs with
GGTT mappings are destroyed before the GGTT MMIO region is removed, as
some BOs may be freed asynchronously by worker threads.
To address this, introduce an open-coded flag, protected by the GGTT
lock, that guards GGTT MMIO access. The flag is cleared during the
dev_fini_ggtt devm release function to ensure MMIO access is disabled
once teardown begins.
Zhanjun Dong [Tue, 10 Mar 2026 22:50:37 +0000 (18:50 -0400)]
drm/xe/guc: Ensure CT state transitions via STOP before DISABLED
The GuC CT state transition requires moving to the STOP state before
entering the DISABLED state. Update the driver teardown sequence to make
the proper state machine transitions.
Fixes: ee4b32220a6b ("drm/xe/guc: Add devm release action to safely tear down CT") Cc: stable@vger.kernel.org Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-6-zhanjun.dong@intel.com
Matthew Brost [Tue, 10 Mar 2026 22:50:35 +0000 (18:50 -0400)]
drm/xe: Trigger queue cleanup if not in wedged mode 2
The intent of wedging a device is to allow queues to continue running
only in wedged mode 2. In other modes, queues should initiate cleanup
and signal all remaining fences. Fix xe_guc_submit_wedge to correctly
clean up queues when wedge mode != 2.
Matthew Brost [Tue, 10 Mar 2026 22:50:34 +0000 (18:50 -0400)]
drm/xe: Forcefully tear down exec queues in GuC submit fini
In GuC submit fini, forcefully tear down any exec queues by disabling
CTs, stopping the scheduler (which cleans up lost G2H), killing all
remaining queues, and resuming scheduling to allow any remaining cleanup
actions to complete and signal any remaining fences.
Split guc_submit_fini into device related and software only part. Using
device-managed and drm-managed action guarantees the correct ordering of
cleanup.
Matthew Brost [Tue, 10 Mar 2026 22:50:33 +0000 (18:50 -0400)]
drm/xe: Always kill exec queues in xe_guc_submit_pause_abort
xe_guc_submit_pause_abort is intended to be called after something
disastrous occurs (e.g., VF migration fails, device wedging, or driver
unload) and should immediately trigger the teardown of remaining
submission state. With that, kill any remaining queues in this function.
Fixes: 7c4b7e34c83b ("drm/xe/vf: Abort VF post migration recovery on failure") Cc: stable@vger.kernel.org Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-2-zhanjun.dong@intel.com
By using the same variable for both the return of poll_timeout_us and
the return of the polled function guc_wait_ucode, the return value of
the latter is overwritten and lost after exiting the polling loop. Since
guc_wait_ucode returns -1 on GuC load failure, we lose that information
and always continue as if the GuC had been loaded correctly.
This is fixed by simply using 2 separate variables.
Fixes: a4916b4da448 ("drm/xe/guc: Refactor GuC load to use poll_timeout_us()") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20260303001732.2540493-2-daniele.ceraolospurio@intel.com
Matt Roper [Thu, 12 Mar 2026 16:29:23 +0000 (09:29 -0700)]
drm/xe/wa: Drop redundant entries for Wa_16021867713 & Wa_14019449301
The Xe2_HPM-specific RTP table entries for Wa_16021867713 and
Wa_14019449301 were removed by commit 941f538b0af8 ("drm/xe: Consolidate
workaround entries for Wa_16021867713") and commit aa0f0a678370
("drm/xe: Consolidate workaround entries for Wa_14019449301") in favor
of alternate entries earlier in the table that cover a wider range of IP
versions. However these Xe2_HPM-specific entries were accidentally
resurrected during a backmerge, which causes the Xe driver to complain
on probe about two entries trying to program the same registers+bits:
Mika Kuoppala [Wed, 4 Mar 2026 21:17:28 +0000 (23:17 +0200)]
drm/xe: Fix overflow in guc_ct_snapshot_capture
snapshot->ctb is u32*, so pointer arithmetic on it scales
the byte offset from xe_bo_size() by 4, overshooting the
intended start of the g2h portion and writing past the
allocated buffer.
Fix this by using void * to get the arithmetic right and
prevent future mishaps.
v2: s/u8/void for memcpy and iosys_map consistency (Matt)
Fixes: af3de6cf06f9 ("drm/xe: Split H2G and G2H into separate buffer objects") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-xe@lists.freedesktop.org Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260304211728.249104-1-mika.kuoppala@linux.intel.com
Nitin Gote [Wed, 4 Mar 2026 12:38:02 +0000 (18:08 +0530)]
drm/xe: implement VM_BIND decompression in vm_bind_ioctl
Implement handling of VM_BIND(..., DECOMPRESS) in xe_vm_bind_ioctl.
Key changes:
- Parse and record per-op intent (op->map.request_decompress) when the
DECOMPRESS flag is present.
- Use xe_pat_index_get_comp_en() helper to check if a PAT index
has compression enabled via the XE2_COMP_EN bit.
- Validate DECOMPRESS preconditions in the ioctl path:
- Only valid for MAP ops.
- The provided pat_index must select the device's "no-compression" PAT.
- Only meaningful on devices with flat CCS and the required XE2+
otherwise return -EOPNOTSUPP.
- Use XE_IOCTL_DBG for uAPI sanity checks.
- Implement xe_bo_decompress():
For VRAM BOs run xe_bo_move_notify(), reserve one fence slot,
schedule xe_migrate_resolve(), and attach the returned fence
with DMA_RESV_USAGE_KERNEL. Non-VRAM cases are silent no-ops.
- Wire scheduling into vma_lock_and_validate() so VM_BIND will schedule
decompression when request_decompress is set.
- Handle fault-mode VMs by performing decompression synchronously during
the bind process, ensuring that the resolve is completed before the bind
finishes.
This schedules an in-place GPU resolve (xe_migrate_resolve) for
decompression.
v7: Rebase on latest drm-tip and add compute and igt pr info
v6: (Matt Auld)
- Rebase as xe_pat_index_get_comp_en() is added in separate
patch
- Drop vm param from xe_bo_decompress(), instead of it
extract tile from bo
- Reject decompression on igpu instead of silent skipping
to avoid any failure on Xe2+igpu as xe_device_has_flat_ccs()
can sometimes be false on igpu due some setting in the BIOS
to turn off compression on igpu.
- Nits
v5: (Matt)
- Correct the condition check of xe_pat_index_get_comp_en
v4: (Matt)
- Introduce xe_pat_index_get_comp_en(), which checks
XE2_COMP_EN for the pat_index
- .interruptible should be true, everything else false
v3: (Matt)
- s/xe_bo_schedule_decompress/xe_bo_decompress
- skip the decrompress step if the BO isn't in VRAM
- start/size not required in xe_bo_schedule_decompress
- Use xe_bo_move_notify instead of xe_vm_invalidate_vma
with respect to invalidation.
- Nits
v2:
- Move decompression work out of vm_bind ioctl. (Matt)
- Put that work in a small helper at the BO/migrate layer invoke it
from vma_lock_and_validate which already runs under drm_exec.
- Move lightweight checks to vm_bind_ioctl_check_args (Matthew Auld)
Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260304123758.3050386-8-nitin.r.gote@intel.com
Nitin Gote [Wed, 4 Mar 2026 12:38:01 +0000 (18:08 +0530)]
drm/xe: add xe_migrate_resolve wrapper and is_vram_resolve support
Introduce an internal __xe_migrate_copy(..., is_vram_resolve) path and
expose a small wrapper xe_migrate_resolve() that calls it with
is_vram_resolve=true.
For resolve/decompression operations we must ensure the copy code uses
the compression PAT index when appropriate; this change centralizes that
behavior and allows callers to schedule a resolve (decompress) operation
via the migrate API.
v3: Fix kernel-doc warnings
v2: (Matt)
- Simplify xe_migrate_resolve(), use single BO/resource;
remove copy_only_ccs argument as it's always false.
Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260304123758.3050386-7-nitin.r.gote@intel.com
Nitin Gote [Wed, 4 Mar 2026 12:38:00 +0000 (18:08 +0530)]
drm/xe: add VM_BIND DECOMPRESS uapi flag
Add a new VM_BIND flag, DRM_XE_VM_BIND_FLAG_DECOMPRESS, that lets userspace
express intent for the driver to perform on-device in-place decompression
for the GPU mapping created by a MAP bind operation.
This flag is used by subsequent driver changes to trigger scheduling of
GPU work that resolves compressed VRAM pages into an uncompressed PAT
VM mapping.
Behavior and semantics:
- Valid only for DRM_XE_VM_BIND_OP_MAP. IOCTLs using this flag on other ops
are rejected (-EINVAL).
- The bind's pat_index must select the device "no-compression" PAT entry;
otherwise the ioctl is rejected (-EINVAL).
- Only meaningful for VRAM-backed BOs on devices that support Flat CCS and
the required hardware generation (driver will return -EOPNOTSUPP if not).
- On success the driver schedules a migrate/resolve and installs the
returned dma_fence into the BO's kernel reservation
(DMA_RESV_USAGE_KERNEL).
v3: Rebase on latest drm-tip and add compute pr info
v2: Add kernel doc (Matt)
Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mrozek, Michal <michal.mrozek@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260304123758.3050386-6-nitin.r.gote@intel.com
Gustavo Sousa [Tue, 3 Mar 2026 20:46:17 +0000 (17:46 -0300)]
drm/xe/pat: Extract gt_pta_entry()
Avoid code duplication by extracting the logic for selection of the
correct PAT_PTA entry for a GT into function gt_pta_entry() and using
that function whenever necessary.
drm/xe/userptr: Defer Waiting for TLB invalidation to the second pass if possible
Now that the two-pass notifier flow uses xe_vma_userptr_do_inval() for
the fence-wait + TLB-invalidate work, extend it to support a further
deferred TLB wait:
- xe_vma_userptr_do_inval(): when the embedded finish handle is free,
submit the TLB invalidation asynchronously (xe_vm_invalidate_vma_submit)
and return &userptr->finish so the mmu_notifier core schedules a third
pass. When the handle is occupied by a concurrent invalidation, fall
back to the synchronous xe_vm_invalidate_vma() path.
- xe_vma_userptr_complete_tlb_inval(): new helper called from
invalidate_finish when tlb_inval_submitted is set. Waits for the
previously submitted batch and unmaps the gpusvm pages.
xe_vma_userptr_invalidate_finish() dispatches between the two helpers
via tlb_inval_submitted, making the three possible flows explicit:
In multi-GPU scenarios this allows TLB flushes to be submitted on all
GPUs in one pass before any of them are waited on.
Also adds xe_vm_invalidate_vma_submit() which submits the TLB range
invalidation without blocking, populating a xe_tlb_inval_batch that
the caller waits on separately.
v3:
- Add locking asserts and notifier state asserts (Matt Brost)
- Update the locking documentation of the notifier
state members (Matt Brost)
- Remove unrelated code formatting changes (Matt Brost)
drm/xe: Split TLB invalidation into submit and wait steps
xe_vm_range_tilemask_tlb_inval() submits TLB invalidation requests to
all GTs in a tile mask and then immediately waits for them to complete
before returning. This is fine for the existing callers, but a
subsequent patch will need to defer the wait in order to overlap TLB
invalidations across multiple VMAs.
Introduce xe_tlb_inval_range_tilemask_submit() and
xe_tlb_inval_batch_wait() in xe_tlb_inval.c as the submit and wait
halves respectively. The batch of fences is carried in the new
xe_tlb_inval_batch structure. Remove xe_vm_range_tilemask_tlb_inval()
and convert all three call sites to the new API.
v3:
- Don't wait on TLB invalidation batches if the corresponding batch
submit returns an error. (Matt Brost)
- s/_batch/batch/ (Matt Brost)
drm/xe/userptr: Convert invalidation to two-pass MMU notifier
In multi-GPU scenarios, asynchronous GPU job latency is a bottleneck if
each notifier waits for its own GPU before returning. The two-pass
mmu_interval_notifier infrastructure allows deferring the wait to a
second pass, so all GPUs can be signalled in the first pass before
any of them are waited on.
Convert the userptr invalidation to use the two-pass model:
Use invalidate_start as the first pass to mark the VMA for repin and
enable software signalling on the VM reservation fences to start any
gpu work needed for signaling. Fall back to completing the work
synchronously if all fences are already signalled, or if a concurrent
invalidation is already using the embedded finish structure.
Use invalidate_finish as the second pass to wait for the reservation
fences to complete, invalidate the GPU TLB in fault mode, and unmap
the gpusvm pages.
Embed a struct mmu_interval_notifier_finish in struct xe_userptr to
avoid dynamic allocation in the notifier callback. Use a finish_inuse
flag to prevent two concurrent invalidations from using it
simultaneously; fall back to the synchronous path for the second caller.
v3:
- Add locking asserts in notifier components (Matt Brost)
- Clean up newlines (Matt Brost)
- Update the userptr notifier state member locking documentation
(Matt Brost)
GPU use-cases for mmu_interval_notifiers with hmm often involve
starting a gpu operation and then waiting for it to complete.
These operations are typically context preemption or TLB flushing.
With single-pass notifiers per GPU this doesn't scale in
multi-gpu scenarios. In those scenarios we'd want to first start
preemption- or TLB flushing on all GPUs and as a second pass wait
for them to complete.
One can do this on per-driver basis multiplexing per-driver
notifiers but that would mean sharing the notifier "user" lock
across all GPUs and that doesn't scale well either, so adding support
for multi-pass in the core appears to be the right choice.
Implement two-pass capability in the mmu_interval_notifier. Use a
linked list for the final passes to minimize the impact for
use-cases that don't need the multi-pass functionality by avoiding
a second interval tree walk, and to be able to easily pass data
between the two passes.
v1:
- Restrict to two passes (Jason Gunthorpe)
- Improve on documentation (Jason Gunthorpe)
- Improve on function naming (Alistair Popple)
v2:
- Include the invalidate_finish() callback in the
struct mmu_interval_notifier_ops.
- Update documentation (GitHub Copilot:claude-sonnet-4.6)
- Use lockless list for list management.
v3:
- Update kerneldoc for the struct mmu_interval_notifier_finish::list member
(Matthew Brost)
- Add a WARN_ON_ONCE() checking for NULL invalidate_finish() op if
if invalidate_start() is non-NULL. (Matthew Brost)
v4:
- Addressed documentation review comments by David Hildenbrand.
Cc: Matthew Brost <matthew.brost@intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: David Hildenbrand <david@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: <dri-devel@lists.freedesktop.org> Cc: <linux-mm@kvack.org> Cc: <linux-kernel@vger.kernel.org> Assisted-by: GitHub Copilot:claude-sonnet-4.6 # Documentation only. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/20260305093909.43623-2-thomas.hellstrom@linux.intel.com
Gustavo Sousa [Tue, 10 Mar 2026 00:42:12 +0000 (21:42 -0300)]
drm/xe: Translate C-state "reset value" into RC6
There are higher level sleep states that will cause RC6 state readout to
come back with an "in-reset" value. That is the case with NVL-P. As
those states are only possible if the GT is already in C6, let's just
translate the "reset value" into C6 when doing the readout.
Gustavo Sousa [Tue, 10 Mar 2026 00:42:11 +0000 (21:42 -0300)]
drm/xe/xe3p: Drop Wa_16028780921
Wa_16028780921 involves writing to a register that is locked by firmware
prior to driver loading and doesn't have any effect if implemented by
the KMD. Since the implementation of the workaround actually belongs
the firmware, just drop the ineffective implementation by the KMD.
Gustavo Sousa [Tue, 10 Mar 2026 00:42:10 +0000 (21:42 -0300)]
drm/xe/nvlp: Implement Wa_14026539277
Implement the KMD part of Wa_14026539277, which applies to NVL-P A0.
The KMD implementation is just one component of the workaround, which
also depends on Pcode to implement its part in order to be complete.
v2:
- Add FUNC(xe_rtp_match_not_sriov_vf) to skip applying the workaround
to SRIOV VFs. (Matt)
v3:
- Make Wa_14026539277 a device workaround instead of a GT workaround.
(Matt)
v4:
- Drop FUNC(xe_rtp_match_not_sriov_vf) and use a direct check with
IS_SRIOV_VF() in the workaround implementation. (Matt)
Gustavo Sousa [Tue, 10 Mar 2026 00:42:09 +0000 (21:42 -0300)]
drm/xe/rtp: Add support for matching platform-level stepping
Add support for matching platform-level stepping, which will be used for
an upcoming NVL-P workaround.
As support for reading platform-level stepping information is added only
as needed in the driver, add a warning when the rule finds a STEP_NONE
value, which is an indication that the driver is missing such a support.
Gustavo Sousa [Tue, 10 Mar 2026 00:42:08 +0000 (21:42 -0300)]
drm/xe/nvlp: Read platform-level stepping info
There will be a NVL-P workaround for which we will need to know the
platform-level stepping information in order to decide whether to apply
it or not.
While NVL-P has a nice mapping between the PCI revid and our symbolic
stepping enumeration, not all platforms are like that: (i) Some
platforms will have a single PCI revid used for a set platform level
steppings (ii) and some might even require specific mappings.
To make things simpler, let's include stepping information in the device
info only on demand, for those platforms where it is needed for
workaround checks.
v2:
- Call xe_step_platform_get() very early, to allow device workarounds
to use it in early stages of device initialization. (Matt)
Gustavo Sousa [Tue, 10 Mar 2026 00:42:07 +0000 (21:42 -0300)]
drm/xe: Drop unused IS_PLATFORM_STEP() and IS_SUBPLATFORM_STEP()
The macros IS_PLATFORM_STEP() and IS_SUBPLATFORM_STEP() are unused since
commit 87c299fa3a97 ("drm/xe/guc: Port Wa_14014475959 to xe_wa and fix
it") and commit 63bbd800ff01 ("drm/xe/guc: Port
Wa_22012727170/Wa_22012727685 to xe_wa"), respectively, and we can drop
them now. Furthermore, in upcoming changes we will add logic to read
platform-level step information from PCI RevID and keeping those macros
around would potentially cause confusion.
v2:
- Cite commits that made the macros unused. (Matt)
Gustavo Sousa [Tue, 10 Mar 2026 00:42:06 +0000 (21:42 -0300)]
drm/xe: Modify stepping info directly in xe_step_*_get()
In an upcoming change, we will add a member to struct xe_step_info to
represent the platform-level stepping. As such, we should stop assigning
the value returned by functions xe_step_pre_gmdid_get() and
xe_step_gmdid_get() directly to xe->info.step.
Since there are no other users for those functions, let's simply update
them to modify xe->info.step directly.
drm/xe: Allow per queue programming of COMMON_SLICE_CHICKEN3 bit13
Similar to i915's commit cebc13de7e704b1355bea208a9f9cdb042c74588
("drm/i915: Whitelist COMMON_SLICE_CHICKEN3 for UMD access"), except
that instead of putting the register on the allowlist for UMD to
program, the KMD is doing the programming at context initialization
based on a queue creation flag.
This is a recommended tuning setting for both gen12 and Xe_HP
platforms.
If a render queue is created with
DRM_XE_EXEC_QUEUE_SET_STATE_CACHE_PERF_FIX, COMMON_SLICE_CHICKEN3 will
be programmed at initialization to enable the render color cache to
key with BTP+BTI (binding table pool + binding table entry) instead of
just BTI (binding table entry). This enables the UMD to avoid emitting
render-target-cache-flush + stall-at-pixel-scoreboard every time a
binding table entry pointing to a render target is changed.
Matt Roper [Thu, 5 Mar 2026 22:59:27 +0000 (14:59 -0800)]
drm/xe: Add for_each_gt_with_type() iterator
There are a couple places in the driver today that have GT loops that
only need to operate on a specific type of GT. E.g.,
for_each_gt(...) {
if (xe_gt_is_media_type(gt))
continue;
...
}
Some upcoming development is expected to utilize this pattern a bit more
widely, so add a dedicated iterator that allows looping over specific GT
type(s).
Note that this iterator uses a mask for the "type" parameter rather than
a direct value match. That's probably a bit overkill for now given that
there are only two possible types of GTs, but if additional types of GTs
ever show up in the future, this approach will fit more naturally and
allow cases where we might want to loop over a subset of the possible
types, or specifically mask off one single type.
Dave Airlie [Sun, 8 Mar 2026 20:04:15 +0000 (06:04 +1000)]
Merge tag 'amd-drm-next-7.1-2026-03-04' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-7.1-2026-03-04:
amdgpu:
- FAMS2 updates
- Refactor DC I2C
- Rework ttm handling to allow for multiple engines
- UserQ updates
- Ring reset improvements
- DC DCE 6.x cleanups
- DC support for NUTMEG and TRAVIS DP bridges
- Enable DC by default on CIK APUs
- Add DCN 4.2 support
- IPS fixes
- Overlay fixes for DCN4
- SDMA Limit updates
- Misc fixes
- RAS updates
- Register access callback rework
- GC 12.1 updates
amdkfd:
- Misc cleanups
UAPI:
- UserQ fence IOCTL parameter size fixes. The change is backwards compatible on LE, but not BE.
UserQs are still not considered stable and are disabled by default.
Linus Torvalds [Sun, 8 Mar 2026 19:13:09 +0000 (12:13 -0700)]
Merge tag 'efi-fixes-for-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fix from Ard Biesheuvel:
"Fix for the x86 EFI workaround keeping boot services code and data
regions reserved until after SetVirtualAddressMap() completes:
deferred struct page initialization may result in some of this memory
being lost permanently"
* tag 'efi-fixes-for-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
x86/efi: defer freeing of boot services memory
Linus Torvalds [Sun, 8 Mar 2026 01:12:06 +0000 (17:12 -0800)]
Merge tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix SEV guest boot failures in certain circumstances, due to
very early code relying on a BSS-zeroed variable that isn't
actually zeroed yet an may contain non-zero bootup values
Move the variable into the .data section go gain even earlier
zeroing
- Expose & allow the IBPB-on-Entry feature on SNP guests, which
was not properly exposed to guests due to initial implementational
caution
- Fix O= build failure when CONFIG_EFI_SBAT_FILE is using relative
file paths
- Fix the various SNC (Sub-NUMA Clustering) topology enumeration
bugs/artifacts (sched-domain build errors mostly).
SNC enumeration data got more complicated with Granite Rapids X
(GNR) and Clearwater Forest X (CWF), which exposed these bugs
and made their effects more serious
- Also use the now sane(r) SNC code to fix resctrl SNC detection bugs
- Work around a historic libgcc unwinder bug in the vdso32 sigreturn
code (again), which regressed during an overly aggressive recent
cleanup of DWARF annotations
* tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/vdso32: Work around libgcc unwinder bug
x86/resctrl: Fix SNC detection
x86/topo: Fix SNC topology mess
x86/topo: Replace x86_has_numa_in_package
x86/topo: Add topology_num_nodes_per_package()
x86/numa: Store extra copy of numa_nodes_parsed
x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file paths
x86/sev: Allow IBPB-on-Entry feature for SNP guests
x86/boot/sev: Move SEV decompressor variables into the .data section
Linus Torvalds [Sun, 8 Mar 2026 01:09:15 +0000 (17:09 -0800)]
Merge tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Make clock_adjtime() syscall timex validation slightly more permissive
for auxiliary clocks, to not reject syscalls based on the status field
that do not try to modify the status field.
This makes the ABI behavior in clock_adjtime() consistent with
CLOCK_REALTIME"
* tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix timex status validation for auxiliary clocks
Linus Torvalds [Sun, 8 Mar 2026 01:07:13 +0000 (17:07 -0800)]
Merge tag 'sched-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a DL scheduler bug that may corrupt internal metrics during PI and
setscheduler() syscalls, resulting in kernel warnings and misbehavior.
Found during stress-testing"
* tag 'sched-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
Linus Torvalds [Sat, 7 Mar 2026 22:04:50 +0000 (14:04 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two core changes and the rest in drivers, one core change to quirk the
behaviour of the Iomega Zip drive and one to fix a hang caused by tag
reallocation problems, which has mostly been seen by the iscsi client.
Note the latter fixes the problem but still has a slight sysfs memory
leak, so will be amended in the next pull request (once we've run the
fix for the fix through our testing)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: Fix recursive locking in __configfs_open_file()
scsi: devinfo: Add BLIST_SKIP_IO_HINTS for Iomega ZIP
scsi: mpi3mr: Clear reset history on ready and recheck state after timeout
scsi: core: Fix refcount leak for tagset_refcnt
Linus Torvalds [Sat, 7 Mar 2026 20:38:16 +0000 (12:38 -0800)]
Merge tag 'parisc-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"While testing Sasha Levin's 'kallsyms: embed source file:line info in
kernel stack traces' patch series, which increases the typical kernel
image size, I found some issues with the parisc initial kernel mapping
which may prevent the kernel to boot.
The three small patches here fix this"
* tag 'parisc-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix initial page table creation for boot
parisc: Check kernel mapping earlier at bootup
parisc: Increase initial mapping to 64 MB with KALLSYMS
- Fix precision backtracking with linked registers (Eduard Zingerman)
- Fix linker flags detection for resolve_btfids (Ihor Solodrai)
- Fix race in update_ftrace_direct_add/del (Jiri Olsa)
- Fix UAF in bpf_trampoline_link_cgroup_shim (Lang Xu)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
resolve_btfids: Fix linker flags detection
selftests/bpf: add reproducer for spurious precision propagation through calls
bpf: collect only live registers in linked regs
Revert "selftests/bpf: Update reg_bound range refinement logic"
selftests/bpf: test refining u32/s32 bounds when ranges cross min/max boundary
bpf: Fix u32/s32 bounds when ranges cross min/max boundary
bpf: Fix a UAF issue in bpf_trampoline_link_cgroup_shim
ftrace: Add missing ftrace_lock to update_ftrace_direct_add/del
Linus Torvalds [Sat, 7 Mar 2026 19:56:55 +0000 (11:56 -0800)]
Merge tag 'rcu-fixes.v7.0-20260307a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU selftest fixes from Boqun Feng:
"Fix a regression in RCU torture test pre-defined scenarios caused by
commit 7dadeaa6e851 ("sched: Further restrict the preemption modes")
which limits PREEMPT_NONE to architectures that do not support
preemption at all and PREEMPT_VOLUNTARY to those architectures that do
not yet have PREEMPT_LAZY support.
Since major architectures (e.g. x86 and arm64) no longer support
CONFIG_PREEMPT_NONE and CONFIG_PREEMPT_VOLUNTARY, using them in
rcutorture, rcuscale, refscale, and scftorture pre-defined scenarios
causes config checking errors.
Switch these kconfigs to PREEMPT_LAZY"
* tag 'rcu-fixes.v7.0-20260307a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
scftorture: Update due to x86 not supporting none/voluntary preemption
refscale: Update due to x86 not supporting none/voluntary preemption
rcuscale: Update due to x86 not supporting none/voluntary preemption
rcutorture: Update due to x86 not supporting none/voluntary preemption
Linus Torvalds [Sat, 7 Mar 2026 17:50:54 +0000 (09:50 -0800)]
Merge tag 'trace-v7.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix possible NULL pointer dereference in trace_data_alloc()
On the trace_data_alloc() error path, it can call trigger_data_free()
with a NULL pointer. This used to be a kfree() but was changed to
trigger_data_free() to clean up any partial initialization. The issue
is that trigger_data_free() does not expect a NULL pointer. Have
trigger_data_free() return safely on NULL pointer.
- Fix multiple events on the command line and bootconfig
If multiple events are enabled on the command line separately and not
grouped, only the last event gets enabled. That is:
trace_event=sched_switch trace_event=sched_waking
will only enable sched_waking whereas:
trace_event=sched_switch,sched_waking
will enable both.
The bootconfig makes it even worse as the second way is the more
common method.
The issue is that a temporary buffer is used to store the events to
enable later in boot. Each time the cmdline callback is called, it
overwrites what was previously there.
Have the callback append the next value (delimited by a comma) if the
temporary buffer already has content.
- Fix command line trace_buffer_size if >= 2G
The logic to allocate the trace buffer uses "int" for the size
parameter in the command line code causing overflow issues if more
that 2G is specified.
* tag 'trace-v7.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2G
tracing: Fix enabling multiple events on the kernel command line and bootconfig
tracing: Add NULL pointer check to trigger_data_free()
Ihor Solodrai [Thu, 5 Mar 2026 01:47:30 +0000 (17:47 -0800)]
resolve_btfids: Fix linker flags detection
The "|| echo -lzstd" default makes zstd an unconditional link
dependency of resolve_btfids. On systems where libzstd-dev is not
installed and pkg-config fails, the linker fails:
ld: cannot find -lzstd: No such file or directory
libzstd is a transitive dependency of libelf, so the -lzstd flag is
strictly necessary only for static builds [1].
Remove ZSTD_LIBS variable, and instead set LIBELF_LIBS depending on
whether the build is static or not. Use $(HOSTPKG_CONFIG) as primary
source of the flags list.
Also add a default value for HOSTPKG_CONFIG in case it's not built via
the toplevel Makefile. Pass it from selftests/bpf too.
Linus Torvalds [Sat, 7 Mar 2026 16:16:48 +0000 (08:16 -0800)]
Merge tag 'driver-core-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fix from Danilo Krummrich:
- Revert "driver core: enforce device_lock for driver_match_device()":
When a device is already present in the system and a driver is
registered on the same bus, we iterate over all devices registered on
this bus to see if one of them matches. If we come across an already
bound one where the corresponding driver crashed while holding the
device lock (e.g. in probe()) we can't make any progress anymore.
Thus, revert and clarify that an implementer of struct bus_type must
not expect match() to be called with the device lock held.
* tag 'driver-core-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
Revert "driver core: enforce device_lock for driver_match_device()"
====================
bpf: Fix precision backtracking bug with linked registers
Emil Tsalapatis reported a verifier bug hit by the scx_lavd sched_ext
scheduler. The essential part of the verifier log looks as follows:
436: ...
// checkpoint hit for 438: (1d) if r7 == r8 goto ...
frame 3: propagating r2,r7,r8
frame 2: propagating r6
mark_precise: frame3: last_idx ...
mark_precise: frame3: regs=r2,r7,r8 stack= before 436: ...
mark_precise: frame3: regs=r2,r7 stack= before 435: ...
mark_precise: frame3: regs=r2,r7 stack= before 434: (85) call bpf_trace_vprintk#177
verifier bug: backtracking call unexpected regs 84
The log complains that registers r2 and r7 are tracked as precise
while processing the bpf_trace_vprintk() call in precision backtracking.
This can't be right, as r2 is reset by the call and there is nothing
to backtrack it to. The precision propagation is triggered when
a checkpoint is hit at instruction 438, r2 is dead at that instruction.
This happens because of the following sequence of events:
- Instruction 438 is first reached with registers r2 and r7 having
the same id via a path that does not call bpf_trace_vprintk():
- Checkpoint is created at 438.
- The jump at 438 is predicted, hence r7 and registers linked to it
(r2) are propagated as precise, marking r2 and r7 precise in the
checkpoint.
- Instruction 438 is reached a second time with r2 undefined and via
a path that calls bpf_trace_vprintk():
- Checkpoint is hit.
- propagate_precision() picks registers r2 and r7 and propagates
precision marks for those up to the helper call.
The root cause is the fact that states_equal() and
propagate_precision() assume that the precision flag can't be set for a
dead register (as computed by compute_live_registers()).
However, this is not the case when linked registers are at play.
Fix this by accounting for live register flags in
collect_linked_regs().
---
====================
selftests/bpf: add reproducer for spurious precision propagation through calls
Add a test for the scenario described in the previous commit:
an iterator loop with two paths where one ties r2/r7 via
shared scalar id and skips a call, while the other goes
through the call. Precision marks from the linked registers
get spuriously propagated to the call path via
propagate_precision(), hitting "backtracking call unexpected
regs" in backtrack_insn().
Fix an inconsistency between func_states_equal() and
collect_linked_regs():
- regsafe() uses check_ids() to verify that cached and current states
have identical register id mapping.
- func_states_equal() calls regsafe() only for registers computed as
live by compute_live_registers().
- clean_live_states() is supposed to remove dead registers from cached
states, but it can skip states belonging to an iterator-based loop.
- collect_linked_regs() collects all registers sharing the same id,
ignoring the marks computed by compute_live_registers().
Linked registers are stored in the state's jump history.
- backtrack_insn() marks all linked registers for an instruction
as precise whenever one of the linked registers is precise.
The above might lead to a scenario:
- There is an instruction I with register rY known to be dead at I.
- Instruction I is reached via two paths: first A, then B.
- On path A:
- There is an id link between registers rX and rY.
- Checkpoint C is created at I.
- Linked register set {rX, rY} is saved to the jump history.
- rX is marked as precise at I, causing both rX and rY
to be marked precise at C.
- On path B:
- There is no id link between registers rX and rY,
otherwise register states are sub-states of those in C.
- Because rY is dead at I, check_ids() returns true.
- Current state is considered equal to checkpoint C,
propagate_precision() propagates spurious precision
mark for register rY along the path B.
- Depending on a program, this might hit verifier_bug()
in the backtrack_insn(), e.g. if rY ∈ [r1..r5]
and backtrack_insn() spots a function call.
The reproducer program is in the next patch.
This was hit by sched_ext scx_lavd scheduler code.
Changes in tests:
- verifier_scalar_ids.c selftests need modification to preserve
some registers as live for __msg() checks.
- exceptions_assert.c adjusted to match changes in the verifier log,
R0 is dead after conditional instruction and thus does not get
range.
- precise.c adjusted to match changes in the verifier log, register r9
is dead after comparison and it's range is not important for test.
Linus Torvalds [Sat, 7 Mar 2026 04:27:13 +0000 (20:27 -0800)]
Merge tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild fixes from Nathan Chancellor:
- Split out .modinfo section from ELF_DETAILS macro, as that macro may
be used in other areas that expect to discard .modinfo, breaking
certain image layouts
- Adjust genksyms parser to handle optional attributes in certain
declarations, necessary after commit 07919126ecfc ("netfilter:
annotate NAT helper hook pointers with __rcu")
- Include resolve_btfids in external module build created by
scripts/package/install-extmod-build when it may be run on external
modules
- Avoid removing objtool binary with 'make clean', as it is required
for external module builds
* tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: Leave objtool binary around with 'make clean'
kbuild: install-extmod-build: Package resolve_btfids if necessary
genksyms: Fix parsing a declarator with a preceding attribute
kbuild: Split .modinfo out from ELF_DETAILS
Linus Torvalds [Sat, 7 Mar 2026 03:57:03 +0000 (19:57 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main changes are a fix to the way in which we manage the access
flag setting for mappings using the contiguous bit and a fix for a
hang on the kexec/hibernation path.
Summary:
- Fix kexec/hibernation hang due to bogus read-only mappings
- Fix sparse warnings in our cmpxchg() implementation
- Prevent runtime-const being used in modules, just like x86
- Fix broken elision of access flag modifications for contiguous
entries on systems without support for hardware updates
- Fix a broken SVE selftest that was testing the wrong instruction"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
selftest/arm64: Fix sve2p1_sigill() to hwcap test
arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults
arm64: make runtime const not usable by modules
arm64: mm: Add PTE_DIRTY back to PAGE_KERNEL* to fix kexec/hibernation
arm64: Silence sparse warnings caused by the type casting in (cmp)xchg
Calvin Owens [Sat, 7 Mar 2026 03:19:25 +0000 (19:19 -0800)]
tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2G
Some of the sizing logic through tracer_alloc_buffers() uses int
internally, causing unexpected behavior if the user passes a value that
does not fit in an int (on my x86 machine, the result is uselessly tiny
buffers).
Fix by plumbing the parameter's real type (unsigned long) through to the
ring buffer allocation functions, which already use unsigned long.
It has always been possible to create larger ring buffers via the sysfs
interface: this only affects the cmdline parameter.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/bff42a4288aada08bdf74da3f5b67a2c28b761f8.1772852067.git.calvin@wbinvd.org Fixes: 73c5162aa362 ("tracing: keep ring buffer to minimum size till used") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
====================
bpf: Fix u32/s32 bounds when ranges cross min/max boundary
Cover the following cases in range refinement logic for 32-bit ranges:
- s32 range crosses U32_MAX/0 boundary, positive part of the s32 range
overlaps with u32 range.
- s32 range crosses U32_MAX/0 boundary, negative part of the s32 range
overlaps with u32 range.
These cases are already handled for 64-bit range refinement.
Without the fix the test in patch 2 is rejected by the verifier.
The test was reduced from sched-ext program.
Changelog:
- v2 -> v3:
- Reverted da653de268d3 (Paul)
- Removed !BPF_F_TEST_REG_INVARIANTS flag from
crossing_32_bit_signed_boundary_2() (Paul)
- v1 -> v2:
- Extended commit message and comments (Emil)
- Targeting 'bpf' tree instead of bpf-next (Alexei)
Revert "selftests/bpf: Update reg_bound range refinement logic"
This reverts commit da653de268d32a80e135c9eb960a8147c186f1bc.
Removed logic is now covered by range_refine_in_halves()
which handles both 32-bit and 64-bit refinements.
selftests/bpf: test refining u32/s32 bounds when ranges cross min/max boundary
Two test cases for signed/unsigned 32-bit bounds refinement
when s32 range crosses the sign boundary:
- s32 range [S32_MIN..1] overlapping with u32 range [3..U32_MAX],
s32 range tail before sign boundary overlaps with u32 range.
- s32 range [-3..5] overlapping with u32 range [0..S32_MIN+3],
s32 range head after the sign boundary overlaps with u32 range.
This covers both branches added in the __reg32_deduce_bounds().
Also, crossing_32_bit_signed_boundary_2() no longer triggers invariant
violations.
bpf: Fix u32/s32 bounds when ranges cross min/max boundary
Same as in __reg64_deduce_bounds(), refine s32/u32 ranges
in __reg32_deduce_bounds() in the following situations:
- s32 range crosses U32_MAX/0 boundary, positive part of the s32 range
overlaps with u32 range:
0 U32_MAX
| [xxxxxxxxxxxxxx u32 range xxxxxxxxxxxxxx] |
|----------------------------|----------------------------|
|xxxxx s32 range xxxxxxxxx] [xxxxxxx|
0 S32_MAX S32_MIN -1
- s32 range crosses U32_MAX/0 boundary, negative part of the s32 range
overlaps with u32 range:
0 U32_MAX
| [xxxxxxxxxxxxxx u32 range xxxxxxxxxxxxxx] |
|----------------------------|----------------------------|
|xxxxxxxxx] [xxxxxxxxxxxx s32 range |
0 S32_MAX S32_MIN -1
- No refinement if ranges overlap in two intervals.
This helps for e.g. consider the following program:
call %[bpf_get_prandom_u32];
w0 &= 0xffffffff;
if w0 < 0x3 goto 1f; // on fall-through u32 range [3..U32_MAX]
if w0 s> 0x1 goto 1f; // on fall-through s32 range [S32_MIN..1]
if w0 s< 0x0 goto 1f; // range can be narrowed to [S32_MIN..-1]
r10 = 0;
1: ...;
The reg_bounds.c selftest is updated to incorporate identical logic,
refinement based on non-overflowing range halves:
Linus Torvalds [Sat, 7 Mar 2026 00:07:22 +0000 (16:07 -0800)]
Merge tag 'v7.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix potential oops on open failure
- Fix unmount to better free deferred closes
- Use proper constant-time MAC comparison function
- Two buffer allocation size fixes
- Two minor cleanups
- make SMB2 kunit tests a distinct module
* tag 'v7.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix oops due to uninitialised var in smb2_unlink()
cifs: open files should not hold ref on superblock
smb: client: Compare MACs in constant time
smb/client: remove unused SMB311_posix_query_info()
smb/client: fix buffer size for smb311_posix_qinfo in SMB311_posix_query_info()
smb/client: fix buffer size for smb311_posix_qinfo in smb2_compound_op()
smb: update some doc references
smb/client: make SMB2 maperror KUnit tests a separate module
tracing: Fix enabling multiple events on the kernel command line and bootconfig
Multiple events can be enabled on the kernel command line via a comma
separator. But if the are specified one at a time, then only the last
event is enabled. This is because the event names are saved in a temporary
buffer, and each call by the init cmdline code will reset that buffer.
This also affects names in the boot config file, as it may call the
callback multiple times with an example of:
Linus Torvalds [Fri, 6 Mar 2026 21:37:52 +0000 (13:37 -0800)]
Merge tag 'pci-v7.0-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Initialize msi_addr_mask for OF-created PCI devices to fix sparc and
powerpc probe regressions (Nilay Shroff)
- Orphan the Altera PCIe controller driver (Dave Hansen)
* tag 'pci-v7.0-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Orphan Altera PCIe controller driver
sparc/PCI: Initialize msi_addr_mask for OF-created PCI devices
powerpc/pci: Initialize msi_addr_mask for OF-created PCI devices
Linus Torvalds [Fri, 6 Mar 2026 21:29:12 +0000 (13:29 -0800)]
Merge tag 'drm-fixes-2026-03-07' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes pull.
There is one mm fix in here for a HMM livelock triggered by the xe
driver tests. Otherwise it's a pretty wide range of fixes across the
board, ttm UAF regression fix, amdgpu fixes, nouveau doesn't crash my
laptop anymore fix, and a fair bit of misc.
Seems about right for rc3.
mm:
- mm: Fix a hmm_range_fault() livelock / starvation problem
amdxdna:
- fix invalid payload for failed command
- fix NULL ptr dereference
- fix major fw version check
- avoid inconsistent fw state on error
i915/display:
- Fix for Lenovo T14 G7 display not refreshing
xe:
- Do not preempt fence signaling CS instructions
- Some leak and finalization fixes
- Workaround fix
nouveau:
- avoid runtime suspend oops when using dp aux
panthor:
- fix gem_sync argument ordering
solomon:
- fix incorrect display output
renesas:
- fix DSI divider programming
ethosu:
- fix job submit error clean-up refcount
- fix NPU_OP_ELEMENTWISE validation
- handle possible underflows in IFM size calcs"
* tag 'drm-fixes-2026-03-07' of https://gitlab.freedesktop.org/drm/kernel: (38 commits)
accel: ethosu: Handle possible underflow in IFM size calculations
accel: ethosu: Fix NPU_OP_ELEMENTWISE validation with scalar
accel: ethosu: Fix job submit error clean-up refcount underflows
accel/amdxdna: Split mailbox channel create function
drm/panthor: Correct the order of arguments passed to gem_sync
Revert "drm/syncobj: Fix handle <-> fd ioctls with dirty stack"
drm/ttm: Fix bo resource use-after-free
nouveau/dpcd: return EBUSY for aux xfer if the device is asleep
accel/amdxdna: Fix major version check on NPU1 platform
drm/amdgpu/userq: refcount userqueues to avoid any race conditions
drm/amdgpu/userq: Consolidate wait ioctl exit path
drm/amdgpu/psp: Use Indirect access address for GFX to PSP mailbox
drm/amdgpu: Fix use-after-free race in VM acquire
drm/amd/pm: remove invalid gpu_metrics.energy_accumulator on smu v13.0.x
drm/xe: Fix memory leak in xe_vm_madvise_ioctl
drm/xe/reg_sr: Fix leak on xa_store failure
drm/xe/xe2_hpg: Correct implementation of Wa_16025250150
drm/xe/gsc: Fix GSC proxy cleanup on early initialization failure
Revert "drm/pagemap: Disable device-to-device migration"
drm/i915/psr: Fix for Panel Replay X granularity DPCD register handling
...
Linus Torvalds [Fri, 6 Mar 2026 20:34:49 +0000 (12:34 -0800)]
Merge tag 'linux_kselftest-kunit-fixes-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fixes from Shuah Khan:
- Fix rust warnings when CONFIG_PRINTK is disabled
- Reduce stack usage in kunit_run_tests() to fix warnings when
CONFIG_FRAME_WARN is set to a relatively low value
- Update email address for David Gow
- Copy caller args in kunit tool in run_kernel to prevent mutation
* tag 'linux_kselftest-kunit-fixes-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: reduce stack usage in kunit_run_tests()
kunit: tool: copy caller args in run_kernel to prevent mutation
rust: kunit: fix warning when !CONFIG_PRINTK
MAINTAINERS: Update email address for David Gow
Linus Torvalds [Fri, 6 Mar 2026 18:33:32 +0000 (10:33 -0800)]
Merge tag 'spi-fix-v7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One device specific fix here, it was possible we might end up trying
to dereference an invalid pointer while reporting a transfer timeout
on DesignWare controllers"
* tag 'spi-fix-v7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-dw-dma: fix print error log when wait finish transaction
Linus Torvalds [Fri, 6 Mar 2026 18:27:45 +0000 (10:27 -0800)]
Merge tag 'regulator-fix-v7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of small, driver specific fixes which might not even have
much impact if you have the affected devices depending on your setup"
* tag 'regulator-fix-v7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: pf9453: Respect IRQ trigger settings from firmware
regulator: mt6363: Fix incorrect and redundant IRQ disposal in probe
Linus Torvalds [Fri, 6 Mar 2026 18:06:04 +0000 (10:06 -0800)]
Merge tag 'sound-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Again a collection of device-specific fixes. Most of changes are
fairly small device-specific quirks of fixes for HD- and USB-audio,
ASoC Intel, AMD, fsl, Cirrus and co.
The only large LOC is for plumbing ASoC ACP driver to add the Cirrus
Logic codec support, so this one is also just adding some tables"
* tag 'sound-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ALSA: us122l: drop redundant interface references
ASoC: amd: yc: Add DMI quirk for ASUS EXPERTBOOK PM1503CDA
ASoC: dt-bindings: renesas,rz-ssi: Document RZ/G3L SoC
ASoC: SDCA: Add allocation failure check for Entity name
ALSA: hda/senary: Ensure EAPD is enabled during init
ALSA: hda/senary: Use codec->core.afg for GPIO access
ALSA: doc: usb-audio: Add doc for QUIRK_FLAG_SKIP_IFACE_SETUP
ASoC: dt-bindings: tegra: Add compatible for Tegra238 sound card
ALSA: hda/hdmi: Add Tegra238 HDA codec device ID
ASoC: cs35l56: Suppress pointless warning about number of GPIO pulls
ASoC: amd: acp: Add ACP6.3 match entries for Cirrus Logic parts
ASoC: Intel: sof_sdw: Add quirk for Alienware Area 51 (2025) 0CCD SKU
ASoC: rt1321: fix DMIC ch2/3 mask issue
ASoC: cs35l56: Only patch ASP registers if the DAI is part of a DAIlink
ASoC: fsl_easrc: Fix event generation in fsl_easrc_iec958_set_reg()
ASoC: fsl_easrc: Fix event generation in fsl_easrc_iec958_put_bits()
ALSA: firewire: dice: Fix printf warning with W=1
ALSA: hda/tas2781: A workaround solution to lower-vol issue among lower calibrated-impedance micro-speaker on TAS2781
ALSA: hda/realtek: Add quirk for HP Pavilion 15-eh1xxx to enable mute LED
ALSA: usb-audio: Add iface reset and delay quirk for AB13X USB Audio
...
Guenter Roeck [Thu, 5 Mar 2026 19:33:39 +0000 (11:33 -0800)]
tracing: Add NULL pointer check to trigger_data_free()
If trigger_data_alloc() fails and returns NULL, event_hist_trigger_parse()
jumps to the out_free error path. While kfree() safely handles a NULL
pointer, trigger_data_free() does not. This causes a NULL pointer
dereference in trigger_data_free() when evaluating
data->cmd_ops->set_filter.
Fix the problem by adding a NULL pointer check to trigger_data_free().
The problem was found by an experimental code review agent based on
gemini-3.1-pro while reviewing backports into v6.18.y.
Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20260305193339.2810953-1-linux@roeck-us.net Fixes: 0550069cc25f ("tracing: Properly process error handling in event_hist_trigger_parse()") Assisted-by: Gemini:gemini-3.1-pro Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Linus Torvalds [Fri, 6 Mar 2026 18:00:58 +0000 (10:00 -0800)]
Merge tag 'hid-for-linus-2026030601' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- fix a few memory leaks (Günther Noack)
- fix potential kernel crashes in cmedia, creative-sb0540 and zydacron
(Greg Kroah-Hartman)
- fix NULL pointer dereference in pidff (Tomasz Pakuła)
- fix battery reporting for Apple Magic Trackpad 2 (Julius Lehmann)
- mcp2221 proper handling of failed read operation (Romain Sioen)
- various device quirks / device ID additions
* tag 'hid-for-linus-2026030601' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: mcp2221: cancel last I2C command on read error
HID: asus: add xg mobile 2023 external hardware support
HID: multitouch: Keep latency normal on deactivate for reactivation gesture
HID: apple: Add EPOMAKER TH87 to the non-apple keyboards list
HID: intel-ish-hid: ipc: Add Nova Lake-H/S PCI device IDs
selftests: hid: tests: test_wacom_generic: add tests for display devices and opaque devices
HID: multitouch: new class MT_CLS_EGALAX_P80H84
HID: magicmouse: fix battery reporting for Apple Magic Trackpad 2
HID: pidff: Fix condition effect bit clearing
HID: Add HID_CLAIMED_INPUT guards in raw_event callbacks missing them
HID: asus: avoid memory leak in asus_report_fixup()
HID: magicmouse: avoid memory leak in magicmouse_report_fixup()
HID: apple: avoid memory leak in apple_report_fixup()
HID: Document memory allocation properties of report_fixup()
- touchscreen_dmi: Add quirk for y-inverted Goodix touchscreen on SUPI
S10
- uniwill-laptop:
- FN lock/super key lock attributes rename
- Fix crash on unexpected battery event
- A special key combination can alter FN lock status so mark it
volatile
- Handle FN lock event
* tag 'platform-drivers-x86-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (27 commits)
platform/x86: dell-wmi-sysman: Don't hex dump plaintext password data
platform_data/mlxreg: mlxreg.h: fix all kernel-doc warnings
platform/x86: asus-armoury: add support for FA401UM
platform/x86: asus-armoury: add support for GX650RX
platform/x86: hp-bioscfg: Support allocations of larger data
platform/x86: oxpec: Add support for Aokzoe A2 Pro
platform/x86: oxpec: Add support for OneXPlayer X1 Air
platform/x86: oxpec: Add support for OneXPlayer X1z
platform/x86: oxpec: Add support for OneXPlayer APEX
platform/x86: uniwill-laptop: Handle FN lock event
platform/x86: uniwill-laptop: Mark FN lock status as being volatile
platform/x86: uniwill-laptop: Fix crash on unexpected battery event
platform/x86: uniwill-laptop: Rename FN lock and super key lock attrs
platform/x86: redmi-wmi: Add more hotkey mappings
platform/x86: alienware-wmi-wmax: Add G-Mode support to m18 laptops
platform/x86: hp-wmi: add Omen 14-fb1xxx (board 8E41) support
platform/x86: dell-wmi: Add audio/mic mute key codes
platform/x86: hp-wmi: Add Victus 16-d0xxx support
platform/x86: intel-hid: Enable 5-button array on ThinkPad X1 Fold 16 Gen 1
platform/x86: int3472: Handle GPIO type 0x10 (DOVDD)
...
Linus Torvalds [Fri, 6 Mar 2026 17:22:51 +0000 (09:22 -0800)]
Merge tag 'slab-for-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
- Fix for slab->stride truncation on 64k page systems due to short
type. It was not due to races and lack of barriers in the end. (Harry
Yoo)
- Fix for severe performance regression due to unnecessary sheaf refill
restrictions exposed by mempool allocation strategy. (Vlastimil
Babka)
- Stable fix for potential silent percpu sheaf flushing failures on
PREEMPT_RT. (Vlastimil Babka)
* tag 'slab-for-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: change stride type from unsigned short to unsigned int
mm/slab: allow sheaf refill if blocking is not allowed
slab: distinguish lock and trylock for sheaf_flush_main()
Linus Torvalds [Fri, 6 Mar 2026 17:16:39 +0000 (09:16 -0800)]
Merge tag 'pmdomain-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
- rockchip: Fix PD_VCODEC for RK3588
- bcm: Fix broken reset status read for bcm2835
* tag 'pmdomain-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: rockchip: Fix PD_VCODEC for RK3588
pmdomain: bcm: bcm2835-power: Fix broken reset status read
Linus Torvalds [Fri, 6 Mar 2026 17:10:36 +0000 (09:10 -0800)]
Require (reasonably) normal mappings for MADV_DOFORK
This came up as a result of the tracing fix pull request, and commit e39bb9e02b68 ("tracing: Fix WARN_ON in tracing_buffers_mmap_close") in
particular.
The use of MADV_DOFORK confused the ring buffer mapping reference
counting just because it was unexpected, since the mapping was
originally done with VM_DONTCOPY.
The tracing code may well be the only case of this (and fixed it all by
just using the mmap open callback to unconfuse itself), but it's just
strange that we allow MADV_DOFORK on special mappings where the kernel
has set the "don't copy this" bit.
The code already disallowed it for VM_IO mappings (going back to the
original commit f822566165dd: "madvise MADV_DONTFORK/MADV_DOFORK"), so
just extend it to any of the VM_SPECIAL cases (which includes
VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP in addition to VM_IO).
We could also allow MADV_DOFORK only on mappings that had been marked
DONTFORK by the user. But that would require us to track that
(presumably with another VM_xyz bit), so let's just do this trivial and
straightforward modifications.
If anybody notices, Lorenzo will be boarding Flying Pig Airlines.
Suggested-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Link: https://lore.kernel.org/all/a8907468-d7e9-4727-af28-66d905093230@kernel.org/ Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 6 Mar 2026 16:44:20 +0000 (08:44 -0800)]
Merge tag 'v7.0-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- Fix use-after-free in ccp
- Fix bug when SEV is disabled in ccp
- Fix tfm_count leak in atmel-sha204a
* tag 'v7.0-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: atmel-sha204a - Fix OOM ->tfm_count leak
crypto: ccp - Fix use-after-free on error path
crypto: ccp - allow callers to use HV-Fixed page API when SEV is disabled
Linus Torvalds [Fri, 6 Mar 2026 16:41:20 +0000 (08:41 -0800)]
Merge tag 'ata-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:
- Fix a problem where the deferred non-NCQ command would incorrectly
get completed as a failed command, if there was another command that
timed out. Found by Gemini (Guenter)
- The deferred non-NCQ command work is only supposed to run after the
last NCQ command finishes. However, because the work was never
canceled on error (e.g. a timeout), the work could incorrectly run
when commands were still in flight. Found by syzbot (me)
- Add a quirk to make sure that QEMU harddrives can potentially use up
to 32 MiB I/Os (Pedro)
- Add a quirk to disable LPM on Seagate ST1000DM010-2EP102 (Maximilian)
* tag 'ata-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-eh: Fix detection of deferred qc timeouts
ata: libata-core: Add BRIDGE_OK quirk for QEMU drives
ata: libata: cancel pending work after clearing deferred_qc
ata: libata-core: Disable LPM on ST1000DM010-2EP102
Linus Torvalds [Fri, 6 Mar 2026 16:36:18 +0000 (08:36 -0800)]
Merge tag 'block-7.0-20260305' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Improve quirk visibility and configurability (Maurizio)
- Fix runtime user modification to queue setup (Keith)
- Fix multipath leak on try_module_get failure (Keith)
- Ignore ambiguous spec definitions for better atomics support
(John)
- Fix admin queue leak on controller reset (Ming)
- Fix large allocation in persistent reservation read keys
(Sungwoo Kim)
- Fix fcloop callback handling (Justin)
- Securely free DHCHAP secrets (Daniel)
- Various cleanups and typo fixes (John, Wilfred)
- Avoid a circular lock dependency issue in the sysfs nr_requests or
scheduler store handling
- Fix a circular lock dependency with the pcpu mutex and the queue
freeze lock
- Cleanup for bio_copy_kern(), using __bio_add_page() rather than the
bio_add_page(), as adding a page here cannot fail. The exiting code
had broken cleanup for the error condition, so make it clear that the
error condition cannot happen
- Fix for a __this_cpu_read() in preemptible context splat
* tag 'block-7.0-20260305' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
block: use trylock to avoid lockdep circular dependency in sysfs
nvme: fix memory allocation in nvme_pr_read_keys()
block: use __bio_add_page in bio_copy_kern
block: break pcpu_alloc_mutex dependency on freeze_lock
blktrace: fix __this_cpu_read/write in preemptible context
nvme-multipath: fix leak on try_module_get failure
nvmet-fcloop: Check remoteport port_state before calling done callback
nvme-pci: do not try to add queue maps at runtime
nvme-pci: cap queue creation to used queues
nvme-pci: ensure we're polling a polled queue
nvme: fix memory leak in quirks_param_set()
nvme: correct comment about nvme_ns_remove()
nvme: stop setting namespace gendisk device driver data
nvme: add support for dynamic quirk configuration via module parameter
nvme: fix admin queue leak on controller reset
nvme-fabrics: use kfree_sensitive() for DHCHAP secrets
nvme: stop using AWUPF
nvme: expose active quirks in sysfs
nvme/host: fixup some typos
Linus Torvalds [Fri, 6 Mar 2026 16:31:36 +0000 (08:31 -0800)]
Merge tag 'io_uring-7.0-20260305' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Fix a typo in the mock_file help text
- Fix a comment regarding IORING_SETUP_TASKRUN_FLAG in the
io_uring.h UAPI header
- Use READ_ONCE() for reading refill queue entries
- Reject SEND_VECTORIZED for fixed buffer sends, as it isn't
implemented. Currently this flag is silently ignored
This is in preparation for making these work, but first we
need a fixup so that older kernels will correctly reject them
- Ensure "0" means default for the rx page size
* tag 'io_uring-7.0-20260305' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/zcrx: use READ_ONCE with user shared RQEs
io_uring/mock: Fix typo in help text
io_uring/net: reject SEND_VECTORIZED when unsupported
io_uring: correct comment for IORING_SETUP_TASKRUN_FLAG
io_uring/zcrx: don't set rx_page_size when not requested
kthread_exit became a macro to do_exit in commit 28aaa9c39945
("kthread: consolidate kthread exit paths to prevent use-after-free"),
so there is no kthread_exit function BTF ID to resolve. Remove it from
noreturn_deny to avoid resolve_btfids unresolved symbol warnings.
Signed-off-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yifan Wu [Thu, 5 Mar 2026 01:36:37 +0000 (09:36 +0800)]
selftest/arm64: Fix sve2p1_sigill() to hwcap test
The FEAT_SVE2p1 is indicated by ID_AA64ZFR0_EL1.SVEver. However,
the BFADD requires the FEAT_SVE_B16B16, which is indicated by
ID_AA64ZFR0_EL1.B16B16. This could cause the test to incorrectly
fail on a CPU that supports FEAT_SVE2.1 but not FEAT_SVE_B16B16.
LD1Q Gather load quadwords which is decoded from SVE encodings and
implied by FEAT_SVE2p1.
Fixes: c5195b027d29 ("kselftest/arm64: Add SVE 2.1 to hwcap test") Signed-off-by: Yifan Wu <wuyifan50@huawei.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults
contpte_ptep_set_access_flags() compared the gathered ptep_get() value
against the requested entry to detect no-ops. ptep_get() ORs AF/dirty
from all sub-PTEs in the CONT block, so a dirty sibling can make the
target appear already-dirty. When the gathered value matches entry, the
function returns 0 even though the target sub-PTE still has PTE_RDONLY
set in hardware.
For a CPU with FEAT_HAFDBS this gathered view is fine, since hardware may
set AF/dirty on any sub-PTE and CPU TLB behavior is effectively gathered
across the CONT range. But page-table walkers that evaluate each
descriptor individually (e.g. a CPU without DBM support, or an SMMU
without HTTU, or with HA/HD disabled in CD.TCR) can keep faulting on the
unchanged target sub-PTE, causing an infinite fault loop.
Gathering can therefore cause false no-ops when only a sibling has been
updated:
- write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
- read faults: target still lacks PTE_AF
Fix by checking each sub-PTE against the requested AF/dirty/write state
(the same bits consumed by __ptep_set_access_flags()), using raw
per-PTE values rather than the gathered ptep_get() view, before
returning no-op. Keep using the raw target PTE for the write-bit unfold
decision.
Per Arm ARM (DDI 0487) D8.7.1 ("The Contiguous bit"), any sub-PTE in a CONT
range may become the effective cached translation and software must
maintain consistent attributes across the range.
Fixes: 4602e5757bcc ("arm64/mm: wire up PTE_CONT for user mappings") Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Breno Leitao <leitao@debian.org> Cc: stable@vger.kernel.org Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Breno Leitao <leitao@debian.org> Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com> Acked-by: Balbir Singh <balbirs@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
Helge Deller [Wed, 4 Mar 2026 21:24:18 +0000 (22:24 +0100)]
parisc: Fix initial page table creation for boot
The KERNEL_INITIAL_ORDER value defines the initial size (usually 32 or
64 MB) of the page table during bootup. Up until now the whole area was
initialized with PTE entries, but there was no check if we filled too
many entries. Change the code to fill up with so many entries that the
"_end" symbol can be reached by the kernel, but not more entries than
actually fit into the initial PTE tables.
Helge Deller [Tue, 3 Mar 2026 22:36:11 +0000 (23:36 +0100)]
parisc: Check kernel mapping earlier at bootup
The check if the initial mapping is sufficient needs to happen much
earlier during bootup. Move this test directly to the start_parisc()
function and use native PDC iodc functions to print the warning, because
panic() and printk() are not functional yet.
This fixes boot when enabling various KALLSYSMS options which need
much more space.
Dave Airlie [Fri, 6 Mar 2026 09:40:23 +0000 (19:40 +1000)]
Merge tag 'drm-misc-fixes-2026-03-06' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Another early drm-misc-fixes PR to revert the previous uapi fix sent in
drm-misc-fixes-2026-03-05, together with a UAF fix in TTM, an argument
order fix for panthor, a fix for the firmware getting stuck on
resource allocation error handling for amdxdna, and a few fixes for
ethosu (size calculation and reference underflows, and a validation
fix).
Dave Airlie [Fri, 6 Mar 2026 07:45:47 +0000 (17:45 +1000)]
Merge tag 'drm-misc-fixes-2026-03-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A return type fix for ttm, a display fix for solomon, several misc fixes
for amdxdna, a DSI clock rate fix for rz-du, a uapi fix for syncobj, a
possible build failure fix for dma-buf, a doc warning fix for sched, a
build failure fix for ttm tests, and a crash fix when suspended for
nouveau.
Guenter Roeck [Fri, 6 Mar 2026 02:48:05 +0000 (18:48 -0800)]
ata: libata-eh: Fix detection of deferred qc timeouts
If the ata_qc_for_each_raw() loop finishes without finding a matching SCSI
command for any QC, the variable qc will hold a pointer to the last element
examined, which has the tag i == ATA_MAX_QUEUE - 1. This qc can match the
port deferred QC (ap->deferred_qc).
If that happens, the condition qc == ap->deferred_qc evaluates to true
despite the loop not breaking with a match on the SCSI command for this QC.
In that case, the error handler mistakenly intercepts a command that has
not been issued yet and that has not timed out, and thus erroneously
returning a timeout error.
Fix the problem by checking for i < ATA_MAX_QUEUE in addition to
qc == ap->deferred_qc.
The problem was found by an experimental code review agent based on
gemini-3.1-pro while reviewing backports into v6.18.y.
Assisted-by: Gemini:gemini-3.1-pro Fixes: eddb98ad9364 ("ata: libata-eh: correctly handle deferred qc timeouts") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[cassel: modified commit log as suggested by Damien] Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>