git.ipfire.org Git - thirdparty/kernel/stable.git/log

drm/xe/nvm: fix writable override for CRI

The witable override should be set when FDO_MODE bit is enabled.
Fix the comparison to distingush this case from legacy systems
where bit should be disabled to have override.

Cc: stable@vger.kernel.org
Fixes: 9dde74fd9e65 ("drm/xe/nvm: enable cri platform")
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260714-cri_nvm_fdo_flip-v2-1-14580e71b58e@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 2007be18d2318a59748da5da1b8968042213d5f1)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Hold a dma-buf reference for imported BOs

An imported dma-buf BO is created as a ttm_bo_type_sg BO whose
reservation object is the exporter's dma_buf->resv. The importer,
however, only takes a dma-buf reference after a successful
dma_buf_dynamic_attach(). Until then nothing keeps the exporter alive,
so if the exporter is freed while the BO still references its resv, a
later access to that resv is a use-after-free:

  Oops: general protection fault, probably for non-canonical address
        0x6b6b6b6b6b6b6b9c
  Workqueue: ttm ttm_bo_delayed_delete [ttm]
  RIP: 0010:mutex_can_spin_on_owner+0x3f/0xc0

This can be reached on two paths:

- dma_buf_dynamic_attach() fails, or
- ttm_bo_init_reserved() fails during BO creation.

In both cases the BO already has bo->base.resv pointing at the exporter
resv, and sg BOs are always torn down via ttm_bo_delayed_delete(), which
locks bo->base.resv asynchronously - potentially after the exporter has
been freed.

Take the dma-buf reference in xe_bo_init_locked(), before
ttm_bo_init_reserved(), so it also covers a creation failure there, and
release it in xe_ttm_bo_destroy(). The reference is held for the whole
BO lifetime, keeping the shared resv alive on every path.

v2:
  - Reworked the fix to avoid creating the imported sg BO before
    dma_buf_dynamic_attach() succeeds.
  - Attach with importer_priv == NULL and make invalidate_mappings ignore
    incomplete imports.

v3:
  - Dropped the xe-side reordering approach since importer_priv must be
    valid when dma_buf_dynamic_attach() publishes the attachment.
  - Per Christian's suggestion on the v1 thread, keyed the check on
    import_attach rather than removing the sg guard entirely.
  - Fixes both xe and amdgpu in a single TTM patch.

v4:
  - Moved import_attach check to after dma_resv_copy_fences() so fences
    are copied before returning for successful imports (Thomas).
  - Removed exporter-alive claim from commit message (Thomas).

v5:
  - Add drm/xe patch to keep imported sg BOs off the LRU before attach
    succeeds; the TTM fix alone is not sufficient for xe if the BO is
    already LRU-visible. (Thomas)
    v4 patch:
    https://patchwork.freedesktop.org/patch/736663/?series=169129&rev=2
  - Patch 1 (drm/ttm) carries Christian's Reviewed-by from v4.

v6:
  - Reworked the fix based on Thomas' suggestion. Instead of the TTM resv
    individualization (v1-v5) plus the xe off-LRU/placement handling (v5),
    just hold a dma-buf reference for the imported BO lifetime so the
    shared resv can never be freed while the BO still references it.
    Single xe patch, no TTM change. (Thomas)
  - Take the reference in xe_bo_init_locked() before ttm_bo_init_reserved()
    so a TTM creation failure is covered too (Thomas).
  - Dropped the v5 series (drm/ttm + drm/xe off-LRU); the off-LRU approach
    also regressed in CI BAT via ttm_bo_pipeline_gutting() creating a ghost
    BO that outlived the exporter.
    Link to v5: https://patchwork.freedesktop.org/series/169984/

v7:
  - Move changelog above --- so it stays in the commit message.
  - Reorder changelog entries oldest-to-newest. (Thomas)

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/8023
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Cc: Christian Konig <christian.koenig@amd.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Suggested-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260710191027.260160-2-nitin.r.gote@intel.com
(cherry picked from commit 3516f3fae6be35642f8f06f8a218da6425c0306a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/vm: Fix BO prefetch with CONSULT_MEM_ADVISE_PREF_LOC

When prefetch region is DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC for a BO VMA,
the code used it as an index into region_to_mem_type[], causing an
out-of-bounds access since the value is -1.

Resolve the preferred location for BO VMAs directly: local VRAM on dGFX
(using the BO's tile placement) or system memory on iGPU.

Discovered using AI-assisted static analysis confirmed by Intel Product
Security.

v2:
-Fix null dereference

Reported-by: Martin Hodo <martin.hodo@intel.com>
Fixes: c1bb69a2e8e2 ("drm/xe/svm: Consult madvise preferred location in prefetch")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20260624174943.2808767-2-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
(cherry picked from commit d9a4906ac03be9f6ed3f3b45c56c866b867fd75b)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"arm64:

   - Fix an accounting buglet when reclaiming pages from a protected
     guest

   - Fix a bunch of architectural compliance issues when injecting a
     synthesised exception, most of which were missing the PSTATE.IL bit
     indicating a 32bit-wide instruction

   - Another set of fixes addressing issues with translation of
     VNCR_EL2, including corner cases where the guest point that
     register at a RO page...

   - Don't warn when trapping accesses to ZCR_EL2 from an L2 guest, as
     that's not unexpected at all

   - Address a bunch of races with LPI migration vs LPIs being disabled

   - Fix a total howler of a bug combining FEAT_MOPS and NV, resulting
     in exception returning in the wrong place...

   - Move locking for kvm_io_bus_get_dev() into the caller, ensuring
     race-free checks that the returned object is of the correct type

   - Fix initialisation of the page-table walk level when relaxing
     permissions

   - Correctly update the XN attribute when relaxing permissions

   - Fix the sign extension of loads from emulated MMIO regions

   - Assorted collection of fixes for pKVM's FFA proxy, together with a
     couple of FFA driver adjustments

   - Coerce Fuad Tabba into a reviewer role, and may his Inbox catch
     fire!

  s390:

   - more gmap KVM memory management fixes

   - PCI passthru fixes

  x86:

   - Fix a bug where KVM will trigger a UAF if updating IOMMU IRTEs
     fails when registering an IRQ-bypass producer

   - Ignore pending PV EOI instead of BUG()ing the host if the feature
     was disabled by the guest

   - Fix nVMX bugs where KVM would run L1 with an L1-controlled CR3
     after a failed "late" consistency check when KVM is NOT using EPT

   - Disallow intra-host migration/mirroring of SNP VMs as KVM doesn't
     yet support moving/mirroring SNP state

   - Fix a TOCTOU bug in KVM's handling of the "trusted" CPUID for TDX
     guests

   - Fix a NULL pointer deref in trace_kvm_inj_exception() where a
     change to the core infrastructure missed KVM's unique (ab)use of
     __print_symbolic()

   - Put vmcs12 pages if nested VM-Enter fails due to invalid guest
     state

   - Fix TLB conflicts between two VMs if one of them VM is run on a CPU
     before and after it is hotplugged"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (48 commits)
  KVM: SVM: Bump asid_generation on CPU online to avoid ASID collision after hotplug
  KVM: nVMX: Put vmcs12 pages if nested VM-Enter fails due to invalid guest state
  KVM: x86: Fix null pointer deref due to dummy array in trace_kvm_inj_exception()
  KVM: TDX: Reject concurrent change to CPUID entry count
  KVM: selftests: Verify SNP VMs are rejected from migration and mirroring
  KVM: SEV: Do not allow intra-host migration/mirroring of SNP VMs
  KVM: s390: pci: Fix handling of AIF enable without AISB
  KVM: s390: Improve kvm_s390_vm_stop_migration()
  KVM: s390: Fix dat_crste_walk_range() early return
  KVM: s390: vsie: Avoid potential deadlock with real spaces
  KVM: s390: pci: Fix GISC refcount leak on AIF enable failure
  KVM: nVMX: Don't use vmcs01.GUEST_CR3 to snapshot L1's CR3 when EPT is disabled
  KVM: nVMX: Move vTPR vs. TPR Threshold consistency check into "normal" checks
  KVM: x86: Ignore pending PV EOI if the vCPU has since disabled PV EOIs
  KVM: x86: Nullify irqfd->producer if updating IRTE for bypass fails
  KVM: arm64: Fix propagation of TLBI level in kvm_pgtable_stage2_relax_perms()
  firmware: arm_ffa: Fix Endpoint Memory Access Descriptor offset calculation
  firmware: arm_ffa: Fix out-of-bound writes in ffa_setup_and_transmit()
  KVM: arm64: Zero out the stack initialized data in the FFA handler
  KVM: arm64: Ensure FFA ranges are page aligned
  ...

Merge tag 'renesas-fixes-for-v7.2-tag1' of https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into arm/fixes

Renesas fixes for v7.2

- Fix lock-ups on the Ironhide development board.

* tag 'renesas-fixes-for-v7.2-tag1' of https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel:
arm64: dts: renesas: ironhide: Describe inline ECC carveouts

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

block: fix aligning of bounced dio read bios

bio_iov_iter_align_down expects the "normal" biovec layout from vector 0,
while bio_iov_iter_bounce_read abuses vector 0 for a bounce buffer
allocation. Pass an explicit bvec to bio_iov_iter_align_down to deal
with this case to avoid a double unpin.

Additionally we need to free the folio if no bio_vec could be added,
and adjust the size of the first bio_vec that contains the bounce buffer
when the I/O size is aligned down.

Fixes: e7b8b3c5b2a6 ("block: align down bounces bios")
Reported-by: 0wnerD1ed <l7z@0b1t.tech>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: 0wnerD1ed <l7z@0b1t.tech>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://patch.msgid.link/20260716091306.316625-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: handle huge zero folios in bio_free_folios

When CONFIG_PERSISTENT_HUGE_ZERO_FOLIO is enabled, iomap_dio_zero() can
add a huge zero folio to a zeroing bio, which needs special treatment
in bio_free_folios by also checking is_huge_zero_folio() in addition to
is_zero_folio().

Fixes: 8dd5e7c75d7b ("block: add helpers to bounce buffer an iov_iter into bios")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Link: https://patch.msgid.link/20260716091306.316625-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: try slab allocation in bio_alloc_bioset() before mempool

When the per-CPU bio cache is enabled but empty, bio_alloc_percpu_cache()
returns NULL and bio_alloc_bioset() falls straight through to the mempool
fallback:

    if (unlikely(!bio)) {
        if (!(saved_gfp & __GFP_DIRECT_RECLAIM))
            return NULL;
        ...
    }

For non-sleeping allocations (no __GFP_DIRECT_RECLAIM) this returns NULL
without ever attempting a slab allocation, even when there is plenty of
free memory.

Commit b520c4eef83d ("block: split bio_alloc_bioset more clearly into a
fast and slowpath") introduced this. Before it, a percpu cache miss fell
through to mempool_alloc(), which attempted the underlying slab allocation
first and only failed when that slab allocation failed. The restructuring
dropped the slab attempt that non-sleeping callers of a cache-enabled
bioset (such as the default fs_bio_set used by bio_alloc()) relied on.

Try a slab allocation with optimistic GFP_ flags before falling back to
the mempool whenever the bio is still NULL, so both the cache-empty and
non-cache paths share the same slab attempt. This restores the previous
behavior for non-sleeping allocations.

Fixes: b520c4eef83d ("block: split bio_alloc_bioset more clearly into a fast and slowpath")
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260709020145.4011533-1-joseph.qi@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: show operation in error injection rules

Rules listed through the error_injection debugfs file omit the block
operation they match. As a result, rules that differ only in operation
are indistinguishable even though op is mandatory when adding a rule.

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260715073341.95129-1-liu.yun@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: serialize elevator changes for the same queue using a writer lock

When elevator_change() is called concurrently for the same queue, the
elevator_change_done() function runs concurrently as well. This function
adds or deletes kobjects for the debugfs entry of the queue. Then the
concurrent calls cause memory corruption of the kobjects and result in a
process hang. The core part of the elevator switch is protected by queue
freeze and q->elevator_lock. However, since the commit 559dc11143eb
("block: move elv_register[unregister]_queue out of elevator_lock"), the
elevator_change_done() is not serialized. Hence the memory corruption
and the hang.

The failures are observed when udev-worker writes to a sysfs
queue/scheduler attribute file while the blktests test case block/005
writes to the same attribute file. The failure also can be recreated by
running two processes that write to the same queue/scheduler file
concurrently. The failure is observed since another commit 370ac285f23a
("block: avoid cpu_hotplug_lock depedency on freeze_lock"). This commit
changed the behavior of queue freeze and it unveiled the failure.

Fix the failure by changing elv_iosched_store() to acquire
update_nr_hwq_lock as the writer lock instead of the reader lock. This
serializes the whole elevator switch steps, including the
elevator_change_done() call.

Fixes: 559dc11143eb ("block: move elv_register[unregister]_queue out of elevator_lock")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260716092237.1305030-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: free copied pages when blk_rq_map_kern() fails

bio_copy_kern() allocates pages that are normally freed by the bio
completion callback. If blk_rq_append_bio() rejects the bio, however,
blk_rq_map_kern() only drops the bio reference. Since bio_put() does not
free pages referenced by the bio vectors, those pages leak.

This can happen when the bio exceeds the queue segment constraints or
when a later mapping cannot be merged into a request built by earlier
calls. Track whether the buffer was copied and free those pages before
dropping the rejected bio.

Fixes: 3a5a39276d2a ("block: allow blk_rq_map_kern to append to requests")
Assisted-by: Codex:gpt-5.6-sol
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260715073518.96042-1-liu.yun@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge patch series "net: can: isotp-fixes"

Oliver Hartkopp <socketcan@hartkopp.net> says:

As sashiko-bot was not able to check the second patch this bundle is
re-posted with b4 preparation.

Link: https://patch.msgid.link/20260712-isotp-fixes-v10-0-793a1b1ce17f@hartkopp.net
[mkl: added stable@k.o on Cc, converted Link: -> Closes:]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: isotp: serialize TX state transitions under so->rx_lock

The TX state machine (so->tx.state) is driven from three contexts:
sendmsg() claiming and progressing a transfer, the RX path consuming
Flow Control/echo frames, and two hrtimers timing out a stalled
transfer. Mixing a lock-free cmpxchg() claim in sendmsg() with
hrtimer_cancel() calls made under so->rx_lock elsewhere left windows
where a frame or timer callback could act on a state that had already
moved on, corrupting an unrelated transfer.

so->rx_lock now covers the full lifecycle of a TX claim: sendmsg()
takes it to check so->tx.state is ISOTP_IDLE, switch it to
ISOTP_SENDING, bump so->tx_gen and drain the previous transfer's
timers - all as one critical section. isotp_rcv_fc()/isotp_rcv_cf()
already run under this lock via isotp_rcv(), and isotp_rcv_echo() now
takes it itself, so none of them can ever observe a transfer mid-claim.
This also means a transfer can no longer be handed to sendmsg()'s
cleanup paths (signal or send error) while another thread is
concurrently claiming or finishing it, so those paths can cancel
timers and reset the state unconditionally.

isotp_release() claims the socket the same way, so a racing sendmsg()
sees a consistent ISOTP_SHUTDOWN and skips arming its timer or sending.

Only the hrtimer callbacks stay outside so->rx_lock, since they run
under so->rx_lock's cancellation elsewhere and taking it themselves
would deadlock. so->tx_gen lets them recognize whether the transfer
they timed out is still the one currently active, so they don't
report an error against a transfer that has since completed or been
superseded.

Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/linux-can/20260710142146.BDAE61F000E9@smtp.kernel.org/
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260712-isotp-fixes-v10-3-793a1b1ce17f@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: isotp: fix use-after-free race with concurrent NETDEV_UNREGISTER

isotp_release() looked up the bound network device via dev_get_by_index()
using the stored ifindex. During device unregistration the device is
unlisted from the ifindex hash before the NETDEV_UNREGISTER notifier
chain runs, so a concurrent isotp_release() could find no device, skip
can_rx_unregister() entirely, and still proceed to free the socket.
Since isotp_release() had already removed itself from the isotp
notifier list at that point, isotp_notify() would never get a chance to
clean up either, leaving a stale CAN filter that keeps pointing at the
freed socket.

Fix this the same way raw.c already does: hold a tracked reference to
the bound net_device in the socket (so->dev/so->dev_tracker) from
bind() onward instead of re-resolving it from the ifindex, and
serialize bind()/release() with rtnl_lock() so that so->dev is always
consistent with what the NETDEV_UNREGISTER notifier sees. so->dev
stays valid regardless of ifindex-hash unlisting, and is only ever
cleared by whichever of isotp_release()/isotp_notify() gets there
first, so the filter is always removed exactly once.

isotp_bind() now rejects a (re)bind with -EAGAIN while so->[tx|rx].state
isn't ISOTP_IDLE yet, so a timer left running by a prior
NETDEV_UNREGISTER can't act on a newly bound so->ifindex. Both checks
share the same lock_sock() section, so there is no window in which a
concurrent isotp_notify() clearing so->bound could be missed.

Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/linux-can/20260707101420.47F261F000E9@smtp.kernel.org/
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260712-isotp-fixes-v10-2-793a1b1ce17f@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: isotp: use unconditional synchronize_rcu() in isotp_release()

isotp_notify() unregisters the (RCU) CAN filters via can_rx_unregister()
and clears so->bound without waiting for a grace period. isotp_release()
uses so->bound to decide whether it needs to call synchronize_rcu()
before cancelling so->rxtimer, so when NETDEV_UNREGISTER runs first it
skips that synchronize_rcu() and can cancel the timer while an
in-flight isotp_rcv() is still executing and about to re-arm it via
isotp_send_fc(), leading to a use-after-free timer callback on the
freed socket.

sakisho-bot remarked a problem with rtnl_lock held in isotp_notify(),
therefore make isotp_release() always call synchronize_rcu() before
cancelling the timers, regardless of so->bound. This still closes the
original race (isotp_notify() clearing so->bound without waiting for
in-flight isotp_rcv() callers before isotp_release() cancels the RX
timer) without adding any RCU wait to the netdevice notifier path.

Fixes: 14a4696bc311 ("can: isotp: isotp_release(): omit unintended hrtimer restart on socket release")
Closes: https://lore.kernel.org/linux-can/20260707085210.6B6C01F000E9@smtp.kernel.org/
Reported-by: Nico Yip <zdi-disclosures@trendmicro.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260712-isotp-fixes-v10-1-793a1b1ce17f@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: bcm: collected fixes"

Oliver Hartkopp <socketcan@hartkopp.net> says:

as there were different patches flying arround to fix CAN_BCM issues and AI
assisted stuff pop's up again and again, I've created this collection to be
applied.

Link: https://patch.msgid.link/20260714-bcm_fixes-v15-0-562f7e3e42da@hartkopp.net
[mkl: added stable@k.o on Cc, converted Link: -> Closes:]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: track a single source interface for ANYDEV timeout/throttle ops

An ANYDEV rx op (ifindex == 0) with an active RX timeout and/or
throttle timer has no defined semantics when matching frames arrive
from several interfaces: bcm_rx_handler() can run concurrently for
the same op on different CPUs, racing hrtimer_cancel()/
bcm_rx_starttimer() against bcm_rx_timeout_handler() and causing
spurious RX_TIMEOUT notifications and last_frames corruption. The
same concurrency lets throttled multiplex frames from different
interfaces clobber the single rx_ifindex/rx_stamp fields shared by
the op.

Add op->if_detected to track the first interface that delivers a
matching frame while a timeout/throttle timer is configured, and
reject frames from any other interface for that op. The claim is
decided in bcm_rx_handler() before hrtimer_cancel() touches
op->timer, so a rejected frame can never disturb the claimed
interface's watchdog. RTR-mode ops are excluded via RX_RTR_FRAME,
independent of kt_ival1/kt_ival2, since those may briefly hold a
stale value from an earlier non-RTR configuration.

The claim is released in bcm_notify() on NETDEV_UNREGISTER and in
bcm_rx_setup() when SETTIMER reconfigures the timer values.

A (re-)claim is only possible on CAN devices in NETREG_REGISTERED
dev->reg_state to cover the release in bcm_notify() where reg_state
becomes NETREG_UNREGISTERING until synchronize_net().

Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/linux-can/20260709105031.1A39C1F000E9@smtp.kernel.org/
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-11-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: fix data race on rx_stamp/rx_ifindex in bcm_rx_handler()

For an rx op subscribed on all interfaces (ifindex == 0), the same op
is registered once in the shared per-netns wildcard filter list, so
bcm_rx_handler() can run concurrently on different CPUs for frames
arriving on different net devices.

op->rx_stamp and op->rx_ifindex were written before bcm_rx_update_lock was
taken, allowing concurrent writers to race each other - including a torn
store of the 64-bit rx_stamp on 32-bit platforms.

Beyond a torn store bcm_send_to_user() must report the timestamp/ifindex
of the very same frame whose content it is delivering. So the assignment
is placed in the same unbroken bcm_rx_update_lock section as the content
comparison.

As a side effect, the RTR-request frame feature (which never reach
bcm_send_to_user()) no longer updates rx_stamp/rx_ifindex, since only
the notification path needs them.

Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/linux-can/20260707145135.5BC831F00A3A@smtp.kernel.org/
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-10-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: fix stale rx/tx ops after device removal

RX: an RX_SETUP update(!) for an existing op skipped can_rx_register()
unconditionally, even when a concurrent NETDEV_UNREGISTER had already
torn down its registration (op->rx_reg_dev == NULL). This silently
did not re-enable frame delivery for that updated filter. bcm_rx_setup()
now re-registers in that case, while leaving rx_ops with ifindex = 0
(all CAN devices) which never carry a tracked rx_reg_dev registered as-is.

TX: bcm_notify() only handled bo->rx_ops on NETDEV_UNREGISTER, leaving
tx_ops with an active cyclic transmission re-arming its hrtimer
indefinitely to execute bcm_tx_timeout_handler(). Cancelling the hrtimer
prevents the runaway timer and any injection into a later reused ifindex,
since nothing else calls bcm_can_tx() for the op until an explicit
TX_SETUP update re-arms it.

Unlike bcm_rx_unreg(), which clears the tracked rx_reg_dev for rx_ops,
the ifindex is intentionally left unchanged for tx_ops. bcm_tx_setup()
always rejects ifindex 0, so clearing it would strand the op: neither a
later TX_SETUP (bcm_find_op()) nor TX_DELETE (bcm_delete_tx_op()) could
ever find it again, since both require an exact ifindex match.

Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/linux-can/20260708094536.DDF821F00A3A@smtp.kernel.org/
Closes: https://lore.kernel.org/linux-can/20260708154039.347ED1F000E9@smtp.kernel.org/
Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-9-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: add missing device refcount for CAN filter removal

sashiko-bot remarked a problem with a concurrent device unregistration
in isotp.c which also is present in the bcm.c code. A former fix for raw.c
commit c275a176e4b6 ("can: raw: add missing refcount for memory leak fix")
introduced a netdevice_tracker which solves the issue for bcm.c too.

bcm_release(), bcm_delete_rx_op() and bcm_notifier() relied on
dev_get_by_index(ifindex) to re-find the device for an rx_op before
unregistering its filter. If a concurrent NETDEV_UNREGISTER has already
unlisted the device from the ifindex table, that lookup fails and
can_rx_unregister() is silently skipped, leaving a stale CAN filter
pointing at the soon-to-be-freed bcm_op/socket.

Hold a netdev_hold()/netdev_put() tracked reference on op->rx_reg_dev
from the moment the rx filter is registered in bcm_rx_setup() until it
is unregistered in bcm_rx_unreg(), and use that reference directly in
bcm_release() and bcm_delete_rx_op() instead of re-looking the device
up by ifindex.

Reported-by: sashiko-bot@kernel.org
Closes: https://sashiko.dev/#/patchset/20260707094716.63578-1-socketcan@hartkopp.net
Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-8-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: validate frame length in bcm_rx_setup() for RTR replies

bcm_tx_setup() validates cf->len against the CAN/CAN FD DLC limits
before installing frames for TX_SETUP, but bcm_rx_setup() never did
the same for the RTR-reply frame configured via RX_SETUP with
RX_RTR_FRAME.

Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-7-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: extend bcm_tx_lock usage for data and timer updates

Stage new CAN frame content for an existing tx op into a kmalloc()'d
buffer and validate it there, mirroring the approach already used in
bcm_rx_setup(). Only copy the validated data into op->frames while
holding op->bcm_tx_lock, so bcm_can_tx() and bcm_tx_timeout_handler()
can no longer observe a partially updated or unvalidated frame.

Add a missing error path for memcpy_from_msg() when copying CAN frame
data from userspace.

Also move the kt_ival1/kt_ival2/ival1/ival2 updates in bcm_tx_setup()
under op->bcm_tx_lock, and read kt_ival1/kt_ival2/count under the same
lock in bcm_tx_set_expiry() and bcm_tx_timeout_handler(), closing the
torn 64-bit ktime_t read on 32-bit platforms.

Fixes: c2aba69d0c36 ("can: bcm: add locking for bcm_op runtime updates")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-6-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: add missing rcu list annotations and operations

sashiko-bot remarked the missing use of list_add_rcu() in
bcm_[rx|tx]_setup() to have a proper initialized bcm_op structure
when bcm_proc_show() traverses the bcm_op's under rcu_read_lock().

To cover all initial settings of the bcm_op's the list_add_rcu() calls
are moved to the end of the setup code.

While at it, also fix the mirroring removal side: bcm_release() called
bcm_remove_op() - which frees the op via call_rcu() - on ops that were
still linked in bo->tx_ops/bo->rx_ops, without list_del_rcu() first.
Unlink each op with list_del_rcu() before handing it to bcm_remove_op(),
matching the existing pattern in bcm_delete_tx_op()/bcm_delete_rx_op().

Reported-by: sashiko-reviews@lists.linux.dev
Closes: https://lore.kernel.org/linux-can/20260610094654.A1FFE1F00893@smtp.kernel.org/
Fixes: dac5e6249159 ("can: bcm: add missing rcu read protection for procfs content")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-5-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: fix CAN frame rx/tx statistics

KCSAN detected a data race within the bcm_rx_handler() when two CAN frames
have been simultaneously received and processed in a single rx op by two
different CPUs.

Use atomic operations with (signed) long data types to access the
statistics in the hot path to fix the KCSAN complaint.

Additionally simplify the update and check of statistics overflow by
using the atomic operations in separate bcm_update_[rx|tx]_stats()
functions. The rx variant runs under bcm_rx_update_lock to prevent
races when resetting the two rx counters; the tx variant runs under
bcm_tx_lock and only needs to guard its own counter's overflow.

As the rx path resets its values already at LONG_MAX / 100, there is
no conflict between the two locking domains (bcm_rx_update_lock vs.
bcm_tx_lock) even for ops that use both paths.

The rx statistics update and the frames_filtered update in
bcm_rx_changed() were previously performed in two separate
bcm_rx_update_lock sections. For an rx op subscribed on all interfaces
(ifindex == 0), bcm_rx_handler() can run concurrently on different
CPUs, so a counter reset by one CPU between these two sections could
leave frames_filtered larger than frames_abs on another CPU, producing
a bogus (even negative) reduction percentage in procfs. Update the
statistics in the same critical section as bcm_rx_changed() to close
this gap, which also removes the now unneeded extra lock/unlock pair
around the traffic_flags calculation.

Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-4-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: add locking when updating filter and timer values

KCSAN detected a simultaneous access to timer values that can be
overwritten in bcm_rx_setup() when updating timer and filter content
while bcm_rx_handler(), bcm_rx_timeout_handler() or bcm_rx_thr_handler()
run concurrently on incoming CAN traffic.

Protect the timer (ival1/ival2/kt_ival1/kt_ival2/kt_lastmsg) and filter
(nframes/flags/frames/last_frames) updates in bcm_rx_setup() with a new
per-op bcm_rx_update_lock, taken with the matching scope in the RX
handlers. memcpy_from_msg() is staged into a temporary buffer before the
lock is taken, since it can sleep and must not run under a spinlock.

hrtimer_cancel() is always called without bcm_rx_update_lock held, since
bcm_rx_timeout_handler()/bcm_rx_thr_handler() take the same lock and a
running callback would otherwise deadlock against the canceller.

Also close a related race: bcm_rx_setup() cleared the RTR flag in the
stored reply frame's can_id as a separate, unprotected step after the
frame content was already installed, so a concurrent bcm_rx_handler()
could transmit a stale reply with CAN_RTR_FLAG still set. Fold that
normalization into the initial frame preparation instead (on the staged
buffer for updates, directly on op->frames pre-registration for new
ops), so the installed frame is always atomically self-consistent.

bcm_rx_handler()'s RX_RTR_FRAME check now takes a lock-protected
snapshot of op->flags before deciding whether to call bcm_can_tx(),
but does not hold the lock across that call.

Also take a lock-protected snapshot of the currframe in bcm_can_tx()
to avoid partly overwrites by content updates in bcm_tx_setup().
Finally check if a TX_RESET_MULTI_IDX/SETTIMER might have reset
op->currframe between the two locked sections in bcm_can_tx().

Omit calling hrtimer_forward() with zero interval in bcm_rx_thr_handler().
kt_ival2 may have been concurrently cleared by bcm_rx_setup() before it
cancels this timer, so check kt_ival2 inside the bcm_rx_update_lock.

Fixes: c2aba69d0c36 ("can: bcm: add locking for bcm_op runtime updates")
Reported-by: syzbot+75e5e4ae00c3b4bb544e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-can/6975d5cf.a00a0220.33ccc7.0022.GAE@google.com/
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-3-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: fix lockless bound/ifindex race and silent RX_SETUP failure

bcm_sendmsg() reads bo->ifindex and checks bo->bound before taking
lock_sock(), while bcm_notify(), bcm_connect() and bcm_release() all
mutate both fields under that same lock. Because the lockless reads
and the locked writes are unordered with respect to each other, a
racing bcm_notify() (device unregister) or bcm_connect() (concurrent
bind on another thread sharing the socket) can make bcm_sendmsg()
observe an inconsistent combination, e.g. a stale bound=1 together
with the now-cleared ifindex=0, silently turning a socket bound to a
specific CAN interface into one that also matches "any" interface.

Keep the lockless bo->bound check purely as a fast-path reject, and
move the ifindex read (and a bo->bound re-check) into the locked
section, where every writer already serializes. This removes the
possibility of observing the two fields torn against each other,
rather than trying to fix it with more READ_ONCE()/WRITE_ONCE() pairs
on two independently updated fields. Annotate the now-purely-lockless
bo->bound accesses consistently across all its write sites.

Also fix bcm_rx_setup() silently returning success when the target
device disappears concurrently instead of reporting -ENODEV, so a
broken RX op is no longer left registered as if it had succeeded.

Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Reported-by: Ginger <ginger.jzllee@gmail.com>
Closes: https://lore.kernel.org/linux-can/CAGp+u1aBK8QVjsvAxM2Ldzep4rEbsP9x_pV3At4g=h1kVEtyhA@mail.gmail.com/
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-2-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: defer rx_op deallocation to workqueue to fix thrtimer UAF

Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
synchronize_rcu()") replaced synchronize_rcu() in bcm_delete_rx_op()
with call_rcu() and introduced the RX_NO_AUTOTIMER flag.

However, this flag check was omitted for thrtimer in the packet rx
fast-path. During BCM RX operation teardown, a concurrent RCU reader
(bcm_rx_handler) can race and re-arm thrtimer via
bcm_rx_update_and_send() after call_rcu() has been scheduled.  Once
the RCU grace period elapses, bcm_op is freed.  The subsequently
firing thrtimer then dereferences the deallocated op, causing a UAF.

Adding flag checks to the rx fast-path (bcm_rx_update_and_send) does not
fully close the TOCTOU race and introduces latency for every CAN frame.
Conversely, calling hrtimer_cancel() directly inside the RCU callback
(softirq context) is fatal as hrtimer_cancel() can sleep, triggering
a "scheduling while atomic" panic.

Resolve this by deferring the timer cancellation and memory free to a
dedicated unbound workqueue (bcm_wq).  The RCU callback now queues a
work item to bcm_wq, which safely cancels both timers and deallocates
memory in sleepable process context.  A dedicated workqueue is used to
prevent system-wide WQ saturation and is cleanly flushed/destroyed
on module unload to avoid rmmod page faults.

Since the deferred work can now outlive the calling context by an
unbounded amount, also take a reference on op->sk when it is assigned
and drop it only once the deferred work has cancelled both timers, so a
socket can no longer be freed out from under a still-armed timer whose
callback (bcm_send_to_user()) dereferences op->sk.

Fixes: f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly synchronize_rcu()")
Tested-by: Feng Xue <feng.xue@outlook.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-1-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: peak: Modification of references to email accounts being deleted

Following the sale of PEAK-System France by HMS-Networks, this update is
intended to change all my @hms-networks.com email addresses to my new
@peak-system.fr address.

Signed-off-by: Stéphane Grosjean <s.grosjean@peak-system.fr>
Link: https://patch.msgid.link/20260410124251.40506-1-stephane.grosjean@free.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: j1939: fix lockless local-destination check

j1939_priv.ents[].nusers is documented as protected by priv->lock, and
its updates already happen under that lock. j1939_can_recv() also reads
it under read_lock_bh(). However, j1939_session_skb_queue() and
j1939_tp_send() still read priv->ents[da].nusers without taking the
lock.

Those transport-side checks decide whether to set J1939_ECU_LOCAL_DST, so
they can race with j1939_local_ecu_get() and j1939_local_ecu_put() while
userspace is binding or releasing sockets concurrently with TP traffic.
This can misclassify TP/ETP sessions as local or remote and take the wrong
transport path.

Fix both transport paths by routing the destination-locality check through
a helper that reads ents[].nusers under read_lock_bh(&priv->lock).

Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260419140614.GA4041240@chcpu16
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

KVM: SVM: Bump asid_generation on CPU online to avoid ASID collision after hotplug

If a vCPU stays scheduled out (or blocked) while the last pCPU it ran
on goes through a hotplug cycle (online->offline->online), and the vCPU
then resumes execution on the same pCPU, then it is possible for it to
run with an ASID that has now been assigned to a different vCPU,
resulting in stale TLB translations being used.

svm_enable_virtualization_cpu() resets asid_generation to 1 and sets
next_asid to max_asid + 1 on every CPU online event, including hotplug
cycles.  Because next_asid starts beyond the pool boundary, the first
call to new_asid() after an online event always wraps the pool,
incrementing asid_generation to 2 and assigning ASIDs starting from
min_asid.

Consider two vCPUs from different VMs, vCPU-A pinned to CPU-X holding
asid_generation=2 and ASID=N from before the hotplug event:

  1. CPU-X goes offline and back online: asid_generation resets to 1,
     next_asid = max_asid + 1.

  2. One or more vCPUs migrate to CPU-X and call new_asid(), wrapping
     the pool and consuming ASIDs starting from min_asid.  Eventually
     vCPU-B from a different VM is assigned asid_generation=2, ASID=N
     — the same ASID that vCPU-A held before the hotplug.

  3. vCPU-A enters pre_svm_run() on CPU-X: current_vmcb->cpu is
     unchanged so the migration branch is skipped.  Its saved
     asid_generation=2 matches sd->asid_generation=2, so the generation
     check silently passes and vCPU-A continues running with ASID=N —
     the same ASID just freshly assigned to vCPU-B.

Both vCPUs from different VMs now run on CPU-X with the same ASID,
causing them to share NPT TLB entries and producing stale translations.

The collision manifests as a KVM internal error (Suberror: 1, emulation
failure).  The NPT page fault reports a faulting GPA far outside the
VM's physical memory range — a sign of stale TLB translations being
used.  KVM falls back to instruction emulation, which fails on
FPU/XSave instructions (XRSTOR, STMXCSR) that the emulator does not
implement.

Fix this by incrementing asid_generation instead of resetting it to 1
in svm_enable_virtualization_cpu().  On module load, asid_generation
starts at 0 (memset) and the increment produces 1, identical to the
old behaviour.  On subsequent hotplug cycles the generation advances
beyond any value a vCPU previously observed on this CPU, so the
generation check in pre_svm_run() reliably forces new_asid() on every
vCPU after every hotplug cycle.

Fixes: 774c47f1d78e ("[PATCH] KVM: cpu hotplug support")
Reported-by: Chandrakanth Silveru <Chandrakanth.Silveru@amd.com>
Tested-by: Srikanth Aithal <Srikanth.Aithal@amd.com>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Message-ID: <20260715063506.672432-1-nikunj@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'linux_kselftest-kunit-fixes-7.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit fix from Shuah Khan:
"Fix warning suppressions with kunit built as module:

  CONFIG_KUNIT is a tristate symbol but the warning suppression code in
  lib/bug.c is only built if it's built-in due to it using a plain
  #ifdef, rendering warning suppressions broken for kunit build as
  loadable module.

  kunit_is_suppressed_warning() already has a stub for when kunit is
  disabled so drop that guard entirely"

* tag 'linux_kselftest-kunit-fixes-7.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  bug: fix warning suppressions with kunit built as module

Merge tag 'linux_kselftest-fixes-7.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:

- Fix ftrace reading enabled_func test in add_remove_fprobe_module test

- Fix tracing trigger-hist-poll.tc to use sched_process_exit

* tag 'linux_kselftest-fixes-7.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/tracing: Have trigger-hist-poll.tc use sched_process_exit
selftests/ftrace: Fix reading enabled_functions in add_remove_fprobe_module test

block: do not warn when doing greedy allocation in folio_alloc_greedy()

During one of my local btrfs fstests runs, folio_alloc() inside
folio_alloc_greedy() triggered an allocation failure report when trying
to allocate an order-4 folio.

The kernel is from the latest development branch, which is utilizing
the IOMAP_DIO_BOUNCE flag for direct writes when the inode requires
checksum.

Unfortunately I didn't save the full log, only the function and the
order.

When the IOMAP_DIO_BOUNCE flag is utilized, we will hit the following call
chain:

bio_iov_iter_bounce_write()
|- folio_alloc_greedy()
|- folio_alloc(gfp | __GFP_NORETRY, get_order(*size));

However __GFP_NORETRY will still emit an allocation failure report
when it fails.

And folio_alloc_greedy() will retry with a smaller order anyway, there
is no point in emitting that allocation failure report.

Append the __GFP_NOWARN flag to folio_alloc() for the larger-order folio
attempts.

Fixes: 8dd5e7c75d7b ("block: add helpers to bounce buffer an iov_iter into bios")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Link: https://patch.msgid.link/d10571445ee505d95ba6eaad7558fc1f556d2921.1784020005.git.wqu@suse.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

partitions: aix: bound the lvd scan to one sector

aix_partition() reads the logical-volume descriptor array as a single
sector and then scans it:

if (numlvs && (d = read_part_sector(state, vgda_sector + 1, &sect))) {
struct lvd *p = (struct lvd *)d;
...
for (i = 0; foundlvs < numlvs && i < state->limit; i++) {
lvip[i].pps_per_lv = be16_to_cpu(p[i].num_lps);

p points at a single 512-byte sector, which holds SECTOR_SIZE /
sizeof(struct lvd) = 16 entries, but the loop runs until foundlvs reaches
the on-disk numlvs or i reaches state->limit (DISK_MAX_PARTS, 256).
numlvs is an on-disk __be16 read straight from the volume group
descriptor and is not validated, so a crafted AIX image with numlvs
larger than 16 and lvd entries whose num_lps fields are zero (so foundlvs
never advances) drives the loop to read p[i] well past the end of the
read sector buffer.

Commit d97a86c170b4 ("partitions: aix.c: off by one bug") hardened the
matching write of lvip[lv_ix] in 2014 but left this read loop unbounded.

Bound the scan to the number of struct lvd entries that fit in the
sector that was actually read.

Fixes: 6ceea22bbbc8 ("partitions: add aix lvm partition support files")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://patch.msgid.link/20260714114806.3761553-1-michael.bommarito@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

blk-cgroup: fix leaks and online flag on radix_tree_insert failure

When radix_tree_insert() fails in blkg_create(), the error path has two
issues:

1. blkg->online is set to true unconditionally, even when the blkg was
   never fully inserted.  Move the assignment inside the success block.

2. The error path calls blkg_put() without first calling
   percpu_ref_kill().  Because the refcount is still in percpu mode,
   percpu_ref_put() only does this_cpu_sub() without checking for zero,
   so blkg_release() is never triggered.  This permanently leaks the
   blkg memory, its percpu iostat, policy data, the parent blkg
   reference, and the cgroup css reference — the latter preventing the
   cgroup from ever being destroyed.

Fix by replacing blkg_put() with percpu_ref_kill(), matching the pattern
used in blkg_destroy().

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Link: https://patch.msgid.link/20260715132407.1469777-1-cui.tao@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>

loop: remove manually added partitions on detach

Commit 267ec4d7223a ("loop: fix partition scan race between udev and
loop_reread_partitions()") stopped disk_force_media_change() from
setting GD_NEED_PART_SCAN because loop devices with LO_FLAGS_PARTSCAN
rescan partitions explicitly. However, partitions can also be added
manually with BLKPG while LO_FLAGS_PARTSCAN is clear.

When such a loop device is detached, __loop_clr_fd() skips
bdev_disk_changed(). Without GD_NEED_PART_SCAN, reopening the unbound
device no longer performs the previous lazy cleanup, leaving dead
partition devices behind. A subsequent LOOP_CONFIGURE can then fail its
partition scan with -EBUSY, as seen in blktests loop/009 after loop/008.

Call bdev_disk_changed() unconditionally during __loop_clr_fd(). The
disk capacity is already zero and the release path holds open_mutex, so
this drops all partitions without rescanning the detached backing file.

The new blktests loop/013 case covers this sequence by adding a partition
with BLKPG without LO_FLAGS_PARTSCAN, detaching the loop device, and
checking that the partition is gone when the device is reopened.

Fixes: 267ec4d7223a ("loop: fix partition scan race between udev and loop_reread_partitions()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202607150754.b660f5b9-lkp@intel.com
Signed-off-by: Daan De Meyer <daan@amutable.com>
Link: https://patch.msgid.link/20260715-b4-loop-partition-cleanup-v1-1-b9f59910cd1e@amutable.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: fix race in blk_time_get_ns() returning 0

blk_time_get_ns() populates the per-plug cached timestamp and then
returns it by re-reading the field:

if (!plug->cur_ktime) {
plug->cur_ktime = ktime_get_ns();
current->flags |= PF_BLOCK_TS;
}
return plug->cur_ktime;

This is problematic when the compiler emits the final
"return plug->cur_ktime" as a reload from memory, after PF_BLOCK_TS has
already been set.

Since the cached timestamp is now invalidated from finish_task_switch()
(fad156c2af22 "block: invalidate cached plug timestamp after task
switch"), a task preempted between setting PF_BLOCK_TS and that reload
has plug->cur_ktime zeroed by blk_plug_invalidate_ts() when it is
scheduled back in.  The reload then returns 0.

A 0 handed back here is stored as a start timestamp -- e.g.
blk_account_io_start() writes it to rq->start_time_ns -- and later
subtracted from "now".  blk_account_io_done() then adds (now - 0), i.e.
roughly the system uptime, to the per-group nsecs[] counters.  On an
otherwise idle, healthy device this appears as sudden ~uptime-sized jumps
in the diskstats time fields (write_ticks/discard_ticks/time_in_queue).

The solution is to be explicit in our reads and writes to this field
that is preemption volatile.  We also add a barrier() to ensure that any
setting of PF_BLOCK_TS is ordered to happen after the cur_ktime update.

This issue was discovered using AI-assisted kprobes looking for paths
that were leaking zeroed timestamps in a live system, based on the
observation that we were sometimes seeing uptime-sized jumps in kernel
exported counters. This was flagged by NodeDiskIOSaturation
prometheus alerts that started firing on all hosts post 7.1.3 kernel
upgrade, due to node-exporter now exporting a nonsensical
node_disk_io_time_weighted_seconds_total.

Fixes: fad156c2af22 ("block: invalidate cached plug timestamp after task switch")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Waychison <mike@waychison.com>
Assisted-by: Claude:claude-opus-4.8
Link: https://patch.msgid.link/20260715192950.2488921-1-mike@waychison.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'scmi-ffa-fixes-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes

Arm SCMI/FF-A fixes for v7.2

Fix two runtime issues in the SCMI framework. Use full 64-bit division
when rounding range-based clock rates, avoiding divisor truncation and
a possible divide-by-zero on 32-bit systems. Rate-limit notification
queue-full warnings emitted from interrupt context to prevent printk
floods and prolonged system stalls during notification bursts. Also
correct a grammar error in the ARM_SCMI_POWER_CONTROL Kconfig help
text.

Fix the FF-A driver RX/TX buffer sizing logic to respect the maximum
buffer size advertised by firmware, while retaining compatibility with
older implementations that may reject PAGE_SIZE-rounded buffers.
Also fix a NULL pointer dereference in ffa_partition_info_get() by
rejecting NULL UUID strings before passing them to uuid_parse().

* tag 'scmi-ffa-fixes-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  firmware: arm_scmi: Rate-limit queue-full warnings in IRQ context
  firmware: arm_scmi: Use 64-bit division for clock rate rounding
  firmware: arm_scmi: Grammar s/may needed/may be needed/
  firmware: arm_ffa: Fix NULL dereference in ffa_partition_info_get()
  firmware: arm_ffa: Respect firmware advertised RX/TX buffer size limits

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

riscv: hwprobe: Avoid uninitialized read in hwprobe_get_cpus()

When cpusetsize < cpumask_size(), hwprobe_get_cpus() did not fully
initialize its copy of the cpu mask, which could cause non-deterministic
results from the riscv_hwprobe syscall on a system with more than 8 CPUs
when the supplied cpu mask is empty. Address this by fully initializing
the cpu mask.

Fixes: e178bf146e4b ("RISC-V: hwprobe: Introduce which-cpus flag")
Signed-off-by: Mark Harris <mark.hsj@gmail.com>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Michael Ellerman <mpe@kernel.org>
Link: https://patch.msgid.link/20260714003056.73707-1-mark.hsj@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>

s390/perf_cpum_cf: Add missing array_index_nospec() to __hw_perf_event_init()

ev variable is userspace controlled via event->attr.config and used
as an array index after bounds checking, but without speculation
barriers.

Add the missing array_index_nospec() call to prevent speculative
execution.

Cc: stable@vger.kernel.org
Fixes: 212188a596d1 ("[S390] perf: add support for s390x CPU counters")
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

s390/checksum: Fix csum_partial() without vector facility

Currently csum_partial() calls csum_copy() with copy=false and dst=NULL.
On machines without the vector facility, csum_copy() falls back to
cksm(dst, ...), causing the checksum to be calculated from address zero
instead of the source buffer.

The VX implementation already checksums data loaded from src. Make the
fallback do the same by passing src to cksm().

Fixes: dcd3e1de9d17 ("s390/checksum: provide csum_partial_copy_nocheck()")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

accel/ivpu: Reject firmware log with size smaller than header

fw_log_from_bo() validates the tracing buffer header_size and that the
log fits within the BO, but never checks that log->size is at least
log->header_size. fw_log_print_buffer() then computes:

u32 data_size = log->size - log->header_size;

which underflows to a near-U32_MAX value when firmware reports a log whose
size is smaller than its header. That huge data_size defeats the
log_start/log_end bounds clamps added by commit dd1311bcf0e6 ("accel/ivpu:
Add bounds checks for firmware log indices"), so fw_log_print_lines() reads
far past the small real data region of the BO. A size of 0 also makes
fw_log_from_bo() advance the offset by 0, causing the callers to loop
forever on the same header.

Reject logs whose size is smaller than the header (which also rejects
size == 0).

Fixes: d4e4257afa6e ("accel/ivpu: Add firmware tracing support")
Cc: stable@vger.kernel.org
Signed-off-by: Jhonraushan <raushan.jhon@gmail.com>
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20260715074206.867712-1-raushan.jhon@gmail.com

drm/panthor: Check debugfs GEM lock initialization

drmm_mutex_init() can fail while registering the managed cleanup action.
When that happens, drmm_add_action_or_reset() destroys the mutex before
returning the error. Continuing initialization would therefore leave the
debugfs GEM object list with an unusable lock.

Propagate the error as is already done for the other managed mutexes in
panthor_device_init().

Fixes: a3707f53eb3f ("drm/panthor: show device-wide list of DRM GEM objects over DebugFS")
Signed-off-by: Linmao Li <lilinmao@kylinos.cn>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patch.msgid.link/20260713082912.321021-1-lilinmao@kylinos.cn
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>

drm/panthor: return error on truncated firmware

panthor_fw_load() detects truncated firmware images, but jumps to the
common cleanup path without setting ret. If no previous error was recorded,
the function can return 0 and treat the invalid firmware as successfully
loaded.

Set ret to -EINVAL before leaving the truncated-image path.

Fixes: 2718d91816ee ("drm/panthor: Add the FW logical block")
Cc: stable@vger.kernel.org
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patch.msgid.link/20260714163056.22329-1-osama.abdelkader@gmail.com
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>

KVM: nVMX: Put vmcs12 pages if nested VM-Enter fails due to invalid guest state

Put all vmcs12 pages if KVM synthesizes a nested VM-Exit due to invalid
guest while emulating VMLAUNCH or VMRESUME. The invalid guest state path
doesn't use nested_vmx_vmexit() as that API is intended to be used if and
only if L2 is active, and the open coded equivalent neglects to put the
vmcs12 pages. Failure to put the vmcs12 pages leaks any pinned pages
(and/or mappings) if L1 retries VMLAUNCH/VMRESUME.

Note, the !from_vmenter scenario doesn't suffer the same problem, as
vmx_get_nested_state_pages() only gets/pins/maps the vmcs12 pages if L2 is
active, i.e. if a "full" VM-Exit is guaranteed before KVM will retry
getting vmcs12 pages.

Fixes: 96c66e87deee ("KVM/nVMX: Use kvm_vcpu_map when mapping the virtual APIC page")
Fixes: 3278e0492554 ("KVM/nVMX: Use kvm_vcpu_map when mapping the posted interrupt descriptor table")
Fixes: fe1911aa443e ("KVM: nVMX: Use kvm_vcpu_map() to get/pin vmcs12's APIC-access page")
Reported-by: Minh Nguyen <minhnguyen.080505@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvm-s390-master-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes for 7.2

- more gmap KVM memory management fixes
- PCI passthru fixes

Merge tag 'kvm-x86-fixes-7.2-rc4' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 7.2-rcN

- Fix a bug where KVM will trigger a UAF if updating IOMMU IRTEs fails when
   registering an IRQ-bypass producer.

- Ignore pending PV EOI instead of BUG()ing the host if the feature was
   disabled by the guest.

- Fix nVMX bugs where KVM would run L1 with an L1-controlled CR3 after a
   failed "late" consistency check when KVM is NOT using EPT.

- Disallow intra-host migration/mirroring of SNP VMs as KVM doesn't yet
   support moving/mirroring SNP state.

- Fix a TOCTOU bug in KVM's handling of the "trusted" CPUID for TDX guests.

- Fix a NULL pointer deref in trace_kvm_inj_exception() where a change to the
   core infrastructure missed KVM's unique (ab)use of __print_symbolic().

Merge tag 'kvmarm-fixes-7.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 7.2, take #2

- Move locking for kvm_io_bus_get_dev() into the caller, ensuring
  race-free checks that the returned object is of the correct type

- Fix initialisation of the page-table walk level when relaxing
  permissions

- Correctly update the XN attribute when relaxing permissions

- Fix the sign extension of loads from emulated MMIO regions

- Assorted collection of fixes for pKVM's FFA proxy, together with a
  couple of FFA driver adjustments

Merge tag 'kvmarm-fixes-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 7.2, take #1

- Fix an accounting buglet when reclaiming pages from a protected
  guest

- Fix a bunch of architectural compliance issues when injecting a
  synthesised exception, most of which were missing the PSTATE.IL bit
  indicating a 32bit-wide instruction

- Another set of fixes addressing issues with translation of VNCR_EL2,
  including corner cases where the guest point that register at a RO
  page...

- Don't warn when trapping accesses to ZCR_EL2 from an L2 guest, as
  that's not unexpected at all

- Address a bunch of races with LPI migration vs LPIs being disabled

- Fix a total howler of a bug combining FEAT_MOPS and NV, resulting in
  exception returning in the wrong place...

- Coerce Fuad Tabba into a reviewer role, and may his Inbox catch
  fire!

Merge branch 'bpf-reject-negative-const-offsets-for-buffer-pointers'

Sun Jian says:

====================
bpf: Reject negative const offsets for buffer pointers

Reject negative effective offsets for PTR_TO_TP_BUFFER and PTR_TO_BUF
accesses. Calculate the effective access start using signed arithmetic
to prevent unsigned access-end accounting from wrapping, and cover both
load-time rejection and the raw tracepoint writable attach-time path.
---

Changes in v5:

- Simplify __check_buffer_access() to reject a negative effective start
  after confirming that var_off is constant. Validate the combined
  offset instead of rejecting negative instruction offsets separately.
  Drop the duplicate BPF_MAX_VAR_OFF check because pointer arithmetic
  already bounds constant offsets, and remove the redundant size < 0
  check.
- Switch the raw tracepoint writable attach tests from nbd_send_request
  to bpf_testmod_test_writable_bare_tp, avoiding the NBD configuration
  dependency and its false-pass condition.
- Split the attach coverage into named subtests and require
  bpf_raw_tracepoint_open() to return -EINVAL.
- Add verifier coverage for a negative constant PTR_TO_BUF offset.

Changes in v4:

- Correct the Fixes tag to point to 022ac0750883, where pointer offsets
  were folded into reg->var_off.
- Drop the end > U32_MAX check, which is unreachable after bounding const
  var_off with BPF_MAX_VAR_OFF while keeping instruction offsets and
  access sizes bounded.

Changes in v3:

- Check constant var_off against +/-BPF_MAX_VAR_OFF before computing
  the effective access range, matching the existing verifier pointer
  offset convention.
- Keep explicit rejection of negative instruction offsets and keep
  bounded negative constant var_off valid when the effective offset is
  non-negative.

Changes in v2:

- Split the kernel fix and selftests into separate patches.
- Add an attach-time raw tracepoint writable test that exercises
  max_tp_access against nbd_send_request's writable size.
- Adjust selftest formatting to use the 100 character line width.

Tested:

- ./test_progs -v -t verifier_raw_tp_writable
- ./test_progs -v -t verifier_ptr_to_buf
- ./test_progs -v -t raw_tp_writable_reject_bad_access
- ./test_progs -v -t raw_tp_writable_test_run

v4: https://lore.kernel.org/bpf/20260708090151.151729-1-sun.jian.kdev@gmail.com/
v3: https://lore.kernel.org/bpf/20260708040715.116680-1-sun.jian.kdev@gmail.com/
v2: https://lore.kernel.org/bpf/20260707060804.93561-1-sun.jian.kdev@gmail.com/
v1: https://lore.kernel.org/bpf/20260703035137.109608-1-sun.jian.kdev@gmail.com/
====================

Link: https://patch.msgid.link/20260714093846.18159-1-sun.jian.kdev@gmail.com
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>

selftests/bpf: Cover negative buffer pointer offsets

Add verifier coverage for constant negative offsets on PTR_TO_TP_BUFFER
and PTR_TO_BUF pointers. Both programs adjust the buffer pointer by -8
and access it at offset zero, so the negative effective start must be
rejected at load time.

Switch the raw tracepoint writable attach checks from nbd_send_request
to bpf_testmod_test_writable_bare_tp, avoiding a dependency on the NBD
tracepoint. Keep the existing past-end case and add a case with a
negative var_off compensated by a positive instruction offset. The
effective start remains non-negative, so the program loads, but its
access end exceeds the writable context size and
bpf_raw_tracepoint_open() must return -EINVAL.

Cc: stable@vger.kernel.org # 5.2.0
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://patch.msgid.link/20260714093846.18159-3-sun.jian.kdev@gmail.com
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>

bpf: Reject negative const offsets for buffer pointers

The verifier rejects variable offsets for PTR_TO_TP_BUFFER and PTR_TO_BUF
accesses, but it currently accepts a constant negative offset produced by
pointer arithmetic.

Commit 022ac0750883 ("bpf: use reg->var_off instead of reg->off for
pointers") moved constant pointer offsets from reg->off to reg->var_off.
However, __check_buffer_access() continued to check only the instruction
offset. An access with reg->var_off equal to -8 and an instruction offset
of zero therefore passes verification.

For writable raw tracepoints, the access end is also calculated from the
unsigned reg->var_off.value. An eight-byte access starting at -8 wraps
the calculated end to zero, allowing the program to load and attach
without increasing max_tp_access.

After ensuring that reg->var_off is constant, calculate the effective
access start using signed arithmetic and reject it when it is negative.
Use the validated start to calculate the access end for both
PTR_TO_TP_BUFFER and PTR_TO_BUF.

Fixes: 022ac0750883 ("bpf: use reg->var_off instead of reg->off for pointers")
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Cc: stable@vger.kernel.org # 5.2.0
Link: https://patch.msgid.link/20260714093846.18159-2-sun.jian.kdev@gmail.com
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>

Merge branch 'bpf-sockmap-fix-fionread-for-sockets-without-a-verdict-program'

Mattia Meleleo says:

====================
bpf, sockmap: Fix FIONREAD for sockets without a verdict program

Sockets added to a sockmap/sockhash with no stream/skb verdict program
attached answer FIONREAD with 0 even when unread data is pending in
sk_receive_queue. Fix tcp_bpf_ioctl() to account for the receive queue
in that case, and add a selftest.

Changes in v3:
- Remove unused sk_psock_msg_inq()
- Link to v2: https://patch.msgid.link/20260708-fionread-no-verdict-v2-0-29dd293621c7@coralogix.com

Changes in v2:
- Split the fix and the selftest into separate patches
- Use READ_ONCE() to read the verdict program pointers
- Link to v1: https://patch.msgid.link/20260707-fionread-no-verdict-v1-1-ce94a72357ec@coralogix.com

Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
---
====================

Link: https://patch.msgid.link/20260708-fionread-no-verdict-v3-0-b4ee31b3af53@coralogix.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

selftests/bpf: Test FIONREAD on a sockmap socket without a verdict program

Add a test validating that FIONREAD on a TCP socket in a sockmap
without a verdict program reports data pending in sk_receive_queue.

Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20260708-fionread-no-verdict-v3-2-b4ee31b3af53@coralogix.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

bpf, sockmap: Account for receive queue in FIONREAD without a verdict program

tcp_bpf_ioctl() answers SIOCINQ from psock->msg_tot_len, which only
counts bytes in ingress_msg. Without a stream/skb verdict program
nothing is diverted there: data stays in sk_receive_queue, so FIONREAD
returns 0 even though read() returns data.

Add tcp_inq() to the reported value when the psock has no verdict
program. The two queues are disjoint, so bytes redirected into
ingress_msg from other sockets stay correctly accounted through
msg_tot_len.

Remove unused sk_psock_msg_inq().

Fixes: 929e30f93125 ("bpf, sockmap: Fix FIONREAD for sockmap")
Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20260708-fionread-no-verdict-v3-1-b4ee31b3af53@coralogix.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

mmc: sdhci-esdhc-imx: fix resume error handling

Check pm_runtime_force_resume() return value in resume. If it fails
(clock enable failure), return immediately since accessing hardware
registers on an unclocked device would cause a kernel panic.

The early return intentionally skips enable_irq() and
sdhci_disable_irq_wakeups() because the IRQ handler reads
SDHCI_INT_STATUS, which would also fault without clocks. The PM runtime
usage counter leak only affects this already-broken device instance and
is an acceptable tradeoff to preserve system stability.

Remove the return value check for mmc_gpio_set_cd_wake(host->mmc, false)
since disable_irq_wake() called internally always returns 0.

Also return 0 explicitly on the success path instead of propagating
stale return values.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: make non-fatal errors non-blocking in suspend

Make pinctrl_pm_select_sleep_state() and mmc_gpio_set_cd_wake() failures
non-fatal in the suspend path. These failures only mean slightly higher
power consumption or missing CD wakeup capability, but should not block
system suspend.

Also change the function to always return 0 on the success path instead
of propagating non-fatal warning return values.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: use pm_runtime_resume_and_get() in suspend

Replace pm_runtime_get_sync() with pm_runtime_resume_and_get() to
simplify error handling. pm_runtime_resume_and_get() automatically
drops the usage counter on failure, avoiding the need for a separate
pm_runtime_put_noidle() call. If it fails, the device is unclocked and
accessing hardware registers would cause a kernel panic, so return the
error immediately.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: disable irq during suspend to fix unhandled interrupt

When using WIFI out-of-band wakeup, an "irq xxx: nobody cared" warning
occurs. This happens because the usdhc interrupt is not disabled during
system suspend when device_may_wakeup() returns false.

The sequence of events leading to this issue:
1. System enters suspend without disabling usdhc interrupt
(because device_may_wakeup() returns false for usdhc device)
2. WIFI out-of-band wakeup triggers system resume via GPIO interrupt
3. WIFI sends a Card interrupt before usdhc has fully resumed
4. usdhc is still in runtime suspend state and cannot handle the
interrupt properly
5. The unhandled interrupt triggers "nobody cared" warning

Fix this by unconditionally disabling the usdhc interrupt during suspend
and re-enabling it during resume, regardless of the wakeup capability.
This ensures no interrupts are processed during the suspend/resume
transition.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: restore pinctrl before restoring ios timing on resume

SDIO devices such as WiFi may keep power during suspend, so the MMC
core skips full card re-initialization on resume and directly restores
the host controller's ios timing to match the card. For DDR mode,
pm_runtime_force_resume() sets DDR_EN before the pin configuration is
restored from sleep state.

This is related to the SoC IP integration: switching pinctrl setting
(changing alt from GPIO to USDHC) impacts the internal loopback path.
If pinctrl configures the pad to GPIO function, once DDR_EN is set, the
DLL delay will be fixed based on the GPIO function loopback path. When
the pinctrl is later changed to USDHC function, the internal loopback
path changes, making the original fixed sample point no longer suitable
for the current loopback path. This causes persistent read CRC errors on
subsequent data transfers.

SD/eMMC running in DDR mode are unaffected as they are fully
re-initialized from legacy timing after resume.

Fix this by restoring the pinctrl state based on current timing mode
using esdhc_change_pinstate() before pm_runtime_force_resume(). This
ensures the correct pin configuration (e.g., 100/200MHz for UHS modes)
is applied before DDR_EN is set. Only restore for non-wakeup devices
since wakeup devices kept their active pin state during suspend.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: fix esdhc_change_pinstate() to allow default state restore

esdhc_change_pinstate() checks for pins_100mhz and pins_200mhz at the
top of the function and returns -EINVAL if either is not defined. This
prevents the default case from ever being reached, which means devices
with a sleep pinctrl state but without high-speed pin states (100mhz/
200mhz) can never restore their default pin configuration.

Move the IS_ERR checks for pins_100mhz and pins_200mhz into their
respective switch cases.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: restore DLL override for DDR modes on resume

sdhci_esdhc_imx_hwinit() unconditionally clears ESDHC_DLL_CTRL by
writing zero. For SDIO devices that keep power during system suspend
and operate in DDR mode, the card remains in DDR timing while the host
DLL override configuration is lost.

Extract the DLL override setup from esdhc_set_uhs_signaling() into
a helper esdhc_set_dll_override(), and call it on the resume path
when the card kept power and is using a DDR timing mode.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: remove unnecessary mmc_card_wake_sdio_irq check for tuning save/restore

The tuning save/restore during system PM is conditioned on
mmc_card_wake_sdio_irq(), but this check is unrelated to whether
tuning values need to be preserved. The actual requirement is that
the card keeps power during suspend and the controller is a uSDHC.

SDIO devices using out-of-band GPIO wakeup maintain power during
suspend but do not set the SDIO IRQ wake flag. In this case the
tuning delay values are not saved/restored.

Remove the unnecessary mmc_card_wake_sdio_irq() condition from both
the suspend save and resume restore paths.

Fixes: c63d25cdc59a ("mmc: sdhci-esdhc-imx: Save tuning value when card stays powered in suspend")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

Merge branch 'bpf-sockmap-fix-sockmap-leaking-udp-socks'

Michal Luczaj says:

====================
bpf, sockmap: Fix sockmap leaking UDP socks

Fix for UDP sockets getting leaked during sockmap lookup/release.
Accompanied by selftests updates.

Two Sashiko's concerns to be addressed separately:
https://lore.kernel.org/bpf/20260626205814.BAC3C1F000E9@smtp.kernel.org/

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
Changes in v4:
- selftest: drop redundant `if (err)` [Sashiko]
- Link to v3: https://patch.msgid.link/20260702-sockmap-lookup-udp-leak-v3-0-ff8de8782468@rbox.co

Changes in v3:
- selftest: better error handling, ASSERT_*() macros [Sashiko]
- selftest: fix grammar, reorder patches [Kuniyuki]
- Link to v2: https://patch.msgid.link/20260626-sockmap-lookup-udp-leak-v2-0-7e7e201c951a@rbox.co

Changes in v2:
- selftest: drop the original, adapt old tests
- fix: change approach to rejecting unbound UDP [Kuniyuki]
- Link to v1: https://patch.msgid.link/20260623-sockmap-lookup-udp-leak-v1-0-05804f9308e4@rbox.co

To: Alexei Starovoitov <ast@kernel.org>
To: Daniel Borkmann <daniel@iogearbox.net>
To: Andrii Nakryiko <andrii@kernel.org>
To: Eduard Zingerman <eddyz87@gmail.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Martin KaFai Lau <martin.lau@linux.dev>
To: Song Liu <song@kernel.org>
To: Yonghong Song <yonghong.song@linux.dev>
To: Jiri Olsa <jolsa@kernel.org>
To: Emil Tsalapatis <emil@etsalapatis.com>
To: Shuah Khan <shuah@kernel.org>
To: John Fastabend <john.fastabend@gmail.com>
To: Jakub Sitnicki <jakub@cloudflare.com>
To: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Eric Dumazet <edumazet@google.com>
To: Kuniyuki Iwashima <kuniyu@google.com>
To: Paolo Abeni <pabeni@redhat.com>
To: Willem de Bruijn <willemb@google.com>
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
To: Simon Horman <horms@kernel.org>
To: Cong Wang <cong.wang@bytedance.com>
Cc: bpf@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
---
====================

Link: https://patch.msgid.link/20260707-sockmap-lookup-udp-leak-v4-0-f878346f27ab@rbox.co
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

selftests/bpf: Fail unbound UDP on sockmap update

sockmap now rejects unbound UDP sockets. Adjust test_maps. While at it,
check socket()'s return value.

This effectively reverts commit c39aa2159974 ("bpf, selftests: Fix
test_maps now that sockmap supports UDP").

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20260707-sockmap-lookup-udp-leak-v4-4-f878346f27ab@rbox.co
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

selftests/bpf: Adapt sockmap update error handling

Update sockmap_listen to accommodate the recent change in sockmap that
rejects unbound UDP sockets.

TCP: Reject unbound and bound (unless established or listening).
UDP: Accept only bound sockets.

While at it, migrate to ASSERT_* and enforce reverse xmas tree.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20260707-sockmap-lookup-udp-leak-v4-3-f878346f27ab@rbox.co
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

bpf, sockmap: Reject unhashed UDP sockets on sockmap update

UDP sockets get SOCK_RCU_FREE set when (auto-)bound. This means
sk_is_refcounted(unbound) = true, while sk_is_refcounted(bound) = false.

Because sockmap accepts unbound UDP sockets, a BPF program can increment a
socket's refcount via lookup. If the socket is subsequently bound, the
transition from unbound to bound causes bpf_sk_release() to skip the
decrement of the refcount, causing a memory leak.

unreferenced object 0xffff88810bc2eb40 (size 1984):
  comm "test_progs", pid 2451, jiffies 4295320596
  hex dump (first 32 bytes):
    7f 00 00 01 7f 00 00 01 d2 04 1b b7 04 d2 00 00  ................
    02 00 01 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
  backtrace (crc bdee079d):
    kmem_cache_alloc_noprof+0x557/0x660
    sk_prot_alloc+0x69/0x240
    sk_alloc+0x30/0x460
    inet_create+0x2ce/0xf80
    __sock_create+0x25b/0x5c0
    __sys_socket+0x119/0x1d0
    __x64_sys_socket+0x72/0xd0
    do_syscall_64+0xa1/0x5f0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

Instead of special-casing for refcounted sockets, reject unhashed UDP
sockets during sockmap updates, as there is no benefit to supporting those.
This effectively reverts the commit under Fixes, with two exceptions:

1. sock_map_sk_state_allowed() maintains a fall-through `return true`.
2. In the spirit of commit b8b8315e39ff ("bpf, sockmap: Remove unhash
   handler for BPF sockmap usage"), the proto::unhash BPF handler is not
   reintroduced.

Historical note: this issue is related to commit 67312adc96b5 ("bpf: reject
unhashed sockets in bpf_sk_assign").

Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20260707-sockmap-lookup-udp-leak-v4-2-f878346f27ab@rbox.co
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

selftests/bpf: Ensure UDP sockets are bound

Update sockmap_basic tests to bind sockets before they are used. This
accommodates the recent change in sockmap that rejects unbound UDP sockets.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20260707-sockmap-lookup-udp-leak-v4-1-f878346f27ab@rbox.co
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

drm/ttm: Account for NULL and handle pages in ttm_pool_backup

Pages in ttm_pool_backup can be NULL or backup handles
(ttm_backup_page_ptr_is_handle()), neither of which can be passed to
set_pages_array_wb() or freed. Add a dedicated WB pass before the
dma/purge loop that walks allocations using the same i += num_pages
stride, skipping NULL and handle entries, and calls set_pages_array_wb()
once per contiguous run of real pages. Apply the same NULL/handle guard
to the dma/purge loop.

Fixes the following oops:

Oops: general protection fault, kernel NULL pointer dereference 0x0: 0000 [#1] SMP NOPTI
RIP: 0010:__cpa_process_fault+0xf8/0x770
RSP: 0018:ffffc90000a87718 EFLAGS: 00010287
RAX: 0000000000000000 RBX: ffffc90000a87868 RCX: 0000000000000000
RDX: 0000000000001000 RSI: 0005088000000000 RDI: ffffffff827c5f34
RBP: 0005088000000000 R08: ffffc90000a877cb R09: ffffc90000a877d0
R10: 0000000000000000 R11: 000000000000001b R12: 000ffffffffff000
R13: ffffc90000a87868 R14: ffffc90000a87868 R15: ffff88815b882ae0
FS: 0000000000000000(0000) GS:ffff8884ec840000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f930b844000 CR3: 000000000262e003 CR4: 0000000008f70ef0
PKRU: 55555554
Call Trace:
<TASK>
__change_page_attr_set_clr+0x989/0xe90
? __purge_vmap_area_lazy+0x6c/0x3a0
? _vm_unmap_aliases+0x250/0x2a0
set_pages_array_wb+0x7f/0x120
ttm_pool_backup+0x4c9/0x5b0 [ttm]
? dma_resv_wait_timeout+0x3b/0xf0
ttm_tt_backup+0x32/0x60 [ttm]
ttm_bo_shrink+0x66/0x110 [ttm]
xe_bo_shrink_purge+0x12b/0x1b0 [xe]
xe_bo_shrink+0xbb/0x270 [xe]
__xe_shrinker_walk+0xf7/0x160 [xe]
xe_shrinker_walk+0x9d/0xc0 [xe]
xe_shrinker_scan+0x11f/0x210 [xe]
do_shrink_slab+0x13b/0x270
shrink_slab+0xf1/0x400
shrink_node+0x352/0x8a0
balance_pgdat+0x32c/0x700
kswapd+0x205/0x2f0
? __pfx_autoremove_wake_function+0x10/0x10
? __pfx_kswapd+0x10/0x10
kthread+0xd1/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x1b1/0x200
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: b63d715b8090 ("drm/ttm/pool, drm/ttm/tt: Provide a helper to shrink pages")
Cc: stable@vger.kernel.org
Assisted-by: GitHub_Copilot:claude-opus-4.8
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260702214815.4009271-1-matthew.brost@intel.com

smb/client: flush dirty data before punching a hole

Punching a hole after a large buffered write may leave the range
reported as data. Reproduce it with:

  xfs_io -f \
    -c "pwrite -b 3m -S 0x61 0 3m" \
    -c "fpunch 1m 1m" \
    -c "seek -h 0" \
    -c "seek -d 1m" \
    /mnt/test/repro

Punching 1 MiB at offset 1 MiB should produce:

  0          1 MiB       2 MiB       3 MiB
  |  DATA    |   HOLE    |   DATA    | EOF

Instead, the entire file is reported as data. SEEK_HOLE(0) returns EOF,
and SEEK_DATA(1M) returns 1M.

This happens because a dirty folio spanning the punched range can be
written back after the punch and refill the hole.

Fix this by flushing and waiting for dirty data in the punched range
before invalidating the page cache and issuing FSCTL_SET_ZERO_DATA.

The xfstests generic/539 pass against Samba/ksmbd with this change.

Signed-off-by: Huiwen He <hehuiwen@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: Use EXPORT_SYMBOL_IF_KUNIT() to export symbols in SMB2

Replace EXPORT_SYMBOL_FOR_MODULES() with EXPORT_SYMBOL_IF_KUNIT()
to mark the symbols as visible only if CONFIG_KUNIT is enabled.

Kunit test should import the namespace EXPORTED_FOR_KUNIT_TESTING to
use these marked symbols. This is the standard way for all KUnit
tests.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: Use EXPORT_SYMBOL_IF_KUNIT() to export symbols

Replace EXPORT_SYMBOL_FOR_MODULES() with EXPORT_SYMBOL_IF_KUNIT()
to mark the symbols as visible only if CONFIG_KUNIT is enabled.

Kunit test should import the namespace EXPORTED_FOR_KUNIT_TESTING to
use these marked symbols. This is the standard way for all KUnit
tests.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

powerpc: Remove dead non-preemption code

Since commit 7dadeaa6e851 ("sched: Further restrict the preemption
modes"), powerpc always has CONFIG_PREEMPTION because only
CONFIG_PREEMPT and CONFIG_PREEMPT_LAZY are possible, even in
dynamic preemption mode (see sched_dynamic_mode).

As a consequence, need_irq_preemption() is always true and can be
removed.

And because commit bee25f97ad24 ("powerpc: Enable GENERIC_ENTRY
feature") includes linux/irq-entry-common.h which already declares
sk_dynamic_irqentry_exit_cond_resched static key, asm/preempt.h
becauses useless and can be removed.

Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/2bf10a0afffefb6aca44bf2f864cc17471a80e31.1781870889.git.chleroy@kernel.org

powerpc/dt_cpu_ftrs: Set CPU_FTR_P11_PVR for Power11 and later processors

When using device tree CPU features (dt-cpu-ftrs), the kernel bypasses
the traditional cputable-based CPU identification and instead derives
CPU features from the device tree's "ibm,powerpc-cpu-features" node
provided by firmware.

However, CPU_FTR_P11_PVR is a kernel-internal feature flag used to
identify Power11 and later processors, and is not represented in the
device tree's ISA feature set. While ISA v3.1 support (indicated by
CPU_FTR_ARCH_31) is present on both Power10 and Power11, the
CPU_FTR_P11_PVR flag is specifically needed by code that must
distinguish between Power10 and Power11 processors.

Without this flag set, code that checks for Power11 using
cpu_has_feature(CPU_FTR_P11_PVR) will incorrectly return false on
Power11+ systems using dt-cpu-ftrs, leading to incorrect behavior.

This issue manifests specifically in powernv environments (bare-metal
or QEMU TCG with powernv machine type), where skiboot/OPAL firmware
provides the "ibm,powerpc-cpu-features" node, causing the kernel to
use dt-cpu-ftrs. The issue does not affect pseries guests, where SLOF
firmware does not provide this node, causing the kernel to fall back
to the traditional cputable path (identify_cpu) which correctly sets
CPU_FTR_P11_PVR during PVR-based CPU identification.

In powernv TCG guests, the missing flag causes KVM code to trigger
warnings when attempting to create KVM guests, as cpu_features shows
0x000c00eb8f4fb187 (missing bit 53) instead of the correct
0x002c00eb8f4fb187 (with bit 53 set).

Fix this by setting CPU_FTR_P11_PVR for all processors with
PVR >= PVR_POWER11 when ISA v3.1 support is detected in
cpufeatures_setup_start(). This approach ensures forward
compatibility with future processor generations.

Fixes: 96e266e3bcd6 ("KVM: PPC: Book3S HV: Add Power11 capability support for Nested PAPR guests")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260614173437.26352-1-amachhiw@linux.ibm.com

powerpc/pseries: fix memory leak on krealloc failure in papr_init

When krealloc() fails, free the original esi_buf before returning to
avoid a memory leak.

Fixes: 3c14b73454cf ("powerpc/pseries: Interface to represent PAPR firmware attributes")
Cc: stable@vger.kernel.org
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260614142356.658212-2-thorsten.blum@linux.dev

powerpc/uaccess: correct check for CONFIG_PPC_E500 in mask_user_address()

mask_user_address() incorrectly checks for CONFIG_E500 instead of
CONFIG_PPC_E500, causing mask_user_address_isel() to not be used on
E500 hardware. Fix the check to use the correct name.

Fixes: 861574d51bbd ("powerpc/uaccess: Implement masked user access")
Cc: stable@vger.kernel.org # 7.0+
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Fixes: 861574d51bbd ("powerpc/uaccess: Implement masked user access")
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260615233729.29386-1-enelsonmoore@gmail.com

powerpc/vtime: Initialize starttime at boot for native accounting

It was observed that /proc/stat had very large value for one ore more
CPUs. It was more visible after recent code simplifications around
cpustats.

System has 240 CPUs.

cat /proc/uptime;
194.18 46500.55
cat /proc/stat
cpu  5966 39 837032887 4650070 164 185 100 0 0 0
cpu0 108 0 837030890 19109 24 4 23 0 0 0

Since uptime is 194s, system time of each CPU can't be more than 19400.
Sum of system time  of all CPUs can't be more than 19400*240 4656000.
In fact huge value is close to mftb(). Note mftb doesn't reset on powerVM
when the LPAR restart. It only resets when whole system resets. The same
issue exists for kexec too.

This happens since starttime is not setup at init time. Once it is set
then subsequent vtime_delta will return the right delta.

Fix it by initializing the starttime during CPU initialization. This
fixes the large times seen.

cat /proc/uptime; cat /proc/stat
15.78 3694.63
cpu  6035 35 1347 369479 23 144 49 0 0 0
cpu0 19 0 38 1508 0 1 14 0 0 0

Now, system time is reported as expected.

Fixes: cf9efce0ce31 ("powerpc: Account time using timebase rather than PURR")
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Suggested-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260605124329.377533-1-sshegde@linux.ibm.com

powerpc/85xx: Add fsl,ifc to common device ids

Add fsl,ifc to mpc85xx_common_ids so that of_platform_bus_probe
creates a platform device for the IFC node even without 'simple-bus'
in its compatible property. On P1010 and similar platforms the IFC
node is a direct child of the root, so it must be explicitly matched
to be populated.

Fixes: 0bf51cc9e9e5 ("powerpc: dts: mpc85xx: remove "simple-bus" compatible from ifc node")
Assisted-by: opencode:big-pickle
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260604043309.91280-1-rosenp@gmail.com

arch/riscv: vdso: remove CFI landing pad from rt_sigreturn

When CONFIG_RISCV_USER_CFI is enabled, the CFI version of the vDSO, has
a CFI landing pad instruction at the start of __vdso_rt_sigreturn. This
breaks libgcc's unwinding code which matches on the first two
instructions. Other unwinders that rely on similar instruction matching
may also be affected.

Since __vdso_rt_sigreturn is reached as part of signal-return handling
rather than via an indirect call/jump from userspace, it does not need a
CFI landing pad. Remove it and restore the instruction sequence expected
by existing unwinding code.

This matches what was done on arm64 in commit 9a964285572b ("arm64:
vdso: Don't prefix sigreturn trampoline with a BTI C instruction") for a
similar issue.

Cc: stable@vger.kernel.org
Fixes: 37f57bd3faea ("arch/riscv: compile vdso with landing pad and shadow stack note")
Co-authored-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Link: https://patch.msgid.link/20260623204058.498120-1-aurelien@aurel32.net
[pjw@kernel.org: fixed comment style]
Signed-off-by: Paul Walmsley <pjw@kernel.org>

accel/amdxdna: reject command submission on devices without a submit op

amdxdna_cmd_submit() calls xdna->dev_info->ops->cmd_submit()
unconditionally, but only aie2_dev_ops defines that callback.
aie4_vf_ops (the AIE4 SR-IOV virtual function) does not, so a user
AMDXDNA_EXEC_CMD ioctl on an AIE4 device reaches a NULL function-pointer
call and oopses the kernel. AIE4 submits work through a mapped user queue
and doorbell, not this ioctl path.

Reject the submission early with -EOPNOTSUPP when the device provides no
cmd_submit op, so the shared EXEC ioctl is a clean no-op on such devices.

Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Cc: stable@vger.kernel.org
Found by 0sec automated security-research tooling (https://0sec.ai).
Assisted-by: 0sec:claude-opus-4-8
Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260713173030.87541-3-doruk@0sec.ai

accel/amdxdna: reject user command submission without a command BO

amdxdna_drm_submit_execbuf() passes the user-supplied command BO handle
straight into amdxdna_cmd_submit() with drv_cmd == NULL. When the handle
is AMDXDNA_INVALID_BO_HANDLE (0), the block that fetches job->cmd_bo is
skipped, leaving it NULL, and no check rejects it on the user path (the
!job->cmd_bo guard lives inside the != INVALID branch).

The job is then armed and pushed to the DRM scheduler.
aie2_sched_job_run() takes the drv_cmd == NULL path and calls
amdxdna_cmd_set_state(job->cmd_bo) -> amdxdna_gem_vmap(NULL) ->
to_gobj(NULL)->dev, a NULL pointer dereference in the drm_sched worker.
A process with access to the accel node on a system with a probed AMD NPU
can trigger a kernel oops with a single AMDXDNA_EXEC_CMD ioctl
(cmd_handles = 0).

Only internal driver commands (SYNC_DEBUG_BO / ATTACH_DEBUG_BO)
legitimately pass AMDXDNA_INVALID_BO_HANDLE, and they always set drv_cmd.
Reject the invalid handle for user submissions (drv_cmd == NULL) at the
submit choke point so every user path is covered.

Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Cc: stable@vger.kernel.org
Found by 0sec automated security-research tooling (https://0sec.ai).
Assisted-by: 0sec:claude-opus-4-8
Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260713173030.87541-2-doruk@0sec.ai

i2c: mediatek: fix WRRD for SoCs without auto_restart option

MediaTek mt65xx family SoCs have no auto restart, however, they still
support the WRRD mode in the hardware. Because auto_restart is set to 0,
the WRRD mode will be never enabled, leading to read errors.

Fix this by removing auto_restart check from the WRRD enable path.

Fixes: b49218365280 ("i2c: mediatek: fix potential incorrect use of I2C_MASTER_WRRD")
Signed-off-by: Roman Vivchar <rva333@protonmail.com>
Cc: <stable@vger.kernel.org> # v6.18+
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260709-6572-6595-i2c-v2-1-b2fb8510d1d3@protonmail.com

selinux: fix incorrect execmem checks on overlayfs

The commit fixing the overlayfs mmap() and mprotect() access checks
failed to skip the execmem check in __file_map_prot_check() for the case
where the "mounter check" is being performed. This check should be
performed only against the credentials of the task that is calling
mmap()/mprotect(), since it doesn't pertain to the file itself, but
rather just gates the ability of the calling task to get an executable
memory mapping in general.

The purpose of the "mounter check" is to guard against using an
overlayfs mount to gain file access that would otherwise be denied to
the mounter. For execmem this is not relevant, as there is no further
file access granted based on it (notice that the file's context is not
used as the target in the check), so checking it also against the
mounter credentials would be incorrect.

Fix this by passing a boolean to [__]file_map_prot_check() and
selinux_mmap_file_common() that indicates if we are doing the "mounter
check" and skiping the execmem check in that case. Since this boolean
also indicates if we use current_cred() or the mounter cred as the
subject, also remove the "cred" argument from these functions and
determine it based on the boolean and the file struct.

Cc: stable@vger.kernel.org
Fixes: 82544d36b172 ("selinux: fix overlayfs mmap() and mprotect() access checks")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

i2c: mlxbf: Fix use-after-free in mlxbf_i2c_init_resource()

If devm_platform_get_and_ioremap_resource() returns an error,
mlxbf_i2c_init_resource() frees tmp_res before reading tmp_res->io to
get the error code. This results in a use-after-free.

Save the error code before freeing tmp_res.

Fixes: b5b5b32081cd ("i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC")
Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn>
Cc: <stable@vger.kernel.org> # v5.10+
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260714150808.85045-1-xuanqiang.luo@linux.dev

io_uring/fs: check unused sqe fields for unlinkat

Zero check unused SQE fields addr3 and pad2 for unlinkat. They're
not needed now, but could be used sometime in the future.

Signed-off-by: Yi Xie <xieyi@kylinos.cn>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://patch.msgid.link/20260714030306.64820-1-xieyi@kylinos.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: free the replaced iovec after a successful grow

The provided-buffer validation fix deferred freeing a cached iovec
until validation completed. However, the deferred free uses arg->iovs.
After a grow, that points to the newly allocated array. Without a grow,
it points to the cached array that remains in use.

This leaves the caller with a dangling iovec in both cases and can
result in repeated frees. Only free org_iovs when arg->iovs actually
replaced it.

Fixes: cd053d788c3f ("io_uring: fix dangling iovec after provided-buffer bundle grow failure")
Assisted-by: Codex:gpt-5.3-codex-spark
Signed-off-by: Jaeyeong Lee <iostreampy@proton.me>
Link: https://patch.msgid.link/20260712142612.188695595-iostreampy@proton.me
Signed-off-by: Jens Axboe <axboe@kernel.dk>

drm/gpusvm: publish dpagemap early to avoid device mapping leak on error

drm_gpusvm_get_pages() only stored the local dpagemap into
svm_pages->dpagemap on the success path. If a later page failed (e.g.
-EOPNOTSUPP when ctx->allow_mixed is false) and jumped to err_unmap,
svm_pages->dpagemap was still NULL, so __drm_gpusvm_unmap_pages() skipped
device_unmap() and leaked the device mappings already created.

Assign svm_pages->dpagemap when the first device page is mapped so the
err_unmap path can device_unmap() those mappings.

This issue was found by Sashiko AI review.

Fixes: f70da6f99d4f ("drm/gpusvm: pull out drm_gpusvm_pages substructure")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Honglei Huang <honghuan@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260701062800.409248-4-honghuan@amd.com

drm/gpusvm: do not route system pages to device_unmap() on IOVA unmap

In a mixed range: ctx->allow_mixed dpagemap is not NULL while some entries
are system pages. The unmap loop used:

        dma_unmap_page(...);
    else if (dpagemap && dpagemap->ops->device_unmap)
        dpagemap->ops->device_unmap(...);

When use_iova is true the first condition is false for system pages,
so they fall through to device_unmap() and a system DMA address is
handed to the device specific unmap callback, risking invalid accesses
or state corruption.

Key the branch off addr->proto instead: system pages only need an explicit
dma_unmap_page() in the non IOVA case, IOVA system pages are already torn
down by the single dma_iova_destroy(), and only genuine device pages
reach device_unmap().

This issue was found by Sashiko AI review.

Fixes: 37ad039fb367 ("drm/gpusvm: Use dma-map IOVA alloc, link, and sync API in GPU SVM")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Honglei Huang <honghuan@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260701062800.409248-3-honghuan@amd.com

drm/gpusvm: free the whole IOVA reservation on unmap

dma_iova_try_alloc() reserves IOVA for the entire range, but in a mixed
range only the system pages are linked (their total size is state_offset)
while device pages never touch the IOVA state. dma_iova_destroy() with
state_offset only frees the linked part, permanently leaking the IOVA
reserved for the device pages and eventually exhausting the IOVA space.

Unlink the linked system-page portion and free the whole reserved IOVA
instead. On the get_pages() error path state_offset is 0 (no page linked,
dma_addr[0] unpopulated), so skip the unlink and just free the reservation;
this also avoids reading the uninitialized dma_addr[0].dir there.

Allocate the dma_addr array with the zeroing kvzalloc_objs() so every entry
has a well-defined value.

This issue was found by Sashiko AI review.

Fixes: 37ad039fb367 ("drm/gpusvm: Use dma-map IOVA alloc, link, and sync API in GPU SVM")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Honglei Huang <honghuan@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260701062800.409248-2-honghuan@amd.com

Merge tag 'sound-7.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes. All are device-specific fixes (including
  regression fixes) or quirks accumulated since the last update. Some
  highlights:

  USB-audio:
   - Fix per-channel volume imbalance regression for sticky mixers
   - Validate input packet length in caiaq driver
   - Quirks for iBasso DC-Elite, Musical Fidelity M6s DAC, and Redragon
     H510-PRO Wireless headset

  HD-audio:
   - Fix a long-standing bug of cached processing coefficient verbs
   - Make cs35l56 driver failing with missing firmware
   - Fix cirrus codec Kconfig dependency, update MAINTAINERS
   - Remove unneeded mic bias threshold override on Conexant
   - Realtek codec quirks for ASUS ROG Ally X (headphone & mic), Dell
     QCM1255, Legion Pro 7, HP/Victus laptops, Framework, and TongFang
     laptops

  ASoC:
   - Add Eliza audio support on Qualcomm sc8280xp/sm8250 SoCs
   - Fix SDCA linker error with ACP on AMD
   - A few fixes for AMD ACP PCI driver
   - Add TAS2783 support on AMD ACP 7.0 platforms
   - Reset RT712-SDCA codec to fix silent headphone issue
   - Soft reset S/PDIF datapath on Meson AIU FIFO
   - Jack report fix for cs42l43
   - TAS2562 shutdown GPIO clearing fix
   - Sidecar amps quirk for Lenovo laptop in SOF SDW driver

  Misc:
   - Drop redundant mod_devicetable.h includes from FireWire drivers
   - Fix memory leak and format mismatch in mixer kselftest"

* tag 'sound-7.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (36 commits)
  ALSA: usb-audio: Add delay quirk for iBasso DC-Elite
  ALSA: hda: conexant: Remove mic bias threshold override
  ALSA: hda/realtek: Fix speakers on Legion Pro 7 16ARX8H with codec SSID 17aa:38a7
  ALSA: hda/realtek: Fix speakers on MECHREVO WUJIE Series
  ALSA: hda: cs35l56: Fail if wmfw file is missing
  ALSA: usb-audio: Skip DSD quirk for Musical Fidelity M6s DAC
  ALSA: hda: MAINTAINERS: Fix missing cirrus* file reference
  ALSA: hda/cirrus_scodec: Make Kconfig visible if KUNIT
  ALSA: hda/realtek: Add quirk for TongFang X6xx45xU
  ALSA: hda/realtek - Fixed Headphone noise issue for Dell QCM1255
  ASoC: tas2562: fix deprecated 'shut-down' GPIO always cleared after lookup
  ASoC: cs42l43: Correct report for forced microphone jack
  ASoC: qcom: sc8280xp: Add support for Eliza
  ASoC: dt-bindings: qcom,sm8250: Add Eliza sound card
  ASoC: dt-bindings: qcom: Add Eliza LPASS macro codecs
  ALSA: hda/realtek: Add mic mute LED quirk for HP Laptop 15-fd0xxx
  ALSA: hda/realtek - Add quirk for HP Victus 15-fa0xxx (MB 8A50)
  ALSA: usb-audio: Add quirk for Redragon H510-PRO Wireless headset
  ASoC: amd: ps: replace bitwise OR with logical OR in IRQ return check
  ASoC: amd: ps: fix wrong ACP version string in pci_request_regions()
  ...

Merge tag 'for-7.2-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- fix root structure leak after relocation error

- fix optimization when checksums are read from commit root, fall back
   to checksum root during relocation

- in tree-checker, validate length of inode reference in items

- validate properties before setting them

- validate free space cache entries on load

- transaction abort fixes

- fix printing of internal trees as signed numbers

- add error messages after critical lzo compression errors

* tag 'for-7.2-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: print-tree: print header owner as signed
  btrfs: decentralize transaction aborts in create_reloc_root()
  btrfs: tree-checker: validate INODE_REF's namelen
  btrfs: lzo: add error message for invalid headers
  btrfs: fallback to transaction csum tree on a commit root csum miss
  btrfs: fix root leak if its reloc root is unexpected in merge_reloc_roots()
  btrfs: reject free space cache with more entries than pages
  btrfs: fix transaction abort logic in btrfs_fileattr_set()
  btrfs: validate properties before setting them

i2c: spacemit: fix spurious IRQ handling returning IRQ_HANDLED

When the interrupt status register reads zero, the handler should
return IRQ_NONE instead of IRQ_HANDLED. What the return value
actually feeds into is the spurious interrupt accounting in
note_interrupt(): falsely claiming IRQ_HANDLED defeats the "irq XX:
nobody cared" detection, so a stuck interrupt source would never be
caught.

Fixes: 5ea558473fa3 ("i2c: spacemit: add support for SpacemiT K1 SoC")
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Cc: <stable@vger.kernel.org> # v6.15+
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Mukesh Savaliya <mukesh.savaliya@oss.qualcomm.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/ef8b623f45d4e430721e46572c2598d882044aed.1783667875.git.xiaopei01@kylinos.cn

i2c: imx: fix locked bus on SMBus block-read of 0 (IRQ)

SMBus 3.1 6.5.7 allows a Block Read byte count of 0, but the
interrupt-driven block-read state machine rejects it as -EPROTO. Worse,
it returns without a NACK+STOP: the next receive cycle has already
started, so the target keeps holding SDA and the bus stays stuck until a
power cycle of this i2c controller.

Accept count=0: NACK the in-flight dummy byte (TXAK) and set msg->len to
2 so i2c_imx_isr_read_continue() emits STOP via its normal last-byte
path. The dummy byte is discarded; block-read callers only consume
buf[0..count-1].

Reading I2DR has likewise already armed the next byte on the
count > I2C_SMBUS_BLOCK_MAX error path, so NACK it (TXAK) before aborting
with -EPROTO; otherwise the failing transfer's STOP cannot complete and
the bus stays held.

The atomic path regressed earlier (v3.16) and is fixed separately; this
patch covers only the v6.13 state-machine rework.

Fixes: 5f5c2d4579ca ("i2c: imx: prevent rescheduling in non dma mode")
Signed-off-by: Vincent Jardin <vjardin@free.fr>
Cc: <stable@vger.kernel.org> # v6.13+
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Stefan Eichenberger <eichest@gmail.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260713-for-upstream-i2c-lx2160-fix-v1-v3-2-073ac9e103a5@free.fr

i2c: imx: fix locked bus on SMBus block-read of 0 (atomic)

SMBus 3.1 6.5.7 allows a Block Read byte count of 0, but the atomic
(polling) path rejects it as -EPROTO. Worse, it returns without a
NACK+STOP: the next receive cycle has already started, so the target
keeps holding SDA and the bus stays stuck until a power cycle for
this i2c controller.

Reading I2DR to obtain the count likewise arms the next byte on the
count > I2C_SMBUS_BLOCK_MAX path, which also returned -EPROTO directly
and left the bus held.

Handle both: NACK the in-flight dummy byte (TXAK) and extend msgs->len so
the existing last-byte handling emits STOP; the dummy byte is discarded.
A count of 0 is a valid empty block read; a count above
I2C_SMBUS_BLOCK_MAX is still reported as -EPROTO, but only after the bus
has been released.

The interrupt-driven path has the same flaw from a later commit and is
fixed separately, as it carries a different Fixes: tag and stable range.

Fixes: 8e8782c71595 ("i2c: imx: add SMBus block read support")
Signed-off-by: Vincent Jardin <vjardin@free.fr>
Cc: <stable@vger.kernel.org> # v3.16+
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Stefan Eichenberger <eichest@gmail.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260713-for-upstream-i2c-lx2160-fix-v1-v3-1-073ac9e103a5@free.fr

CREDITS: Add Wolfram Sang

Wolfram Sang has decided that a decade-plus of I2C was enough and
is moving on to new things.

Thank you, Wolfram, for your years of dedication and for keeping
the bus in line. Your legacy is now officially cemented in the
CREDITS file.

Suggested-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Cc: Wolfram Sang <wsa@kernel.org>
Reviewed-by: Wolfram Sang <wsa@kernel.org>
Link: https://lore.kernel.org/r/20260625163221.183414-1-andi.shyti@kernel.org

drm/i915/display: Fix NV12 ceiling division for bigjoiner case

Commit 16df4cc63c58 ("drm/i915/display: Use ceiling division for NV12
UV surface offset calculation") computes the UV (chroma) surface
start/size as ceiling(half of Y plane start/size) directly from the
U16.16 fixed-point source rectangle:

        x = fp_16_16_to_int_ceil(fp_16_16_div2(src.x1));

For a single pipe the source coordinates are integers, so this is
correct.
(UV start = ceiling(half of Y plane start)).

With bigjoiner + a plane scaler the picture changes. The pipe boundary
is a fixed integer destination pixel, but the plane's position and the
scaler ratio are arbitrary, so drm_rect_clip_scaled() maps the seam back
to a *fractional* per-pipe source. For a 1280->2407 upscaled NV12 plane
crossing the seam:

        master src: width = 1204 * 1280/2407 = 640.265899, x1 = 0
        joiner src: width = 1203 * 1280/2407 = 639.734115, x1 = 640.265884

The luma path floors this to an integer (src.x1 >> 16 = 640), but the
UV path takes ceiling(640.265884 / 2) = ceil(320.13) = 321. The Y plane
then starts at column 640 while the UV plane starts at 321*2 = 642,
pushing the chroma read one column past the 640-wide chroma surface on
the joiner secondary:

        [CRTC:382:pipe C] PLANE ATS fault
        [CRTC:382:pipe C][PLANE:267:plane 1C] fault (CTL=0x81009400, ...)

The spec "Y plane start" is the integer pixel the luma surface actually
programs (640), not the pre-floor fixed-point value (640.27). Convert
the Y plane start/size to integer first - matching skl_check_main_surface()
- and then apply the ceiling. This is a no-op for the integer (non-joiner)
case and yields the correct, in-bounds chroma offset for the fractional
joiner seam:

                     before fix      after fix
        master 1B:   x=0  w=321      x=0   w=320   -> [0, 320)
        slave  1C:   x=321 w=320     x=320 w=320   -> [320, 640)

The two halves now tile the 640-wide chroma plane exactly and the ATS
fault is gone.

Assisted-by: GitHub-Copilot:Claude-Opus-4.8
Fixes: 16df4cc63c58 ("drm/i915/display: Use ceiling division for NV12 UV surface offset calculation")
Signed-off-by: Vidya Srinivas <vidya.srinivas@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patch.msgid.link/20260618181837.687302-1-vidya.srinivas@intel.com
(cherry picked from commit 0c59cc78241c10e5f02d92b28d811b0435e706a7)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

spi: cadence-quadspi: Fix indirect write timeout when DMA read mode is enabled

When use_dma_read is enabled, the IRQ handler unconditionally overwrites
irq_status with the return value of get_dma_status(). For write operations,
DMA status returns 0 since no DMA read is in progress, causing irq_status
to become 0. The subsequent completion signal is never triggered and the
write operation times out with -ETIMEDOUT:

cadence-qspi f1010000.spi: Indirect write timeout
spi-nor spi0.1: operation failed with -110

Fix this by separating the DMA completion path from the write interrupt
path. If get_dma_status() indicates DMA read completion, signal completion
and return immediately. Otherwise, preserve the original irq_status so that
write completion interrupts are correctly recognized and signalled.

Fixes: aac733a96636 ("spi: cadence-qspi: Fix style and improve readability")
Signed-off-by: Srikanth Boyapally <srikanth.boyapally@amd.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://patch.msgid.link/20260708045148.2993313-1-srikanth.boyapally@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: dw-dma: Wait for controller idle before completing Tx

dw_spi_dma_wait_tx_done() polls dw_spi_dma_tx_busy(), which only checks
DW_SPI_SR_TF_EMPT. An empty TX FIFO merely means the last data word has
been moved into the shift register; the transfer is not complete on the
bus until DW_SPI_SR_BUSY is also cleared. As a result the wait can
return while the controller is still shifting out the final word.

Any caller that tears down or reconfigures the controller right after
the transfer can then lose the tail of the transfer.

The memory-operation path in spi-dw-core.c already waits for both
DW_SPI_SR_BUSY == 0 and DW_SPI_SR_TF_EMPT == 1. Use the same completion
condition in the DMA path so the transfer is guaranteed to be finished
on the bus before the wait returns.

Signed-off-by: Wang YuWei <1973615295@qq.com>
Link: https://patch.msgid.link/tencent_4EA7B5C94669ED4C38A5F6C1C9126E5D9106@qq.com
Signed-off-by: Mark Brown <broonie@kernel.org>

can: raw: add locking for raw flags bitfield

With commit 890e5198a6e5 ("can: raw: use bitfields to store flags in
struct raw_sock") the formerly separate integer values have been integrated
into a single bitfield. This led to a read-modify-write operation when
changing a flag in raw_setsockopt() which now needs a locking to prevent
concurrent access.

Instead of adding a lock/unlock hell in each of the flag manipulations this
patch introduces a wrapper for a new raw_setsockopt_locked() function
analogue to the isotp_setsockopt[_locked]() approach in net/can/isotp.c

Fixes: 890e5198a6e5 ("can: raw: use bitfields to store flags in struct raw_sock")
Reported-by: Eulgyu Kim <eulgyukim@snu.ac.kr>
Closes: https://lore.kernel.org/linux-can/20260503112200.22727-1-eulgyukim@snu.ac.kr/
Tested-by: Eulgyu Kim <eulgyukim@snu.ac.kr>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Reviewed-by: Vincent Mailhol <mailhol@kernel.org>
Tested-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20260504111928.41856-1-socketcan@hartkopp.net
[mkl: use Closes tag instead of Link]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

xfs: don't zap bmbt forks if they are MAXLEVELS tall

LOLLM noticed a discrepancy between the bmbt level checks in the libxfs
bmbt code vs. the inode repair code. We do actually allow a bmbt root
that proclaims to have a height of XFS_BM_MAXLEVELS.

Cc: stable@vger.kernel.org # v6.8
Fixes: e744cef2060559 ("xfs: zap broken inode forks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Assisted-by: LOLLM # finding obvious bugs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>