git.ipfire.org Git - thirdparty/linux.git/log

can: isotp: use unconditional synchronize_rcu() in isotp_release()

isotp_notify() unregisters the (RCU) CAN filters via can_rx_unregister()
and clears so->bound without waiting for a grace period. isotp_release()
uses so->bound to decide whether it needs to call synchronize_rcu()
before cancelling so->rxtimer, so when NETDEV_UNREGISTER runs first it
skips that synchronize_rcu() and can cancel the timer while an
in-flight isotp_rcv() is still executing and about to re-arm it via
isotp_send_fc(), leading to a use-after-free timer callback on the
freed socket.

sakisho-bot remarked a problem with rtnl_lock held in isotp_notify(),
therefore make isotp_release() always call synchronize_rcu() before
cancelling the timers, regardless of so->bound. This still closes the
original race (isotp_notify() clearing so->bound without waiting for
in-flight isotp_rcv() callers before isotp_release() cancels the RX
timer) without adding any RCU wait to the netdevice notifier path.

Fixes: 14a4696bc311 ("can: isotp: isotp_release(): omit unintended hrtimer restart on socket release")
Closes: https://lore.kernel.org/linux-can/20260707085210.6B6C01F000E9@smtp.kernel.org/
Reported-by: Nico Yip <zdi-disclosures@trendmicro.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260712-isotp-fixes-v10-1-793a1b1ce17f@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: bcm: collected fixes"

Oliver Hartkopp <socketcan@hartkopp.net> says:

as there were different patches flying arround to fix CAN_BCM issues and AI
assisted stuff pop's up again and again, I've created this collection to be
applied.

Link: https://patch.msgid.link/20260714-bcm_fixes-v15-0-562f7e3e42da@hartkopp.net
[mkl: added stable@k.o on Cc, converted Link: -> Closes:]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: track a single source interface for ANYDEV timeout/throttle ops

An ANYDEV rx op (ifindex == 0) with an active RX timeout and/or
throttle timer has no defined semantics when matching frames arrive
from several interfaces: bcm_rx_handler() can run concurrently for
the same op on different CPUs, racing hrtimer_cancel()/
bcm_rx_starttimer() against bcm_rx_timeout_handler() and causing
spurious RX_TIMEOUT notifications and last_frames corruption. The
same concurrency lets throttled multiplex frames from different
interfaces clobber the single rx_ifindex/rx_stamp fields shared by
the op.

Add op->if_detected to track the first interface that delivers a
matching frame while a timeout/throttle timer is configured, and
reject frames from any other interface for that op. The claim is
decided in bcm_rx_handler() before hrtimer_cancel() touches
op->timer, so a rejected frame can never disturb the claimed
interface's watchdog. RTR-mode ops are excluded via RX_RTR_FRAME,
independent of kt_ival1/kt_ival2, since those may briefly hold a
stale value from an earlier non-RTR configuration.

The claim is released in bcm_notify() on NETDEV_UNREGISTER and in
bcm_rx_setup() when SETTIMER reconfigures the timer values.

A (re-)claim is only possible on CAN devices in NETREG_REGISTERED
dev->reg_state to cover the release in bcm_notify() where reg_state
becomes NETREG_UNREGISTERING until synchronize_net().

Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/linux-can/20260709105031.1A39C1F000E9@smtp.kernel.org/
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-11-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: fix data race on rx_stamp/rx_ifindex in bcm_rx_handler()

For an rx op subscribed on all interfaces (ifindex == 0), the same op
is registered once in the shared per-netns wildcard filter list, so
bcm_rx_handler() can run concurrently on different CPUs for frames
arriving on different net devices.

op->rx_stamp and op->rx_ifindex were written before bcm_rx_update_lock was
taken, allowing concurrent writers to race each other - including a torn
store of the 64-bit rx_stamp on 32-bit platforms.

Beyond a torn store bcm_send_to_user() must report the timestamp/ifindex
of the very same frame whose content it is delivering. So the assignment
is placed in the same unbroken bcm_rx_update_lock section as the content
comparison.

As a side effect, the RTR-request frame feature (which never reach
bcm_send_to_user()) no longer updates rx_stamp/rx_ifindex, since only
the notification path needs them.

Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/linux-can/20260707145135.5BC831F00A3A@smtp.kernel.org/
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-10-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: fix stale rx/tx ops after device removal

RX: an RX_SETUP update(!) for an existing op skipped can_rx_register()
unconditionally, even when a concurrent NETDEV_UNREGISTER had already
torn down its registration (op->rx_reg_dev == NULL). This silently
did not re-enable frame delivery for that updated filter. bcm_rx_setup()
now re-registers in that case, while leaving rx_ops with ifindex = 0
(all CAN devices) which never carry a tracked rx_reg_dev registered as-is.

TX: bcm_notify() only handled bo->rx_ops on NETDEV_UNREGISTER, leaving
tx_ops with an active cyclic transmission re-arming its hrtimer
indefinitely to execute bcm_tx_timeout_handler(). Cancelling the hrtimer
prevents the runaway timer and any injection into a later reused ifindex,
since nothing else calls bcm_can_tx() for the op until an explicit
TX_SETUP update re-arms it.

Unlike bcm_rx_unreg(), which clears the tracked rx_reg_dev for rx_ops,
the ifindex is intentionally left unchanged for tx_ops. bcm_tx_setup()
always rejects ifindex 0, so clearing it would strand the op: neither a
later TX_SETUP (bcm_find_op()) nor TX_DELETE (bcm_delete_tx_op()) could
ever find it again, since both require an exact ifindex match.

Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/linux-can/20260708094536.DDF821F00A3A@smtp.kernel.org/
Closes: https://lore.kernel.org/linux-can/20260708154039.347ED1F000E9@smtp.kernel.org/
Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-9-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: add missing device refcount for CAN filter removal

sashiko-bot remarked a problem with a concurrent device unregistration
in isotp.c which also is present in the bcm.c code. A former fix for raw.c
commit c275a176e4b6 ("can: raw: add missing refcount for memory leak fix")
introduced a netdevice_tracker which solves the issue for bcm.c too.

bcm_release(), bcm_delete_rx_op() and bcm_notifier() relied on
dev_get_by_index(ifindex) to re-find the device for an rx_op before
unregistering its filter. If a concurrent NETDEV_UNREGISTER has already
unlisted the device from the ifindex table, that lookup fails and
can_rx_unregister() is silently skipped, leaving a stale CAN filter
pointing at the soon-to-be-freed bcm_op/socket.

Hold a netdev_hold()/netdev_put() tracked reference on op->rx_reg_dev
from the moment the rx filter is registered in bcm_rx_setup() until it
is unregistered in bcm_rx_unreg(), and use that reference directly in
bcm_release() and bcm_delete_rx_op() instead of re-looking the device
up by ifindex.

Reported-by: sashiko-bot@kernel.org
Closes: https://sashiko.dev/#/patchset/20260707094716.63578-1-socketcan@hartkopp.net
Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-8-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: validate frame length in bcm_rx_setup() for RTR replies

bcm_tx_setup() validates cf->len against the CAN/CAN FD DLC limits
before installing frames for TX_SETUP, but bcm_rx_setup() never did
the same for the RTR-reply frame configured via RX_SETUP with
RX_RTR_FRAME.

Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-7-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: extend bcm_tx_lock usage for data and timer updates

Stage new CAN frame content for an existing tx op into a kmalloc()'d
buffer and validate it there, mirroring the approach already used in
bcm_rx_setup(). Only copy the validated data into op->frames while
holding op->bcm_tx_lock, so bcm_can_tx() and bcm_tx_timeout_handler()
can no longer observe a partially updated or unvalidated frame.

Add a missing error path for memcpy_from_msg() when copying CAN frame
data from userspace.

Also move the kt_ival1/kt_ival2/ival1/ival2 updates in bcm_tx_setup()
under op->bcm_tx_lock, and read kt_ival1/kt_ival2/count under the same
lock in bcm_tx_set_expiry() and bcm_tx_timeout_handler(), closing the
torn 64-bit ktime_t read on 32-bit platforms.

Fixes: c2aba69d0c36 ("can: bcm: add locking for bcm_op runtime updates")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-6-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: add missing rcu list annotations and operations

sashiko-bot remarked the missing use of list_add_rcu() in
bcm_[rx|tx]_setup() to have a proper initialized bcm_op structure
when bcm_proc_show() traverses the bcm_op's under rcu_read_lock().

To cover all initial settings of the bcm_op's the list_add_rcu() calls
are moved to the end of the setup code.

While at it, also fix the mirroring removal side: bcm_release() called
bcm_remove_op() - which frees the op via call_rcu() - on ops that were
still linked in bo->tx_ops/bo->rx_ops, without list_del_rcu() first.
Unlink each op with list_del_rcu() before handing it to bcm_remove_op(),
matching the existing pattern in bcm_delete_tx_op()/bcm_delete_rx_op().

Reported-by: sashiko-reviews@lists.linux.dev
Closes: https://lore.kernel.org/linux-can/20260610094654.A1FFE1F00893@smtp.kernel.org/
Fixes: dac5e6249159 ("can: bcm: add missing rcu read protection for procfs content")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-5-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: fix CAN frame rx/tx statistics

KCSAN detected a data race within the bcm_rx_handler() when two CAN frames
have been simultaneously received and processed in a single rx op by two
different CPUs.

Use atomic operations with (signed) long data types to access the
statistics in the hot path to fix the KCSAN complaint.

Additionally simplify the update and check of statistics overflow by
using the atomic operations in separate bcm_update_[rx|tx]_stats()
functions. The rx variant runs under bcm_rx_update_lock to prevent
races when resetting the two rx counters; the tx variant runs under
bcm_tx_lock and only needs to guard its own counter's overflow.

As the rx path resets its values already at LONG_MAX / 100, there is
no conflict between the two locking domains (bcm_rx_update_lock vs.
bcm_tx_lock) even for ops that use both paths.

The rx statistics update and the frames_filtered update in
bcm_rx_changed() were previously performed in two separate
bcm_rx_update_lock sections. For an rx op subscribed on all interfaces
(ifindex == 0), bcm_rx_handler() can run concurrently on different
CPUs, so a counter reset by one CPU between these two sections could
leave frames_filtered larger than frames_abs on another CPU, producing
a bogus (even negative) reduction percentage in procfs. Update the
statistics in the same critical section as bcm_rx_changed() to close
this gap, which also removes the now unneeded extra lock/unlock pair
around the traffic_flags calculation.

Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-4-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: add locking when updating filter and timer values

KCSAN detected a simultaneous access to timer values that can be
overwritten in bcm_rx_setup() when updating timer and filter content
while bcm_rx_handler(), bcm_rx_timeout_handler() or bcm_rx_thr_handler()
run concurrently on incoming CAN traffic.

Protect the timer (ival1/ival2/kt_ival1/kt_ival2/kt_lastmsg) and filter
(nframes/flags/frames/last_frames) updates in bcm_rx_setup() with a new
per-op bcm_rx_update_lock, taken with the matching scope in the RX
handlers. memcpy_from_msg() is staged into a temporary buffer before the
lock is taken, since it can sleep and must not run under a spinlock.

hrtimer_cancel() is always called without bcm_rx_update_lock held, since
bcm_rx_timeout_handler()/bcm_rx_thr_handler() take the same lock and a
running callback would otherwise deadlock against the canceller.

Also close a related race: bcm_rx_setup() cleared the RTR flag in the
stored reply frame's can_id as a separate, unprotected step after the
frame content was already installed, so a concurrent bcm_rx_handler()
could transmit a stale reply with CAN_RTR_FLAG still set. Fold that
normalization into the initial frame preparation instead (on the staged
buffer for updates, directly on op->frames pre-registration for new
ops), so the installed frame is always atomically self-consistent.

bcm_rx_handler()'s RX_RTR_FRAME check now takes a lock-protected
snapshot of op->flags before deciding whether to call bcm_can_tx(),
but does not hold the lock across that call.

Also take a lock-protected snapshot of the currframe in bcm_can_tx()
to avoid partly overwrites by content updates in bcm_tx_setup().
Finally check if a TX_RESET_MULTI_IDX/SETTIMER might have reset
op->currframe between the two locked sections in bcm_can_tx().

Omit calling hrtimer_forward() with zero interval in bcm_rx_thr_handler().
kt_ival2 may have been concurrently cleared by bcm_rx_setup() before it
cancels this timer, so check kt_ival2 inside the bcm_rx_update_lock.

Fixes: c2aba69d0c36 ("can: bcm: add locking for bcm_op runtime updates")
Reported-by: syzbot+75e5e4ae00c3b4bb544e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-can/6975d5cf.a00a0220.33ccc7.0022.GAE@google.com/
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-3-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: fix lockless bound/ifindex race and silent RX_SETUP failure

bcm_sendmsg() reads bo->ifindex and checks bo->bound before taking
lock_sock(), while bcm_notify(), bcm_connect() and bcm_release() all
mutate both fields under that same lock. Because the lockless reads
and the locked writes are unordered with respect to each other, a
racing bcm_notify() (device unregister) or bcm_connect() (concurrent
bind on another thread sharing the socket) can make bcm_sendmsg()
observe an inconsistent combination, e.g. a stale bound=1 together
with the now-cleared ifindex=0, silently turning a socket bound to a
specific CAN interface into one that also matches "any" interface.

Keep the lockless bo->bound check purely as a fast-path reject, and
move the ifindex read (and a bo->bound re-check) into the locked
section, where every writer already serializes. This removes the
possibility of observing the two fields torn against each other,
rather than trying to fix it with more READ_ONCE()/WRITE_ONCE() pairs
on two independently updated fields. Annotate the now-purely-lockless
bo->bound accesses consistently across all its write sites.

Also fix bcm_rx_setup() silently returning success when the target
device disappears concurrently instead of reporting -ENODEV, so a
broken RX op is no longer left registered as if it had succeeded.

Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Reported-by: Ginger <ginger.jzllee@gmail.com>
Closes: https://lore.kernel.org/linux-can/CAGp+u1aBK8QVjsvAxM2Ldzep4rEbsP9x_pV3At4g=h1kVEtyhA@mail.gmail.com/
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-2-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bcm: defer rx_op deallocation to workqueue to fix thrtimer UAF

Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
synchronize_rcu()") replaced synchronize_rcu() in bcm_delete_rx_op()
with call_rcu() and introduced the RX_NO_AUTOTIMER flag.

However, this flag check was omitted for thrtimer in the packet rx
fast-path. During BCM RX operation teardown, a concurrent RCU reader
(bcm_rx_handler) can race and re-arm thrtimer via
bcm_rx_update_and_send() after call_rcu() has been scheduled.  Once
the RCU grace period elapses, bcm_op is freed.  The subsequently
firing thrtimer then dereferences the deallocated op, causing a UAF.

Adding flag checks to the rx fast-path (bcm_rx_update_and_send) does not
fully close the TOCTOU race and introduces latency for every CAN frame.
Conversely, calling hrtimer_cancel() directly inside the RCU callback
(softirq context) is fatal as hrtimer_cancel() can sleep, triggering
a "scheduling while atomic" panic.

Resolve this by deferring the timer cancellation and memory free to a
dedicated unbound workqueue (bcm_wq).  The RCU callback now queues a
work item to bcm_wq, which safely cancels both timers and deallocates
memory in sleepable process context.  A dedicated workqueue is used to
prevent system-wide WQ saturation and is cleanly flushed/destroyed
on module unload to avoid rmmod page faults.

Since the deferred work can now outlive the calling context by an
unbounded amount, also take a reference on op->sk when it is assigned
and drop it only once the deferred work has cancelled both timers, so a
socket can no longer be freed out from under a still-armed timer whose
callback (bcm_send_to_user()) dereferences op->sk.

Fixes: f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly synchronize_rcu()")
Tested-by: Feng Xue <feng.xue@outlook.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260714-bcm_fixes-v15-1-562f7e3e42da@hartkopp.net
Cc: stable@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: peak: Modification of references to email accounts being deleted

Following the sale of PEAK-System France by HMS-Networks, this update is
intended to change all my @hms-networks.com email addresses to my new
@peak-system.fr address.

Signed-off-by: Stéphane Grosjean <s.grosjean@peak-system.fr>
Link: https://patch.msgid.link/20260410124251.40506-1-stephane.grosjean@free.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: j1939: fix lockless local-destination check

j1939_priv.ents[].nusers is documented as protected by priv->lock, and
its updates already happen under that lock. j1939_can_recv() also reads
it under read_lock_bh(). However, j1939_session_skb_queue() and
j1939_tp_send() still read priv->ents[da].nusers without taking the
lock.

Those transport-side checks decide whether to set J1939_ECU_LOCAL_DST, so
they can race with j1939_local_ecu_get() and j1939_local_ecu_put() while
userspace is binding or releasing sockets concurrently with TP traffic.
This can misclassify TP/ETP sessions as local or remote and take the wrong
transport path.

Fix both transport paths by routing the destination-locality check through
a helper that reads ents[].nusers under read_lock_bh(&priv->lock).

Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260419140614.GA4041240@chcpu16
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

KVM: SVM: Bump asid_generation on CPU online to avoid ASID collision after hotplug

If a vCPU stays scheduled out (or blocked) while the last pCPU it ran
on goes through a hotplug cycle (online->offline->online), and the vCPU
then resumes execution on the same pCPU, then it is possible for it to
run with an ASID that has now been assigned to a different vCPU,
resulting in stale TLB translations being used.

svm_enable_virtualization_cpu() resets asid_generation to 1 and sets
next_asid to max_asid + 1 on every CPU online event, including hotplug
cycles.  Because next_asid starts beyond the pool boundary, the first
call to new_asid() after an online event always wraps the pool,
incrementing asid_generation to 2 and assigning ASIDs starting from
min_asid.

Consider two vCPUs from different VMs, vCPU-A pinned to CPU-X holding
asid_generation=2 and ASID=N from before the hotplug event:

  1. CPU-X goes offline and back online: asid_generation resets to 1,
     next_asid = max_asid + 1.

  2. One or more vCPUs migrate to CPU-X and call new_asid(), wrapping
     the pool and consuming ASIDs starting from min_asid.  Eventually
     vCPU-B from a different VM is assigned asid_generation=2, ASID=N
     — the same ASID that vCPU-A held before the hotplug.

  3. vCPU-A enters pre_svm_run() on CPU-X: current_vmcb->cpu is
     unchanged so the migration branch is skipped.  Its saved
     asid_generation=2 matches sd->asid_generation=2, so the generation
     check silently passes and vCPU-A continues running with ASID=N —
     the same ASID just freshly assigned to vCPU-B.

Both vCPUs from different VMs now run on CPU-X with the same ASID,
causing them to share NPT TLB entries and producing stale translations.

The collision manifests as a KVM internal error (Suberror: 1, emulation
failure).  The NPT page fault reports a faulting GPA far outside the
VM's physical memory range — a sign of stale TLB translations being
used.  KVM falls back to instruction emulation, which fails on
FPU/XSave instructions (XRSTOR, STMXCSR) that the emulator does not
implement.

Fix this by incrementing asid_generation instead of resetting it to 1
in svm_enable_virtualization_cpu().  On module load, asid_generation
starts at 0 (memset) and the increment produces 1, identical to the
old behaviour.  On subsequent hotplug cycles the generation advances
beyond any value a vCPU previously observed on this CPU, so the
generation check in pre_svm_run() reliably forces new_asid() on every
vCPU after every hotplug cycle.

Fixes: 774c47f1d78e ("[PATCH] KVM: cpu hotplug support")
Reported-by: Chandrakanth Silveru <Chandrakanth.Silveru@amd.com>
Tested-by: Srikanth Aithal <Srikanth.Aithal@amd.com>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Message-ID: <20260715063506.672432-1-nikunj@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ALSA: hda/realtek: Add inverted LED quirk for HP ZBook 8 G2a

HP ZBook 8 G2a 14 and 16 (SSIDs 0x103c:0x8f94, 0x103c:0x8f95) use the
Realtek ALC245 codec with a TAS2781 amplifier via I2C. They share the same
hardware configuration as the existing 0x8f40/0x8f41/0x8f42/0x8f62 models
but have inverted speaker mute LED polarity, so the existing
ALC245_FIXUP_HP_TAS2781_I2C_MUTE_LED quirk drives the LED backwards: off
when muted and on when unmuted.

Add a dedicated quirk with inverted COEF values. These are speaker-only
models without an HP pin, so the LED is driven directly through the
vmaster_mute hook; there is no need to probe the HP pin at runtime since
the configuration is static and known per SSID.

Signed-off-by: Chris Chiu <chris.chiu@canonical.com>
Link: https://patch.msgid.link/20260716023825.387532-1-chris.chiu@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda: codecs: hdmi: disable keep-alive before audio format change

When a keep-alive (KAE) silent stream is active on an Intel HDMI/DP
codec, opening a real PCM stream reprograms the converter format and the
audio infoframe in snd_hda_hdmi_generic_pcm_prepare(). Part of that
reprogramming - the converter channel count and the channel mapping in
snd_hda_hdmi_setup_audio_infoframe() - is not safe to do while a
keep-alive stream is active. This is most visible when switching to a
multichannel PCM configuration, where the active channel count actually
changes. In that case the newly opened PCM stream plays no sound.

Add an optional hdmi_ops .prepare hook, called at the start of the
PCM prepare sequence (before the format and infoframe are touched), and
implement it for HSW+ to release keep-alive. Keep-alive is then
re-enabled as before once the new stream has been set up, in the
setup_stream op.

Fixes: 15175a4f2bbb ("ALSA: hda/hdmi: add keep-alive support for ADL-P and DG2")
Reported-by: Alexander Kaplan <alexander.kaplan@sms-medipool.de>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/8412
Tested-by: Alexander Kaplan <alexander.kaplan@sms-medipool.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Link: https://patch.msgid.link/20260715180610.1371243-1-kai.vehmanen@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

liveupdate: fix GET_NAME ioctl argument validation

LIVEUPDATE_SESSION_GET_NAME was developed in the liveupdate/next branch
while the session type validation change was carried in liveupdate-fixes.
When the conflict between the two branches was resolved, the GET_NAME
operation descriptor picked up the structure and last member from
RETRIEVE_FD.

This makes both its known size and minimum size 16 bytes rather than 72.
A zero-initialized request still succeeds because luo_session_get_name()
writes the full name before luo_ucmd_respond() copies the full GET_NAME
response to userspace. However, copy_struct_from_user() treats the
output-only name field as unknown trailing data and rejects the request
with -E2BIG if any byte in that field is nonzero.

Use the GET_NAME structure and its name field in the descriptor.

Link: https://lore.kernel.org/all/ahWlYXNjGUbkKoHy@sirena.org.uk/
Assisted-by: Codex:gpt-5.6-sol
Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Link: https://patch.msgid.link/20260716012607.22020-1-liu.yun@linux.dev
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

selftests/seccomp: Fix pointer type mismatch build error

We hit the following build error while running the seccomp selftests in
our testing.

CC seccomp_bpf
seccomp_bpf.c: In function ‘UPROBE_setup’:
seccomp_bpf.c:5175:74: error: pointer type mismatch in conditional expression [-Wincompatible-pointer-types]
5175 | offset = get_uprobe_offset(variant->uretprobe ? probed_uretprobe : probed_uprobe);
| ^
seccomp_bpf.c:5175:57: note: first expression has type ‘int (*)(void)’
5175 | offset = get_uprobe_offset(variant->uretprobe ? probed_uretprobe : probed_uprobe);
| ^~~~~~~~~~~~~~~~
seccomp_bpf.c:5175:76: note: second expression has type ‘int (__attribute__((nocf_check)) *)(void)’
5175 | offset = get_uprobe_offset(variant->uretprobe ? probed_uretprobe : probed_uprobe);
| ^~~~~~~~~~~~~

get_uprobe_offset() takes a 'const void *' argument, so cast both
operands to 'void *'.

Fixes: 9ffc7a635c35 ("selftests/seccomp: validate uprobe syscall passes through seccomp")
Signed-off-by: Kuan-Ying Lee <kuan-ying.lee@canonical.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://patch.msgid.link/20260715053559.28535-1-kuan-ying.lee@canonical.com
Signed-off-by: Kees Cook <kees@kernel.org>

selftests/lkdtm: rename STACKLEAK_ERASING to KSTACK_ERASE

Commit 57fbad15c2ee ("stackleak: Rename STACKLEAK to KSTACK_ERASE")
renamed the LKDTM crash type and selftest configuration but missed the
entry in tests.txt.

As a result, the selftest generates STACKLEAK_ERASING.sh, which run.sh
skips because the LKDTM DIRECT trigger only exposes KSTACK_ERASE. Rename
the test entry so the generated runner uses the registered crash type.

Fixes: 57fbad15c2ee ("stackleak: Rename STACKLEAK to KSTACK_ERASE")
Signed-off-by: Haofeng Li <lihaofeng@kylinos.cn>
Link: https://patch.msgid.link/tencent_CD80B5F746B6AABD68AF3F1097AD02C96F05@qq.com
Signed-off-by: Kees Cook <kees@kernel.org>

Merge tag 'linux_kselftest-kunit-fixes-7.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit fix from Shuah Khan:
"Fix warning suppressions with kunit built as module:

  CONFIG_KUNIT is a tristate symbol but the warning suppression code in
  lib/bug.c is only built if it's built-in due to it using a plain
  #ifdef, rendering warning suppressions broken for kunit build as
  loadable module.

  kunit_is_suppressed_warning() already has a stub for when kunit is
  disabled so drop that guard entirely"

* tag 'linux_kselftest-kunit-fixes-7.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  bug: fix warning suppressions with kunit built as module

Merge tag 'linux_kselftest-fixes-7.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:

- Fix ftrace reading enabled_func test in add_remove_fprobe_module test

- Fix tracing trigger-hist-poll.tc to use sched_process_exit

* tag 'linux_kselftest-fixes-7.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/tracing: Have trigger-hist-poll.tc use sched_process_exit
selftests/ftrace: Fix reading enabled_functions in add_remove_fprobe_module test

block: do not warn when doing greedy allocation in folio_alloc_greedy()

During one of my local btrfs fstests runs, folio_alloc() inside
folio_alloc_greedy() triggered an allocation failure report when trying
to allocate an order-4 folio.

The kernel is from the latest development branch, which is utilizing
the IOMAP_DIO_BOUNCE flag for direct writes when the inode requires
checksum.

Unfortunately I didn't save the full log, only the function and the
order.

When the IOMAP_DIO_BOUNCE flag is utilized, we will hit the following call
chain:

bio_iov_iter_bounce_write()
|- folio_alloc_greedy()
|- folio_alloc(gfp | __GFP_NORETRY, get_order(*size));

However __GFP_NORETRY will still emit an allocation failure report
when it fails.

And folio_alloc_greedy() will retry with a smaller order anyway, there
is no point in emitting that allocation failure report.

Append the __GFP_NOWARN flag to folio_alloc() for the larger-order folio
attempts.

Fixes: 8dd5e7c75d7b ("block: add helpers to bounce buffer an iov_iter into bios")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Link: https://patch.msgid.link/d10571445ee505d95ba6eaad7558fc1f556d2921.1784020005.git.wqu@suse.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

partitions: aix: bound the lvd scan to one sector

aix_partition() reads the logical-volume descriptor array as a single
sector and then scans it:

if (numlvs && (d = read_part_sector(state, vgda_sector + 1, &sect))) {
struct lvd *p = (struct lvd *)d;
...
for (i = 0; foundlvs < numlvs && i < state->limit; i++) {
lvip[i].pps_per_lv = be16_to_cpu(p[i].num_lps);

p points at a single 512-byte sector, which holds SECTOR_SIZE /
sizeof(struct lvd) = 16 entries, but the loop runs until foundlvs reaches
the on-disk numlvs or i reaches state->limit (DISK_MAX_PARTS, 256).
numlvs is an on-disk __be16 read straight from the volume group
descriptor and is not validated, so a crafted AIX image with numlvs
larger than 16 and lvd entries whose num_lps fields are zero (so foundlvs
never advances) drives the loop to read p[i] well past the end of the
read sector buffer.

Commit d97a86c170b4 ("partitions: aix.c: off by one bug") hardened the
matching write of lvip[lv_ix] in 2014 but left this read loop unbounded.

Bound the scan to the number of struct lvd entries that fit in the
sector that was actually read.

Fixes: 6ceea22bbbc8 ("partitions: add aix lvm partition support files")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://patch.msgid.link/20260714114806.3761553-1-michael.bommarito@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

blk-cgroup: fix leaks and online flag on radix_tree_insert failure

When radix_tree_insert() fails in blkg_create(), the error path has two
issues:

1. blkg->online is set to true unconditionally, even when the blkg was
   never fully inserted.  Move the assignment inside the success block.

2. The error path calls blkg_put() without first calling
   percpu_ref_kill().  Because the refcount is still in percpu mode,
   percpu_ref_put() only does this_cpu_sub() without checking for zero,
   so blkg_release() is never triggered.  This permanently leaks the
   blkg memory, its percpu iostat, policy data, the parent blkg
   reference, and the cgroup css reference — the latter preventing the
   cgroup from ever being destroyed.

Fix by replacing blkg_put() with percpu_ref_kill(), matching the pattern
used in blkg_destroy().

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Link: https://patch.msgid.link/20260715132407.1469777-1-cui.tao@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>

loop: remove manually added partitions on detach

Commit 267ec4d7223a ("loop: fix partition scan race between udev and
loop_reread_partitions()") stopped disk_force_media_change() from
setting GD_NEED_PART_SCAN because loop devices with LO_FLAGS_PARTSCAN
rescan partitions explicitly. However, partitions can also be added
manually with BLKPG while LO_FLAGS_PARTSCAN is clear.

When such a loop device is detached, __loop_clr_fd() skips
bdev_disk_changed(). Without GD_NEED_PART_SCAN, reopening the unbound
device no longer performs the previous lazy cleanup, leaving dead
partition devices behind. A subsequent LOOP_CONFIGURE can then fail its
partition scan with -EBUSY, as seen in blktests loop/009 after loop/008.

Call bdev_disk_changed() unconditionally during __loop_clr_fd(). The
disk capacity is already zero and the release path holds open_mutex, so
this drops all partitions without rescanning the detached backing file.

The new blktests loop/013 case covers this sequence by adding a partition
with BLKPG without LO_FLAGS_PARTSCAN, detaching the loop device, and
checking that the partition is gone when the device is reopened.

Fixes: 267ec4d7223a ("loop: fix partition scan race between udev and loop_reread_partitions()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202607150754.b660f5b9-lkp@intel.com
Signed-off-by: Daan De Meyer <daan@amutable.com>
Link: https://patch.msgid.link/20260715-b4-loop-partition-cleanup-v1-1-b9f59910cd1e@amutable.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: fix race in blk_time_get_ns() returning 0

blk_time_get_ns() populates the per-plug cached timestamp and then
returns it by re-reading the field:

if (!plug->cur_ktime) {
plug->cur_ktime = ktime_get_ns();
current->flags |= PF_BLOCK_TS;
}
return plug->cur_ktime;

This is problematic when the compiler emits the final
"return plug->cur_ktime" as a reload from memory, after PF_BLOCK_TS has
already been set.

Since the cached timestamp is now invalidated from finish_task_switch()
(fad156c2af22 "block: invalidate cached plug timestamp after task
switch"), a task preempted between setting PF_BLOCK_TS and that reload
has plug->cur_ktime zeroed by blk_plug_invalidate_ts() when it is
scheduled back in.  The reload then returns 0.

A 0 handed back here is stored as a start timestamp -- e.g.
blk_account_io_start() writes it to rq->start_time_ns -- and later
subtracted from "now".  blk_account_io_done() then adds (now - 0), i.e.
roughly the system uptime, to the per-group nsecs[] counters.  On an
otherwise idle, healthy device this appears as sudden ~uptime-sized jumps
in the diskstats time fields (write_ticks/discard_ticks/time_in_queue).

The solution is to be explicit in our reads and writes to this field
that is preemption volatile.  We also add a barrier() to ensure that any
setting of PF_BLOCK_TS is ordered to happen after the cur_ktime update.

This issue was discovered using AI-assisted kprobes looking for paths
that were leaking zeroed timestamps in a live system, based on the
observation that we were sometimes seeing uptime-sized jumps in kernel
exported counters. This was flagged by NodeDiskIOSaturation
prometheus alerts that started firing on all hosts post 7.1.3 kernel
upgrade, due to node-exporter now exporting a nonsensical
node_disk_io_time_weighted_seconds_total.

Fixes: fad156c2af22 ("block: invalidate cached plug timestamp after task switch")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Waychison <mike@waychison.com>
Assisted-by: Claude:claude-opus-4.8
Link: https://patch.msgid.link/20260715192950.2488921-1-mike@waychison.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'scmi-ffa-fixes-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes

Arm SCMI/FF-A fixes for v7.2

Fix two runtime issues in the SCMI framework. Use full 64-bit division
when rounding range-based clock rates, avoiding divisor truncation and
a possible divide-by-zero on 32-bit systems. Rate-limit notification
queue-full warnings emitted from interrupt context to prevent printk
floods and prolonged system stalls during notification bursts. Also
correct a grammar error in the ARM_SCMI_POWER_CONTROL Kconfig help
text.

Fix the FF-A driver RX/TX buffer sizing logic to respect the maximum
buffer size advertised by firmware, while retaining compatibility with
older implementations that may reject PAGE_SIZE-rounded buffers.
Also fix a NULL pointer dereference in ffa_partition_info_get() by
rejecting NULL UUID strings before passing them to uuid_parse().

* tag 'scmi-ffa-fixes-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  firmware: arm_scmi: Rate-limit queue-full warnings in IRQ context
  firmware: arm_scmi: Use 64-bit division for clock rate rounding
  firmware: arm_scmi: Grammar s/may needed/may be needed/
  firmware: arm_ffa: Fix NULL dereference in ffa_partition_info_get()
  firmware: arm_ffa: Respect firmware advertised RX/TX buffer size limits

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

riscv: hwprobe: Avoid uninitialized read in hwprobe_get_cpus()

When cpusetsize < cpumask_size(), hwprobe_get_cpus() did not fully
initialize its copy of the cpu mask, which could cause non-deterministic
results from the riscv_hwprobe syscall on a system with more than 8 CPUs
when the supplied cpu mask is empty. Address this by fully initializing
the cpu mask.

Fixes: e178bf146e4b ("RISC-V: hwprobe: Introduce which-cpus flag")
Signed-off-by: Mark Harris <mark.hsj@gmail.com>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Michael Ellerman <mpe@kernel.org>
Link: https://patch.msgid.link/20260714003056.73707-1-mark.hsj@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>

fortify: Disable -Wstringop-overread in tests

clang recently added support for -Wstringop-overread [1], which is on by
default like -Wfortify-source. This breaks the usage of -Werror in the
fortify tests, resulting in the following false positive warnings in the
kernel build:

  warning: unsafe memcmp() usage lacked '__read_overflow2' warning in lib/test_fortify/read_overflow2-memcmp.c
  warning: unsafe memcmp() usage lacked '__read_overflow' warning in lib/test_fortify/read_overflow-memcmp.c
  warning: unsafe memchr() usage lacked '__read_overflow' warning in lib/test_fortify/read_overflow-memchr.c

Examining the fortify test logs shows a warning like the following in
each of the failed logs:

  In file included from lib/test_fortify/read_overflow2-memcmp.c:5:
  lib/test_fortify/test_fortify.h:34:2: error: 'memcmp' reading 17 bytes from a region of size 16 [-Werror,-Wstringop-overread]
     34 |         TEST;
        |         ^
  lib/test_fortify/read_overflow2-memcmp.c:3:2: note: expanded from macro 'TEST'
      3 |         memcmp(large, small, sizeof(small) + 1)
        |         ^
  1 error generated.

Disable -Wstringop-overread for the fortify tests, as it defeats the
purpose of testing the Linux specific implementation of fortify, like
-Wfortify-source.

Cc: stable@vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/2168
Link: https://github.com/llvm/llvm-project/commit/86f2e71cb8d165b59ad31a442b2391e23826133e
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260623-fix-test_fortify-for-clang-stringop-overread-v1-1-15ee8342a953@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

ASoC: bt-sco: fix duplicate DAPM widget names for wideband DAI

The bt-sco-pcm-wb DAI uses the same stream_name strings as bt-sco-pcm
("Playback" and "Capture"). This causes duplicate DAPM AIF widget
names within the same component, leading to debugfs warnings:

debugfs: 'Playback' already exists in 'dapm'
debugfs: 'Capture' already exists in 'dapm'

Give the wideband DAI distinct stream names ("WB Playback" and
"WB Capture") and add corresponding DAPM AIF widgets and routes for
them.

Fixes: 5947e1b4992e ("ASoC: bt-sco: extend rate and add a general compatible string")
Assisted-by: VeroCoder:claude-sonnet-4-5
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Link: https://patch.msgid.link/20260715100620.1387159-1-shengjiu.wang@oss.nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>

s390/perf_cpum_cf: Add missing array_index_nospec() to __hw_perf_event_init()

ev variable is userspace controlled via event->attr.config and used
as an array index after bounds checking, but without speculation
barriers.

Add the missing array_index_nospec() call to prevent speculative
execution.

Cc: stable@vger.kernel.org
Fixes: 212188a596d1 ("[S390] perf: add support for s390x CPU counters")
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

s390/checksum: Fix csum_partial() without vector facility

Currently csum_partial() calls csum_copy() with copy=false and dst=NULL.
On machines without the vector facility, csum_copy() falls back to
cksm(dst, ...), causing the checksum to be calculated from address zero
instead of the source buffer.

The VX implementation already checksums data loaded from src. Make the
fallback do the same by passing src to cksm().

Fixes: dcd3e1de9d17 ("s390/checksum: provide csum_partial_copy_nocheck()")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

ALSA: hda: cs35l41: validate and free ACPI mute object

cs35l41_get_acpi_mute_state() evaluates a _DSM method to get the ACPI
mute state and reads the first byte from the returned object.

However, the returned ACPI object is owned by the caller and is never
freed after use, so each successful query leaks the _DSM result object.

The code also assumes that the returned object is a buffer with at least
one byte. A malformed firmware response can return a different object
type or an empty buffer, and the direct ret->buffer.pointer dereference
can then access an invalid pointer.

Use the typed _DSM helper, validate that the returned buffer contains at
least one byte, and free the ACPI object after reading it.

Fixes: 447106e92a0c ("ALSA: hda: cs35l41: Support mute notifications for CS35L41 HDA")
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Link: https://patch.msgid.link/20260708113625.752913-1-lgs201920130244@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

accel/ivpu: Reject firmware log with size smaller than header

fw_log_from_bo() validates the tracing buffer header_size and that the
log fits within the BO, but never checks that log->size is at least
log->header_size. fw_log_print_buffer() then computes:

u32 data_size = log->size - log->header_size;

which underflows to a near-U32_MAX value when firmware reports a log whose
size is smaller than its header. That huge data_size defeats the
log_start/log_end bounds clamps added by commit dd1311bcf0e6 ("accel/ivpu:
Add bounds checks for firmware log indices"), so fw_log_print_lines() reads
far past the small real data region of the BO. A size of 0 also makes
fw_log_from_bo() advance the offset by 0, causing the callers to loop
forever on the same header.

Reject logs whose size is smaller than the header (which also rejects
size == 0).

Fixes: d4e4257afa6e ("accel/ivpu: Add firmware tracing support")
Cc: stable@vger.kernel.org
Signed-off-by: Jhonraushan <raushan.jhon@gmail.com>
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20260715074206.867712-1-raushan.jhon@gmail.com

KVM: riscv: Fix Spectre-v1 in vector register access

User-controlled register indices from the ONE_REG ioctl are used to
index into the vector register buffer (v0..v31). Sanitize the calculated
offset with array_index_nospec() to prevent speculative out-of-bounds
access.

Signed-off-by: Zongmin Zhou <zhouzongmin@kylinos.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260715030818.75657-1-min_halo@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>

drm/panthor: Check debugfs GEM lock initialization

drmm_mutex_init() can fail while registering the managed cleanup action.
When that happens, drmm_add_action_or_reset() destroys the mutex before
returning the error. Continuing initialization would therefore leave the
debugfs GEM object list with an unusable lock.

Propagate the error as is already done for the other managed mutexes in
panthor_device_init().

Fixes: a3707f53eb3f ("drm/panthor: show device-wide list of DRM GEM objects over DebugFS")
Signed-off-by: Linmao Li <lilinmao@kylinos.cn>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patch.msgid.link/20260713082912.321021-1-lilinmao@kylinos.cn
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>

drm/panthor: return error on truncated firmware

panthor_fw_load() detects truncated firmware images, but jumps to the
common cleanup path without setting ret. If no previous error was recorded,
the function can return 0 and treat the invalid firmware as successfully
loaded.

Set ret to -EINVAL before leaving the truncated-image path.

Fixes: 2718d91816ee ("drm/panthor: Add the FW logical block")
Cc: stable@vger.kernel.org
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patch.msgid.link/20260714163056.22329-1-osama.abdelkader@gmail.com
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>

gpiolib: tolerate gpio-hogs lacking a hogging state

Commit d1d564ec4992 ("gpio: move hogs into GPIO core") made
gpiochip_add_hog() return -EINVAL for hog nodes lacking any of the
'input', 'output-low' or 'output-high' properties. The error is
propagated by gpiochip_hog_lines() and fails registration of the
whole GPIO chip.

The previous OF-specific implementation tolerated such nodes:
of_parse_own_gpio() warned "no hogging state specified, bailing out"
and of_gpiochip_add_hog() stopped processing the node without failing
chip registration.

Some boards deliberately ship hog nodes without a hogging state in
their base devicetree and supply the state via overlay, e.g. the PCIe
slot key selection hogs on the BananaPi R4 Pro added in
commit e309fa232d12 ("arm64: dts: mediatek: mt7988a-bpi-r4pro: rework
pcie gpio-hog handling"), as the polarity set in the base devicetree
could not be overridden from an overlay.

Booting such a board without an overlay applied now fails to register
the gpiochip. On the BananaPi R4 Pro this means the MT7988A pinctrl
device fails to probe, all peripherals including the console UART
defer forever, and the board finally hangs when clk_disable_unused()
gates the clocks of the UART still in use by earlycon:

  gpiochip_add_data_with_key: GPIOs 512..595 (pinctrl_moore) failed to register, -22
  mt7988-pinctrl 1001f000.pinctrl: error -EINVAL: Failed to add gpio_chip
  ...
  clk: Disabling unused clocks
  (hangs)

Restore the previous behaviour by warning about hog nodes lacking a
hogging state and skipping them instead of failing the registration
of the whole GPIO chip.

Fixes: d1d564ec4992 ("gpio: move hogs into GPIO core")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/4c67cf0839ccf57db35a826df6d8fc779531509a.1783974733.git.daniel@makrotopia.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>

gpio: sloppy-logic-analyzer: Fix memory leak in gpio_la_poll_probe()

The memory allocated for priv->blob.data is not freed in the error paths
that follow the fops_buf_size_set() call in gpio_la_poll_probe(), as
well as in the remove function. Fix that by using device managed action
to free the memory on remove.

Fixes: 7828b7bbbf20 ("gpio: add sloppy logic analyzer using polling")
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://patch.msgid.link/20260715075311.527753-1-nihaal@cse.iitm.ac.in
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>

iommu/amd: Fix nested domain leak

A couple of runs of different AI tools have generated something like the
following bug report:

    In nested_domain_free(), when refcount_dec_and_test() returns false
    (other nested domains still reference the same gdom_info), the function
    returns without calling kfree(ndom), leaking the nested_domain
    structure. This problem wasn't introduced by this patch, but exists in
    the code from commit 757d2b1fdf5b that the patch modifies. Each
    nested_domain (ndom) is allocated individually in
    amd_iommu_alloc_domain_nested() via kzalloc_obj(*ndom). The .free
    callback is the sole point responsible for freeing this domain. When
    the refcount is > 0, only the xa_unlock_irqrestore is performed and the
    function returns, leaving ndom permanently allocated. This leak occurs
    every time a nested domain sharing a gDomID is destroyed while other
    domains still use that gDomID.

There is a similar leak later in this function in the WARN_ON() test when
the mapping is already NULL. Switch to a RAII-based cleanup for ndom, since
it should always be freed in this function.

Fixes: 757d2b1fdf5b ("iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation")
Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Reviewed-by: Ankit Soni <Ankit.Soni@amd.com>
Signed-off-by: Will Deacon <will@kernel.org>

iommu/amd: Fix IRQ unsafe locking in gdom allocation

Lockdep complains:

  [  259.410489] =====================================================
  [  259.417287] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
  [  259.424667] 7.0.0-g51db1d8d2113 #54 Not tainted
  [  259.429718] -----------------------------------------------------
  [  259.436516] qemu-system-x86/10143 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
  [  259.444670] ff3b2b1c60305170 (&xa->xa_lock#25){+.+.}-{3:3}, at: __domain_flush_pages+0x17c/0x4b0
  [  259.454485]
                 and this task is already holding:
  [  259.460991] ff3b2b1c98504cc0 (&domain->lock){-.-.}-{3:3}, at: amd_iommu_iotlb_sync+0x25/0x60
  [  259.470408] which would create a new lock dependency:
  [  259.476041]  (&domain->lock){-.-.}-{3:3} -> (&xa->xa_lock#25){+.+.}-{3:3}
  [  259.483615]
                 but this new dependency connects a HARDIRQ-irq-safe lock:
  [  259.492447]  (&domain->lock){-.-.}-{3:3}
  [  259.492449]
                 ... which became HARDIRQ-irq-safe at:
  [  259.503705]   lock_acquire+0xb6/0x2e0
  [  259.507790]   _raw_spin_lock_irqsave+0x3e/0x60
  [  259.512748]   amd_iommu_flush_iotlb_all+0x20/0x50
  [  259.517996]   iommu_dma_free_iova.isra.0+0x1b8/0x1e0
  [  259.523534]   __iommu_dma_unmap+0xc2/0x140
  [  259.528100]   iommu_dma_unmap_phys+0x55/0xc0
  [  259.532863]   dma_unmap_phys+0x274/0x2e0
  [  259.537238]   dma_unmap_page_attrs+0x17/0x30
  [  259.542000]   nvme_unmap_data+0x13e/0x280
  [  259.546473]   nvme_pci_complete_batch+0x45/0x70
  [  259.551524]   nvme_irq+0x83/0x90
  [  259.555123]   __handle_irq_event_percpu+0x92/0x360
  [  259.560466]   handle_irq_event+0x39/0x80
  [  259.564841]   handle_edge_irq+0xb2/0x1a0
  [  259.569214]   __common_interrupt+0x4e/0x130
  [  259.573882]   common_interrupt+0x88/0xa0
  [  259.578256]   asm_common_interrupt+0x27/0x40
  [  259.583019]   cpuidle_enter_state+0x119/0x5d0
  [  259.587877]   cpuidle_enter+0x2e/0x50
  [  259.591962]   do_idle+0x153/0x2c0
  [  259.595657]   cpu_startup_entry+0x29/0x30
  [  259.600128]   start_secondary+0x118/0x150
  [  259.604601]   common_startup_64+0x13e/0x141
  [  259.609266]
                 to a HARDIRQ-irq-unsafe lock:
  [  259.615384]  (&xa->xa_lock#25){+.+.}-{3:3}
  [  259.615386]
                 ... which became HARDIRQ-irq-unsafe at:
  [  259.627039] ...
  [  259.627039]   lock_acquire+0xb6/0x2e0
  [  259.633071]   _raw_spin_lock+0x2f/0x50
  [  259.637250]   amd_iommu_alloc_domain_nested+0x140/0x3c0
  [  259.643078]   iommufd_hwpt_alloc+0x272/0x800 [iommufd]
  [  259.648813]   iommufd_fops_ioctl+0x14e/0x200 [iommufd]
  [  259.654547]   __x64_sys_ioctl+0x9d/0xf0
  ...

Since amd_iommu_domain_flush_pages() necessarily holds domain->lock to do the
flush, switch the allocation side in gdom_info_load_or_alloc_locked() to
HARDIRQ-safe allocation. The IOMMU_DESTROY->free path has the same issue,
so switch that path to HARDIRQ-safe locking as well.

Fixes: 757d2b1fdf5b ("iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation")
Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Reviewed-by: Ankit Soni <Ankit.Soni@amd.com>
Signed-off-by: Will Deacon <will@kernel.org>

KVM: nVMX: Put vmcs12 pages if nested VM-Enter fails due to invalid guest state

Put all vmcs12 pages if KVM synthesizes a nested VM-Exit due to invalid
guest while emulating VMLAUNCH or VMRESUME. The invalid guest state path
doesn't use nested_vmx_vmexit() as that API is intended to be used if and
only if L2 is active, and the open coded equivalent neglects to put the
vmcs12 pages. Failure to put the vmcs12 pages leaks any pinned pages
(and/or mappings) if L1 retries VMLAUNCH/VMRESUME.

Note, the !from_vmenter scenario doesn't suffer the same problem, as
vmx_get_nested_state_pages() only gets/pins/maps the vmcs12 pages if L2 is
active, i.e. if a "full" VM-Exit is guaranteed before KVM will retry
getting vmcs12 pages.

Fixes: 96c66e87deee ("KVM/nVMX: Use kvm_vcpu_map when mapping the virtual APIC page")
Fixes: 3278e0492554 ("KVM/nVMX: Use kvm_vcpu_map when mapping the posted interrupt descriptor table")
Fixes: fe1911aa443e ("KVM: nVMX: Use kvm_vcpu_map() to get/pin vmcs12's APIC-access page")
Reported-by: Minh Nguyen <minhnguyen.080505@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvm-s390-master-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes for 7.2

- more gmap KVM memory management fixes
- PCI passthru fixes

Merge tag 'kvm-x86-fixes-7.2-rc4' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 7.2-rcN

- Fix a bug where KVM will trigger a UAF if updating IOMMU IRTEs fails when
   registering an IRQ-bypass producer.

- Ignore pending PV EOI instead of BUG()ing the host if the feature was
   disabled by the guest.

- Fix nVMX bugs where KVM would run L1 with an L1-controlled CR3 after a
   failed "late" consistency check when KVM is NOT using EPT.

- Disallow intra-host migration/mirroring of SNP VMs as KVM doesn't yet
   support moving/mirroring SNP state.

- Fix a TOCTOU bug in KVM's handling of the "trusted" CPUID for TDX guests.

- Fix a NULL pointer deref in trace_kvm_inj_exception() where a change to the
   core infrastructure missed KVM's unique (ab)use of __print_symbolic().

Merge tag 'kvmarm-fixes-7.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 7.2, take #2

- Move locking for kvm_io_bus_get_dev() into the caller, ensuring
  race-free checks that the returned object is of the correct type

- Fix initialisation of the page-table walk level when relaxing
  permissions

- Correctly update the XN attribute when relaxing permissions

- Fix the sign extension of loads from emulated MMIO regions

- Assorted collection of fixes for pKVM's FFA proxy, together with a
  couple of FFA driver adjustments

Merge tag 'kvmarm-fixes-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 7.2, take #1

- Fix an accounting buglet when reclaiming pages from a protected
  guest

- Fix a bunch of architectural compliance issues when injecting a
  synthesised exception, most of which were missing the PSTATE.IL bit
  indicating a 32bit-wide instruction

- Another set of fixes addressing issues with translation of VNCR_EL2,
  including corner cases where the guest point that register at a RO
  page...

- Don't warn when trapping accesses to ZCR_EL2 from an L2 guest, as
  that's not unexpected at all

- Address a bunch of races with LPI migration vs LPIs being disabled

- Fix a total howler of a bug combining FEAT_MOPS and NV, resulting in
  exception returning in the wrong place...

- Coerce Fuad Tabba into a reviewer role, and may his Inbox catch
  fire!

Merge branch 'bpf-reject-negative-const-offsets-for-buffer-pointers'

Sun Jian says:

====================
bpf: Reject negative const offsets for buffer pointers

Reject negative effective offsets for PTR_TO_TP_BUFFER and PTR_TO_BUF
accesses. Calculate the effective access start using signed arithmetic
to prevent unsigned access-end accounting from wrapping, and cover both
load-time rejection and the raw tracepoint writable attach-time path.
---

Changes in v5:

- Simplify __check_buffer_access() to reject a negative effective start
  after confirming that var_off is constant. Validate the combined
  offset instead of rejecting negative instruction offsets separately.
  Drop the duplicate BPF_MAX_VAR_OFF check because pointer arithmetic
  already bounds constant offsets, and remove the redundant size < 0
  check.
- Switch the raw tracepoint writable attach tests from nbd_send_request
  to bpf_testmod_test_writable_bare_tp, avoiding the NBD configuration
  dependency and its false-pass condition.
- Split the attach coverage into named subtests and require
  bpf_raw_tracepoint_open() to return -EINVAL.
- Add verifier coverage for a negative constant PTR_TO_BUF offset.

Changes in v4:

- Correct the Fixes tag to point to 022ac0750883, where pointer offsets
  were folded into reg->var_off.
- Drop the end > U32_MAX check, which is unreachable after bounding const
  var_off with BPF_MAX_VAR_OFF while keeping instruction offsets and
  access sizes bounded.

Changes in v3:

- Check constant var_off against +/-BPF_MAX_VAR_OFF before computing
  the effective access range, matching the existing verifier pointer
  offset convention.
- Keep explicit rejection of negative instruction offsets and keep
  bounded negative constant var_off valid when the effective offset is
  non-negative.

Changes in v2:

- Split the kernel fix and selftests into separate patches.
- Add an attach-time raw tracepoint writable test that exercises
  max_tp_access against nbd_send_request's writable size.
- Adjust selftest formatting to use the 100 character line width.

Tested:

- ./test_progs -v -t verifier_raw_tp_writable
- ./test_progs -v -t verifier_ptr_to_buf
- ./test_progs -v -t raw_tp_writable_reject_bad_access
- ./test_progs -v -t raw_tp_writable_test_run

v4: https://lore.kernel.org/bpf/20260708090151.151729-1-sun.jian.kdev@gmail.com/
v3: https://lore.kernel.org/bpf/20260708040715.116680-1-sun.jian.kdev@gmail.com/
v2: https://lore.kernel.org/bpf/20260707060804.93561-1-sun.jian.kdev@gmail.com/
v1: https://lore.kernel.org/bpf/20260703035137.109608-1-sun.jian.kdev@gmail.com/
====================

Link: https://patch.msgid.link/20260714093846.18159-1-sun.jian.kdev@gmail.com
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>

selftests/bpf: Cover negative buffer pointer offsets

Add verifier coverage for constant negative offsets on PTR_TO_TP_BUFFER
and PTR_TO_BUF pointers. Both programs adjust the buffer pointer by -8
and access it at offset zero, so the negative effective start must be
rejected at load time.

Switch the raw tracepoint writable attach checks from nbd_send_request
to bpf_testmod_test_writable_bare_tp, avoiding a dependency on the NBD
tracepoint. Keep the existing past-end case and add a case with a
negative var_off compensated by a positive instruction offset. The
effective start remains non-negative, so the program loads, but its
access end exceeds the writable context size and
bpf_raw_tracepoint_open() must return -EINVAL.

Cc: stable@vger.kernel.org # 5.2.0
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://patch.msgid.link/20260714093846.18159-3-sun.jian.kdev@gmail.com
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>

bpf: Reject negative const offsets for buffer pointers

The verifier rejects variable offsets for PTR_TO_TP_BUFFER and PTR_TO_BUF
accesses, but it currently accepts a constant negative offset produced by
pointer arithmetic.

Commit 022ac0750883 ("bpf: use reg->var_off instead of reg->off for
pointers") moved constant pointer offsets from reg->off to reg->var_off.
However, __check_buffer_access() continued to check only the instruction
offset. An access with reg->var_off equal to -8 and an instruction offset
of zero therefore passes verification.

For writable raw tracepoints, the access end is also calculated from the
unsigned reg->var_off.value. An eight-byte access starting at -8 wraps
the calculated end to zero, allowing the program to load and attach
without increasing max_tp_access.

After ensuring that reg->var_off is constant, calculate the effective
access start using signed arithmetic and reject it when it is negative.
Use the validated start to calculate the access end for both
PTR_TO_TP_BUFFER and PTR_TO_BUF.

Fixes: 022ac0750883 ("bpf: use reg->var_off instead of reg->off for pointers")
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Cc: stable@vger.kernel.org # 5.2.0
Link: https://patch.msgid.link/20260714093846.18159-2-sun.jian.kdev@gmail.com
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>

Merge branch 'bpf-sockmap-fix-fionread-for-sockets-without-a-verdict-program'

Mattia Meleleo says:

====================
bpf, sockmap: Fix FIONREAD for sockets without a verdict program

Sockets added to a sockmap/sockhash with no stream/skb verdict program
attached answer FIONREAD with 0 even when unread data is pending in
sk_receive_queue. Fix tcp_bpf_ioctl() to account for the receive queue
in that case, and add a selftest.

Changes in v3:
- Remove unused sk_psock_msg_inq()
- Link to v2: https://patch.msgid.link/20260708-fionread-no-verdict-v2-0-29dd293621c7@coralogix.com

Changes in v2:
- Split the fix and the selftest into separate patches
- Use READ_ONCE() to read the verdict program pointers
- Link to v1: https://patch.msgid.link/20260707-fionread-no-verdict-v1-1-ce94a72357ec@coralogix.com

Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
---
====================

Link: https://patch.msgid.link/20260708-fionread-no-verdict-v3-0-b4ee31b3af53@coralogix.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

selftests/bpf: Test FIONREAD on a sockmap socket without a verdict program

Add a test validating that FIONREAD on a TCP socket in a sockmap
without a verdict program reports data pending in sk_receive_queue.

Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20260708-fionread-no-verdict-v3-2-b4ee31b3af53@coralogix.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

bpf, sockmap: Account for receive queue in FIONREAD without a verdict program

tcp_bpf_ioctl() answers SIOCINQ from psock->msg_tot_len, which only
counts bytes in ingress_msg. Without a stream/skb verdict program
nothing is diverted there: data stays in sk_receive_queue, so FIONREAD
returns 0 even though read() returns data.

Add tcp_inq() to the reported value when the psock has no verdict
program. The two queues are disjoint, so bytes redirected into
ingress_msg from other sockets stay correctly accounted through
msg_tot_len.

Remove unused sk_psock_msg_inq().

Fixes: 929e30f93125 ("bpf, sockmap: Fix FIONREAD for sockmap")
Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20260708-fionread-no-verdict-v3-1-b4ee31b3af53@coralogix.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

mmc: sdhci-esdhc-imx: fix resume error handling

Check pm_runtime_force_resume() return value in resume. If it fails
(clock enable failure), return immediately since accessing hardware
registers on an unclocked device would cause a kernel panic.

The early return intentionally skips enable_irq() and
sdhci_disable_irq_wakeups() because the IRQ handler reads
SDHCI_INT_STATUS, which would also fault without clocks. The PM runtime
usage counter leak only affects this already-broken device instance and
is an acceptable tradeoff to preserve system stability.

Remove the return value check for mmc_gpio_set_cd_wake(host->mmc, false)
since disable_irq_wake() called internally always returns 0.

Also return 0 explicitly on the success path instead of propagating
stale return values.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: make non-fatal errors non-blocking in suspend

Make pinctrl_pm_select_sleep_state() and mmc_gpio_set_cd_wake() failures
non-fatal in the suspend path. These failures only mean slightly higher
power consumption or missing CD wakeup capability, but should not block
system suspend.

Also change the function to always return 0 on the success path instead
of propagating non-fatal warning return values.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: use pm_runtime_resume_and_get() in suspend

Replace pm_runtime_get_sync() with pm_runtime_resume_and_get() to
simplify error handling. pm_runtime_resume_and_get() automatically
drops the usage counter on failure, avoiding the need for a separate
pm_runtime_put_noidle() call. If it fails, the device is unclocked and
accessing hardware registers would cause a kernel panic, so return the
error immediately.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: disable irq during suspend to fix unhandled interrupt

When using WIFI out-of-band wakeup, an "irq xxx: nobody cared" warning
occurs. This happens because the usdhc interrupt is not disabled during
system suspend when device_may_wakeup() returns false.

The sequence of events leading to this issue:
1. System enters suspend without disabling usdhc interrupt
(because device_may_wakeup() returns false for usdhc device)
2. WIFI out-of-band wakeup triggers system resume via GPIO interrupt
3. WIFI sends a Card interrupt before usdhc has fully resumed
4. usdhc is still in runtime suspend state and cannot handle the
interrupt properly
5. The unhandled interrupt triggers "nobody cared" warning

Fix this by unconditionally disabling the usdhc interrupt during suspend
and re-enabling it during resume, regardless of the wakeup capability.
This ensures no interrupts are processed during the suspend/resume
transition.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: restore pinctrl before restoring ios timing on resume

SDIO devices such as WiFi may keep power during suspend, so the MMC
core skips full card re-initialization on resume and directly restores
the host controller's ios timing to match the card. For DDR mode,
pm_runtime_force_resume() sets DDR_EN before the pin configuration is
restored from sleep state.

This is related to the SoC IP integration: switching pinctrl setting
(changing alt from GPIO to USDHC) impacts the internal loopback path.
If pinctrl configures the pad to GPIO function, once DDR_EN is set, the
DLL delay will be fixed based on the GPIO function loopback path. When
the pinctrl is later changed to USDHC function, the internal loopback
path changes, making the original fixed sample point no longer suitable
for the current loopback path. This causes persistent read CRC errors on
subsequent data transfers.

SD/eMMC running in DDR mode are unaffected as they are fully
re-initialized from legacy timing after resume.

Fix this by restoring the pinctrl state based on current timing mode
using esdhc_change_pinstate() before pm_runtime_force_resume(). This
ensures the correct pin configuration (e.g., 100/200MHz for UHS modes)
is applied before DDR_EN is set. Only restore for non-wakeup devices
since wakeup devices kept their active pin state during suspend.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: fix esdhc_change_pinstate() to allow default state restore

esdhc_change_pinstate() checks for pins_100mhz and pins_200mhz at the
top of the function and returns -EINVAL if either is not defined. This
prevents the default case from ever being reached, which means devices
with a sleep pinctrl state but without high-speed pin states (100mhz/
200mhz) can never restore their default pin configuration.

Move the IS_ERR checks for pins_100mhz and pins_200mhz into their
respective switch cases.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: restore DLL override for DDR modes on resume

sdhci_esdhc_imx_hwinit() unconditionally clears ESDHC_DLL_CTRL by
writing zero. For SDIO devices that keep power during system suspend
and operate in DDR mode, the card remains in DDR timing while the host
DLL override configuration is lost.

Extract the DLL override setup from esdhc_set_uhs_signaling() into
a helper esdhc_set_dll_override(), and call it on the resume path
when the card kept power and is using a DDR timing mode.

Fixes: 676a83855614 ("mmc: host: sdhci-esdhc-imx: refactor the system PM logic")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

mmc: sdhci-esdhc-imx: remove unnecessary mmc_card_wake_sdio_irq check for tuning save/restore

The tuning save/restore during system PM is conditioned on
mmc_card_wake_sdio_irq(), but this check is unrelated to whether
tuning values need to be preserved. The actual requirement is that
the card keeps power during suspend and the controller is a uSDHC.

SDIO devices using out-of-band GPIO wakeup maintain power during
suspend but do not set the SDIO IRQ wake flag. In this case the
tuning delay values are not saved/restored.

Remove the unnecessary mmc_card_wake_sdio_irq() condition from both
the suspend save and resume restore paths.

Fixes: c63d25cdc59a ("mmc: sdhci-esdhc-imx: Save tuning value when card stays powered in suspend")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

Merge branch 'bpf-sockmap-fix-sockmap-leaking-udp-socks'

Michal Luczaj says:

====================
bpf, sockmap: Fix sockmap leaking UDP socks

Fix for UDP sockets getting leaked during sockmap lookup/release.
Accompanied by selftests updates.

Two Sashiko's concerns to be addressed separately:
https://lore.kernel.org/bpf/20260626205814.BAC3C1F000E9@smtp.kernel.org/

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
Changes in v4:
- selftest: drop redundant `if (err)` [Sashiko]
- Link to v3: https://patch.msgid.link/20260702-sockmap-lookup-udp-leak-v3-0-ff8de8782468@rbox.co

Changes in v3:
- selftest: better error handling, ASSERT_*() macros [Sashiko]
- selftest: fix grammar, reorder patches [Kuniyuki]
- Link to v2: https://patch.msgid.link/20260626-sockmap-lookup-udp-leak-v2-0-7e7e201c951a@rbox.co

Changes in v2:
- selftest: drop the original, adapt old tests
- fix: change approach to rejecting unbound UDP [Kuniyuki]
- Link to v1: https://patch.msgid.link/20260623-sockmap-lookup-udp-leak-v1-0-05804f9308e4@rbox.co

To: Alexei Starovoitov <ast@kernel.org>
To: Daniel Borkmann <daniel@iogearbox.net>
To: Andrii Nakryiko <andrii@kernel.org>
To: Eduard Zingerman <eddyz87@gmail.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Martin KaFai Lau <martin.lau@linux.dev>
To: Song Liu <song@kernel.org>
To: Yonghong Song <yonghong.song@linux.dev>
To: Jiri Olsa <jolsa@kernel.org>
To: Emil Tsalapatis <emil@etsalapatis.com>
To: Shuah Khan <shuah@kernel.org>
To: John Fastabend <john.fastabend@gmail.com>
To: Jakub Sitnicki <jakub@cloudflare.com>
To: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Eric Dumazet <edumazet@google.com>
To: Kuniyuki Iwashima <kuniyu@google.com>
To: Paolo Abeni <pabeni@redhat.com>
To: Willem de Bruijn <willemb@google.com>
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
To: Simon Horman <horms@kernel.org>
To: Cong Wang <cong.wang@bytedance.com>
Cc: bpf@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
---
====================

Link: https://patch.msgid.link/20260707-sockmap-lookup-udp-leak-v4-0-f878346f27ab@rbox.co
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

selftests/bpf: Fail unbound UDP on sockmap update

sockmap now rejects unbound UDP sockets. Adjust test_maps. While at it,
check socket()'s return value.

This effectively reverts commit c39aa2159974 ("bpf, selftests: Fix
test_maps now that sockmap supports UDP").

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20260707-sockmap-lookup-udp-leak-v4-4-f878346f27ab@rbox.co
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

selftests/bpf: Adapt sockmap update error handling

Update sockmap_listen to accommodate the recent change in sockmap that
rejects unbound UDP sockets.

TCP: Reject unbound and bound (unless established or listening).
UDP: Accept only bound sockets.

While at it, migrate to ASSERT_* and enforce reverse xmas tree.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20260707-sockmap-lookup-udp-leak-v4-3-f878346f27ab@rbox.co
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

bpf, sockmap: Reject unhashed UDP sockets on sockmap update

UDP sockets get SOCK_RCU_FREE set when (auto-)bound. This means
sk_is_refcounted(unbound) = true, while sk_is_refcounted(bound) = false.

Because sockmap accepts unbound UDP sockets, a BPF program can increment a
socket's refcount via lookup. If the socket is subsequently bound, the
transition from unbound to bound causes bpf_sk_release() to skip the
decrement of the refcount, causing a memory leak.

unreferenced object 0xffff88810bc2eb40 (size 1984):
  comm "test_progs", pid 2451, jiffies 4295320596
  hex dump (first 32 bytes):
    7f 00 00 01 7f 00 00 01 d2 04 1b b7 04 d2 00 00  ................
    02 00 01 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
  backtrace (crc bdee079d):
    kmem_cache_alloc_noprof+0x557/0x660
    sk_prot_alloc+0x69/0x240
    sk_alloc+0x30/0x460
    inet_create+0x2ce/0xf80
    __sock_create+0x25b/0x5c0
    __sys_socket+0x119/0x1d0
    __x64_sys_socket+0x72/0xd0
    do_syscall_64+0xa1/0x5f0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

Instead of special-casing for refcounted sockets, reject unhashed UDP
sockets during sockmap updates, as there is no benefit to supporting those.
This effectively reverts the commit under Fixes, with two exceptions:

1. sock_map_sk_state_allowed() maintains a fall-through `return true`.
2. In the spirit of commit b8b8315e39ff ("bpf, sockmap: Remove unhash
   handler for BPF sockmap usage"), the proto::unhash BPF handler is not
   reintroduced.

Historical note: this issue is related to commit 67312adc96b5 ("bpf: reject
unhashed sockets in bpf_sk_assign").

Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20260707-sockmap-lookup-udp-leak-v4-2-f878346f27ab@rbox.co
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

selftests/bpf: Ensure UDP sockets are bound

Update sockmap_basic tests to bind sockets before they are used. This
accommodates the recent change in sockmap that rejects unbound UDP sockets.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20260707-sockmap-lookup-udp-leak-v4-1-f878346f27ab@rbox.co
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

RISC-V: KVM: Serialize virtual interrupt pending state updates

KVM RISC-V tracks guest local interrupt state with two bitmaps:

  - irqs_pending: interrupts that should be visible to the guest
  - irqs_pending_mask: interrupts whose pending state changed

The current code updates those bitmaps with independent atomic bitops
and assumes a multiple-producer, single-consumer protocol. That model
does not actually hold.

kvm_riscv_vcpu_sync_interrupts() is not a pure consumer. When the guest
changes guest-visible HVIP state, sync_interrupts() writes both
irqs_pending and irqs_pending_mask to reflect the new guest state back
into KVM state. As a result, irqs_pending and irqs_pending_mask form a
single logical state transition, but they are not updated atomically as
a pair.

This allows a race where a newly injected interrupt is lost. For
example:

  CPU0                              CPU1
  ----                              ----
  kvm_riscv_vcpu_set_interrupt(VS_SOFT)
    set_bit(VS_SOFT, irqs_pending)
                                    kvm_riscv_vcpu_sync_interrupts()
                                      sees guest-cleared HVIP.VSSIP
                                      sets irqs_pending_mask
                                      clear_bit(IRQ_VS_SOFT, irqs_pending)
    set_bit(VS_SOFT, irqs_pending_mask)
    kvm_vcpu_kick()

After that interleaving, a later flush can update HVIP without VSSIP
even though a new virtual interrupt was injected. In practice, the
guest can remain blocked in WFI with work pending.

The same pending/mask protocol is shared by VS soft interrupts, PMU
overflow delivery, and AIA high interrupt synchronization, so the race
is not limited to one interrupt source.

Fix this by serializing all updates to irqs_pending and irqs_pending_mask
with a per-vCPU raw spinlock. This keeps the pending bit and the dirty
mask as one state transition across:

  - set/unset interrupt
  - guest HVIP sync
  - interrupt flush to guest CSR state
  - vCPU reset
  - AIA CSR writes that clear dirty state

Use non-atomic bitmap operations while holding the lock. Hold the lock
across the AIA sync, flush, and pending checks as well, so both bitmap
words share the same serialization domain.

This intentionally replaces the existing lockless protocol instead of
trying to repair it with additional barriers. The problem is not memory
ordering on a single field; it is that two separate bitmaps encode one
shared state machine while both producers and sync paths can modify
them. A per-vCPU raw spinlock keeps the fix small, local, and suitable
for backporting.

Fixes: cce69aff689e ("RISC-V: KVM: Implement VCPU interrupts and requests handling")
Cc: stable@vger.kernel.org
Signed-off-by: Xie Bo <xb@ultrarisc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260715020359.1521354-2-xb@ultrarisc.com
Signed-off-by: Anup Patel <anup@brainfault.org>

ALSA: hda/realtek: Fix speakers on Alienware x16 R2

The Alienware x16 R2 has two pairs of speakers, but the BIOS
marks pin 0x17 as unused, so only the pin 0x14 pair plays and
audio is very quiet/dull.

Apply ALC289_FIXUP_DUAL_SPK like on other Dell machines to set
up pin 0x17 and route it to DAC1. Tested on my x16 R2 with
kernel 6.18.38, and now all speakers play at full volume.

Signed-off-by: Oliver Ohrt <oliver@theohrts.com>
Link: https://patch.msgid.link/20260715070409.42696-1-oliver@theohrts.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda/realtek: Add quirk for HP EliteBook 830 G8 (8AB8) to enable mute LEDs

The sound and microphone mute LEDs do not function on this newer
revision of the board (8AB8) while they do on the older 880D models.
I have verified this on another laptop which was manufactured before
the one with the issue.
Added the ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED from a G9 model, which
uses the same codec, to make it work. Tested on kernel version 7.1.3 on
the aforementioned newer revision notebook.

Signed-off-by: Marcel Kłos <marcel@marmak.net.pl>
Link: https://patch.msgid.link/4dab5622-9100-4730-8c99-b58da939549b@marmak.net.pl
Signed-off-by: Takashi Iwai <tiwai@suse.de>

drm/ttm: Account for NULL and handle pages in ttm_pool_backup

Pages in ttm_pool_backup can be NULL or backup handles
(ttm_backup_page_ptr_is_handle()), neither of which can be passed to
set_pages_array_wb() or freed. Add a dedicated WB pass before the
dma/purge loop that walks allocations using the same i += num_pages
stride, skipping NULL and handle entries, and calls set_pages_array_wb()
once per contiguous run of real pages. Apply the same NULL/handle guard
to the dma/purge loop.

Fixes the following oops:

Oops: general protection fault, kernel NULL pointer dereference 0x0: 0000 [#1] SMP NOPTI
RIP: 0010:__cpa_process_fault+0xf8/0x770
RSP: 0018:ffffc90000a87718 EFLAGS: 00010287
RAX: 0000000000000000 RBX: ffffc90000a87868 RCX: 0000000000000000
RDX: 0000000000001000 RSI: 0005088000000000 RDI: ffffffff827c5f34
RBP: 0005088000000000 R08: ffffc90000a877cb R09: ffffc90000a877d0
R10: 0000000000000000 R11: 000000000000001b R12: 000ffffffffff000
R13: ffffc90000a87868 R14: ffffc90000a87868 R15: ffff88815b882ae0
FS: 0000000000000000(0000) GS:ffff8884ec840000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f930b844000 CR3: 000000000262e003 CR4: 0000000008f70ef0
PKRU: 55555554
Call Trace:
<TASK>
__change_page_attr_set_clr+0x989/0xe90
? __purge_vmap_area_lazy+0x6c/0x3a0
? _vm_unmap_aliases+0x250/0x2a0
set_pages_array_wb+0x7f/0x120
ttm_pool_backup+0x4c9/0x5b0 [ttm]
? dma_resv_wait_timeout+0x3b/0xf0
ttm_tt_backup+0x32/0x60 [ttm]
ttm_bo_shrink+0x66/0x110 [ttm]
xe_bo_shrink_purge+0x12b/0x1b0 [xe]
xe_bo_shrink+0xbb/0x270 [xe]
__xe_shrinker_walk+0xf7/0x160 [xe]
xe_shrinker_walk+0x9d/0xc0 [xe]
xe_shrinker_scan+0x11f/0x210 [xe]
do_shrink_slab+0x13b/0x270
shrink_slab+0xf1/0x400
shrink_node+0x352/0x8a0
balance_pgdat+0x32c/0x700
kswapd+0x205/0x2f0
? __pfx_autoremove_wake_function+0x10/0x10
? __pfx_kswapd+0x10/0x10
kthread+0xd1/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x1b1/0x200
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: b63d715b8090 ("drm/ttm/pool, drm/ttm/tt: Provide a helper to shrink pages")
Cc: stable@vger.kernel.org
Assisted-by: GitHub_Copilot:claude-opus-4.8
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260702214815.4009271-1-matthew.brost@intel.com

smb/client: flush dirty data before punching a hole

Punching a hole after a large buffered write may leave the range
reported as data. Reproduce it with:

  xfs_io -f \
    -c "pwrite -b 3m -S 0x61 0 3m" \
    -c "fpunch 1m 1m" \
    -c "seek -h 0" \
    -c "seek -d 1m" \
    /mnt/test/repro

Punching 1 MiB at offset 1 MiB should produce:

  0          1 MiB       2 MiB       3 MiB
  |  DATA    |   HOLE    |   DATA    | EOF

Instead, the entire file is reported as data. SEEK_HOLE(0) returns EOF,
and SEEK_DATA(1M) returns 1M.

This happens because a dirty folio spanning the punched range can be
written back after the punch and refill the hole.

Fix this by flushing and waiting for dirty data in the punched range
before invalidating the page cache and issuing FSCTL_SET_ZERO_DATA.

The xfstests generic/539 pass against Samba/ksmbd with this change.

Signed-off-by: Huiwen He <hehuiwen@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: Use EXPORT_SYMBOL_IF_KUNIT() to export symbols in SMB2

Replace EXPORT_SYMBOL_FOR_MODULES() with EXPORT_SYMBOL_IF_KUNIT()
to mark the symbols as visible only if CONFIG_KUNIT is enabled.

Kunit test should import the namespace EXPORTED_FOR_KUNIT_TESTING to
use these marked symbols. This is the standard way for all KUnit
tests.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: Use EXPORT_SYMBOL_IF_KUNIT() to export symbols

Replace EXPORT_SYMBOL_FOR_MODULES() with EXPORT_SYMBOL_IF_KUNIT()
to mark the symbols as visible only if CONFIG_KUNIT is enabled.

Kunit test should import the namespace EXPORTED_FOR_KUNIT_TESTING to
use these marked symbols. This is the standard way for all KUnit
tests.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

powerpc: Remove dead non-preemption code

Since commit 7dadeaa6e851 ("sched: Further restrict the preemption
modes"), powerpc always has CONFIG_PREEMPTION because only
CONFIG_PREEMPT and CONFIG_PREEMPT_LAZY are possible, even in
dynamic preemption mode (see sched_dynamic_mode).

As a consequence, need_irq_preemption() is always true and can be
removed.

And because commit bee25f97ad24 ("powerpc: Enable GENERIC_ENTRY
feature") includes linux/irq-entry-common.h which already declares
sk_dynamic_irqentry_exit_cond_resched static key, asm/preempt.h
becauses useless and can be removed.

Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/2bf10a0afffefb6aca44bf2f864cc17471a80e31.1781870889.git.chleroy@kernel.org

powerpc/dt_cpu_ftrs: Set CPU_FTR_P11_PVR for Power11 and later processors

When using device tree CPU features (dt-cpu-ftrs), the kernel bypasses
the traditional cputable-based CPU identification and instead derives
CPU features from the device tree's "ibm,powerpc-cpu-features" node
provided by firmware.

However, CPU_FTR_P11_PVR is a kernel-internal feature flag used to
identify Power11 and later processors, and is not represented in the
device tree's ISA feature set. While ISA v3.1 support (indicated by
CPU_FTR_ARCH_31) is present on both Power10 and Power11, the
CPU_FTR_P11_PVR flag is specifically needed by code that must
distinguish between Power10 and Power11 processors.

Without this flag set, code that checks for Power11 using
cpu_has_feature(CPU_FTR_P11_PVR) will incorrectly return false on
Power11+ systems using dt-cpu-ftrs, leading to incorrect behavior.

This issue manifests specifically in powernv environments (bare-metal
or QEMU TCG with powernv machine type), where skiboot/OPAL firmware
provides the "ibm,powerpc-cpu-features" node, causing the kernel to
use dt-cpu-ftrs. The issue does not affect pseries guests, where SLOF
firmware does not provide this node, causing the kernel to fall back
to the traditional cputable path (identify_cpu) which correctly sets
CPU_FTR_P11_PVR during PVR-based CPU identification.

In powernv TCG guests, the missing flag causes KVM code to trigger
warnings when attempting to create KVM guests, as cpu_features shows
0x000c00eb8f4fb187 (missing bit 53) instead of the correct
0x002c00eb8f4fb187 (with bit 53 set).

Fix this by setting CPU_FTR_P11_PVR for all processors with
PVR >= PVR_POWER11 when ISA v3.1 support is detected in
cpufeatures_setup_start(). This approach ensures forward
compatibility with future processor generations.

Fixes: 96e266e3bcd6 ("KVM: PPC: Book3S HV: Add Power11 capability support for Nested PAPR guests")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260614173437.26352-1-amachhiw@linux.ibm.com

powerpc/pseries: fix memory leak on krealloc failure in papr_init

When krealloc() fails, free the original esi_buf before returning to
avoid a memory leak.

Fixes: 3c14b73454cf ("powerpc/pseries: Interface to represent PAPR firmware attributes")
Cc: stable@vger.kernel.org
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260614142356.658212-2-thorsten.blum@linux.dev

powerpc/uaccess: correct check for CONFIG_PPC_E500 in mask_user_address()

mask_user_address() incorrectly checks for CONFIG_E500 instead of
CONFIG_PPC_E500, causing mask_user_address_isel() to not be used on
E500 hardware. Fix the check to use the correct name.

Fixes: 861574d51bbd ("powerpc/uaccess: Implement masked user access")
Cc: stable@vger.kernel.org # 7.0+
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Fixes: 861574d51bbd ("powerpc/uaccess: Implement masked user access")
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260615233729.29386-1-enelsonmoore@gmail.com

powerpc/vtime: Initialize starttime at boot for native accounting

It was observed that /proc/stat had very large value for one ore more
CPUs. It was more visible after recent code simplifications around
cpustats.

System has 240 CPUs.

cat /proc/uptime;
194.18 46500.55
cat /proc/stat
cpu  5966 39 837032887 4650070 164 185 100 0 0 0
cpu0 108 0 837030890 19109 24 4 23 0 0 0

Since uptime is 194s, system time of each CPU can't be more than 19400.
Sum of system time  of all CPUs can't be more than 19400*240 4656000.
In fact huge value is close to mftb(). Note mftb doesn't reset on powerVM
when the LPAR restart. It only resets when whole system resets. The same
issue exists for kexec too.

This happens since starttime is not setup at init time. Once it is set
then subsequent vtime_delta will return the right delta.

Fix it by initializing the starttime during CPU initialization. This
fixes the large times seen.

cat /proc/uptime; cat /proc/stat
15.78 3694.63
cpu  6035 35 1347 369479 23 144 49 0 0 0
cpu0 19 0 38 1508 0 1 14 0 0 0

Now, system time is reported as expected.

Fixes: cf9efce0ce31 ("powerpc: Account time using timebase rather than PURR")
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Suggested-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260605124329.377533-1-sshegde@linux.ibm.com

powerpc/85xx: Add fsl,ifc to common device ids

Add fsl,ifc to mpc85xx_common_ids so that of_platform_bus_probe
creates a platform device for the IFC node even without 'simple-bus'
in its compatible property. On P1010 and similar platforms the IFC
node is a direct child of the root, so it must be explicitly matched
to be populated.

Fixes: 0bf51cc9e9e5 ("powerpc: dts: mpc85xx: remove "simple-bus" compatible from ifc node")
Assisted-by: opencode:big-pickle
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260604043309.91280-1-rosenp@gmail.com

arch/riscv: vdso: remove CFI landing pad from rt_sigreturn

When CONFIG_RISCV_USER_CFI is enabled, the CFI version of the vDSO, has
a CFI landing pad instruction at the start of __vdso_rt_sigreturn. This
breaks libgcc's unwinding code which matches on the first two
instructions. Other unwinders that rely on similar instruction matching
may also be affected.

Since __vdso_rt_sigreturn is reached as part of signal-return handling
rather than via an indirect call/jump from userspace, it does not need a
CFI landing pad. Remove it and restore the instruction sequence expected
by existing unwinding code.

This matches what was done on arm64 in commit 9a964285572b ("arm64:
vdso: Don't prefix sigreturn trampoline with a BTI C instruction") for a
similar issue.

Cc: stable@vger.kernel.org
Fixes: 37f57bd3faea ("arch/riscv: compile vdso with landing pad and shadow stack note")
Co-authored-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Link: https://patch.msgid.link/20260623204058.498120-1-aurelien@aurel32.net
[pjw@kernel.org: fixed comment style]
Signed-off-by: Paul Walmsley <pjw@kernel.org>

accel/amdxdna: reject command submission on devices without a submit op

amdxdna_cmd_submit() calls xdna->dev_info->ops->cmd_submit()
unconditionally, but only aie2_dev_ops defines that callback.
aie4_vf_ops (the AIE4 SR-IOV virtual function) does not, so a user
AMDXDNA_EXEC_CMD ioctl on an AIE4 device reaches a NULL function-pointer
call and oopses the kernel. AIE4 submits work through a mapped user queue
and doorbell, not this ioctl path.

Reject the submission early with -EOPNOTSUPP when the device provides no
cmd_submit op, so the shared EXEC ioctl is a clean no-op on such devices.

Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Cc: stable@vger.kernel.org
Found by 0sec automated security-research tooling (https://0sec.ai).
Assisted-by: 0sec:claude-opus-4-8
Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260713173030.87541-3-doruk@0sec.ai

accel/amdxdna: reject user command submission without a command BO

amdxdna_drm_submit_execbuf() passes the user-supplied command BO handle
straight into amdxdna_cmd_submit() with drv_cmd == NULL. When the handle
is AMDXDNA_INVALID_BO_HANDLE (0), the block that fetches job->cmd_bo is
skipped, leaving it NULL, and no check rejects it on the user path (the
!job->cmd_bo guard lives inside the != INVALID branch).

The job is then armed and pushed to the DRM scheduler.
aie2_sched_job_run() takes the drv_cmd == NULL path and calls
amdxdna_cmd_set_state(job->cmd_bo) -> amdxdna_gem_vmap(NULL) ->
to_gobj(NULL)->dev, a NULL pointer dereference in the drm_sched worker.
A process with access to the accel node on a system with a probed AMD NPU
can trigger a kernel oops with a single AMDXDNA_EXEC_CMD ioctl
(cmd_handles = 0).

Only internal driver commands (SYNC_DEBUG_BO / ATTACH_DEBUG_BO)
legitimately pass AMDXDNA_INVALID_BO_HANDLE, and they always set drv_cmd.
Reject the invalid handle for user submissions (drv_cmd == NULL) at the
submit choke point so every user path is covered.

Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Cc: stable@vger.kernel.org
Found by 0sec automated security-research tooling (https://0sec.ai).
Assisted-by: 0sec:claude-opus-4-8
Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260713173030.87541-2-doruk@0sec.ai

i2c: mediatek: fix WRRD for SoCs without auto_restart option

MediaTek mt65xx family SoCs have no auto restart, however, they still
support the WRRD mode in the hardware. Because auto_restart is set to 0,
the WRRD mode will be never enabled, leading to read errors.

Fix this by removing auto_restart check from the WRRD enable path.

Fixes: b49218365280 ("i2c: mediatek: fix potential incorrect use of I2C_MASTER_WRRD")
Signed-off-by: Roman Vivchar <rva333@protonmail.com>
Cc: <stable@vger.kernel.org> # v6.18+
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260709-6572-6595-i2c-v2-1-b2fb8510d1d3@protonmail.com

selinux: fix incorrect execmem checks on overlayfs

The commit fixing the overlayfs mmap() and mprotect() access checks
failed to skip the execmem check in __file_map_prot_check() for the case
where the "mounter check" is being performed. This check should be
performed only against the credentials of the task that is calling
mmap()/mprotect(), since it doesn't pertain to the file itself, but
rather just gates the ability of the calling task to get an executable
memory mapping in general.

The purpose of the "mounter check" is to guard against using an
overlayfs mount to gain file access that would otherwise be denied to
the mounter. For execmem this is not relevant, as there is no further
file access granted based on it (notice that the file's context is not
used as the target in the check), so checking it also against the
mounter credentials would be incorrect.

Fix this by passing a boolean to [__]file_map_prot_check() and
selinux_mmap_file_common() that indicates if we are doing the "mounter
check" and skiping the execmem check in that case. Since this boolean
also indicates if we use current_cred() or the mounter cred as the
subject, also remove the "cred" argument from these functions and
determine it based on the boolean and the file struct.

Cc: stable@vger.kernel.org
Fixes: 82544d36b172 ("selinux: fix overlayfs mmap() and mprotect() access checks")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

i2c: mlxbf: Fix use-after-free in mlxbf_i2c_init_resource()

If devm_platform_get_and_ioremap_resource() returns an error,
mlxbf_i2c_init_resource() frees tmp_res before reading tmp_res->io to
get the error code. This results in a use-after-free.

Save the error code before freeing tmp_res.

Fixes: b5b5b32081cd ("i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC")
Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn>
Cc: <stable@vger.kernel.org> # v5.10+
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260714150808.85045-1-xuanqiang.luo@linux.dev

firmware: stratix10-svc: fix teardown order in remove to prevent race

In stratix10_svc_drv_remove(), stratix10_svc_async_exit() was called
before client devices were unregistered. This created a race window
where child devices could still be issuing service requests through
the async channels after the async infrastructure had already been
torn down.

Unregister client devices before tearing down the async threads and
channels to ensure all in-flight service calls drain before the
underlying infrastructure is destroyed.

Fixes: bcb9f4f07061 ("firmware: stratix10-svc: Add support for async communication")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Ng Ho Yin <adrian.ho.yin.ng@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>

firmware: stratix10-svc: handle NO_RESPONSE in async poll

Define INTEL_SIP_SMC_STATUS_NO_RESPONSE (0x3) and handle it in
stratix10_svc_async_poll() the same way as INTEL_SIP_SMC_STATUS_BUSY,
returning -EAGAIN so callers can retry instead of treating the poll as
a hard failure.

When the Secure Device Manager has not yet produced a response for an
asynchronous transaction, ATF is expected to return
INTEL_SIP_SMC_STATUS_NO_RESPONSE. Without this handling, the service
layer maps the status to -EINVAL and async clients cannot distinguish
"not ready yet" from a real error.

Fixes: bcb9f4f07061 ("firmware: stratix10-svc: Add support for async communication")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Ng Ho Yin <adrian.ho.yin.ng@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>

ASoC: amd: yc: Add DMI quirk for MSI Vector A16 HX A8WIG

The internal digital microphone on the MSI Vector A16 HX A8WIG is not
detected: the ACP platform devices are created, but snd_soc_acp6x_mach
never binds because the machine is missing from the DMI quirk table, so
no capture device shows up at all.

This is the same board as the already supported "Vector A16 HX A8WHG",
differing only in the trailing model code. Add the corresponding entry.

Signed-off-by: Antonio Ignacio Campos Ruiz <acamposruiz@gmail.com>
Link: https://patch.msgid.link/20260713165709.19489-1-acamposruiz@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: fsl: imx-card: Skip sysclk reset for active DAIs in shutdown

In a full-duplex setup, when one direction (playback or capture) is
closed while the other is still running, imx_aif_shutdown() was
unconditionally calling snd_soc_dai_set_sysclk() with rate=0 for all
cpu/codec DAIs, which would disable the clock still needed by the
active stream.

Add snd_soc_dai_active() checks before clearing sysclk so that only
truly inactive DAIs have their clocks reset.

Fixes: 2260bc6ea8bd ("ASoC: imx-card: Add WM8524 support")
Cc: stable@vger.kernel.org
Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
Link: https://patch.msgid.link/20260710031333.3491445-1-chancel.liu@oss.nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>

io_uring/fs: check unused sqe fields for unlinkat

Zero check unused SQE fields addr3 and pad2 for unlinkat. They're
not needed now, but could be used sometime in the future.

Signed-off-by: Yi Xie <xieyi@kylinos.cn>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://patch.msgid.link/20260714030306.64820-1-xieyi@kylinos.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: free the replaced iovec after a successful grow

The provided-buffer validation fix deferred freeing a cached iovec
until validation completed. However, the deferred free uses arg->iovs.
After a grow, that points to the newly allocated array. Without a grow,
it points to the cached array that remains in use.

This leaves the caller with a dangling iovec in both cases and can
result in repeated frees. Only free org_iovs when arg->iovs actually
replaced it.

Fixes: cd053d788c3f ("io_uring: fix dangling iovec after provided-buffer bundle grow failure")
Assisted-by: Codex:gpt-5.3-codex-spark
Signed-off-by: Jaeyeong Lee <iostreampy@proton.me>
Link: https://patch.msgid.link/20260712142612.188695595-iostreampy@proton.me
Signed-off-by: Jens Axboe <axboe@kernel.dk>

wifi: iwlwifi: mvm: d3: validate D3 resume notification payloads

D3 resume notification handlers read firmware notification fields
before validating that the payload contains the complete fixed structure.
This causes buffer underread on malformed or truncated notifications.

Move payload length validation to occur before any field access in:
- iwl_mvm_parse_wowlan_info_notif: validate before reading num_mlo_link_keys
- iwl_mvm_wait_d3_notif D3_END handler: validate before reading flags

Assisted-by: GitHub Copilot <copilot@github.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20260714141909.762193753434.I148991b8136cc5042fa08b5faf7b57d38aa2fb47@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>

wifi: iwlwifi: pcie: null RX pointers after free

When iwl_pcie_tx_init() fails after RX init, nic init unwinds via
iwl_pcie_rx_free().

The freed RX members stayed non-NULL on the live transport object,
so later teardown or retry could touch stale RX state.
Set rx_pool, global_table, rxq, and alloc_page to NULL after free
to make repeated cleanup and retry paths safe.

Assisted-by: GitHubCopilot:gpt-5.3-codex
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20260714141909.33e8978d8b36.Ibaedd4b0ce01405b940de7b90223b6d2c5136ffd@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>

wifi: iwlwifi: mvm: fix sched scan IE sizing

Scheduled scan built the probe request before iwl_mvm_scan_fits(),
so oversized IEs could be copied into the fixed preq buffer before
length validation. Move iwl_mvm_build_scan_probe() after the fits
check.

Also advertise max_sched_scan_ie_len using iwl_mvm_max_scan_ie_len()
so userspace limits account for driver-inserted DS/TPC bytes.

Assisted-by: GitHubCopilot:gpt-5.3-codex
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Link: https://patch.msgid.link/20260714141909.53d2722c79e7.Iebb922efa6173c92f14cd8aa8b4e7f372c0a0fb7@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>

wifi: iwlwifi: mld: clear tzone on fail

iwl_mld_thermal_zone_register() stores the thermal zone pointer in
mld->tzone before calling thermal_zone_device_enable(). If enable
fails, the code unregisters the zone but leaves mld->tzone stale,
so iwl_mld_thermal_zone_unregister() can unregister it again.
Clear mld->tzone after unregister in the error path.

While at it remove a pointless if in iwl_mld_thermal_zone_unregister
after we've alredy checked the tzone pointer is not NULL.

Assisted-by: GitHubCopilot:gpt-5.3-codex
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20260714141909.595dcb8cb7fe.I8125e4a2eeb0390798e3f4074c62c00443eda8e8@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>

wifi: iwlwifi: mvm: validate mac_link_id in session protect notif

Check the mac_id before accessing the vif_id_to_mac array.

Assisted-by: GitHubCopilot:gpt-5.3-codex
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20260714141909.547ea470e686.I931445ae6f37bf0e1ef6f112c811712fc48af9c9@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>

wifi: iwlwifi: mvm: validate sta_id in BA window status notif

BA_WINDOW_STATUS_NOTIFICATION_ID extracts a 5-bit sta_id from the
firmware notification and uses it to index fw_id_to_mac_id[] without
bounds checking. Validate sta_id before array access to prevent
out-of-bounds indexing.

Assisted-by: GitHubCopilot:gpt-5.3-codex
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20260714141909.2e97f337f3cb.Ic3f0f404082ccdea13809a3c0b70e0f5417e1037@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>

wifi: iwlwifi: mvm: validate sta_id in TLC notif

TLC_MNG_UPDATE_NOTIF uses firmware-provided sta_id to index
fw_id_to_link_sta[] and fw_id_to_mac_id[]. Validate sta_id
before array access to avoid out-of-bounds indexing.

Assisted-by: GitHubCopilot:gpt-5.3-codex
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20260714141909.1ce54794c1f8.I275fd4c1165bf42fb17516c550dd8813a2b8286e@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>

wifi: iwlwifi: mvm: validate MCC header before n_channels

MCC response parsing read n_channels from v8/v4/v3 response variants
before ensuring the payload contained the fixed response header.

Add a minimum payload-length check for each response version before
reading n_channels, and keep the existing exact-size validation for the
channels array payload.

Assisted-by: GitHub Copilot:gpt-5.3-codex
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20260714141909.cb2cef3d3e7e.Iee7b48614289da576de842157ad3730b7589a4b1@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>