Alice Ryhl [Thu, 7 May 2026 11:07:47 +0000 (11:07 +0000)]
rust_binder: use lock_vma_under_rcu() in shrinker
The shrinker callback currently uses the mmap read trylock operation to
attempt to access the vma, but it's generally better to only lock the
vma instead of the whole mmap when you can.
When lock_vma_under_rcu() fails, there is no reason to lock the mmap
lock instead because it's already a trylock operation that is allowed to
fail.
PCI: Add pci_suspend_retains_context() to check if device state is preserved during suspend
Currently, PCI endpoint drivers (e.g. nvme) use pm_suspend_via_firmware()
to check whether device state is preserved during system suspend. If
firmware will be invoked at the end of suspend, we don't know whether
devices will retain their internal state.
But device context might be lost due to platform issues as well. Having
those checks in endpoint drivers will not scale and will cause a lot of
code duplication.
Add pci_suspend_retains_context() as a sole point of truth that the
endpoint drivers can rely on to check whether they can expect the device
context to be retained or not.
If pci_suspend_retains_context() returns 'false', drivers need to prepare
for context loss by performing actions such as resetting the device, saving
the context, shutting it down etc. If it returns 'true', drivers do not
need to perform any special action and can leave the device in active
state.
Right now, this API only incorporates pm_suspend_via_firmware(), but will
be extended in future commits.
Merge tag 'usb-serial-7.1-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB serial fixes for 7.1-rc5
Here are a number of fixes for memory corruption and information leaks
due to missing endpoint and transfer sanity checks dating back to
simpler times when we trusted our hardware.
Included are also a fix for a recently added modem device id entry and
some new modem devices ids.
All but the last five commits have been in linux-next and with no
reported issues.
* tag 'usb-serial-7.1-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: cypress_m8: validate interrupt packet headers
USB: serial: safe_serial: fix memory corruption with small endpoint
USB: serial: omninet: fix memory corruption with small endpoint
USB: serial: mxuport: fix memory corruption with small endpoint
USB: serial: cypress_m8: fix memory corruption with small endpoint
USB: serial: option: add missing RSVD(5) flag for Rolling RW135R-GL
USB: serial: option: add MeiG SRM813Q
USB: serial: mct_u232: fix missing interrupt-in transfer sanity check
USB: serial: mct_u232: fix memory corruption with small endpoint
USB: serial: keyspan: fix missing indat transfer sanity check
USB: serial: digi_acceleport: fix memory corruption with small endpoints
USB: serial: belkin_sa: validate interrupt status length
Abel Vesa [Fri, 15 May 2026 11:21:52 +0000 (14:21 +0300)]
pinctrl: qcom: eliza: Merge QUP1_SE4 lanes in groups
QUP1_SE4 uses GPIO36 and GPIO37 for two selectable lane pairs. The
current driver exposes lanes 0, 1, 2 and 3 as independent functions.
However, since these are usually configured in pairs in devicetree,
it makes more sense to merge them into groups.
So merge the per-lane functions into qup1_se4_01 and qup1_se4_23, and list
both GPIO36 and GPIO37 in each function group.
Abel Vesa [Fri, 15 May 2026 11:21:51 +0000 (14:21 +0300)]
dt-bindings: pinctrl: qcom,eliza-tlmm: Merge QUP1_SE4 lane functions
QUP1_SE4 uses GPIO36 and GPIO37 for two selectable lane pairs. The
previous split added one function name per lane. Since these are usually
configured in pairs in devicetree, it makes more sense to have them
grouped.
So replace the per-lane qup1_se4_l[0-3] names with names for the two
selectable pairs, qup1_se4_01 and qup1_se4_23.
Fixes: 1bd5c56253c5 ("dt-bindings: pinctrl: qcom,eliza-tlmm: Split QUP1_SE4 lanes") Suggested-by: Bjorn Andersson <andersson@kernel.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Linus Walleij <linusw@kernel.org>
Frank Li [Wed, 8 Apr 2026 05:07:01 +0000 (01:07 -0400)]
pinctrl: Add OF dependency for PINCTRL_GENERIC_MUX
Add an explicit OF dependency for PINCTRL_GENERIC_MUX to ensure the
generic mux support is only enabled when device tree is available.
Also fix the stub implementation of pinctrl_generic_to_map() by correcting
its last argument to match the non-stub prototype.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202604072013.aI84l57L-lkp@intel.com/ Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Linus Walleij <linusw@kernel.org>
Tejun Heo [Fri, 22 May 2026 17:22:16 +0000 (07:22 -1000)]
bpf/arena: Add bpf_arena_map_kern_vm_start() and bpf_prog_arena()
struct bpf_arena is opaque to callers outside arena.c. Add two helpers
for struct_ops subsystems that need to reach into an arena:
bpf_arena_map_kern_vm_start(struct bpf_map *map)
returns @map's kern_vm_start. A sched_ext follow-up needs this
to translate kern_va <-> uaddr.
bpf_prog_arena(struct bpf_prog *prog)
returns the bpf_map of the arena referenced by @prog (NULL if
@prog references no arena). The verifier enforces at most one
arena per program. Used by struct_ops callers that auto-discover
an arena from a member prog and need to take a map reference.
Tejun Heo [Fri, 22 May 2026 17:22:15 +0000 (07:22 -1000)]
bpf: Add bpf_struct_ops_for_each_prog()
Add a helper that walks the member progs of the struct_ops map
containing a given @kdata vmtable. struct_ops ->reg() callbacks (and
similar) sometimes need to inspect the loaded BPF programs, e.g. to
discover maps they reference via prog->aux->used_maps.
The implementation mirrors bpf_struct_ops_id(): container_of @kdata
to recover the bpf_struct_ops_map, then iterate st_map->links[i]->prog
for i in [0, funcs_cnt). Same access pattern, no new locking - by the
time ->reg() fires st_map is fully populated and stable.
A sched_ext follow-up walks the member progs of a cid-form scheduler's
struct_ops map, reads prog->aux->arena directly, and requires all member
progs to reference exactly one arena, without requiring the BPF program
to call a registration kfunc.
Tejun Heo [Fri, 22 May 2026 17:22:14 +0000 (07:22 -1000)]
bpf: Add sleepable variant of bpf_arena_alloc_pages for kernel callers
The existing kernel-side export of bpf_arena_alloc_pages is _non_sleepable
only - it's used by the verifier to inline the kfunc when the call site is
non-sleepable. There is no sleepable equivalent for kernel callers. The
kfunc bpf_arena_alloc_pages itself is BPF-only.
sched_ext needs sleepable kernel-side allocs for its arena pool init/grow
paths. Add bpf_arena_alloc_pages_sleepable() mirroring the _non_sleepable
wrapper but passing sleepable=true to arena_alloc_pages().
bpf: Recover arena kernel faults with scratch page
BPF arena usage is becoming more prevalent, but kernel <-> BPF communication
over arena memory is awkward today. Data has to be staged through a trusted
kernel pointer with extra code and copying on the BPF side. While reads
through arena pointers can use a fault-safe helper, writes don't have a good
solution. The in-line alternative would need instruction emulation or asm
fixup labels.
Enable direct kernel-side reads and writes within GUARD_SZ / 2 of any
handed-in arena pointer, without bounds checking. A per-arena scratch page
is installed by the arch fault path into empty arena kernel PTEs - x86 from
page_fault_oops() for not-present faults, arm64 from __do_kernel_fault() for
translation faults, both after the existing exception-table and KFENCE
handling. The faulting instruction retries and the access is also reported
through the program's BPF stream, preserving error reporting.
bpf_prog_find_from_stack() resolves the current BPF program (and its arena)
from the kernel stack - no new bpf_run_ctx state is added. Recovery covers
the 4 GiB arena plus the upper half-guard (GUARD_SZ / 2). The lower
half-guard is excluded because well-behaved kfuncs only access forward from
arena pointers. The kfunc-author contract - access at most GUARD_SZ / 2 past
a handed-in pointer - is documented in Documentation/bpf/kfuncs.rst.
The install is lock-free via ptep_try_set(). On race-loss the winning
installer's PTE is already valid, so the access retry succeeds. The arena
clear path uses ptep_get_and_clear() so installer and clearer race through
atomic accessors. No flush_tlb_kernel_range() afterwards. Stale "not mapped"
entries just cause one extra re-fault, cheaper than a global IPI on every
install.
Scratch exists only to keep the kernel from oopsing on an in-line arena
access. Its presence at a PTE means the BPF program has already
malfunctioned, and the violation is reported through the program's BPF
stream. The only requirement for behavior on a scratched PTE is that the
kernel doesn't crash. In particular, any user-side access through such a PTE
may segfault. The shared scratch page is freed once during map destruction.
BPF instruction faults continue to use the existing JIT exception-table
path. This patch changes only the kernel-text fault path. No UAPI flag is
added. The new behavior is the default.
v2: Use ptep_get_and_clear() in apply_range_clear_cb(). (David)
v3: Stub bpf_arena_handle_page_fault() for !CONFIG_BPF_SYSCALL. (lkp)
Tejun Heo [Fri, 22 May 2026 17:22:12 +0000 (07:22 -1000)]
mm: Add ptep_try_set() for lockless empty-slot installs
Add ptep_try_set(ptep, new_pte): atomically set *ptep to new_pte iff it is
currently pte_none(). Returns true on success, false if the slot was already
populated or the arch has no implementation.
The intended caller is the upcoming bpf_arena kernel-side fault recovery
path. The install runs from a page fault that can be nested under locks
held by the faulting kernel caller (e.g. a BPF program holding
raw_res_spin_lock_irqsave on its arena's spinlock), so trylock-and-retry
would A-A deadlock. Lock-free cmpxchg is the only viable option, which
constrains this helper to special kernel page tables where concurrent
writers cooperate via atomic accessors.
The generic version in <linux/pgtable.h> returns false. x86 and arm64
override with try_cmpxchg-based implementations on the underlying pteval.
Other architectures get the false stub - the callers there already fall
through to oops.
v2: Rename to ptep_try_set(). Tighten kerneldoc. (David, Alexei)
v3: Note that strict-zero cmpxchg is narrower than pte_none(). (Andrea)
Linus Walleij [Sat, 23 May 2026 08:46:52 +0000 (10:46 +0200)]
Merge tag 'renesas-pinctrl-for-v7.2-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into devel
pinctrl: renesas: Updates for v7.2
- Save/restore more registers during suspend/resume on the RZ/G2L and
RZ/V2H SoC families,
- Add support for the RZ/G3L (R9A08G046) SoC,
- Add support for pinconf-groups in debugfs on EMMA Mobile,
SH/R-Mobile, R-Car, RZ/G1, and RZ/G2 SoCs,
- Miscellaneous fixes and improvements.
Tina Zhang [Fri, 22 May 2026 04:00:14 +0000 (12:00 +0800)]
KVM: SVM: Disable AVIC IPI virtualization on Hygon Family 18h (erratum #1235)
Hygon Family 18h CPUs are derived from AMD Family 17h (Zen1) silicon and
share the same erratum #1235: hardware may read a stale IsRunning=1 bit
during ICR write emulation and silently fail to generate an
AVIC_IPI_FAILURE_TARGET_NOT_RUNNING VM-Exit on the sending vCPU.
The absence of the VM-Exit causes KVM to miss the required wakeup of
blocking target vCPUs, leading to hung vCPUs and unbounded delays in
guest execution.
Extend the existing AMD Family 17h erratum #1235 workaround to also cover
Hygon Family 18h. With IPI virtualization disabled, KVM never sets
IsRunning=1 in the Physical ID table, so every non-self IPI generates a
VM-Exit and is correctly emulated.
Fixes: 8de4a1c8164e ("KVM: SVM: Disable (x2)AVIC IPI virtualization if CPU has erratum #1235") Cc: <stable@vger.kernel.org> Signed-off-by: Tina Zhang <zhang_wei@open-hieco.net>
Message-ID: <20260522040014.3380201-1-zhang_wei@open-hieco.net>
KVM: selftests: Verify that KVM returns the configured APIC cycle length
Add checks in the APIC bus clock test to verify that querying
KVM_CAP_X86_APIC_BUS_CYCLES_NS on the VM after changing the frequency
returns the VM's actual APIC cycle length, not KVM's default. For
giggles, verify that KVM still returns its default frequency for the
system-scoped check.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260522173526.3539407-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: x86: Return the VM's configured APIC bus frequency when queried
When KVM_CAP_X86_APIC_BUS_CYCLES_NS is queried on a specific VM, return the
VM's configured APIC bus frequency, not KVM's default. Aside from the fact
that returning the default frequency is blatantly wrong if userspace has
changed the frequency, returning the configured frequency means userspace
can blindly trust the result, e.g. when filling PV CPUID information that
communicates the APIC bus frequency to the guest.
Fixes: 6fef518594bc ("KVM: x86: Add a capability to configure bus frequency for APIC timer") Reported-by: David Woodhouse <dwmw2@infradead.org> Closes: https://lore.kernel.org/all/ab84153e33fbe7c25667f595c56b310d4d5a93ef.camel@infradead.org Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260522173526.3539407-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sat, 23 May 2026 08:04:35 +0000 (10:04 +0200)]
Merge tag 'kvm-riscv-fixes-7.1-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 7.1, take #1
- Fix invalid HVA warning in steal-time recording
- Return SBI_ERR_FAILURE to guest upon OOM in pmu_event_info()
and pmu_snapshot_set_shmem()
- Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler
- Fix sign extension of value for MMIO loads
cypress_read_int_callback() parses the interrupt-in buffer according to
the selected Cypress packet format. Format 1 has a two-byte status/count
header and format 2 has a one-byte combined status/count header. The
usb-serial core sizes the interrupt-in buffer from the endpoint
descriptor's wMaxPacketSize, and successful interrupt transfers can
complete short when URB_SHORT_NOT_OK is not set.
Check that the completed packet contains the selected header before
reading it. Malformed short reports are ignored and the interrupt URB is
resubmitted through the existing retry path, preventing out-of-bounds
header-byte reads.
KASAN report as below:
KASAN slab-out-of-bounds in cypress_read_int_callback+0x240/0x7f0
Read of size 1
Call trace:
cypress_read_int_callback() (drivers/usb/serial/cypress_m8.c:1009)
__usb_hcd_giveback_urb()
dummy_timer()
Fixes: 3416eaa1f8f8 ("USB: cypress_m8: Packet format is separate from characteristic size") Assisted-by: Codex:gpt-5.5 Signed-off-by: Zhang Cen <rollkingzzc@gmail.com> Fixes: 3416eaa1f8f8 ("USB: cypress_m8: Packet format is separate from characteristic size") Cc: stable@vger.kernel.org # 2.6.26
[ johan: use constants in header length sanity checks ] Signed-off-by: Johan Hovold <johan@kernel.org>
Johan Hovold [Fri, 22 May 2026 14:22:18 +0000 (16:22 +0200)]
USB: serial: safe_serial: fix memory corruption with small endpoint
Make sure that the bulk-out buffer size is at least eight bytes to avoid
user-controlled slab corruption in "safe" mode should a malicious device
report a smaller size.
Johan Hovold [Fri, 22 May 2026 14:20:58 +0000 (16:20 +0200)]
USB: serial: omninet: fix memory corruption with small endpoint
Make sure that the bulk-out buffers are at least as large as the
hardcoded transfer size to avoid user-controlled slab corruption should
a malicious device report a smaller endpoint max packet size than
expected.
Johan Hovold [Fri, 22 May 2026 14:19:50 +0000 (16:19 +0200)]
USB: serial: mxuport: fix memory corruption with small endpoint
Make sure that the bulk-out endpoint max packet size is at least eight
bytes to avoid user-controlled slab corruption should a malicious device
report a smaller size.
Stafford Horne [Fri, 22 May 2026 16:28:31 +0000 (17:28 +0100)]
openrisc: Fix jump_label smp syncing
The original commit 8c30b0018f9d ("openrisc: Add jump label support")
copies from arm64 and does not properly consider how icache invalidation
on remote cores works in OpenRISC. On OpenRISC remote icaches need to
be invalidated otherwise static key's may remain state after updating.
Fix SMP cache syncing by:
1. Properly invalidate remote core icaches on SMP systems by using
icache_all_inv. The old code uses kick_all_cpus_sync() which runs a
no-op IPI function call on remote CPU's which does execute a lot of
code and flushes many cache lines in the process, but does not flush
all and it's not correct on OpenRISC.
2. For architectures that do not have WRITETHROUGH caches be sure
to flush the dcache after patching.
To test this I first reproduced the issue using a custom test module
[0]. The test confirmed that some icache lines maintained stale
static_key code sequences after calling static_branch_enable(). After
this patch there are no longer jump_label coherency issues.
Stafford Horne [Fri, 22 May 2026 15:56:03 +0000 (16:56 +0100)]
openrisc: Add full instruction cache invalidate functions
Add functions to invalidate all cache lines which we will use for
static_key patching.
On OpenRISC there is no instruction to invalidate an entire cache so we
loop and invalidate cache lines one by one. This is not extremely
expensive on OpenRISC as we usually have only a few hundred cache lines.
I considered using the invalidate cache page or range functions.
However, tracking which ranges need invalidation would have been more
expensive than flushing all pages.
Stafford Horne [Fri, 22 May 2026 15:49:51 +0000 (16:49 +0100)]
openrisc: Cache invalidation cleanup
When working on new cache invalidation functions I noticed these
cleanups in the cache initialization code. Remove unused and commented
instructions to avoid confusion.
Alexandru Hossu [Thu, 21 May 2026 15:11:21 +0000 (17:11 +0200)]
scsi: target: iscsi: Validate CHAP_R length before base64 decode
chap_server_compute_hash() allocates client_digest as
kzalloc(chap->digest_size) and then, for BASE64-encoded responses,
passes chap_r directly to chap_base64_decode() without checking whether
the input length could produce more than digest_size bytes of output.
chap_base64_decode() writes to the destination unconditionally as long
as there is input to consume. With MAX_RESPONSE_LENGTH set to 128 and
the "0b" prefix stripped by extract_param(), up to 127 base64 characters
can reach the decoder. 127 characters decode to 95 bytes. For SHA-256
(digest_size=32) this overflows client_digest by 63 bytes; for MD5
(digest_size=16) the overflow is 79 bytes.
The length check at line 344 fires after the write has already happened.
The HEX branch in the same switch statement already validates the length
up front. Apply the same approach to the BASE64 branch: strip trailing
base64 padding characters, then reject any input whose data length
exceeds DIV_ROUND_UP(digest_size * 4, 3) before calling the decoder.
Stripping trailing '=' before the comparison handles both padded and
unpadded encodings. chap_base64_decode() already returns early on '=',
so the full original string is still passed to the decoder unchanged.
The mutual CHAP path decodes CHAP_C into initiatorchg_binhex, which is
kzalloc(CHAP_CHALLENGE_STR_LEN). extract_param() caps initiatorchg at
CHAP_CHALLENGE_STR_LEN characters, so at most CHAP_CHALLENGE_STR_LEN-1
base64 characters reach the decoder. The maximum decoded size,
DIV_ROUND_UP((CHAP_CHALLENGE_STR_LEN-1) * 3, 4), is less than
CHAP_CHALLENGE_STR_LEN, so no overflow is possible there. A comment is
added at the call site to document this.
Fixes: 1e5733883421 ("scsi: target: iscsi: Support base64 in CHAP") Cc: stable@vger.kernel.org Signed-off-by: Alexandru Hossu <hossu.alexandru@gmail.com> Reviewed-by: David Disseldorp <ddiss@suse.de> Link: https://patch.msgid.link/20260521151121.808477-1-hossu.alexandru@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: target: iscsi: Bound iscsi_encode_text_output() appends to rsp_buf
iscsi_encode_text_output() concatenates "key=value\0" records into
login->rsp_buf, an 8192-byte kzalloc(MAX_KEY_VALUE_PAIRS) buffer
allocated in iscsit_alloc_login_setup_buffer(). The three sprintf() call
sites in this function (lines 1398, 1411, 1424 in v7.1-rc2) never check
the remaining buffer capacity:
The 8192-byte ceiling at iscsi_target_check_login_request() bounds the
*input* Login PDU payload, but a single PDU can carry up to 2048 minimal
four-byte "a=b\0" pairs, each unknown key expanding to a 16-byte
"a=NotUnderstood\0" output record via iscsi_add_notunderstood_response().
2048 * 16 = 32 KiB of output into an 8 KiB buffer, producing a ~24 KiB
heap overrun in the kmalloc-8k slab.
The fix introduces a static iscsi_encode_text_record() helper that uses
snprintf() with a per-call bounds check against the remaining buffer,
and threads a u32 textbuf_size parameter through
iscsi_encode_text_output(). Both call sites in
iscsi_target_handle_csg_zero() (PHASE_SECURITY) and
iscsi_target_handle_csg_one() (PHASE_OPERATIONAL) pass
MAX_KEY_VALUE_PAIRS. On overflow the encoder logs the condition, calls
iscsi_release_extra_responses() to drop queued records, and returns -1;
both caller sites now emit ISCSI_STATUS_CLS_INITIATOR_ERR /
ISCSI_LOGIN_STATUS_INIT_ERR via iscsit_tx_login_rsp() before returning,
so the initiator sees an explicit failed-login response rather than a
silent connection drop. (Prior to this patch only the PHASE_OPERATIONAL
caller did that; the PHASE_SECURITY caller is converted to the same
shape.)
Fixes: e48354ce078c ("iscsi-target: Add iSCSI fabric support for target v4.1") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Tested-by: John Garry <john.g.garry@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: target: iscsi: Fix CRC overread and double-free in iscsit_handle_text_cmd()
Two latent bugs in the Text-phase handler, both present since the
original LIO integration in commit e48354ce078c ("iscsi-target: Add
iSCSI fabric support for target v4.1"):
1) DataDigest CRC buffer overread (4 bytes past text_in).
text_in is kzalloc()'d at ALIGN(payload_length, 4). rx_size is then
incremented by ISCSI_CRC_LEN to make room for the received DataDigest
in the iovec, but the same (now-bumped) rx_size is passed as the
buffer length to iscsit_crc_buf():
if (conn->conn_ops->DataDigest) {
...
rx_size += ISCSI_CRC_LEN;
}
...
if (conn->conn_ops->DataDigest) {
data_crc = iscsit_crc_buf(text_in, rx_size, 0, NULL);
iscsit_crc_buf() walks rx_size bytes of text_in with crc32c(), so
when DataDigest is negotiated it reads 4 bytes past the end of the
text_in allocation. KASAN reproduces this directly on the unpatched
mainline tree as slab-out-of-bounds in crc32c() called from the Text
PDU path. The OOB bytes feed crc32c() and are then compared against
the initiator-supplied checksum, so the value does not flow back to
the attacker, but the kernel does read past the buffer on every Text
PDU with DataDigest=CRC32C.
Fix by passing the actual padded payload length
(ALIGN(payload_length, 4)) that was used for the kzalloc().
2) Stale cmd->text_in_ptr re-free (double-free) on ERL>0 bad DataDigest
drop.
On DataDigest mismatch with ErrorRecoveryLevel > 0 the handler
silently drops the PDU and lets the initiator plug the CmdSN gap:
kfree(text_in);
return 0;
cmd->text_in_ptr still points at the freed buffer. The next Text
Request on the same ITT re-enters iscsit_setup_text_cmd(), which
unconditionally does
kfree(cmd->text_in_ptr);
cmd->text_in_ptr = NULL;
freeing the same pointer a second time. Session teardown via
iscsit_release_cmd() has the same shape and hits the same double-free
if the connection is dropped before a second Text Request arrives.
On an unmodified mainline tree the bug-1 CRC overread fires first on
the initial valid Text Request and perturbs the subsequent state, so
#4 was isolated by building a kernel with only the bug-1 hunk of this
patch applied plus temporary printk() observability around the three
relevant kfree() sites. The observability prints are not part of
this patch. On that build, a three-PDU Text Request sequence after
login produces two back-to-back splats:
BUG: KASAN: double-free in iscsit_setup_text_cmd+0x??
BUG: KASAN: double-free in iscsit_release_cmd+0x??
showing the same pointer freed in the ERL>0 drop path and again in
iscsit_setup_text_cmd() (next Text Request on the same ITT) and once
more in iscsit_release_cmd() (session teardown). On distro kernels
with CONFIG_SLAB_FREELIST_HARDENED=y (default) the double-free
becomes a remote kernel BUG(); on non-hardened kernels it corrupts
the slab freelist.
Fix by clearing cmd->text_in_ptr after the kfree() in the ERL>0 drop
path. With both hunks applied #4 is directly observable on the stock
tree without observability printks; fixing bug-1 alone would mask #4
less, not more, so the hunks are submitted together.
Both fixes are one-liners. The Text PDU state machine is unchanged and
the wire protocol is unaffected.
Fixes: e48354ce078c ("iscsi-target: Add iSCSI fabric support for target v4.1") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Tested-by: John Garry <john.g.garry@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
vsock/virtio: fix skb overhead overflow on 32-bit builds
On 32-bit architectures, both skb_queue_len() and SKB_TRUESIZE(0) evaluate
to 32-bit values. The multiplication can overflow before being assigned to
the u64 skb_overhead variable, making the skb overhead check ineffective.
Cast skb_queue_len() to u64 so the multiplication is always performed in
64-bit arithmetic.
This issue was reported by Sashiko while reviewing another patch.
Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue") Closes: https://sashiko.dev/#/patchset/20260518090656.134588-1-sgarzare%40redhat.com Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20260521124732.125771-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
scsi: fcoe: Reject FIP descriptors with zero fip_dlen in CVL walker
drivers/scsi/fcoe/fcoe_ctlr.c::fcoe_ctlr_recv_clr_vlink() advanced the
descriptor cursor by an attacker-supplied fip_dlen without ever
requiring dlen >= sizeof(struct fip_desc) in the default branch. The
named descriptor cases (FIP_DT_MAC, FIP_DT_NAME, FIP_DT_VN_ID) checked
their per-type minimum lengths, but a FIP_DT_NON_CRITICAL descriptor
(fip_dtype >= 128, which the standard requires receivers to silently
ignore) skipped that check entirely.
An unauthenticated L2 peer on the FCoE control VLAN could hang
fcoe_ctlr_recv_work on an fcoe, qedf, or bnx2fc initiator indefinitely
by emitting one FIP CVL frame whose single descriptor had fip_dtype ==
FIP_DT_NON_CRITICAL and fip_dlen == 0: the cursor advanced zero bytes
per iteration and the loop condition rlen >= sizeof(*desc) stayed true
forever, blocking every subsequent FIP frame on that controller.
Tighten the outer dlen guard to also reject dlen < sizeof(struct
fip_desc), so a malformed descriptor whose length cannot even cover the
descriptor header is rejected before the switch. This is the same
lower-bound the named cases already apply and is the minimum scope that
closes the loop.
Fixes: 97c8389d54b9 ("[SCSI] fcoe, libfcoe: Add support for FIP. FCoE discovery and keep-alive.") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Reviewed-by: Hannes Reinecke <hare@kernel.org> Link: https://patch.msgid.link/20260518144307.2820961-1-michael.bommarito@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: scsi_transport_fc: Widen FPIN pname walker counter to u32
An adjacent Fibre Channel fabric actor that can deliver an FPIN ELS
frame to an lpfc or qla2xxx Linux initiator can trigger a non-return in
the generic FC transport. This is not a local userspace or IP network
path; the attacker must be able to inject fabric traffic, for example as
a compromised switch or fabric controller, or as a same-zone N_Port on a
fabric that permits source spoofing.
The Link-Integrity and Peer-Congestion FPIN walkers used a u8 loop
counter against the 32-bit on-wire pname_count field, and did not bound
pname_count by the descriptor body already validated by the TLV walker.
A pname_count of 256 therefore wraps the counter and keeps the loop
condition true indefinitely.
Factor the shared pname_list[] walk into one helper, widen the counter
to u32, and clamp pname_count against the entries that fit in the
descriptor body before iterating.
Fixes: 3dcfe0de5a97 ("scsi: fc: Parse FPIN packets and update statistics") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://patch.msgid.link/20260520133015.1018937-1-michael.bommarito@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Fri, 15 May 2026 20:52:21 +0000 (13:52 -0700)]
scsi: core: Convert INQUIRY information
Currently the vendor, model, and revision members of struct scsi_device
are pointers to fixed-length strings that are not NUL-terminated.
Fixed-precision format specifiers (e.g., "%.8s") are required whenever
they are printed and strncmp() must be used to compare these fields.
This is error-prone.
Convert these fields to fixed-size character arrays within struct
scsi_device. Remove an !sdev->model check because sdev->model is now
guaranteed not to be NULL.
This patch fixes a bug in the qla2xxx driver. It makes the following
code safe:
Bart Van Assche [Fri, 15 May 2026 20:52:19 +0000 (13:52 -0700)]
scsi: core: target: Add INQUIRY-related constants to scsi_common.h
Move three constants from target/target_core_base.h into
scsi/scsi_common.h. Add three new constants in the scsi_common.h header
file. This patch prepares for using these constants in the SCSI core.
This series continues the rework of the KSZ driver initiated by a previous
series (see [1]), following the discussion we had here [2].
The KSZ driver got way too convoluted over time because it uses a common
framework to handle more than 20 switches split in 5 families (see below
table)
The previous series ([1]) replaced the unique dsa_swicth_ops struct used
by all the KSZ families with one dsa_switch_ops struct for each family.
These dsa_switch_ops structs still rely on common functions that redirect
the calls to ksz_dev_ops operations which are custom to each switch
family. Many of hese ksz_dev_ops callbacks have a direct equivalent in the
struct dsa_switch_ops. This series directly connects the implementations of
these ksz_dev_ops operations to the relevant dsa_switch_ops attribute
to get rid of one unnecessary level of indirection.
On top of this on-going rework I added PTP and periodic output support for
the KSZ8463 (which was my first goal). There are more than 60 patches for
all this so this series will be followed by several others and if you
want to see the full picture we can check my github ([3]).
I haven't finished yet to group all the patches into meaningful series
but here is more or less what I plan to do next:
- A series will split again some operations to get rid of the
if (is_kszXYZ) branches.
- Maybe another series will be needed to completely move out of
ksz_common.c everything that isn't truly common to all the switches
- A series will add PTP support for the KSZ8463
- A final series will add periodic output support for the KSZ8463
FYI, I only have a KSZ8463 so, unfortunately, I can't test other switches.
net: dsa: microchip: bypass dev_ops for phy_read()/phy_write()
phy_read() and phy_write() are handled through common functions that
redirect the treatment to ksz_dev_ops callbacks. This layer of
indirection isn't needed since we now have a dsa_switch_ops for each
kind of switch
Remove one indirection layer for KSZ switches, by connecting the
ksz_dev_ops::phy_r() and ksz_dev_ops::phy_w() operations directly to
dsa_switch_ops.
Remove the now unused phy_r()/phy_w() callbacks from ksz_dev_ops.
net: dsa: microchip: call DSA's phy_{read/write} to do mdio {read/write}
ksz_sw_mdio_read() and ksz_sw_mdio_write() respectively call
ksz_dev_ops::phy_r() and ksz_dev_ops::phy_w() just like
dsa_switch_ops::phy_read() and dsa_switch_ops::phy_write() do.
Call dsa_switch_ops::phy_read() from ksz_sw_mdio_read() and
dsa_switch_ops::phy_write() from ksz_sw_mdio_write() so we'll be able
to get rid of the useless indirections provided by ksz_dev_ops in
upcoming patch.
net: dsa: microchip: bypass dev_ops for port_setup()
port_setup() is handled through a common function that redirects
the treatment to ksz_dev_ops callbacks. This layer of indirection
isn't needed since we now have a dsa_switch_ops for each switch family
Remove one indirection layer for KSZ switches, by connecting the
ksz_dev_ops :: port_setup() operations directly to dsa_switch_ops.
Make ksz9477_set_default_prio_queue_mapping() non-static since it's used
by ksz_common for tc operations and by ksz9477.c for this port_setup().
Remove the now unused port_setup() callback from ksz_dev_ops.
Vladimir Oltean [Thu, 21 May 2026 06:12:39 +0000 (08:12 +0200)]
net: dsa: microchip: bypass dev_ops->setup() and teardown() for ksz8
The KSZ switch families are sufficiently different that a common
ds->ops->setup() - ksz_setup() with micro-managed dev_ops->reset(),
dev_ops->pcs_create(), dev_ops->config_cpu_port(),
dev_ops->enable_stp_addr(), dev_ops->setup() seems to be too convoluted.
I am proposing to make each KSZ switch family part ways for
dsa_switch_ops :: setup() and teardown(), to allow them greater
flexibility. This here is the implementation for ksz8, which is
nothing other than a copy of ksz_setup() with the dev_ops function
pointers replaced with direct function calls.
Vladimir Oltean [Thu, 21 May 2026 06:12:38 +0000 (08:12 +0200)]
net: dsa: microchip: bypass dev_ops->setup() and teardown() for ksz9477
The KSZ switch families are sufficiently different that a common
ds->ops->setup() - ksz_setup() with micro-managed dev_ops->reset(),
dev_ops->pcs_create(), dev_ops->config_cpu_port(),
dev_ops->enable_stp_addr(), dev_ops->setup() seems to be too convoluted.
I am proposing to make each KSZ switch family part ways for
dsa_switch_ops :: setup() and teardown(), to allow them greater
flexibility. This here is the implementation for ksz9477, which is
nothing other than a copy of ksz_setup() with the dev_ops function
pointers replaced with direct function calls.
Vladimir Oltean [Thu, 21 May 2026 06:12:37 +0000 (08:12 +0200)]
net: dsa: microchip: bypass dev_ops->setup() and teardown() for lan937x
The KSZ switch families are sufficiently different that a common
ds->ops->setup() - ksz_setup() with micro-managed dev_ops->reset(),
dev_ops->pcs_create(), dev_ops->config_cpu_port(),
dev_ops->enable_stp_addr(), dev_ops->setup() seems to be too convoluted.
I am proposing to make each KSZ switch family part ways for
dsa_switch_ops :: setup() and teardown(), to allow them greater
flexibility. This here is the implementation for lan937x, which is
nothing other than a copy of ksz_setup() with the dev_ops function
pointers replaced with direct function calls.
Vladimir Oltean [Thu, 21 May 2026 06:12:36 +0000 (08:12 +0200)]
net: dsa: microchip: don't reset on shutdown or driver removal
The ksz_switch driver is one of the few which reset the switch when
unbinding the driver or shutting down - in the same category with
ar9331_sw_remove(), bcm_sf2_sw_remove(), and ks8995_remove(),
vsc73xx_remove() and lan9303_remove().
I don't think there exists any requirement to do this, and in fact it
does create complications for WoL, as the code already shows.
My issue with this logic is that it is the only thing keeping
dev_ops->reset() necessary, which I would like to remove after
individual KSZ switch families get their own setup() and teardown()
methods that don't go through dev_ops.
Don't reset the switch when unbinding the driver or shutting down.
Remove the exit callbacks from the ksz_dev_ops.
Bart Van Assche [Tue, 19 May 2026 21:21:28 +0000 (14:21 -0700)]
scsi: ufs: core: Complain if UIC argument 2 is invalid
According to the UFSHCI standard, the lowest byte of UIC argument 2 is
an output value. Additionally, ufshcd_uic_cmd_compl() is based on the
assumption that the lowest byte of UIC argument 2 is zero. Hence,
complain if the result byte is set when a UIC command is submitted.
Bart Van Assche [Tue, 19 May 2026 21:21:27 +0000 (14:21 -0700)]
scsi: ufs: core: Inline two functions related to UIC commands
The implementation of the two functions ufshcd_get_uic_cmd_result() and
ufshcd_get_dme_attr_val() is very short. Additionally, both functions
only have one caller. Inline both functions to make the code shorter.
Arnd Bergmann [Tue, 19 May 2026 20:21:24 +0000 (22:21 +0200)]
scsi: megaraid_mbox: Reduce stack usage in megaraid_cmm_register()
The megaraid_cmm_register() function has a local copy of mraid_mmadp_t
on the stack that gets copied into the actual structure used at
runtime. When -fsanitize=thread is enabled, this causes the per-function
stack frame to grow beyond the warning limit:
megaraid_mbox.c: In function 'megaraid_cmm_register':
megaraid_mbox.c:3472:1: error: the frame size of 1312 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
Refactor this by moving the allocation into the caller to save the extra
on-stack copy of the structure.
The static variable sdebug_any_injecting_opt is no longer read. Commit 3a90a63d02b8 ("scsi: scsi_debug: every_nth triggered error injection")
removed all code that reads this variable. Hence, also remove this
variable itself. Remove SDEBUG_OPT_ALL_INJECTING because there is no
code left that uses this constant if sdebug_any_injecting_opt is
removed. This has been detected by building the scsi_debug driver with
the git HEAD version of Clang and with W=1.
Ewan D. Milne [Tue, 19 May 2026 20:53:56 +0000 (16:53 -0400)]
scsi: scsi_debug: Add missing newline in scsi_debug_device_reset()
A "\n" at the end of the sdev_printk() string appears to have been
inadvertently removed. Add it back for correct log message formatting.
Fixes: a743b120227a ("scsi: scsi_debug: Stop printing extra function name in debug logs") Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://patch.msgid.link/20260519205356.1040855-1-emilne@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Milan P. Gandhi [Thu, 14 May 2026 07:57:54 +0000 (13:27 +0530)]
scsi: megaraid_sas: Fix NULL pointer dereference on firmware duplicate completion
Add NULL check for scmd_local in the MPI2_FUNCTION_SCSI_IO_REQUEST case
to handle firmware duplicate/stale completions.
When firmware sends a duplicate completion for a command that was
already processed and returned to the pool, the driver accesses NULL
scmd pointer causing a crash.
Timeline of the bug:
1. Command completes normally, megasas_return_cmd_fusion() called
2. This sets cmd->scmd = NULL and clears io_request with memset(..., 0,
...)
3. Firmware sends duplicate/stale completion for same SMID (firmware
bug)
4. Driver processes reply descriptor again
5. Cleared io_request has Function = 0 (MPI2_FUNCTION_SCSI_IO_REQUEST)
6. Switch statement matches SCSI_IO_REQUEST case by accident
7. Accesses megasas_priv(NULL scmd)->status -> crash at offset 0x228
The offset 0x228 = sizeof(struct scsi_cmnd) 0x220 + offsetof(status)
0x8.
This issue was observed on PERC H330 Mini running firmware 25.5.9.0001
after 3+ days of heavy I/O load.
Crash signature:
BUG: unable to handle kernel NULL pointer dereference at 0x228
RIP: complete_cmd_fusion+0x428
Function: megasas_priv(cmd_fusion->scmd)->status
Add defensive check to skip processing when scmd_local is NULL. This
handles duplicate completions from firmware and prevents accessing freed
command structures. The check protects all scmd_local uses in both the
SCSI_IO path and the fallthrough LDIO path.
added support for the BLIST_NO_RSOC flag and specified that flag for the
Promise VTrak E610f. This current patch simply adds the E310f to that
same list.
One curiosity is the additional BLIST_SPARSELUN flag. This was also in
the 2014 patch for the E610f, and was already in place for *all* Promise
devices since 2007 due to commit e0b2e597d5dd ("[SCSI] stex: fix id
mapping issue") which added the line:
{"Promise", "", NULL, BLIST_SPARSELUN}
The 2007 commit message talks of issues with SuperTrak EX (stex) but the
added line did not limit itself to that particular device family. The
current patch for E310F, like the 2014 patch for E610f, adds
BLIST_NO_RSOC while preserving BLIST_SPARSELUN from 2007.
Can Guo [Fri, 1 May 2026 13:16:40 +0000 (06:16 -0700)]
scsi: ufs: core: Add a quirk for extended TX EQTR Adapt L0L1L2L3 length
Add a quirk to support TX Equalization Training (EQTR) using Adapt
L0L1L2L3 length which is larger than what is allowed by M-PHY spec ver
6.0.
Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Reviewed-by: Ziqi Chen <ziqi.chen@oss.qualcomm.com> Link: https://patch.msgid.link/20260501131641.826258-2-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
David Jeffery [Fri, 15 May 2026 18:09:41 +0000 (14:09 -0400)]
scsi: core: Run queues for all non-SDEV_DEL devices from scsi_run_host_queues
While a SCSI host is in a recovery state, scsi_mq_requeue_cmd() will not
set the requeue list for a requeued command to be kicked in the future.
The expectation is a call to scsi_run_host_queues() will kick all SCSI
devices once the recovery state is cleared.
However, scsi_run_host_queues() uses shost_for_each_device() which uses
scsi_device_get() and so will ignore devices in a partially removed
state like SDEV_CANCEL. But these devices may also have requeued
requests, leaving their requests stuck from not being kicked and causing
the removal process of the device to hang.
scsi_run_host_queues() needs to run against more devices than the macro
shost_for_each_device() allows. Instead of using the too limiting
scsi_device_get() state checks, only ignore devices in SDEV_DEL state or
when unable to acquire a reference. Attempt to run the queues for all
other devices when scsi_run_host_queues() is called.
Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary") Signed-off-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260515180941.9698-1-djeffery@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Breno Leitao [Thu, 21 May 2026 14:11:45 +0000 (07:11 -0700)]
net/iucv: fix locking in .getsockopt
Mirror iucv_sock_setsockopt() and wrap the whole switch in
lock_sock()/release_sock(). The pre-existing SO_MSGLIMIT-only lock
becomes redundant and is removed.
Any AF_IUCV HIPER user can potentially crash the kernel by racing
recvmsg() with getsockopt(SO_MSGSIZE): the SO_MSGSIZE arm dereferences
iucv->hs_dev->mtu after iucv_sock_close() (called from the racing
recvmsg()) has set hs_dev to NULL, producing a NULL pointer dereference
oops.
Suggested-by: Stanislav Fomichev <sdf.kernel@gmail.com> Fixes: 51363b8751a6 ("af_iucv: allow retrieval of maximum message size") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Tested-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20260521-af_iucv_fix2-v1-1-f16b1c510aa9@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ian Rogers [Tue, 19 May 2026 01:41:06 +0000 (18:41 -0700)]
perf tests: Add test for stat delay option with duration_time
Add a new test case `test_stat_delay` to `stat.sh` to verify that
`duration_time` correctly excludes the delay period when using the
delay option (-D).
The test runs `perf stat -D 1000 -e duration_time sleep 2` and
verifies that `duration_time` is ~1s (excluding the 1s delay), not
~2s.
Assisted-by: Antigravity:gemini-3-flash Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Francesco Nigro <nigro.fra@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Alexandra Winter [Thu, 21 May 2026 14:56:39 +0000 (16:56 +0200)]
net/smc: Do not re-initialize smc hashtables
INIT_HLIST_HEAD(&smc_v*_hashinfo.ht) are called after smc_nl_init(),
proto_register() and sock_register(). This can lead to smc_v*_hashinfo.ht
being reset even though hash entries already exist and are being used,
possibly resulting in a corrupted list.
Remove unnecessary and dangerous re-initialisation of smc_v*_hashinfo.ht in
smc_init(); it is implicitly initialised to zero anyhow. Add
HLIST_HEAD_INIT to the definitions for clarity.
Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets") Suggested-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Mahanta Jambigi <mjambigi@linux.ibm.com> Link: https://patch.msgid.link/20260521145639.10317-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ian Rogers [Tue, 19 May 2026 01:41:05 +0000 (18:41 -0700)]
perf tool_pmu: Make tool PMU events respect enable/disable
Tool PMU events (duration_time, user_time, system_time) currently
count from when the event is opened to when it is read. This causes
issues with features like the delay option (-D) or control fd, where
events are opened but should not start counting immediately.
Make these events behave more like regular counters by implementing
proper enable and disable support. Add accumulated_time to struct
evsel to track time while enabled, and implement enable/disable CPU
callbacks to start/stop counting.
Also generalize userspace PMU mixed group handling. Userspace synthetic
PMUs (type > PERF_PMU_TYPE_PE_END) do not have kernel implementations and
cannot be grouped in the kernel (opened with group_fd = -1), and are
skipped by kernel enable/disable calls. Iterate over group members in
userspace and manually enable/disable any members if the leader or the
member is a non-perf-event open PMU, and synchronize their disabled flags.
Closes: https://lore.kernel.org/linux-perf-users/20260517093650.2540920-1-nigro.fra@gmail.com/ Fixes: b71f46a6a7086175 ("perf stat: Remove hard coded shadow metrics") Reported-by: Francesco Nigro <nigro.fra@gmail.com> Assisted-by: Antigravity:gemini-3-flash Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Abid Ali [Thu, 21 May 2026 16:32:46 +0000 (16:32 +0000)]
net: stmmac: mmc: Remove duplicate mmc_rx crc
MMC_XGMAC_RX_CRC_ERR is clear-on-read, and just a single read would
update the mmc_rx_crc_error counter.
The duplicate read appears to have been unintentionally introduced in
the intial MMC counter implementation [1]. The databook does not mention
MMC_XGMAC_RX_CRC_ERR needing the additional read.
Ravi Bangoria [Fri, 8 May 2026 06:00:04 +0000 (06:00 +0000)]
perf doc: Document new IBS capabilities in man page
Include examples of:
o Privilege filter with Fetch and Op PMUs, including swfilt approach on
Zen5 and older platforms and hardware assisted filter on Zen6 and newer
platforms
o Streaming store filter with Op PMU
o Fetch latency filter with Fetch PMU
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Manali Shukla <manali.shukla@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ravi Bangoria [Fri, 8 May 2026 06:00:03 +0000 (06:00 +0000)]
perf amd ibs: Decode Streaming-store flag in IBS OP raw dump
IBS OP on Zen6 and future platform can tag IBS samples that originate from
streaming-store instruction. When the PMU advertises this feature,
interpret IBS_OP_DATA2[8] bit as the streaming store indicator and show
it in the raw dump output.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Manali Shukla <manali.shukla@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ravi Bangoria [Fri, 8 May 2026 06:00:02 +0000 (06:00 +0000)]
perf amd ibs: Decode Remote-Socket flag in IBS OP raw dump
IBS OP on Zen6 and future platform can mark a data source as coming from
a remote socket. When the PMU advertises this feature, interpret
IBS_OP_DATA2[9] bit as the Remote-Socket indicator and show it in the
raw dump output.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Manali Shukla <manali.shukla@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ravi Bangoria [Fri, 8 May 2026 06:00:01 +0000 (06:00 +0000)]
perf amd ibs: Make Fetch status bits dependent on PhyAddrValid for newer platforms
On Zen6 and future platforms, IBS_FETCH_CTL status fields are valid only
if IBS_FETCH_CTL[IbsPhyAddrValid] is set. Same for IBS_FETCH_CTL_EXT.
Add these checks while decoding IBS MSRs.
Unfortunately, there is no CPUID bit to indicate the change. Fallback
to Family/Model check.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Manali Shukla <manali.shukla@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ravi Bangoria [Fri, 8 May 2026 06:00:00 +0000 (06:00 +0000)]
perf amd ibs: Suppress bogus TlbRefillLat and DCPhysAd on Zen4+
On Zen4 (and future) CPUs, IBS_OP_DATA3[TlbRefillLat] is valid only if
IBS_OP_DATA3[DcPhyAddrValid] is set. Similarly, IBS_DC_PHYSADDR is valid
if IBS_OP_DATA3[DcLinAddrValid] is _also_ set. Add these checks while
decoding IBS MSRs.
When IBS is triggered by an unprivileged user, the kernel now zeroes
PhysAddr before storing raw IBS register values in the perf sample. The
perf tool, however, still outputs these zero physical addresses, which
serves no purpose. So avoid printing zero physical addresses.
Instead of explicit family/model checks use the !zen4_ibs_extensions as
a proxy flag to cover Zen 3 and earlier revisions.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Manali Shukla <manali.shukla@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ravi Bangoria [Fri, 8 May 2026 05:59:59 +0000 (05:59 +0000)]
perf test ibs: Skip privilege test on Zen6 and newer platforms
IBS on pre-Zen6 platforms lacked a hardware privilege filter, so the
kernel enabled swfilt=1. Zen6 and newer platforms provides privilege
filtering via the RIP[63] bit, making swfilt redundant. Skip the perf
unit test that assumes IBS has no hardware-assisted privilege filter
on Zen6 and newer platforms.
swfilt is ignored by kernel on platforms that support RIP[63] bit filter
i.e. all amd-ibs-swfilt.sh tests will test hardware assisted privilege
filter.
Without the patch on Zen6:
# sudo ./perf test -vv 77
77: AMD IBS software filtering:
--- start ---
test child forked, pid 30813
check availability of IBS swfilt
run perf record with modifier and swfilt
[FAIL] IBS PMU should not accept exclude_kernel
---- end(-1) ----
77: AMD IBS software filtering : FAILED!
With the patch:
# ./perf test -vv 77
77: AMD IBS software filtering:
--- start ---
test child forked, pid 30903
check availability of IBS swfilt
run perf record with modifier and swfilt
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.000 MB /dev/null ]
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.000 MB /dev/null ]
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.000 MB /dev/null ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB /dev/null ]
check number of samples with swfilt
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.051 MB - ]
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.063 MB - ]
---- end(0) ----
77: AMD IBS software filtering : Ok
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Manali Shukla <manali.shukla@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ravi Bangoria [Fri, 8 May 2026 05:59:58 +0000 (05:59 +0000)]
perf tool ibs: Sync AMD IBS header file
IBS_OP_DATA2 register will have two more fields: strm_st and rmt_socket
in Zen6 and future AMD platforms. Kernel header file is already updated.
Add those fields in tools copy as well.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Manali Shukla <manali.shukla@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Yang [Thu, 21 May 2026 01:03:07 +0000 (09:03 +0800)]
net: dsa: yt921x: Add port TBF support
React to TC_SETUP_QDISC_TBF and configure the egress shaper as
appropriate with the maximum rate and burst size requested by the user.
Per queue shaper is possible, though not touched in this commit.
David Yang [Thu, 21 May 2026 01:03:06 +0000 (09:03 +0800)]
net/sched: tbf: add extack to offload params
Drivers might have error messages to propagate to user space. Propagate
the netlink extack so that they can inform user space in a verbal way of
their limitations.
====================
ARCnet: remove outdated drivers and information and unused code; small cleanups and documentation improvements
This patch series mainly removes the ISA and PCMCIA ARCnet drivers and
documentation for them and hardware they supported. While ARCnet is still
used in industrial environments, and cards are still manufactured,
it is unlikely anyone is still using it with ISA and PCMCIA cards.
Removing these drivers reduces future maintenance burden.
While updating the ARCnet documentation to remove references to the removed
drivers, I noticed that it contained thousands of lines of outdated and
irrelevant information (much of it so outdated that it would not even work
on modern kernels). I took the opportunity to remove this information
and improve the writing style slightly.
I noticed that the BUS_ALIGN macro was always defined to 1, which meant
that the custom arcnet_in/out/read/write* I/O macros were unnecessary.
I expanded and removed them to make the code more straightforwards.
I also corrected some typos and comments.
====================
When compiling the com20020-pci driver with W=1, I received the
following warning:
drivers/net/arcnet/com20020-pci.c:224:71: warning: ‘%d’ directive
output may be truncated writing between 1 and 11 bytes into a region of
size between 10 and 11 [-Wformat-truncation=]
224 | snprintf(dev->name, sizeof(dev->name), "arc%d-%d", dev->dev_id, i);
In reality, this does not represent a problem, because i is bounded by
the .devcount field in struct com20020_pci_card_info, which is
statically defined for each card and very small. Quiet the invalid
warning by changing the type of i and the .devcount field to be
narrower.
The ARCnet documentation contains a lot of outdated and irrelevant
information (such as changes in decades-old driver versions and
messages from a former maintainer) and has some writing style issues.
Remove this unnecessary information and improve the writing style. Also
remove links to pages that no longer exist.
Now that the BUS_ALIGN variable has been removed, the
arcnet_in/out/read/write* macros behave identically to the functions
they wrap. Expand them and remove their definitions to make the code
easier to maintain.
net: arcnet: remove code depending on nonexistent config option
The CONFIG_SA1100_CT6001 option has never existed in the kernel. Remove
code in arcdevice.h referring to it. This allows the
arcnet_(in|out)(s|)b macros to be simplified by removing the BUS_ALIGN
macro.
net: arcnet: remove ISA and PCMCIA support; modernize documentation
While ARCnet is still used in industrial environments, and cards are
still manufactured, it is unlikely anyone is still using it with ISA
and PCMCIA cards. Reduce future maintenance burden by removing all ISA
and PCMCIA ARCnet drivers and documentation related to them. Update
instructions for loading modules and passing parameters to work on
modern kernels and with the com20020_pci driver. Also take the
opportunity to document the rest of the module parameters, correct a
file path in Documentation/networking/arcnet.rst, and change a
reference to /etc/rc.inet1, which no longer exists, to refer to
ifconfig.
net: arcnet: com20020: remove misleading references to multicast
ARCnet does not support multicast, only unicast and broadcast. In spite
of this, the com20020 driver contains several references to multicast
in a comment and a function name, including a FIXME that it should be
implemented. Adjust the comment to make the lack of multicast support
clear and rename com20020_set_mc_list to com20020_set_rx_mode.
====================
netlink: fixes for cross-namespace nsid reporting
While working on some new features for OVS and OVN we discovered that
self-referential NSIDs get unintentionally allocated in the system as
well as unexpectedly reported for local events on all-nsid listeners.
More details in the patches. They change user-visible behavior, but
the current behavior is arguably a bug, as it makes it hard to use
all-nsid sockets without a decent amount of extra unrelated work of
tracking when new NSIDs are allocated for your local namespace.
Tests are added to check the expected behavior and YNL is extended
to support all-nsid sockets in the tests.
====================