Add a device tree overlay for the Toradex Capacitive Touch Display 7"
DSI on the Verdin DSI_1 interface. The display features an internal
Texas Instruments SN65DSI83 DSI-to-LVDS bridge driving a Riverdi
RVT70HSLNWCA0 7" WSVGA IPS TFT LCD panel. The touch input is provided
by an Ilitek ILI2132 capacitive touch controller.
Add a device tree overlay for the Toradex Capacitive Touch Display 10.1"
on the Verdin DSI_1 interface. The display features an internal
Texas Instruments SN65DSI83 DSI-to-LVDS bridge driving a Riverdi
RVT101HVLNWC00 10.1" WXGA (1280x800) IPS TFT LCD panel. The touch input
is provided by an Ilitek ILI2132 capacitive touch controller.
The overlay is also combined with the Verdin AM62 Dahlia carrier board
device trees to provide ready-to-use DTBs in both WiFi and non-Wifi SoM
variants.
Add a device tree overlay for the Toradex Capacitive Touch Display 10.1"
LVDS connected via Verdin AM62 OLDI on carrier boards exposing LVDS
interface (e.g., Mallow). The panel is a LogicTechno LT170410-2WHC 10.1"
WXGA IPS LCD and the touch input is provided by an Atmel MaxTouch
capacitive touch controller.
Vitor Soares [Fri, 22 May 2026 16:11:05 +0000 (17:11 +0100)]
arm64: dts: ti: k3-am62-verdin: Add Toradex DSI to LVDS adapter with 10.1" display
Add a device tree overlay for the Toradex DSI to LVDS Adapter with the
Toradex Capacitive Touch Display 10.1" LVDS. The adapter connects to the
Verdin DSI_1 interface. It is based on the Texas Instruments SN65DSI84
DSI-to-LVDS bridge and drives a LogicTechno LT170410-2WHC 10.1" WXGA LVDS
panel. Touch input is provided by an Atmel MaxTouch capacitive touch
controller.
Arnd Bergmann [Fri, 29 May 2026 09:41:20 +0000 (11:41 +0200)]
pinctrl: tegra238: remove unused entries
The -Wunused-const-variable check points out a number of added
entries that are currently not referenced:
drivers/pinctrl/tegra/pinctrl-tegra238.c:1169:27: error: 'soc_gpio86_phh3_pins' defined but not used [-Werror=unused-const-variable=]
1169 | static const unsigned int soc_gpio86_phh3_pins[] = {
| ^~~~~~~~~~~~~~~~~~~~
drivers/pinctrl/tegra/pinctrl-tegra238.c:1165:27: error: 'uart5_cts_phh2_pins' defined but not used [-Werror=unused-const-variable=]
1165 | static const unsigned int uart5_cts_phh2_pins[] = {
| ^~~~~~~~~~~~~~~~~~~
drivers/pinctrl/tegra/pinctrl-tegra238.c:1161:27: error: 'uart5_rts_phh1_pins' defined but not used [-Werror=unused-const-variable=]
1161 | static const unsigned int uart5_rts_phh1_pins[] = {
| ^~~~~~~~~~~~~~~~~~~
Remove them for now, they can just be added back if they get
used in the future.
1) xfrm: route MIGRATE notifications to caller's netns
Thread the caller's netns through km_migrate() so that
MIGRATE notifications go to the issuing netns, fixing both the
init_net listener leak and MOBIKE notifications inside
non-init netns. From Maoyi Xie.
2) xfrm: ipcomp: Free destination pages on acomp errors
Move the out_free_req label up so that allocated destination
pages are released on decompression errors, not only on success.
From Herbert Xu.
3) xfrm: Check for underflow in xfrm_state_mtu
Reject configurations that cause xfrm_state_mtu() to underflow,
preventing a negative TFCPAD value from becoming a memset size
that triggers an out-of-bounds write of several terabytes.
From David Ahern.
4) xfrm: ah: use skb_to_full_sk in async output callbacks
Convert the possibly-incomplete skb->sk to a full socket pointer
in async AH callbacks so that a request_sock or timewait_sock
never reaches xfrm_output_resume() downstream consumers.
From Michael Bommarito.
5) Add and revert: esp: fix page frag reference leak on skb_to_sgvec failure
The patch does not fix te issue completely.
6) xfrm: esp: restore combined single-frag length gate
Check the aligned post-trailer combined length against a page limit
in the fast path, preventing skb_page_frag_refill() from falling
back to a page too small for the destination scatterlist.
From Jingguo Tan.
7) xfrm: iptfs: reset runtime state when cloning SAs
Reinitialise the clone's mode_data runtime objects before
publishing it, preventing queued skbs from being freed with
list state copied from the original SA when migration fails.
From Shaomin Chen.
8) xfrm: move policy_bydst RCU sync from per-netns .exit to .pre_exit
Flush policy tables and drain the workqueue in a .pre_exit handler
so that cleanup_net() pays one RCU grace period per batch instead
of one per namespace, fixing stalls at high CLONE_NEWNET rates.
From Usama Arif.
9) xfrm: input: hold netns during deferred transport reinjection
Take a netns reference when queueing deferred transport reinjection
work and drop it after the callback completes, keeping the skb->cb
net pointer valid until the deferred work runs.
From Zhengchuan Liang.
* tag 'ipsec-2026-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
Revert "esp: fix page frag reference leak on skb_to_sgvec failure"
xfrm: input: hold netns during deferred transport reinjection
xfrm: move policy_bydst RCU sync from per-netns .exit to .pre_exit
xfrm: iptfs: reset runtime state when cloning SAs
xfrm: esp: restore combined single-frag length gate
esp: fix page frag reference leak on skb_to_sgvec failure
xfrm: ah: use skb_to_full_sk in async output callbacks
xfrm: Check for underflow in xfrm_state_mtu
xfrm: ipcomp: Free destination pages on acomp errors
xfrm: route MIGRATE notifications to caller's netns
====================
Pavel Begunkov [Thu, 28 May 2026 18:43:53 +0000 (19:43 +0100)]
net: skbuff: fix pskb_carve leaking zcopy pages
When SKBFL_MANAGED_FRAG_REFS is set, frag pages are not refcounted but
their lifetime is controlled by the attached ubuf_info. To make a copy
of the skb_shared_info, we either should clear the flag and reference
the frags, or keep the flag and have frags unreferenced.
pskb_carve_inside_header() and pskb_carve_inside_nonlinear() don't
follow the rule and thus can leak page references. Let's clear
SKBFL_MANAGED_FRAG_REFS from the original skb to fix it. It's the
simplest way to address it, but there are more performant ways to do
that if it ever becomes a problem.
Jiayuan Chen [Wed, 27 May 2026 05:31:31 +0000 (13:31 +0800)]
ipv6: fix possible infinite loop in fib6_select_path()
Found while auditing the same pattern Sashiko reported in
rt6_fill_node() [1]. Apply the same fix as
commit f8d8ce1b515a ("ipv6: fix possible infinite loop in fib6_info_uses_dev()").
Writers holding tb6_lock can list_del_rcu(&first->fib6_siblings)
without waiting for RCU readers; first->fib6_siblings.next then
still points into the old ring and this softirq-side walker never
reaches &first->fib6_siblings as its terminator. fib6_purge_rt()
always WRITE_ONCE()s first->fib6_nsiblings to 0 before
list_del_rcu(), so an inside-loop check is a reliable detach signal.
Fixes: d9ccb18f83ea ("ipv6: Fix soft lockups in fib6_select_path under high next hop churn") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260527053133.180695-2-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiayuan Chen [Wed, 27 May 2026 05:31:30 +0000 (13:31 +0800)]
ipv6: fix possible infinite loop in rt6_fill_node()
Sashiko reported this issue [1]. Apply the same fix as
commit f8d8ce1b515a ("ipv6: fix possible infinite loop in fib6_info_uses_dev()").
Writers holding tb6_lock can list_del_rcu(&rt->fib6_siblings)
without waiting for RCU readers; rt->fib6_siblings.next then still
points into the old ring and this softirq-side walker never reaches
&rt->fib6_siblings, causing a CPU stall. fib6_del_route() always
WRITE_ONCE()s rt->fib6_nsiblings to 0 before list_del_rcu(), so an
inside-loop check is a reliable detach signal.
Fixes: d9ccb18f83ea ("ipv6: Fix soft lockups in fib6_select_path under high next hop churn") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260527053133.180695-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yuqi Xu [Wed, 27 May 2026 03:48:15 +0000 (11:48 +0800)]
bpf: sockmap: fix tail fragment offset in bpf_msg_push_data
When bpf_msg_push_data() inserts data in the middle of a scatterlist
entry, it splits the original entry into a left fragment and a right
fragment.
The right fragment offset is page-local, but the code advances it with
`start`, which is the message-global insertion point. For inserts into a
non-first SG entry, this over-advances the offset and leaves the split
layout inconsistent.
Advance the right fragment offset by the fragment-local delta,
`start - offset`, which matches the length removed from the front of the
original entry.
Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yuqi Xu <xuyq21@lenovo.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Link: https://patch.msgid.link/8b129d10566aa3eb43f61a8f9757bcf51707d324.1779636774.git.xuyq21@lenovo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jingguo Tan [Wed, 27 May 2026 02:33:01 +0000 (10:33 +0800)]
vsock/virtio: bind uarg before filling zerocopy skb
virtio_transport_send_pkt_info() allocates or reuses the zerocopy uarg
before entering the send loop, but virtio_transport_alloc_skb() still
fills the skb before it inherits that uarg. When fixed-buffer vectored
zerocopy hits MAX_SKB_FRAGS, io_sg_from_iter() may partially attach
managed frags and return -EMSGSIZE. The rollback path call kfree_skb()
to free an skb that carries SKBFL_MANAGED_FRAG_REFS but no uarg, so
skb_release_data() falls through to ordinary frag unref.
Pass the uarg into virtio_transport_alloc_skb() and bind it immediately
before virtio_transport_fill_skb(). This keeps control or no-payload skbs
untouched while ensuring success and rollback share one lifetime rule.
Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Signed-off-by: Lin Ma <malin89@huawei.com> Signed-off-by: Rongzhen Cui <cuirongzhen@huawei.com> Signed-off-by: Jingguo Tan <tanjingguo@huawei.com> Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260527023301.1075581-1-malin89@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sven Eckelmann [Tue, 12 May 2026 20:03:53 +0000 (22:03 +0200)]
batman-adv: use atomic_xchg() for gw.reselect check
batadv_gw_election() only needs to test whether gw.reselect was set and
clear it afterwards. Replace the batadv_atomic_dec_not_zero()
[atomic_add_unless(..., -1, 0)] call with atomic_xchg(..., 0) to simplify
the logic and make the intent more explicit.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Sven Eckelmann [Sun, 3 May 2026 20:46:14 +0000 (22:46 +0200)]
batman-adv: add missing includes
Some of the recent fixes required features from new header files. There is
currently no build problem because transitive includes take care of it. But
the batman-adv source code tries to avoid the dependency to
transitive/implicite includes because it has no control over them and they
might get removed at some point.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Sven Eckelmann [Wed, 6 May 2026 06:44:32 +0000 (08:44 +0200)]
MAINTAINERS: Don't send batman-adv patches to netdev
Do not send batman-adv patches directly to the networking maintainers or
the netdev mailing list for initial review. Keeping these patch iterations
off netdev reduces review load, especially for patchsets that require
multiple revisions before reaching consensus.
After the review was finished on the BATMAN ADVANCED mailing list, the
patches will be queued up and then submitted to netdev as PR, including the
full patchset in the same thread (for the last review).
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Sven Eckelmann [Tue, 5 May 2026 08:23:06 +0000 (10:23 +0200)]
MAINTAINERS: Rename batman-adv T(ree)
Replace the batman-adv tree name "linux-merge" with "batadv" to match the
patch prefix convention used in subject lines (e.g. "[PATCH batadv 1/10]").
The previous name "linux-merge" was ambiguous and was not suitable as a
easy-to-recognize prefix. Routing of patches to net-next vs. net remains at
maintainer discretion.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Sven Eckelmann [Fri, 15 May 2026 06:41:19 +0000 (08:41 +0200)]
batman-adv: drop batman-adv specific version
Bumping the version number on the first pull request after each merge
window was deemed inappropriate for an in-tree component. The version
number carries little meaningful information in the context of the Linux
kernel release model, where stable and distribution might all carry
slightly different patches (without any change to the batman-adv version).
Instead, expose a UTS_RELEASE-based string to consumers of the netlink and
ethtool interfaces. To avoid recompilation for each (re)generate of
generated/utsrelease.h, init_utsname()->release is used in code which can
dynamically retrieve the version string. The MODULE_VERSION is moved to a
separate file because it doesn't support dynamic retrieval of the version
string (but constant "at compile time" string) and it is required for the
/sys/module/batman_adv/version. The latter is unfortunately still required
by userspace tools.
KVM: SEV: Use READ_ONCE() when reading entries/indices from PSC buffer
Use READ_ONCE() when reading entries/indices from the guest-accessible
Page State Change buffer to defend against TOCTOU bugs.
Don't bother with READ_ONCE()/WRITE_ONCE() for cases where KVM is writing
(and not consuming the result!), as the guest isn't supposed to touch the
buffer while it's being processed. I.e. using READ_ONCE() is all about
protecting against misbehaving guests.
Fixes: 9b54e248d264 ("KVM: SEV: Add support to handle Page State Change VMGEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Check PSC request indices against the actual size of the buffer
When processing Page State Change (PSC) requests, validate the PSC buffer
against the effective size of the scratch area, which could be less than
the maximum size if the guest provided a pointer that isn't exactly at the
start of the GHCB shared buffer.
Fixes: 9b54e248d264 ("KVM: SEV: Add support to handle Page State Change VMGEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Don't explicitly pass PSC buffer to snp_begin_psc()
Stop explicitly passing the PSC buffer to snp_begin_psc(): it *must*
be the scratch area. This will allow fixing a variety of bugs without
further complicating the code.
No functional change intended.
Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: WARN if KVM attempts to setup scratch area with min_len==0
Now that all paths in KVM properly validate the length needed for the
scratch area, and are guaranteed to pass in a non-zero length, WARN if KVM
attempts to configured the scratch area with min_len==0 to guard against
future bugs.
Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Compute the correct max length of the in-GHCB scratch area
When setting the length of the GHCB scratch area, and the area is in the
GHCB shared buffer, set the effective length of the scratch area to the max
possible size given the start of the guest-provided pointer, and the end of
the shared buffer.
The code was "fine" when first introduced, as KVM doesn't consult the
length of the buffer when emulating MMIO, because the passed in @len always
specifies the *max* size required. But for PSC requests, the incoming @len
is just the minimum length (to process the header), and KVM needs to know
the full size of the scratch area to avoid buffer overflows (spoiler alert).
Opportunistically rename @len => @min_len to better reflect its role.
Fixes: 9b54e248d264 ("KVM: SEV: Add support to handle Page State Change VMGEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: SEV: Use the size of the PSC header as the minimum size for PSC requests
When handling a Page State Change (PSC) #VMGEXIT use the size of the PSC
header as the minimum size for the scratch area. Per the GHCB spec, PSC
requests do NOT provide the length, i.e. using control->exit_info_2 for the
length is completely made up behavior. The existing code "works", e.g.
even though Linux-as-a-guest always passes '0', because KVM doesn't do
anything with the length when the request is in the GHCB's shared buffer.
Use the header as the min length. Once the header is retrieved, KVM can
use the specified indices to compute the full size of the request.
Fixes: 9b54e248d264 ("KVM: SEV: Add support to handle Page State Change VMGEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Explicitly ignore Port I/O requests of length '0' (or count '0'), so that
setting up the software scratch area (and other code) doesn't have to
worry about underflowing the length, and to allow for WARNing on trying
to configure the scratch area with len==0.
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Explicitly ignore MMIO requests of length '0', so that setting up the
software scratch area (and other code) doesn't have to worry about
underflowing the length, and to allow for special casing '0' in the
future.
Fixes: 8f423a80d299 ("KVM: SVM: Support MMIO for an SEV-ES guest") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260501202250.2115252-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Michael Roth [Fri, 1 May 2026 20:22:26 +0000 (13:22 -0700)]
KVM: SEV: Require in-GHCB scratch area if GHCB v2+ is in use
As per the GHCB spec, when using GHCB v2+ require the software scratch area
to reside in the GHCB's shared buffer. Note, things like Page State Change
(PSC) requests _rely_ on this behavior, as the guest can't provide a length
when making the request, i.e. the size of the guest payload is bounded by
the size of the shared buffer.
Failure to force usage of the GHCB, and a slew of other flaws, lets a
malicious SNP guest corrupt host kernel heap memory, and leak host heap
layout information.
setup_vmgexit_scratch() allocates a buffer via kvzalloc(exit_info_2),
where exit_info_2 is guest-controlled. With exit_info_2=24, this yields
a 24-byte allocation in kmalloc-cg-32 (32-byte slab objects). The buffer
holds an 8-byte psc_hdr followed by 8-byte psc_entry structs, so only
entries[0] and entries[1] are in-bounds.
snp_begin_psc() validates end_entry against VMGEXIT_PSC_MAX_COUNT (253)
but NOT against the actual buffer size:
idx_end = hdr->end_entry;
if (idx_end >= VMGEXIT_PSC_MAX_COUNT) { // checks 253, not buffer
snp_complete_psc(svm, ...);
return 1;
}
for (idx = idx_start; idx <= idx_end; idx++) {
entry_start = entries[idx]; // OOB when idx >= 2
The guest sets end_entry=10+, causing the host to iterate entries[2+]
which are OOB into adjacent slab objects. For each OOB entry:
- The host reads 8 bytes (OOB READ / info leak oracle)
- If the data passes PSC validation, __snp_complete_one_psc() writes
cur_page = 1 or 512 into the entry (OOB WRITE, sev.c:3806)
- If validation fails, the error response reveals whether adjacent
memory is zero vs non-zero (information disclosure to guest)
The guest controls allocation size (exit_info_2), entry range
(cur_entry/end_entry), and can fire unlimited VMGEXITs to repeatedly
hit different slab positions.
By exploiting the variety of bugs, a malicious SEV-SNP guest can:
- OOB read adjacent kmalloc-cg-32 objects (heap layout disclosure)
- OOB write cur_page bits into adjacent objects (heap corruption)
- Trigger use-after-free conditions across VMGEXITs
E.g. with KASAN enabled, a single insmod of the PoC guest module
produces 73 KASAN reports:
BUG: KASAN: slab-out-of-bounds in snp_begin_psc+0x126/0x890
Read of size 8 at addr ffff888219ffb5e0 by task qemu-system-x86/2199
BUG: KASAN: slab-out-of-bounds in snp_begin_psc+0x468/0x890
Write of size 8 at addr ffff888351566648 by task qemu-system-x86/2199
The buggy address belongs to the object at ffff888XXXXXXXXX
which belongs to the cache kmalloc-cg-32 of size 32
The buggy address is located N bytes to the right of
allocated 32-byte region [ffff888XXXXXXXXX, ffff888XXXXXXXXX)
Guopeng Zhang [Thu, 28 May 2026 09:37:42 +0000 (17:37 +0800)]
cgroup/cpuset: Free sched domains on rebuild guard failure
generate_sched_domains() returns sched-domain masks and optional
attributes that are normally handed to partition_sched_domains(), which
takes ownership of them.
rebuild_sched_domains_locked() has a WARN guard after
generate_sched_domains() and before partition_sched_domains() to avoid
passing offline CPUs into the scheduler domain rebuild path. If that
guard fires, the function currently returns directly without freeing
the generated doms and attr.
Free the generated sched-domain masks and attributes before returning
from the guard failure path.
Marco Crivellari [Fri, 29 May 2026 13:06:39 +0000 (15:06 +0200)]
workqueue: Add warnings and fallback if system_{unbound}_wq is used
Currently many users transitioned already to the new introduced workqueue
(system_percpu_wq, system_dfl_wq), but there are new users who still use the
older system_wq and system_unbound_wq.
This change try to push this transition forward, by warning whether the old
workqueues are used.
Linus Torvalds [Fri, 29 May 2026 17:36:57 +0000 (10:36 -0700)]
Merge tag 'io_uring-7.1-20260529' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fix from Jens Axboe:
"Just a single fix for a regression introduced in this cycle, where
we should ensure the node is visible before the entry is added to
the tctx list"
* tag 'io_uring-7.1-20260529' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/tctx: set ->io_uring before publishing the tctx node
Paolo Bonzini [Fri, 29 May 2026 17:28:16 +0000 (19:28 +0200)]
Merge tag 'kvm-x86-fixes-7.1-rc6' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fixes for 7.1-rcN
- Include the kernel's linux/mman.h in KVM selftests to ensure MADV_COLLAPSE
is defined, as older libc versions may not provide it.
- Include execinfo.h if and only if KVM selftests are building against glibc,
and provide a test_dump_stack() for non-glibc builds.
- Fudge around an RCU splat in the emegerncy reboot code that is technically
a legitimate flaw, but in practice is a non-issue and fixing the flaw, e.g.
by adding locking, would incur meaningful risk, i.e. do more harm than good.
- Rate-limit global clock updates once again (but without delayed work), as
KVM was subtly relying on the old rate-limiting for NPT correction to guard
against "update storms" when running without a master clock on systems with
overcommitted CPUs.
- Fix a brown paper bag goof where KVM checked if ERAPS is "dirty" instead of
marking it dirty when emulating INVPCID.
- Flush the TLB when transitioning from xAVIC => x2AVIC to ensure the CPU TLB
doesn't contain AVIC-tagged entries for the APIC base GPA.
Linus Torvalds [Fri, 29 May 2026 17:04:09 +0000 (10:04 -0700)]
Merge tag 'cxl-fixes-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) fixes from Dave Jiang:
- cxl/test: update mock dev array before calling platform_device_add()
* tag 'cxl-fixes-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/test: Update mock dev array before calling platform_device_add()
Marek Vasut [Sat, 11 Apr 2026 13:02:35 +0000 (15:02 +0200)]
arm64: dts: st: Fix SAI addresses on stm32mp251
The second field of SAI register addresses should be within 0x3f0 bytes
from the start of the SAI register addresses, the second field describes
the ID registers which are at that addrses. Currently, the second field
does not match RM, fix it.
Fixes: bf26d75a95f1 ("arm64: dts: st: add sai support on stm32mp251") Signed-off-by: Marek Vasut <marex@nabladev.com> Reviewed-by: Olivier Moysan <olivier.moysan@foss.st.com> Link: https://lore.kernel.org/r/20260411130300.19603-1-marex@nabladev.com Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
* tag 'iommu-fixes-v7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
MAINTAINERS: Add my employer to my entries
MAINTAINERS: Add Vasant Hegde to reviewers of AMD IOMMU
iommu, debugobjects: avoid gcc-16.1 section mismatch warnings
iommu/vt-d: Simplify calculate_psi_aligned_address()
Linus Torvalds [Fri, 29 May 2026 15:55:41 +0000 (08:55 -0700)]
Merge tag 'sound-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of recent small fixes and quirks.
We still see a bit more changes than wished, but most of them are
device-specific ones that are pretty safe to apply, while a core fix
is a typical UAF fix for PCM core that was recently caught by fuzzer;
so overall nothing looks really worrisome.
Core:
- Fix a UAF in PCM OSS proc interface
HD-audio:
- Fix memory leaks in CS35L56 driver
- Various device-specific quirks for Realtek and CS420x codecs
USB-audio:
- Quirk for TAE1160 USB Audio
- Fix for Scarlett2 Gen4 direct monitor gain
ASoC:
- Fixes for QCom q6asm-dai, Intel bytcht_es8316, and simple-mux codec
FireWire:
- Fix for Motu DSP event queue protection"
* tag 'sound-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: codecs: simple-mux: Fix enum control bounds check
ALSA: usb-audio: Add iface reset and delay quirk for TAE1160 USB Audio
ALSA: hda/cs420x: Add CS4208 fixup for iMac16,1
ALSA: hda/realtek: add quirk for HP Dragonfly Folio G3 2-in-1
ALSA: hda/realtek: Fix speaker output on ASUS ROG Strix G615LP
ASoC: qcom: q6asm-dai: use pointer type with kzalloc_obj()
ASoC: qcom: q6asm-dai: remove unnecessary braces
ASoC: qcom: q6asm-dai: fix error handling in prepare and set_params
ASoC: qcom: q6asm-dai: close stream only when running
ASoC: qcom: q6asm-dai: do not set stream state in event and trigger callbacks
ASoC: Intel: bytcht_es8316: Fix MCLK leak on init errors
ALSA: hda/realtek: Limit mic boost on Positivo DN140
ALSA: scarlett2: Fix 2i2 Gen 4 direct monitor gain on firmware 2417
ALSA: pcm: oss: Fix setup list UAF on proc write error
ALSA: hda: cs35l56: Fix system name string leaks
ALSA: hda/realtek: Add HDA_CODEC_QUIRK for Lenovo Yoga Slim 7 14AGP11
ALSA: hda/realtek: Fix incorrect comment for ALC299_FIXUP_PREDATOR_SPK
ALSA: firewire-motu: Protect register DSP event queue positions
Qiuxu Zhuo [Thu, 21 May 2026 12:38:12 +0000 (20:38 +0800)]
EDAC/igen6: Add Intel Nova Lake-H SoC support
Nova Lake-H SoCs share similar memory controller registers and IBECC
(In-Band ECC) registers with Panther Lake-H SoCs but use a new memory
subsystem register for IBECC presence detection.
Add Nova Lake-H SoC compute die IDs and create a new configuration
structure for Nova Lake-H SoCs to enable EDAC support.
Qiuxu Zhuo [Thu, 21 May 2026 12:38:11 +0000 (20:38 +0800)]
EDAC/igen6: Make registers for detecting IBECC configurable
Some Intel CPUs with IBECC (In-Band ECC) capability use different registers
to indicate IBECC presence. Make IBECC detection registers CPU-model
specific and configure them properly for scalable IBECC detection.
Qiuxu Zhuo [Thu, 21 May 2026 07:31:12 +0000 (15:31 +0800)]
EDAC/imh: Add RRL support for Intel Diamond Rapids server
Compared to previous generations, Diamond Rapids RRL (Retry Read error Log)
operates at DDR sub-channel granularity and adds an extra register per set.
It also increases the CORRERRCNT register width from 4 to 8 bytes while
reducing the number of registers from 8 to 4.
Add the Diamond Rapids RRL register configuration table and enable support.
Qiuxu Zhuo [Thu, 21 May 2026 07:31:11 +0000 (15:31 +0800)]
EDAC/{skx_common,i10nm}: Prepare RRL for sub-channel granularity
To prepare for enabling Diamond Rapids server RRL (Retry Read error Log),
which operates at sub-channel granularity by converting struct
res_config::reg_rrl_ddr from a single pointer to an array (reg_rrl_ddr[2])
and updating all users in i10nm_edac and skx_common accordingly.
Initialize only reg_rrl_ddr[0] for existing platforms and prepare for
supporting two RRL set groups per DDR channel (one per sub-channel)
when present.
Qiuxu Zhuo [Thu, 21 May 2026 07:31:10 +0000 (15:31 +0800)]
EDAC/skx_common: Add SubChannel support to ADXL decode
Diamond Rapids server RRL (Retry Read error Log) operates at sub-channel
granularity. Add SubChannel support to ADXL decoding in preparation for
enabling this feature.
Also introduce adxl_component_required() to validate mandatory ADXL
components to improve code readability.
Qiuxu Zhuo [Thu, 21 May 2026 07:31:08 +0000 (15:31 +0800)]
EDAC/{skx_common,i10nm}: Introduce rrl_ctrl_mode
RRL (Retry Read error Log) ownership is currently inferred from
retry_rd_err_log magic values, making control semantics implicit
and harder to understand.
Introduce rrl_ctrl_mode to explicitly describe whether RRL is
controlled by none, BIOS, or Linux, and replace direct checks with
named control states to improve readability and maintainability.
skx_set_decode() currently handles both address decoding and Retry
Read error Log (RRL) reporting, coupling two independent functions
in a single API. This complicates setup/teardown and forces callers
to update unrelated state.
Introduce skx_set_show_rrl() and keep skx_set_decode() focused on
decode setup, allowing decode and RRL handling to be managed
independently.
Also rename the callback type and variable to skx_show_rrl_f and
show_rrl for clearer RRL terminology and consistency.
Qiuxu Zhuo [Thu, 21 May 2026 07:31:05 +0000 (15:31 +0800)]
EDAC/{skx_common,i10nm,imh}: Move MC register access helpers to skx_common
Both i10nm_basic.c and imh_basic.c use identical helpers for accessing
memory controller MMIO-based registers. Move these helpers to skx_common.c
to eliminate code duplication. This change also prepares for an upcoming
patch that will move RRL(retry_rd_err_log) code from i10nm_basic.c to
skx_common.c, which requires these helpers to be available in skx_common.c.
Additionally, prefix these function names with 'skx_' to maintain naming
consistency within the file.
EDAC/igen6: Fix memory topology parsing for Panther Lake-H SoCs
Panther Lake-H SoC memory controller registers for memory topology have
been updated, but the current igen6_edac driver still uses old generation
ones to incorrectly parse memory topology.
Fix the issue by adding memory topology parsing function pointers to the
'struct res_config' and creating a new configuration structure for Panther
Lake-H SoCs to enable igen6_edac to parse memory correctly.
EDAC/igen6: Fix call trace due to missing release()
When unloading the igen6_edac driver, there is a call trace:
Device '(null)' does not have a release() function, it is broken and must be fixed.
See Documentation/core-api/kobject.rst.
WARNING: drivers/base/core.c:2567 at device_release+0x84/0x90, CPU#5: rmmod/127209
...
RIP: 0010:device_release+0x84/0x90
Call Trace:
<TASK>
kobject_put+0x8c/0x220
put_device+0x17/0x30
igen6_unregister_mcis+0xa2/0xe0 [igen6_edac]
igen6_remove+0x82/0xb0 [igen6_edac]
...
Fix the call trace by providing empty release() functions for the
memory controller devices.
Thorsten Blum [Fri, 8 May 2026 14:38:46 +0000 (16:38 +0200)]
EDAC/sb_edac: fix grammar in sb_decode_ddr3 warning
Fix the warning in sb_decode_ddr3() by adding the missing verb "is" and
using "supported" instead of "support" to match the LockStep warning in
sb_decode_ddr4().
EDAC/i5400: disable error reporting at teardown and refactor helper
If error reporting is enabled during initialization but initialization
fails immediately after, or during normal driver teardown, error reporting
is left enabled in the mask register even after exit.
Replace i5400_enable_error_reporting() with i5400_set_error_reporting()
to combine enabling/disabling. Disable reporting at initialization
failure and driver exit, before call to i5400_put_devices() for cleanup.
This ensures clean hardware handling by disabling any unused error
reporting bits before exiting.
EDAC/i5100: disable error reporting at teardown and create helper
Error reporting is enabled during init but not reverted when init fails.
It is also not disabled at normal driver teardown.
Create i5100_set_error_reporting() to enable/disable reporting. Move
enable reporting write to after initialization success. Disable reporting
at driver teardown.
EDAC/i5000: disable error reporting at teardown and refactor helper
If error reporting is enabled during initialization but initialization
fails immediately after, or during normal driver teardown, error reporting
is left enabled in the mask register even after exit.
Replace i5000_enable_error_reporting() with i5000_set_error_reporting()
to combine enabling/disabling. Disable reporting at initialization
failure and driver exit, before call to i5000_put_devices() for cleanup.
This ensures clean hardware handling by disabling any unused error
reporting bits before exiting.
EDAC/i7300: disable error reporting if init fails and refactor helper
If error reporting is enabled during initialization but initialization
fails immediately after, or during normal driver exit, error reporting
is left enabled in the mask register even after exit.
Replace i7300_enable_error_reporting() with i7300_set_error_reporting()
to combine enabling/disabling. Disable reporting at initialization
failure and driver exit, before call to i7300_put_devices() for cleanup.
Add enabled reporting flag to i7300_pvt.
This ensures clean hardware handling by disabling any unused error
reporting bits before exiting.
Rik van Riel [Wed, 27 May 2026 15:13:01 +0000 (11:13 -0400)]
perf/ftrace: Fix WARNING in __unregister_ftrace_function
perf_ftrace_function_unregister() unconditionally calls
unregister_ftrace_function() without checking whether the ftrace_ops
was ever successfully registered. This triggers a WARN_ON in
__unregister_ftrace_function() when the ops doesn't have
FTRACE_OPS_FL_ENABLED set.
This can happen during perf_event_alloc() error cleanup when
perf_trace_destroy() is called via __free_event() on an event whose
ftrace_ops registration failed or was already torn down by
perf_try_init_event()'s err_destroy path.
Paul Moore [Fri, 29 May 2026 15:24:37 +0000 (11:24 -0400)]
selinux: revert use of __getname() in selinux_genfs_get_sid()
Revert commit 54067bacb49c ("selinux: hooks: use __getname() to allocate
path buffer") as it improperly assumed that PATH_MAX == PAGE_SIZE
everywhere. Moving away from __get_free_page() is still a good thing and
will be revisited in the future.
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Karl Mehltretter [Mon, 25 May 2026 17:04:28 +0000 (19:04 +0200)]
tracing: Disable KCOV instrumentation for trace_irqsoff.o
When KCOV runs its boot selftest with whole-kernel instrumentation
enabled, it sets current->kcov_mode to KCOV_MODE_TRACE_PC without
installing a coverage area. Any instrumented code accepted as task-context
coverage in that window dereferences current->kcov_area and crashes.
On ARMv5 Versatile PB with CONFIG_KCOV_SELFTEST=y,
CONFIG_KCOV_INSTRUMENT_ALL=y and CONFIG_IRQSOFF_TRACER=y, boot hits a
NULL pointer fault during the selftest:
kcov: running self test
Internal error: Oops: 5 [#1] ARM
PC is at __sanitizer_cov_trace_pc+0x4c/0x90
Kernel panic - not syncing: Fatal exception
A diagnostic run showed the unwanted coverage comes from the IRQs-off
tracer callbacks reached from ARM IRQ entry before hardirq context is
visible to KCOV:
__sanitizer_cov_trace_pc from tracer_hardirqs_off+0x18/0x1cc
tracer_hardirqs_off from trace_hardirqs_off+0x34/0x54
trace_hardirqs_off from __irq_svc+0x58/0xb0
__irq_svc from kcov_init+0x7c/0xdc
and similarly through tracer_hardirqs_on().
trace_preemptirq.o is already excluded because this tracing path can run
from early interrupt code and produce coverage unrelated to syscall
inputs. Exclude trace_irqsoff.o as well, instead of requiring users to
turn off CONFIG_KCOV_INSTRUMENT_ALL=y, which is the default whole-kernel
KCOV mode.
With the exclusion in place, the same ARMv5 Versatile PB QEMU test boots
through the KCOV selftest and reaches userspace.
Tested on ARMv5 Versatile PB QEMU with CONFIG_KCOV_SELFTEST=y,
CONFIG_KCOV_INSTRUMENT_ALL=y and CONFIG_IRQSOFF_TRACER=y.
Rosen Penev [Fri, 22 May 2026 21:44:07 +0000 (14:44 -0700)]
tracing: Turn hist_elt_data field_var_str into a flexible array
The field_var_str array was allocated separately via kcalloc() with its
length already known at elt_data allocation time. Convert it to a
flexible array member and fold the two allocations into a single
kzalloc_flex(), reordering hist_trigger_elt_data_alloc() so n_str is
computed and bounds-checked before the struct allocation.
hist_elt_data is only reached through tracing_map_elt::private_data
(a void *), never embedded, so adding a FAM imposes no tail-position
constraint on any enclosing struct.
Tanmay Shah [Wed, 27 May 2026 05:16:11 +0000 (22:16 -0700)]
remoteproc: xlnx: Enable auto boot feature
The remoteproc framework has capability to start (or attach to) the remote
processor automatically if auto boot flag is set by the driver during
probe. If the 'firmware-name' property is available for the remoteproc
node, then that firmware will be loaded and started during auto boot. If
the remote core is started by the bootloader then during auto-boot
remoteproc framework will try to attach to the remote processor.
The current architecture allocates and adds the remoteproc instance before
all the hardware such as sram, mbox, TCM is initialized. This design has
to be changed for auto boot to work. So, rename zynqmp_r5_rproc_add()
function to zynqmp_r5_rproc_alloc() and move adding the remoteproc
instance at the end of cluster initialization. This makes sure that all
the required hardware is initialized before starting the remote
processor.
The firmware-name property indicates which firmware to load on RPU
during the Linux boot time. It is possible to stop the RPU after boot
and load different firmware and start RPU.
James Clark [Mon, 11 May 2026 09:19:35 +0000 (10:19 +0100)]
perf test: Make leafloop workload immune to compiler options
Since the leafloop test program was moved into the main Perf binary as a
workload, it inherited the same compiler options as Perf. In this case
the -fstack-protector option broke the assumption that simple leaf
frames don't have a stack frame on Arm. This causes
test_arm_callgraph_fp.sh to pass even if the stack isn't augmented with
the link register, making the test useless.
Fix it by rewriting the leaf function in assembly seeing as it's so
simple. Adding -fno-stack-protector would also work, but wouldn't be
robust against other future compiler option additions.
The local variables and 'a' variable were never needed so remove them to
simplify.
Reviewed-by: Ian Rogers <irogers@google.com> Assisted-by: GitHub-Copilot:GPT-5.5 Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf test: Add truncated perf.data robustness test
Add a shell test that verifies perf report handles truncated perf.data
files gracefully — exiting with an error code rather than crashing with
SIGSEGV or SIGABRT.
The test records a simple workload, then truncates the resulting
perf.data at four offsets that exercise different parsing stages:
8 bytes — file header magic only
64 bytes — partial file header (attr section incomplete)
256 bytes — into the first events (partial event headers)
75% size — mid-stream truncation (partial event data)
For each truncation, perf report is run and the exit code is checked:
- Exit code 0 (success) fails the test — a truncated file should
never parse without error.
- Crash signals are detected portably via kill -l, which maps the
signal number to a name on the running system. This handles
architectures where signal numbers differ (e.g. SIGBUS is 7 on
x86/ARM but 10 on MIPS/SPARC). Core-dump and fatal signals
(KILL, ILL, ABRT, BUS, FPE, SEGV, TRAP, SYS) fail the test.
- Higher exit codes (200+) are perf's own negative-errno returns
(e.g. -EINVAL = 234) and are expected.
This exercises the bounds checking, minimum-size validation, and error
propagation added by the preceding patches in this series.
Testing it:
root@number:~# perf test truncat
84: Test that perf report handles truncated perf.data gracefully (no crash, no segfault — clean error exit).: Ok
root@number:~# perf test -vv truncat
84: Test that perf report handles truncated perf.data gracefully (no crash, no segfault — clean error exit).:
--- start ---
test child forked, pid 62890
---- end(0) ----
84: Test that perf report handles truncated perf.data gracefully (no crash, no segfault — clean error exit).: Ok
root@number:~#
Changes in v2:
- Add SIGKILL to the list of fatal signals so OOM kills from
resource exhaustion bugs are detected (Reported-by: sashiko-bot@kernel.org)
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m
[ Fixed the SPDX on the line where 'perf test' expects the test description, reviewed by Ian Rogers ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Snapshot event->header.size in process_user_event()
On native-endian files, events are read from MAP_SHARED memory.
Multiple reads of event->header.size can return different values
if the file is concurrently modified, allowing an attacker to
bypass bounds checks performed on an earlier read.
Snapshot header.size into a local variable at function entry using
READ_ONCE() to prevent compiler rematerialization, and use it for
all size-dependent arithmetic within the function. This ensures
every bounds calculation uses the same value that was validated
by the reader.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf kwork: Bounds check work->cpu before indexing cpus_runtime[]
work->cpu comes from sample->cpu which is (u32)-1 when
PERF_SAMPLE_CPU is absent. Stored as int, this becomes -1
which passes the signed BUG_ON(work->cpu >= MAX_NR_CPUS) but
causes an out-of-bounds access on cpus_runtime[-1].
Replace the BUG_ON in top_calc_total_runtime() with an unsigned
bounds check that skips entries with invalid CPU values, counting
them for a summary warning.
Guard the same index in profile_event_match() (bitmap OOB),
top_calc_idle_time(), top_calc_irq_runtime(), top_calc_cpu_usage(),
and top_calc_load_runtime(). Also guard against division by zero
in top_calc_cpu_usage() when no runtime was accumulated.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Yang Jihong <yangjihong@bytedance.com> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Bound nr_cpus_avail and validate sample CPU
Several downstream consumers (timechart, kwork, sched) use fixed-size
arrays indexed by CPU. A crafted perf.data can supply arbitrary CPU
values that index past these arrays, causing out-of-bounds access.
Validate sample.cpu against min(nr_cpus_avail, MAX_NR_CPUS) in
perf_session__deliver_event() before any tool callback runs. The
cap at MAX_NR_CPUS protects fixed-size downstream arrays; the true
nr_cpus_avail is preserved in env for header parsing (e.g.
process_cpu_topology) which needs the real count.
Fall back to MAX_NR_CPUS when HEADER_NRCPUS is missing (truncated
files, pipe mode, pre-2017 perf).
Only validate when PERF_SAMPLE_CPU is set in sample_type — when
absent, evsel__parse_sample() leaves sample.cpu as (u32)-1, a
sentinel that downstream tools (script, inject) check to identify
events without CPU info. Clamping it to 0 would break those checks.
Inline evlist__parse_sample() into perf_session__deliver_event()
so the evsel lookup needed for sample_type checking reuses the same
evsel that parsed the sample, avoiding a second evlist__event2evsel()
call on every event.
For pipe-mode streams where HEADER_NRCPUS may arrive late or not at
all, the MAX_NR_CPUS fallback ensures the bounds check is still
effective against the fixed-size downstream arrays.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Check for decompression buffer size overflow
On 32-bit systems, sizeof(struct decomp) + decomp_len can wrap
size_t when comp_mmap_len is large. The preceding patch validates
comp_mmap_len alignment but does not cap the upper bound, so two
additions can still overflow:
1. decomp_len += decomp_last_rem: on 32-bit, adding a u64 to
size_t silently truncates, producing a corrupted decomp_len
that would bypass the subsequent overflow check and result
in an undersized buffer allocation.
2. sizeof(struct decomp) + decomp_len: the final addition could
overflow on systems with small size_t.
Add explicit overflow checks before each addition as
defense-in-depth.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add several hardening checks to the compressed event decompression
pipeline:
1. Guard against decomp_last_rem underflow: check that
decomp_last->head does not exceed decomp_last->size before
subtracting. A u64 underflow here would produce a huge
decomp_len, causing an oversized mmap allocation.
2. Validate comp_mmap_len from the HEADER_COMPRESSED feature
section: reject values that are not 4K-aligned or smaller than
4096. The downstream decompression path checks allocation
sizes against SIZE_MAX, which handles 32-bit safety.
3. Validate COMPRESSED event header size: reject events where
header.size is too small to contain the fixed struct fields,
preventing underflow in the payload size calculation.
4. Validate COMPRESSED2 event data_size: check that data_size
does not exceed the available payload (header.size minus the
fixed struct fields) for the newer compressed format.
5. Reject compressed events when the HEADER_COMPRESSED feature
is missing from the file header, which means no decompression
context was initialized.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Add byte-swap handler for PERF_RECORD_COMPRESSED2
PERF_RECORD_COMPRESSED2 events carry a data_size field that must be
byte-swapped when reading cross-endian perf.data files. Without a
swap handler, reading COMPRESSED2 events on a different-endian machine
would misinterpret data_size as a garbage value, causing the
decompression path to read the wrong number of bytes.
The compressed payload itself is a raw byte stream and needs no
swapping.
Fixes: 208c0e16834472bb ("perf record: Add 8-byte aligned event type PERF_RECORD_COMPRESSED2") Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf header: Validate bitmap size before allocating in do_read_bitmap()
do_read_bitmap() reads a u64 bit count from the file and passes it
to bitmap_zalloc() without checking it against the remaining section
size. A crafted perf.data could trigger a large allocation that would
only fail later when the per-element reads exceed section bounds.
Additionally, bitmap_zalloc() takes an int parameter, so a crafted
size with bits set above bit 31 (e.g. 0x100000040) would pass the
section bounds check but truncate when passed to bitmap_zalloc(),
allocating a much smaller buffer than the subsequent read loop
expects.
Reject size values that exceed INT_MAX, and check that the data
needed (BITS_TO_U64(size) u64 values) fits in the remaining section
before allocating. Switch from bitmap_zalloc() to calloc() of u64
units so the allocation size matches the u64 read/write granularity
and avoids unsigned long vs u64 mismatch on 32-bit architectures.
Fix do_write_bitmap() to use memcpy to read u64-sized chunks from
the unsigned long bitmap, preventing out-of-bounds reads on 32-bit
systems where sizeof(unsigned long) is 4 but the bitmap is stored
in u64 units.
Fix process_mem_topology() minimum section size: the check used
nr * 2 * sizeof(u64) per node, but do_read_bitmap() reads an
additional u64 for the bitmap size, so the minimum is 3 * sizeof(u64).
Fix memory leak in process_mem_topology() error paths: replace
free(nodes) with memory_node__delete_nodes() to free per-node
bitmaps allocated by do_read_bitmap().
Currently used by process_mem_topology() for HEADER_MEM_TOPOLOGY.
Fixes: a881fc56038a ("perf header: Sanity check HEADER_MEM_TOPOLOGY") Closes: https://lore.kernel.org/linux-perf-users/20260414224622.2AE69C19425@smtp.kernel.org/ Closes: https://lore.kernel.org/linux-perf-users/20260410223242.DD76FC19421@smtp.kernel.org/ Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf header: Sanity check HEADER_EVENT_DESC attr.size before swap
read_event_desc() reads nre (event count), sz (attr size), and nr
(IDs per event) from the file and uses them to control allocations
and loops without validating them against the section size.
A crafted perf.data could trigger large allocations or many loop
iterations before __do_read() eventually rejects the reads.
Add bounds checks in read_event_desc():
- Reject sz smaller than PERF_ATTR_SIZE_VER0.
- Require at least one event (nre > 0).
- Check that nre events fit in the remaining section, using the
minimum per-event footprint of sz + sizeof(u32).
- Pre-swap attr->size to native byte order, then reject values
below PERF_ATTR_SIZE_VER0 or above sz before calling
perf_event__attr_swap() to prevent heap out-of-bounds access.
- Handle ABI0 (attr.size == 0): substitute PERF_ATTR_SIZE_VER0,
and on native-endian files write the value back so
free_event_desc() does not treat the zero as its end-of-array
sentinel (it iterates while attr.size != 0). The swap path
skips the write-back — perf_event__attr_swap() has its own
ABI0 fallback that sets VER0 after swapping.
- Check that nr IDs fit in the remaining section before allocating.
Fixes: b30b61729246 ("perf tools: Fix a problem when opening old perf.data with different byte order") Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Harden feature section parsing against crafted perf.data files:
1. perf_header__process_sections() reads the feature section table
and passes each section's offset and size directly to the
processing callbacks without validating them against the actual
file size. A crafted section size would make all downstream
bounds checks against ff->size ineffective since they compare
against the untrusted, inflated bound. Add an fstat() check
with S_ISREG() guard and verify that each section's offset +
size does not extend past EOF.
2. __do_read_buf() validates reads against ff->size (section size),
but __do_read_fd() had no such check, so a malformed perf.data
with an understated section size could cause reads past the end
of the current section into the next section's data. Add the
bounds check in __do_read(), the common caller of both helpers,
so it is enforced uniformly for both the fd and buf paths.
Track the section-relative offset in __do_read_fd() so the
check works for the fd path. Reject negative sizes which on
32-bit can occur when a u32 >= 0x80000000 is passed as ssize_t.
3. do_read_string() relied on file data being null-padded. Add
explicit null-termination (buf[len-1] = '\0') after reading
and validate length (>= 1, fits within section) before
allocating, so callers like process_cpu_topology() never
receive an unterminated string.
4. Initialize feat_fd.offset to 0 (section-relative) instead of
section->offset (file-absolute) so the bounds tracking is
consistent with __do_read()'s section-relative comparison.
Adjust process_build_id() to use lseek() for its file-absolute
offset needs since it cannot rely on ff->offset for that.
5. Propagate ff->size to perf_file_section__fprintf_info() so its
reads are also bounded.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf header: Validate f_attr.ids section before use in perf_session__read_header()
perf_session__read_header() reads f_attr.ids.size from the perf.data
file and divides it by sizeof(u64) to compute nr_ids, which is
declared as int. No validation is performed on the value before it
is used to allocate arrays and drive a read loop.
On 32-bit architectures, a crafted f_attr.ids.size of 0x100000000
(4 GB) produces nr_ids = 0x20000000, but the allocation size
1 * 0x20000000 * 8 overflows size_t to 0, so zalloc(0) returns a
valid pointer. The subsequent loop writes 0x20000000 IDs into that
zero-length buffer, corrupting the heap.
On 64-bit, the u64-to-int truncation silently drops high bits,
processing fewer IDs than the file claims. While not exploitable,
this is a data integrity issue.
Add validation before using f_attr.ids:
- Cap nr_attrs (attrs.size / attr_size) to MAX_NR_ATTRS (1 << 16)
with overflow-safe u64 comparison before assigning to int
- Reject ids.size not aligned to sizeof(u64)
- Cap ids.size / sizeof(u64) to MAX_IDS_PER_ATTR (1 << 24) to
prevent int truncation and size_t overflow on 32-bit
- Reject ids sections that extend past the end of the file,
guarded by S_ISREG() so non-regular files (block devices,
pipes) are not falsely rejected
Also fix perf_header__getbuffer64() to set errno = EIO when
readn() returns 0 (EOF). Without this, the out_errno path in
perf_session__read_header() returns -errno which is 0 (success)
on truncated files, causing downstream NULL dereferences.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf_session__read_header() discards the return value from
perf_header__process_sections(), so any error from a feature
section processor (process_nrcpus, process_compressed, etc.)
is silently ignored and the session opens as if nothing went
wrong.
This defeats the validation added by subsequent commits in this
series: a crafted perf.data that fails a feature section check
would still be processed with partially-initialized state.
Check the return value and fail the session if any feature
section processor returns an error.
For truncated files (data.size == 0, i.e. recording was
interrupted before the header was finalized), skip feature
section processing entirely and clear the feature bitmap so
tools use their "feature not present" fallbacks instead of
accessing uninitialized env fields.
Change the feature processor stubs for optional libraries
(libtraceevent, libbpf) from returning -1 to returning 0,
so that perf.data files containing these features can still be
opened on builds without the optional library — the feature is
simply skipped rather than causing a fatal error.
Also propagate evlist__prepare_tracepoint_events() failure as
-ENOMEM, since the function can fail due to strdup() allocation
failure inside evsel__prepare_tracepoint_event().
Fixes: 1c0b04d12ae9 ("perf tools: Add perf_session__read_header function") Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools: Bounds check perf_event_attr fields against attr.size before printing
perf_event_attr__fprintf() accessed all struct fields unconditionally,
but attrs from older perf.data files or BPF-captured syscall payloads
may have a smaller size than the current struct. Fields beyond the
recorded size contain uninitialized or zero-filled data.
Add size-guarded macros (PRINT_ATTRn, PRINT_ATTRn_bf) that compare
each field's offset against attr->size before accessing it.
Guard the bitfield block (disabled, inherit, ... defer_output) with
attr_size >= 48. These bitfields share a single __u64 at offset 40,
which is within PERF_ATTR_SIZE_VER0 for validated perf.data attrs,
but BPF-captured attrs from perf trace can have a smaller size when
the tracee passes a minimal struct to sys_perf_event_open.
Also fix the BPF trace path: when perf trace intercepts
sys_perf_event_open via BPF, the program copies PERF_ATTR_SIZE_VER0
bytes when the tracee passes size=0, but leaves the size field as 0.
Set attr->size to PERF_ATTR_SIZE_VER0 in the augmented syscall
handler so the bounds checks match the actual copied size.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf header: Validate null-termination in PERF_RECORD_EVENT_UPDATE string fields
strdup(ev->unit) and strdup(ev->name) read until '\0' with no
guarantee the string is null-terminated within event->header.size.
The dump_trace fprintf path has the same problem with %s.
Validate before either path runs — same class of bug fixed for
MMAP/MMAP2/COMM/CGROUP by perf_event__check_nul().
Also harden the event_update swap handler to:
- Validate SCALE event size before swapping the double at
offset 24, which exceeds the 24-byte min_size.
- Validate CPUS event size before accessing the cpu_map
type/nr/long_size fields, which also start at the min_size
boundary.
- Swap CPUS variant fields (type, nr, long_size) so the
processing path sees native byte order.
Add validation in perf_event__process_event_update() for all
event update variants (UNIT, NAME, SCALE, CPUS) before
dump_trace or processing.
Validate CPUS nr against payload size for both PERF_CPU_MAP__CPUS
and PERF_CPU_MAP__MASK types on the fprintf (dump_trace) path:
- CPUS: check nr does not exceed available cpu entries
- MASK: check nr does not exceed available mask entries for
both mask32 (long_size == 4) and mask64 (long_size == 8)
layouts, with underflow guards on the offsetof subtraction
Fix a missing break before the default case in the CPUS
switch path.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Add byte-swap and bounds check for PERF_RECORD_BPF_METADATA events
PERF_RECORD_BPF_METADATA has no entry in perf_event__swap_ops[], so its
nr_entries field is never byte-swapped when reading a cross-endian
perf.data file. Downstream processing in
perf_event__fprintf_bpf_metadata() loops over nr_entries, so a
foreign-endian value causes out-of-bounds reads.
Add a swap handler that byte-swaps nr_entries after validating that
header.size is large enough. The entries[] array contains only char
arrays (key/value strings), so no per-entry swap is needed — but ensure
NUL-termination on the writable cross-endian path.
Validate header.size, nr_entries, and string NUL-termination in the
common event delivery path so that native-endian files with malicious
values are also rejected. Snapshot nr_entries via READ_ONCE() before
validation — the event is on a MAP_SHARED mmap that could theoretically
change between the bounds check and the loop.
Changes in v2:
- Snapshot event->header.size via READ_ONCE() into a local variable
to prevent a double-fetch underflow in the max_entries calculation
(Reported-by: sashiko-bot@kernel.org)
- Write back clamped nr_entries to the event on the swap path,
consistent with NAMESPACES and STAT_CONFIG handlers — without
writeback the native path sees the inflated nr and skips the
event entirely (Reported-by: sashiko-bot@kernel.org)
Fixes: ab38e84ba9a8 ("perf record: collect BPF metadata from existing BPF programs") Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Blake Jones <blakejones@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix four issues in PERF_RECORD_AUXTRACE_ERROR handling:
1. auxtrace_error_name() takes a signed int parameter, but e->type
is __u32. A crafted value like 0xFFFFFFFF converts to -1, passes
the bounds check, and causes a negative array index. Fix by
changing the parameter to unsigned int.
2. The msg field is printed via %s without a length bound. The
min_size table only guarantees fields up to msg (offset 48), so
a truncated event has zero msg bytes within the event boundary.
Compute the available msg length from header.size, cap at
sizeof(e->msg), and use %.*s.
3. fmt >= 2 adds machine_pid and vcpu fields after msg[64]. Older
files may have fmt >= 2 but an event size that doesn't include
these fields. Add a size check in the swap handler to downgrade
fmt before the conditional field access, and a matching size
guard in the fprintf path for native-endian events (which are
mmap'd read-only and can't be modified in place).
4. python_process_auxtrace_error() had the same issues: msg was
passed to tuple_set_string() unbounded, and machine_pid/vcpu
were accessed unconditionally without checking fmt or event
size. Apply the same bounds checks.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf cpumap: Reject RANGE_CPUS with start_cpu > end_cpu
cpu_map__from_range() computes nr_cpus as end_cpu - start_cpu + 1.
When a crafted perf.data has start_cpu > end_cpu, this wraps to a
huge value, causing perf_cpu_map__empty_new() to attempt a massive
allocation.
Return NULL when the range is inverted.
Also clamp any_cpu to boolean (0 or 1) since it is added to the
allocation count — a crafted value > 1 would inflate the map size.
Harden cpu_map__from_mask() to reject unsupported long_size values
(anything other than 4 or 8), preventing misinterpretation of the
mask data layout.
Snapshot mmap'd fields via READ_ONCE() into locals to prevent
TOCTOU re-reads — the data pointer references MAP_SHARED mmap'd
memory that could theoretically change between reads on a
FUSE-backed file:
- cpu_map__from_range(): snapshot start_cpu, end_cpu, any_cpu
- cpu_map__from_entries(): snapshot nr and each cpu[i] element
- cpu_map__from_mask(): snapshot long_size (before validation,
closing the check-then-read gap), mask_nr
- perf_record_cpu_map_data__read_one_mask(): add u16 long_size
parameter so callers pass the validated copy instead of
re-reading data->mask32_data.long_size from mmap'd memory
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf header: Byte-swap build ID event pid and bounds check section entries
perf_header__read_build_ids() swaps the event header fields for cross-endian
perf.data files but not bev.pid. This causes perf_session__findnew_machine()
to look up the wrong machine for guest VM build IDs, misattributing them.
Swap bev.pid alongside the header fields.
Also add a build_id_swap callback for stream-mode build ID events,
and validate NUL-termination of build_id.filename on the native-endian
delivery path (perf_session__process_user_event) — events with
unterminated filenames are skipped.
Harden perf_header__read_build_ids() against crafted perf.data files:
- Add overflow check on offset + size to prevent wrap past ULLONG_MAX.
- Reject bev.header.size == 0 which would loop forever.
- Reject bev.header.size > remaining section to prevent reading past
the section boundary.
- Guard memcmp(filename, "nel.kallsyms]", 13) with len >= 13 to avoid
reading uninitialized stack memory on short filenames.
- Force NUL-termination of filename before passing it to functions
like machine__findnew_dso() that use strlen/strcmp.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Validate nr fields against event size on both swap and common paths
Several event types use an nr field to control iteration over
variable-length arrays. The swap handlers byte-swap and loop using
these fields without bounds checks, and the native processing path
trusts them as well.
Add bounds checks on both paths for:
- PERF_RECORD_THREAD_MAP: validate nr against payload, return -1
on the swap path. On the native path, reject with -EINVAL.
- PERF_RECORD_NAMESPACES: clamp nr on the swap path (safe because
each entry is indexed by type; missing entries just won't be
resolved). Skip the event on the native path.
- PERF_RECORD_CPU_MAP: clamp nr for CPUS and MASK sub-types on
the swap path. Add bounds checks for mask64 which previously
had no nr validation. Skip the event on the native path.
- PERF_RECORD_STAT_CONFIG: clamp nr on the swap path (safe because
each config entry is self-describing via its tag). Skip the
event on the native path.
The swap path (cross-endian, writable MAP_PRIVATE mapping) can
safely clamp by writing back to the event. The native path
(read-only MAP_SHARED mapping) must skip instead of clamping
because writing to the mmap'd event would segfault.
Also fix stat_config swap range: change size += 1 to
size += sizeof(event->stat_config.nr) for clarity. The old +1
happened to work because mem_bswap_64 processes 8-byte chunks,
but the intent is to include the 8-byte nr field in the swap
range.
Changes in v2:
- Document that PERF_RECORD_NAMESPACES max_nr includes trailing
sample_id space when sample_id_all is present — harmless on the
swap path because both per-element bswap_64 and swap_sample_id_all()
perform the same u64 byte swap (Reported-by: sashiko-bot@kernel.org)
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Validate HEADER_ATTR attr.size before swapping
Harden PERF_RECORD_HEADER_ATTR handling against crafted perf.data:
- Validate attr.size: must be >= PERF_ATTR_SIZE_VER0, a multiple
of sizeof(u64), and fit within the event payload.
- Copy only min(attr.size, sizeof(struct perf_event_attr)) bytes
into a local attr, zeroing the rest so legacy files don't leak
adjacent event data into new fields.
- Keep the original attr.size so perf_event__synthesize_attr()
uses it for both allocation and ID-array placement.
Fix perf_event__synthesize_attr() to use attr->size (not the
compiled sizeof) for event allocation and layout, so perf inject
correctly re-synthesizes attrs from files recorded by a different
perf version. Without this, the ID array destination pointer
(computed via perf_record_header_attr_id()) would be inconsistent
with the allocation when attr->size differs from sizeof.
Also fix the parse-no-sample-id-all test to set attr.size, which
is now validated, and improve error handling in read_attr() for
short reads and invalid attr sizes.
Handle ABI0 pipe/inject events where attr.size is 0: use a local
attr_size variable set to PERF_ATTR_SIZE_VER0 for both the bounded
copy and ID array position, instead of writing back to the event.
Native-endian files may be MAP_SHARED (read-only mmap), so writing
to the event buffer would SIGSEGV. The swap path handles ABI0 in
perf_event__attr_swap() which writes to the MAP_PRIVATE copy.
header.size alignment is now validated centrally in
perf_session__process_event() (see "Add minimum event size and
alignment validation").
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Use bounded copy for PERF_RECORD_TIME_CONV
session->time_conv = event->time_conv copies sizeof(struct
perf_record_time_conv) bytes unconditionally, but older kernels
emit shorter TIME_CONV events without the time_cycles, time_mask,
cap_user_time_zero, and cap_user_time_short fields.
For a 32-byte event (the original format), this reads 24 bytes
past the event boundary into adjacent mmap'd data. The garbage
values end up in session->time_conv and can cause incorrect TSC
conversion if cap_user_time_zero happens to be non-zero.
Replace the struct assignment with a bounded memcpy capped at
event->header.size, zeroing the remainder so extended fields
default to off when absent.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Add validated swap infrastructure with null-termination checks
Change swap callbacks from void to int return so handlers can
propagate errors. All 28 existing handlers are converted to
return 0 on success, -1 on error. Three new handlers (KSYMBOL,
BPF_EVENT, HEADER_FEATURE) are added returning int from the
start, with sample_id_all handling for the kernel event types.
event_swap() propagates the return to its callers (process_event
and peek_event), which skip events that fail to swap.
Add perf_event__check_nul() for null-termination enforcement
on the common event delivery path for MMAP, MMAP2, COMM,
CGROUP, and KSYMBOL events. Events with
unterminated strings are skipped — native-endian files are
mapped read-only, so writing a NUL byte in place would segfault.
Swap handler hardening:
- Use strnlen bounded by event size (instead of strlen) in
COMM/MMAP/MMAP2/CGROUP swap handlers, returning -1 on
unterminated strings.
- Bounds check text_poke old_len+new_len before computing the
sample_id offset, returning -1 on overflow. Use offsetof()
for the native-path check in machines__deliver_event() since
sizeof() includes struct padding past the flexible array.
- Fix PERF_RECORD_SWITCH sample_id_all: non-CPU_WIDE SWITCH
events have sample_id immediately after the 8-byte header,
not at sizeof(struct perf_record_switch) which is the
CPU_WIDE variant size.
- Fix perf_event__time_conv_swap(): decouple time_cycles and
time_mask into independent per-field event_contains() checks,
so each field is only swapped when the event is large enough
to contain it. The original code guarded both fields under
a single time_cycles check, which would swap time_mask on a
short event that contains time_cycles but not time_mask.
- Handle ABI0 (attr.size == 0) in perf_event__attr_swap()
by substituting PERF_ATTR_SIZE_VER0, so bswap_safe()
correctly swaps VER0 fields instead of skipping everything.
- peek_events: on swap failure, advance past the malformed
entry instead of aborting the loop.
Note: the nr-field bounds checks for namespaces, thread_map,
cpu_map, and stat_config arrays are added by a subsequent
patch ("perf session: Validate nr fields against event size
on both swap and common paths"). The HEADER_ATTR attr.size
validation is added by ("perf session: Validate HEADER_ATTR
attr.size before swapping").
By establishing the int-returning swap infrastructure first,
all subsequent hardening patches can use direct error returns
from day one — no poison values, no workarounds for void return.
Changes in v2:
- peek_events: abort instead of skip for AUXTRACE events on
validation failure — skipping only header.size would land
inside the raw trace payload, causing subsequent iterations
to misparse data as events (Reported-by: sashiko-bot@kernel.org)
Fixes: 9aa0bfa370b2 ("perf tools: Handle PERF_RECORD_KSYMBOL") Fixes: 45178a928a4b ("perf tools: Handle PERF_RECORD_BPF_EVENT") Fixes: e9def1b2e74e ("perf tools: Add feature header record to pipe-mode") Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Fix swap_sample_id_all() crash on crafted events
swap_sample_id_all() calls BUG_ON(size % sizeof(u64)) which kills
perf on any event where the sample_id_all tail is not 8-byte aligned.
A crafted perf.data can trigger this trivially.
Replace BUG_ON with a bounds check: skip the swap if the data pointer
is past the end of the event, and only swap when there are bytes
remaining.
Note: the strlen calls in string-field swap handlers (comm,
mmap, mmap2, cgroup) are replaced with bounded strnlen by the
next patch in this series ("perf session: Add validated swap
infrastructure with null-termination checks").
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Fix PERF_RECORD_READ swap and dump for variable-length events
The kernel dynamically sizes PERF_RECORD_READ based on
attr.read_format: only the fields enabled by PERF_FORMAT_TOTAL_TIME_ENABLED,
PERF_FORMAT_TOTAL_TIME_RUNNING, PERF_FORMAT_ID, and PERF_FORMAT_LOST
are emitted, packed with no gaps.
perf_event__read_swap() unconditionally byte-swapped time_enabled,
time_running, and id at their fixed struct offsets, causing
out-of-bounds access on smaller events and swapping the wrong
bytes when not all format fields are present. It also swapped
sample_id_all at a fixed offset past the full struct, which is
wrong for shorter events.
Replace the individual field swaps with a single mem_bswap_64()
over the entire tail from value onward. Since every field after
pid/tid is u64 regardless of which combination is present, this
correctly handles any read_format combination and any trailing
sample_id_all fields.
Similarly, dump_read() accessed optional fields via fixed struct
offsets, displaying values from wrong positions when not all
format bits are set. Walk the packed u64 array sequentially
instead, with bounds checks against event->header.size.
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf zstd: Fix multi-iteration decompression and error handling
zstd_decompress_stream() has two bugs in its multi-iteration loop:
1. After each ZSTD_decompressStream() call, the code advances
output.dst by output.pos but doesn't reset output.pos to 0.
ZSTD interprets output.pos relative to output.dst, so the
next iteration writes at (dst + pos) + pos = dst + 2*pos,
skipping a gap and potentially writing out of bounds.
2. On ZSTD_decompressStream() error, the loop executes break
and returns output.pos (which is > 0 if some bytes were
decompressed before the error). The caller checks
!decomp_size and skips the error, silently accepting
truncated or corrupted data.
Fix both by removing the output buffer adjustment — ZSTD
correctly accumulates output.pos across calls without it.
Return 0 on decompression error so the caller detects it.
Add a no-progress guard to prevent infinite loops if the
output buffer fills before all input is consumed.
Note: the compressed event data_size is validated against
header.size by a subsequent patch in this series
("perf tools: Harden compressed event processing").
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf zstd: Fix compression error path in zstd_compress_stream_to_records()
The error fallback does memcpy(dst, src, src_size) intending to store
uncompressed data when compression fails, but this has three bugs:
1. dst has been advanced past the record header (and potentially
past earlier compressed records), so the copy writes to the
wrong offset in the output buffer.
2. src still points to the start of the input, not to the
remaining uncompressed data at src + input.pos. On a second
or later iteration, previously compressed data would be
duplicated.
3. No check that dst_size >= src_size — if the remaining output
space is smaller, this is an out-of-bounds write.
Replace with return -1 after resetting the ZSTD compression
context via ZSTD_initCStream(). The -1 propagates through
zstd_compress() -> record__pushfn() -> perf_mmap__push() to the
recording loop, which breaks out and terminates recording.
Add an out_child_no_flush label in __cmd_record() so the
mmap-read failure path skips the final record__mmap_read_all()
flush — retrying the same read that just failed would just fail
again, and the flush is only useful when the mmap data is intact
but the control path (auxtrace, switch_output) had an error.
Consolidate all error paths through a single 'reset' label to
ensure the compression context is always reset on failure —
including the output-buffer-full path, where a bare return
without resetting would leave stale stream state that corrupts
output if the caller retries.
Also guard against process_header() writing the event header
before the buffer-full check: add a sizeof(perf_event_header)
pre-check so the callback never writes past the output buffer.
Guard against ZSTD making no progress: if output.pos is zero
after ZSTD_compressStream(), calling process_header(record, 0)
would re-trigger header initialization, double-subtracting the
header size from dst_size and underflowing the unsigned counter.
Also fix two pre-existing issues in the same function:
- Add a dst_size guard before subtracting the record header
size: if the output buffer is nearly full, the unsigned
dst_size -= size underflows to a huge value, causing
ZSTD_compressStream to write past the buffer boundary.
- Check the ZSTD_initCStream() return value and log an error
if the context reset itself fails.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools: Fix event_contains() macro to verify full field extent
event_contains() checked whether a field's start offset was within
the event (header.size > offsetof), but not whether the full field
fit. A crafted event with header.size = offsetof(field) + 1 would
pass the check, but an 8-byte access (bswap_64, direct read) would
overrun the event boundary by up to 7 bytes.
Fix the macro to verify the complete field:
header.size >= offsetof(field) + sizeof(field)
Also update all callers that check event_contains(time_cycles) but
access later fields (time_mask, cap_user_time_zero,
cap_user_time_short) to check for cap_user_time_short — the last
field accessed — so the entire extended block is verified:
tsc.c, arm-spe.c, cs-etm.c, jitdump.c.
Note: session.c's perf_event__time_conv_swap() also guards on
time_cycles but accesses time_mask — a pre-existing issue not
introduced by this macro change. It is fixed by a later patch
in this series ("perf session: Add validated swap
infrastructure with null-termination checks"), which decouples
time_cycles and time_mask into independent per-field
event_contains() checks. The struct assignment overread
(session->time_conv = event->time_conv copies sizeof on a
potentially shorter event) is separately fixed by "perf
session: Use bounded copy for PERF_RECORD_TIME_CONV".
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Bounds-check one_mmap event pointer in peek_event
perf_session__peek_event() computes an event pointer directly from
file_offset when one_mmap is active, without verifying that file_offset
and the subsequent event->header.size fall within the mapped region.
A corrupted perf.data file could cause out-of-bounds memory reads.
Add one_mmap_size to the session struct and validate both the header
and full event fit within the mmap before dereferencing.
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf session: Add minimum event size and alignment validation
Add a per-type minimum size table (perf_event__min_size[]) and
enforce it before swap and processing, so that both cross-endian
and native-endian paths are protected from accessing fields past
the event boundary.
The table uses offsetof() for types with trailing variable-length
fields (filenames, strings, msg arrays) and sizeof() for
fixed-size types. Zero entries mean no minimum beyond the 8-byte
header already enforced by the reader.
Undersized events are skipped with a warning in process_event
and rejected in peek_event — both checked before the swap
handler runs, preventing OOB access on crafted event fields.
Also reject events whose header.size is not 8-byte aligned. The
kernel aligns all event sizes to sizeof(u64) — see
perf_event_comm_event() (ALIGN), perf_event_mmap_event(),
perf_event_cgroup(), perf_event_ksymbol() (IS_ALIGNED loops),
and perf_event_text_poke() (ALIGN) in kernel/events/core.c.
An unaligned size means the file is corrupted or crafted; reject
early so downstream code that divides by sizeof(u64) to compute
array element counts gets exact results.
Three legacy user events are exempted from the alignment check:
TRACING_DATA (66) had a 12-byte struct before commit b39c915a4f36
("libperf event: Ensure tracing data is multiple of 8 sized")
added padding, COMPRESSED (81) carries raw ZSTD output (already
superseded by COMPRESSED2 with PERF_ALIGN), and HEADER_FEATURE
(80) uses do_write_string() with a 4-byte length prefix.
Also guard event_swap() against crafted event types >=
PERF_RECORD_HEADER_MAX to prevent OOB reads on the
perf_event__swap_ops[] array.
Changes in v2:
- Fix double-skip for unsupported event types: return 0 instead
of event->header.size in perf_session__process_event() for
HEADER_MAX, since reader__read_event() already advances by
event->header.size (Reported-by: sashiko-bot@kernel.org)
- Exempt TRACING_DATA, COMPRESSED, and HEADER_FEATURE from the
alignment check — these legacy user events predate the 8-byte
alignment rule (Reported-by: sashiko-bot@kernel.org)
- peek_event: return 0 (skip) for unknown event types instead of
-1 (error), consistent with process_event which already skips
unsupported types gracefully (Reported-by: sashiko-bot@kernel.org)
Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude:claude-opus-4.6-1m Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ulf Hansson [Fri, 29 May 2026 14:42:41 +0000 (16:42 +0200)]
mmc: Merge branch fixes into next
Merge the mmc fixes for v7.1-rc[n] into the next branch, to allow them to
get tested together with the mmc changes that are targeted for the next
release.
Bard Liao [Fri, 29 May 2026 01:42:59 +0000 (09:42 +0800)]
ASoC: sdw_utils: return -EPROBE_DEFER if components are not registered yet
commit 42d99857d6f0 ("ASoC: core: Move all users to deferrable card binding")
converted the -EPROBE_DEFER return value of snd_soc_bind_card() to 0
which results in the machine driver probe return 0 and will not be
called again when any component is not yet registered.
We get the right component name from the registered components
and use it in the dai links. It will lead to bind fail if the default
component name is used. Return -EPROBE_DEFER to allow the machine driver
probe again after the components are registered.
Suggested-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://patch.msgid.link/20260529014259.2528048-1-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
Crystal Wood [Mon, 11 May 2026 22:30:35 +0000 (17:30 -0500)]
tracing/osnoise: Array printk init and cleanup
None of the calls to trace_array_printk_buf() will do anything
if we don't initialize the buffer on instance creation (unless
some other tracer called it), so do that.
Add an osnoise_print() function to facilitate adding debug prints
(without tainting).
Use trace_array_printk() instead of trace_array_printk_buf(), as we're
only writing to the main buffer (of a non-main instance) anyway -- and
trace_array_printk_buf() skips the check to make sure we're not printing
to the global instance.