Rosen Penev [Sun, 17 May 2026 04:27:16 +0000 (21:27 -0700)]
wifi: ath9k: Clear DMA descriptors without memset
Clear ath9k DMA descriptors with explicit status word stores instead of
memset(). The descriptor rings are coherent DMA memory, which may be
mapped uncached on 32-bit powerpc. The optimized memset() path can use
dcbz there and trigger an alignment warning.
Use WRITE_ONCE() for the descriptor status words so the compiler keeps
the clears as ordinary stores instead of folding them back into bulk
memset(). This covers AR9003 TX status descriptors as well as the RX
status area cleared when setting up RX descriptors.
Rosen Penev [Wed, 6 May 2026 23:48:48 +0000 (16:48 -0700)]
wifi: ath9k_htc: use module_usb_driver
This follows the pattern with other USB Wifi drivers. There is nothing
special being done in the _init and _exit functions here. Simplifies and
saves some lines of code.
wifi: wcn36xx: fix OOB read from short trigger BA firmware response
The firmware response length is only checked against sizeof(*rsp) (20
bytes), but when candidate_cnt >= 1, a 22-byte candidate struct is read
at buf + 20 without verifying the response contains it. This causes an
out-of-bounds read of stale heap data, corrupting the BA session state.
Add validation that the response includes the candidate data.
wifi: wcn36xx: fix OOB read from firmware count in PRINT_REG_INFO indication
The firmware-controlled rsp->count field is used as the loop bound for
indexing into the flexible rsp->regs[] array without validation against
the message length. A count exceeding the actual data causes out-of-
bounds reads from the heap-allocated message buffer.
Add a check that count fits within the received message.
Fixes: 43efa3c0f241 ("wcn36xx: Implement print_reg indication") Signed-off-by: Tristan Madani <tristan@talencesecurity.com> Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Link: https://patch.msgid.link/20260421135018.352774-3-tristmd@gmail.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
wifi: wcn36xx: fix heap overflow from oversized firmware HAL response
The firmware response dispatcher copies all synchronous HAL responses
into the 4096-byte hal_buf without validating the response length. A
response exceeding WCN36XX_HAL_BUF_SIZE causes a heap buffer overflow
with firmware-controlled content.
Add a bounds check on the response length.
Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware") Signed-off-by: Tristan Madani <tristan@talencesecurity.com> Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Link: https://patch.msgid.link/20260421135018.352774-2-tristmd@gmail.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Linus Torvalds [Sat, 6 Jun 2026 14:28:59 +0000 (07:28 -0700)]
Merge tag 'vfs-7.1-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix error handling in ovl_cache_get()
- Tighten access checks for exited tasks in pidfd_getfd()
- Fix selftests leak in __wait_for_test()
- Limit FUSE_NOTIFY_RETRIEVE to uptodate folios
- Reject fuse_notify() pagecache ops on directories
- Clear JOBCTL_PENDING_MASK for caller in zap_other_threads()
- Fix failure to unlock in nfsd4_create_file()
- Fix pointer arithmetic in qnx6 directory iteration
- Fix UAF due to unlocked ->mnt_ns read in may_decode_fh()
- Avoid potential null folio->mapping deref during iomap error
reporting
* tag 'vfs-7.1-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: avoid potential null folio->mapping deref during error reporting
fhandle: fix UAF due to unlocked ->mnt_ns read in may_decode_fh()
fs/qnx6: fix pointer arithmetic in directory iteration
VFS: fix possible failure to unlock in nfsd4_create_file()
signal: clear JOBCTL_PENDING_MASK for caller in zap_other_threads()
fuse: reject fuse_notify() pagecache ops on directories
fuse: limit FUSE_NOTIFY_RETRIEVE to uptodate folios
selftests: harness: fix pidfd leak in __wait_for_test
pidfd: refuse access to tasks that have started exiting harder
ovl: keep err zero after successful ovl_cache_get()
When `I2cAdapter::get` executes, it first calls
`bindings::i2c_get_adapter()` which increments the device and module
reference counts. It then takes a reference to the raw pointer and
converts it to an `ARef` via `.into()`.
The implementation of `From<&T> for ARef<T>` where `T: AlwaysRefCounted`
unconditionally calls `T::inc_ref()`. This leads to a second increment
to the reference counts.
Since the returned `ARef` will only release a single reference when
dropped via `dec_ref()`, this leaks one device and module reference count
on every call.
Jori Koolstra [Thu, 4 Jun 2026 22:24:05 +0000 (00:24 +0200)]
vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags
A recent build failure[1] exposed the diffculty of working with the
current octal and hex definitions of O_ flags when trying to find a gap
for a new flag. This difficulty is compounded by the fact that O_ flags
may have architectural specific values.
Replace the hex/octal #defines, which are hard to parse when looking for
free bits, with explicit bit shifts like (1 << 11). Also, add comments
that identify which architectures redefine some of the seemingly free
("cursed") bits in uapi/asm-generic/fcntl.h. These should not be used to
define new O_ flags (for now, at least).
The translastion was done with Claude Opus 4.8, and verified with a
(non-AI) gawk script. The accounting of which architectures claim
which bit-gaps in uapi/asm-generic/fcntl.h is also done by hand.
Add a sleepable BPF kfunc that resolves the real inode backing a dentry
via d_real_inode(). On overlay/union filesystems the inode attached to
the dentry is the overlay inode which does not carry the underlying
device information. d_real_inode() resolves through the overlay and
returns the inode from the lower, real filesystem.
This is used in the RestrictFilesytemAccess bpf program that has been
merged into systemd a little while ago.
Daniel Borkmann [Tue, 2 Jun 2026 07:40:12 +0000 (09:40 +0200)]
bpf: Add simple xattr support to bpffs
Add support for extended attributes on bpffs inodes so that user space
and BPF LSM programs can attach metadata, for example, a content hash
or a security label - to a pinned object or directory. BPF LSM or user
space tooling can then uniformly look at this (e.g. security.bpf.*) in
similar way to other fs'es. The store is in-memory and non-persistent:
it lives only for the lifetime of the mount, like everything else in
bpffs. The modelling is similar to tmpfs.
bpffs serves the trusted.* and security.* namespaces; user.* is left
unsupported. As bpffs is FS_USERNS_MOUNT, security.* is reachable by
the unprivileged mounter in a user namespace, and thus we are using
the simple_xattr_set_limited infra there (trusted.* needs global
CAP_SYS_ADMIN).
bpf_fill_super() is open-coded instead of using simple_fill_super(),
because the root inode must now be allocated through bpf_fs_alloc_inode()
i.e. carry the bpf_fs_inode wrapper and come from the right cache -
which requires s_op (and s_xattr) to be installed before the first
inode is created. While at it, also harden s_iflags with SB_I_NOEXEC
and SB_I_NODEV.
bpf_fs_listxattr() is only reachable through the filesystem via
i_op->listxattr, so the BPF token inode is left untouched. Name-based
fsetxattr()/fgetxattr() on a token fd still work since the get/set
handlers are installed at the superblock.
For security.* namespace, we use simple_xattr_set_limited() but
there was no simple_xattr_add_limited() API yet which was needed
in bpf_fs_initxattrs() to avoid underflows in the accounting. The
symlink target is freed in bpf_free_inode() rather than in
bpf_destroy_inode() so that it is released only after an RCU grace
period, as an RCU path walk following the symlink may still
dereference inode->i_link in security_inode_follow_link(). Lastly,
the bpf_symlink() allocated the symlink target is switched to
GFP_KERNEL_ACCOUNT, so the string is charged to the caller's memcg.
kernfs: link kn to its parent before the LSM init hook
After commit 12e9e3cd03b5 ("simpe_xattr: use per-sb cache"),
kernfs_xattr_set() and kernfs_xattr_get() compute the cache via
kernfs_root(kn) before any other check. kernfs_root(kn) walks
kn->__parent first and falls back to kn->dir.root, both of which are
NULL on a freshly kmem_cache_zalloc()'d kn. kn->__parent was being set
in kernfs_new_node() after __kernfs_new_node() returned, and kn->dir.root
is set even later by kernfs_create_dir_ns() / kernfs_create_empty_dir().
The LSM kernfs_init_security hook is invoked from inside
__kernfs_new_node(), before either field has been initialized.
selinux_kernfs_init_security() ends with kernfs_xattr_set(kn,
XATTR_NAME_SELINUX, ...). kernfs_root(kn) then returns NULL, and
&((struct kernfs_root *)NULL)->xa_cache evaluates to
offsetof(struct kernfs_root, xa_cache) which faults:
Reproduces deterministically at PID 1 (systemd) on an SELinux-enabled
distro. The first cgroup mkdir under /sys/fs/cgroup with a labelled
parent panics the kernel.
The LSM hook's contract is that the kn_dir argument is the parent of
the new kn, so kn->__parent should already point at kn_dir when the
hook runs. Move kernfs_get(parent) and rcu_assign_pointer of
kn->__parent from kernfs_new_node() into __kernfs_new_node() right
before the security hook, and unwind the parent reference on the
err_out4 path. kernfs_root(kn) then takes its parent branch during
the hook and returns parent->dir.root, which is the correct root.
This also closes the same-shape latent bug in kernfs_xattr_get() (which
today is hidden only by kernfs_iattrs_noalloc() returning NULL on a
fresh kn).
Fixes: 12e9e3cd03b5 ("simpe_xattr: use per-sb cache") Reported-by: Calum Mackay <calum.mackay@oracle.com> Closes: https://lore.kernel.org/all/5386153f-9112-4971-98fc-de90d7aae2c6@oracle.com/ Link: https://patch.msgid.link/20260526-ablief-demut-wehen-aef8446ef5c9@brauner Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Miklos Szeredi [Fri, 5 Jun 2026 13:53:19 +0000 (15:53 +0200)]
simpe_xattr: use per-sb cache
Move the hash table to the super block to remove excessive overhead in case
of small number of xattrs per inode.
Add linked list to the inode, used for listxattr and eviction. Listxattr
uses rcu protection to iterate the list of xattrs.
Before being made per-sb, lazy allocation was protected by inode lock. Now
inode lock no longer provides sufficient exclusion, so use cmpxchg() to
ensure atomicity.
Though I haven't found a description of this pattern, after some research
it seems that cmpxchg_release() and READ_ONCE() should provide the
necessary memory barriers.
Use simple_xattr_free_rcu() in simple_xattrs_free(). This is needed because
the hash table is now shared between inodes and lookup on a different inode
might be running the compare function on the just freed element within the
RCU grace period.
Following stats are based on slabinfo diff, after creating 100k empty
files, then adding a "user.test=foo" xattr to each:
The overhead of a single xattr is reduced to nearly v7.0 levels. The per
xattr overhead is slightly larger due to the addition of three pointers to
struct simple_xattr.
Miklos Szeredi [Fri, 5 Jun 2026 13:53:18 +0000 (15:53 +0200)]
simple_xattr: change interface to pass struct simple_xattrs **
Change the simple_xattr API to accept pointer-to-pointer (struct
simple_xattrs **) instead of pointer. This allows the functions to handle
lazy allocation internally without requiring callers to use
simple_xattrs_lazy_alloc().
The simple_xattr_set(), simple_xattr_set_limited() and simple_xattr_add()
functions now handle allocation when xattrs is NULL. simple_xattrs_free()
now also frees the xattrs structure itself and sets the pointer to NULL.
This simplifies callers and removes the need for most callers to explicitly
manage xattrs allocation and lifetime.
In shmem_initxattrs(), the total required space for all initial xattrs
(ispace) is pre-calculated and deducted from sbinfo->free_ispace.
Since this patch modifies the function to add new xattrs directly to the
inode's &info->xattrs list rather than using a local temporary variable, a
failure means that the partially populated info->xattrs list remains
attached to the inode.
When the VFS caller handles the -ENOMEM error, it drops the newly created
inode via iput(), shmem_free_inode() adds freed to sbinfo->free_ispace a
second time, permanently inflating the tmpfs free space quota.
Fix by substracting already added xattrs from ispace.
Miklos Szeredi [Fri, 5 Jun 2026 13:53:16 +0000 (15:53 +0200)]
kernfs: fix xattr race condition with multiple superblocks
Multiple superblocks with different namespaces can share the same
kernfs_node when kernfs_test_super() finds a matching root but
different namespace. This means multiple inodes from different
superblocks can reference the same kernfs_node->iattr->xattrs
structure.
The VFS layer only holds per-inode locks during xattr operations,
which is insufficient to serialize concurrent xattr modifications on
the shared kernfs_node. This can lead to race conditions in
simple_xattr_set() where the lookup->replace/remove sequence is not
atomic with respect to operations from other superblocks.
Fix this by protecting xattr operations with the existing hashed
kernfs_locks->open_file_mutex[] array, which is already used to
protect per-node open file data. The hashed mutex array provides
scalable per-node serialization (scaled by CPU count, up to 1024 locks
on 32+ CPU systems) with zero memory overhead.
Changes:
- Rename open_file_mutex[] to node_mutex[] to reflect dual purpose
- Add kernfs_node_lock_ptr() and kernfs_node_lock() helpers
- Protect simple_xattr_set() calls in kernfs_xattr_set() and
kernfs_vfs_user_xattr_set() with the hashed mutex
- Update file.c to use new helpers via compatibility wrappers
- Update documentation to explain the extended lock usage
Waiman Long [Fri, 5 Jun 2026 17:30:38 +0000 (13:30 -0400)]
debugobjects: Don't call fill_pool() in early boot hardirq context
When booting a debug PREEMPT_RT kernel on an ARM64 system, a "inconsistent
{HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage" lockdep warning message was
reported to the console.
During early boot, interrupts are enabled before the scheduler is
enabled. In this window (before SYSTEM_SCHEDULING is set) interrupts can
fire and in the hard interrupt context handler attempt to fill the pool
This can lead to a deadlock when the interrupt occurred when the interrupt
hits a region which holds a lock that is required to be taken in the
allocation path.
Add a new can_fill_pool() helper and reorder the exception rule and forbid
this scenario by excluding allocations from hard interrupt context.
Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING") Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260605173038.495075-1-longman@redhat.com
Miguel Ojeda [Sat, 6 Jun 2026 12:01:29 +0000 (14:01 +0200)]
Merge tag 'pin-init-v7.2' of https://github.com/Rust-for-Linux/linux into rust-next
Pull pin-init updates from Gary Guo:
"User visible changes:
- Do not generate 'non_snake_case' warnings for identifiers that are
syntactically just users of a field name. This would allow all
'#[allow(non_snake_case)]' in nova-core to be removed, which I will
send to the nova tree next cycle.
- Filter non-cfg attributes out properly in derived structs. This
improves pin-init compatibility with other derive macros.
- Insert projection types' where clause properly.
Other changes:
- Bump MSRV to 1.82, plus associated cleanups.
- Overhaul how init slots are projected. The new approach is easier
to justify with safety comments.
- Mark more functions as inline, which should help mitigate the
super-long symbol name issue due to lack of inlining.
- Various small code quality cleanups."
* tag 'pin-init-v7.2' of https://github.com/Rust-for-Linux/linux: (27 commits)
rust: pin_init: internal: use `loop {}` to produce never value
rust: pin-init: remove `E` from `InitClosure`
rust: pin-init: move `InitClosure` out from `__internal`
rust: pin-init: docs: fix typos in MaybeZeroable documentation
rust: pin-init: internal: suppress `non_snake_case` lint in `[pin_]init!`
rust: pin-init: internal: suppress `non_snake_case` lint in `#[pin_data]`
rust: pin-init: internal: pin_data: filter non-`#[cfg]` attr in generated code
rust: pin-init: internal: project using full slot
rust: pin-init: internal: project slots instead of references
rust: pin-init: internal: make `make_closure` inherent methods
rust: pin-init: internal: use marker on drop guard type for pinned fields
rust: pin-init: internal: init: handle code blocks early
rust: pin-init: internal: add `PhantomInvariant` and `PhantomInvariantLifetime`
rust: pin-init: internal: pin_data: add struct to record field info
rust: pin-init: internal: pin_data: use closure for `handle_field`
rust: pin-init: examples: fix `useless_borrows_in_formatting` clippy warning
rust: pin-init: internal: remove `collect_tuple` polyfill after MSRV bump
rust: pin-init: internal: turn `PhantomPinned` error into warnings
rust: pin-init: cleanup workaround for old Rust compiler
rust: pin-init: fix badge URL in README
...
Rong Xu [Thu, 4 Jun 2026 19:56:08 +0000 (12:56 -0700)]
kconfig: Remove the architecture specific config for Propeller
The CONFIG_PROPELLER_CLANG option currently depends on
ARCH_SUPPORTS_PROPELLER_CLANG, but this dependency seems unnecessary.
Remove ARCH_SUPPORTS_PROPELLER_CLANG and allow users to control
Propeller builds solely through CONFIG_PROPELLER_CLANG. This simplifies
the kconfig and avoids potential confusion.
Move the .llvm_bb_addr_map sections grouping to
include/asm-generic/vmlinux.lds.h.
The Propeller documentation has been updated to reflect the most
recent tool location and now includes instructions for arm64.
Contributor Acknowledgments:
* SPE instructions: Daniel Hoekwater <hoekwater@google.com>
Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Will Deacon <will@kernel.org> Suggested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Yabin Cui <yabinc@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20260604195612.3757860-3-xur@google.com Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Rong Xu [Thu, 4 Jun 2026 19:56:07 +0000 (12:56 -0700)]
kconfig: Remove the architecture specific config for AutoFDO
The CONFIG_AUTOFDO_CLANG option currently depends on
ARCH_SUPPORTS_AUTOFDO_CLANG, but this dependency seems unnecessary.
Remove ARCH_SUPPORTS_AUTOFDO_CLANG and allow users to control AutoFDO
builds solely through CONFIG_AUTOFDO_CLANG. This simplifies the kconfig
and avoids potential confusion.
Expand the AutoFDO documentation to include instructions for arm64.
Contributor acknowledgments:
* SPE instructions: Daniel Hoekwater <hoekwater@google.com>
* ETM instructions: Yabin Cui <yabinc@google.com>
Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Will Deacon <will@kernel.org> Tested-by: Yabin Cui <yabinc@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20260604195612.3757860-2-xur@google.com Signed-off-by: Nathan Chancellor <nathan@kernel.org>
James Lee [Thu, 4 Jun 2026 06:03:00 +0000 (14:03 +0800)]
modpost: Add __llvm_covfun and __llvm_covmap to section_white_list
Modpost emits hundreds of warnings when using Clang to build for ARCH=um
and CONFIG_GCOV=y. e.g.:
vmlinux (__llvm_covfun): unexpected non-allocatable section.
Did you forget to use "ax"/"aw" in a .S file?
Note that for example <linux/init.h> contains
section definitions for use in .S files.
For example, when we use LLVM for a kunit user mode build with coverage:
python3 tools/testing/kunit/kunit.py build --make_options LLVM=1 \
--kunitconfig=tools/testing/kunit/configs/default.config \
--kunitconfig=tools/testing/kunit/configs/coverage_uml.config
The behaviour occurs when building the kernel for ARCH=um with code
coverage enabled. The warnings come from modpost's check_sec_ref
function, which ensures no sections reference others that will be
discarded. covfun and covmap sections must reference __init and __exit
sections to collect coverage data, triggering the modpost warning.
To suppress these warnings, these section names have been added to
modpost's whitelist. This is unlikely to suppress legitimate warnings as
Clang will only insert these sections when building with coverage, and
can be assumed to manage these references safely.
Daniel Borkmann [Fri, 5 Jun 2026 21:35:18 +0000 (23:35 +0200)]
selftests/bpf: Inspect the signature verdict exposed to BPF LSM
Add a minimal BPF LSM program on lsm/bpf_prog_load that, for loads on
the monitored thread, reads back prog->aux->sig.{verdict,keyring_type,
keyring_serial}, and a signed_loader subtest that drives the same
gen_loader loader through the hook twice: i) /unsigned/ where the LSM
must observe UNSIGNED, no keyring and serial 0; ii) /signed/ where the
very same insns signed against the session keyring must be observed as
VERIFIED with a user keyring, and the recorded keyring_serial must be
equal to the resolved session keyring serial. Loading (not running) the
loader is sufficient since the verdict is attached at load time.
KP Singh [Fri, 5 Jun 2026 21:35:17 +0000 (23:35 +0200)]
bpf: Expose signature verdict via bpf_prog_aux
BPF_PROG_LOAD verifies the loader signature but does not record the
outcome on the BPF program. [BPF] LSMs and audit can read attr->signature
and attr->keyring_id to infer "was this signed, and if so, against which
keyring".
Add prog->aux->sig (verdict + keyring_{type,serial}), populated by
bpf_prog_load before the LSM hook. keyring_type classifies the keyring
the load referenced (builtin, secondary, platform or user), while
keyring_serial records the serial of the keyring the signature was
actually validated against. System keyrings carry a pseudo key pointer
with no user-visible serial and are reported as 0, as are unsigned loads.
Failed verifications reject the load before the hook runs, so it observes
only either UNSIGNED or VERIFIED.
Signed-off-by: KP Singh <kpsingh@kernel.org> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260605213518.544262-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
====================
selftests/bpf: libarena: Add initial data structures
Add two new data structures to libarena. These data structures initially
resided in the sched-ext repo (https://github.com/sched-ext/scx) and
have been adapted to the internal libarena build system. The data
structures are:
- Red black tree: Fundamental tree data structure that can also serve
as a base for more domain-specific data structures.
- Lev-Chase deque: Queue data structure that allows efficient work
stealing, useful in scheduling scenarios.
The data structures are accompanied by selftests that are automatically
discovered by the existing libarena test_progs selftest and incorporated
in the CI.
CHANGELOG
=========
v3 -> v4 (https://lore.kernel.org/bpf/20260604235016.20856-1-emil@etsalapatis.com/)
- Turn off load_acquire/store_relesase - dependent selftests for s390 (CI)
- Various style/non-functional nits (AI)
- Add workaround to handle LLVM 21 and GCC 15 assignment-to-memset promotions
that are causing verification failures for arena programs (CI)
- Incorporate Sashiko feedback for cleanup edge cases (Sashiko)
- Simplify some of the ordering semantics in spmc
- Rename tests from st_ to test_ (Alexei)
- Removed the freelist caches from the rbtrees, previously used to defer freeing (Alexei)
- Moved the type and function definitions to use the __arena identifier
- Removed the typecasts during function return and directly return __arena
pointers (Alexei)
- Renamed queues to spmc queues to abstract away the algorithm (Alexei)
- Adjusted the memory barriers in the spmc queue
- Added multithreaded testing harness for libarena programs (Alexei)
- Added parallel selftest for queues (Alexei)
- Split least upper bound and exact find operations back into separate
functions to prevent RB_DUPLICATE-related bug (AI)
====================
Emil Tsalapatis [Fri, 5 Jun 2026 22:20:20 +0000 (18:20 -0400)]
selftests/bpf: libarena: parallel test harness and spmc parallel selftest
Add a parallel test for the SPMC Lev-Chase workstealing queue. The queue
is built to be wait-free even when there are multiple consumers, and
the parallel selftest provides a signal on whether the queue behaves
correctly when stress tested.
To support the test, this patch includes a test harness for parallel
selftests. The spmc selftest acts as an example of the naming and other
conventions expected by the harness.
Emil Tsalapatis [Fri, 5 Jun 2026 22:20:19 +0000 (18:20 -0400)]
selftests/bpf: libarena: Add spmc queue data structure
Expand libarena with a single producer multiple consumer deque data
structure. This is a single producer, multiple consumer lockless structure
that permits efficient work stealing. The structure is a Lev-Chase queue,
so it is lock-free and wait-free.
The data structure exposes three main calls. two of them are available to
the thread owning the queue and one available to all threads in the program:
spmc_owner_push(): Push an item to the top of the queue.
spmc_owner_pop(): Pop an item from the top of the queue.
spmc_steal(): Steal a thread from the bottom of the queue from
any thread.
Note that the queue is not really FIFO for all consumers, since
non-owners of the queue can only work steal from the bottom.
Emil Tsalapatis [Fri, 5 Jun 2026 22:20:18 +0000 (18:20 -0400)]
selftests/bpf: libarena: Add rbtree data structure
Add a native red-black tree data structure to libarena.
The data structure supports multiple APIs (key-value based,
node based) with which users can query and modify it. The
tree uses the libarena memory allocator to manage its data.
Nazim Amirul [Thu, 4 Jun 2026 08:30:37 +0000 (01:30 -0700)]
net: stmmac: xgmac: report L3/L4 filter match count in ethtool stats
Read the L3FM and L4FM bits from the RX descriptor status word (RDES2)
and increment the corresponding ethtool statistics counters. This allows
users to observe L3/L4 filter hit rates via ethtool -S.
Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260604083037.24407-1-muhammad.nazim.amirul.nazle.asmade@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Haoxiang Li [Wed, 3 Jun 2026 06:17:16 +0000 (14:17 +0800)]
net: microchip: sparx5: clean up PSFP resources on flower setup failure
sparx5_tc_flower_psfp_setup() allocates PSFP stream gate, flow meter and
stream filter resources before adding VCAP actions. If a later step
fails, the resources allocated earlier in the function are not unwound.
Add error paths to release the stream filter, flow meter and stream gate
when setup fails after they have been acquired.
Also make sparx5_psfp_fm_add() return the acquired flow-meter id before
the existing-flow-meter early return. When an existing flow meter is
reused, sparx5_psfp_fm_get() increments its pool reference count, but the
caller previously kept psfp_fmid as 0. If a later setup step failed, the
error path could try to delete flow-meter id 0 instead of the reused flow
meter, leaving the incremented reference behind.
Chenguang Zhao [Wed, 3 Jun 2026 01:13:53 +0000 (09:13 +0800)]
netlabel: validate unlabeled address and mask attribute lengths
netlbl_unlabel_addrinfo_get() used the address attribute length to
determine whether the attribute data could be read as an IPv4 or IPv6
address, but did not independently validate the corresponding mask
attribute length. A crafted Generic Netlink request could therefore
provide a valid IPv4/IPv6 address attribute with a shorter mask
attribute, which would later be read as a full struct in_addr or
struct in6_addr.
NLA_BINARY policy lengths are maximum lengths by default, so use
NLA_POLICY_EXACT_LEN() for the unlabeled IPv4/IPv6 address and mask
attributes. This rejects short attributes during policy validation and
also exposes the exact length requirements through policy introspection.
Fixes: 8cc44579d1bd ("NetLabel: Introduce static network labels for unlabeled connections") Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: airoha: Support multiple net_devices connected to the same GDM port
EN7581 or AN7583 SoCs support connecting multiple external SerDes (e.g.
Ethernet or USB SerDes) to GDM3 or GDM4 ports via a hw arbiter that
manages the traffic in a TDM manner. As a result multiple net_devices can
connect to the same GDM{3,4} port and there is a theoretical "1:n"
relation between GDM ports and net_devices.
This series introduces support for multiple net_devices connected to the
same Frame Engine (FE) GDM port (GDM3 or GDM4) via an external hw
arbiter. Please note GDM1 or GDM2 does not support the connection with
the external arbiter.
====================
net: airoha: Support multiple LAN/WAN interfaces for hw MAC address configuration
The EN7581 and AN7583 SoCs provide registers to configure hardware LAN/WAN
MAC addresses. These registers are used during FE hw acceleration to
determine whether received traffic is destined to this host (L3 traffic)
or should be switched to another device (L2 traffic).
The SoC hardware design assumes all interfaces configured as LAN (or WAN)
share the MAC address MSBs, which are programmed into the
REG_FE_{LAN,WAN}_MAC_H register. The LSBs of 'local' mac addresses can be
expressed as a range via the REG_FE_MAC_LMIN and REG_FE_MAC_LMAX
registers. In order to properly accelerate the traffic, FE module requires
the user to configure the REG_FE_{LAN,WAN}_MAC_H register respecting this
limitation. Please note a misconfiguration in REG_FE_{LAN,WAN}_MAC_H
will still allow the user to log into the device for debugging.
Previously, only a single interface was considered when programming these
registers. Extend the logic to derive the correct minimum and maximum
values for REG_FE_MAC_LMIN/REG_FE_MAC_LMAX when two or more interfaces are
configured as LAN or WAN. Since this functionality was not available
before this series, no regression is introduced.
Introduce WAN flag to specify if a given device is used to transmit/receive
WAN or LAN traffic. Current codebase supports specifying LAN/WAN device
configuration in ndo_init() callback during device bootstrap.
In order to consider setups where LAN configuration is used even for
GDM3/GDM4 devices, check airoha_is_lan_gdm_dev() to select pse_port in
airoha_ppe_foe_entry_prepare().
Please note after this patch, it will be possible to specify multiple LAN
devices but just a single WAN one. Please note this change is not visible
to the user since airoha_eth driver currently supports just the internal
phy available via the MT7530 DSA switch and there are no WAN interfaces
officially supported since PCS/external phy is not merged mainline yet
(it will be posted with following patches).
Theoretically, in the current codebase, two independent net_devices can
be connected to the same GDM port so we need to check the GDM port is not
used by any other running net_device before setting the forward
configuration to FE_PSE_PORT_DROP.
Moreover, always set in GDM_LONG_LEN_MASK field of REG_GDM_LEN_CFG
register the maximum MTU of all running net_devices connected to the same
GDM port.
net: airoha: Support multiple net_devices for a single FE GDM port
EN7581 or AN7583 SoCs support connecting multiple external SerDes (e.g.
Ethernet or USB SerDes) to GDM3 or GDM4 ports via a hw arbiter that
manages the traffic in a TDM manner. As a result multiple net_devices can
connect to the same GDM{3,4} port and there is a theoretical "1:n"
relation between GDM ports and net_devices.
Introduce support for multiple net_devices connected to the same Frame
Engine (FE) GDM port (GDM3 or GDM4) via an external hw arbiter.
Please note GDM1 or GDM2 does not support the connection with the external
arbiter.
Add get_dev_from_sport callback since EN7581 and AN7583 have different
logics for the net_device type connected to GDM3 or GDM4.
net: airoha: Remove private net_device pointer in airoha_gdm_dev struct
Remove redundant net_device pointer inside airoha_gdm_dev struct and
rely on netdev_from_priv routine instead. Please note this patch does
not introduce any logical change, just code refactoring.
dt-bindings: net: airoha: Add GDM port ethernet child node
EN7581 and AN7583 SoCs support connecting multiple external SerDes to GDM3
or GDM4 ports via a hw arbiter that manages the traffic in a TDM manner.
As a result multiple net_devices can connect to the same GDM{3,4} port
and there is a theoretical "1:n" relation between GDM ports and
net_devices.
Introduce the ethernet node child of a specific GDM port in order to model
a given net_device that is connected via the external arbiter to the
GDM{3,4} port. This new ethernet node is defined by the "airoha,eth-port"
compatible string. Please note GDM1 and GDM2 does not support the
connection with the external arbiter and they are represented by an
ethernet node defined by the "airoha,eth-mac" compatible string.
====================
net: mdio: realtek-rtl9300: Refactor initialization and port lookup
The Realtek Otto switch platform consists of four different series
- RTL838x aka maple : 28 port 1G Switches
- RTL839x aka cypress : 52 port 1G Switches
- RTL930x aka longan : 28 port 1G/2.5G/10G Switches
- RTL931x aka mango : 56 port 1G/2.5G/10G Switches
A lot has been done to enhance the ethernet MDIO driver for simple
integration of more SoCs. Now it is time to solve inconveniences
that were discovered during daily operation. That includes
- Consistent "of" API usage
- Tightening error handling and improving overall robustness.
- Adding support for PHY packages.
- Fixing setup order issues. These currently hinder the driver from
properly enabling the hardware on devices where U-Boot skips the
setup and leaves the controller registers untouched.
====================
Realtek Otto switches usually make use of multiport PHYs (e.g. 8 port
1G RTL8218D or 4 port 2.5G RTL8224). The device tree can describe this
fact via an "ethernet-phy-package" node that resides between the bus
and the PHY node.
When looking up the device tree bus node via the chain port->phy->parent
the driver totally ignores the existence of a PHY package. Enhance the
lookup to take care of this feature.
After the former refactoring the existing otto_emdio_9300_mdiobus_init()
contains only the c22/c45 bus mode setup. Like the topology setup this
must run before bus registration. Otherwise the bus does not "speak" the
right protocol for PHY setup.
This setup is device-specific and other SoCs will need to set up other
register bits in the controller in the future. Therefore
- Relocate c22/c45 device tree readout to the very beginning of the probing
- Add a new device-specific setup_controller() into the info structure.
- Relocate otto_emdio_priv to satisfy the new info structure dependency.
- Rename otto_emdio_9300_mdiobus_init accordingly and add it to the
RTL9300 info structure. At the same time, adapt register naming
for the function to make it clear that it only applies to this SoC.
- Call setup_controller() prior to bus registration.
net: mdio: realtek-rtl9300: relocate c22/c45 device tree readout
otto_emdio_map_ports() is the central place to lookup the topology and the
properties of the Realtek ethernet MDIO controller from the device tree.
Deviating from this the c22/c45 detection via "ethernet-phy-ieee802.3-c45"
is running separately in otto_emdio_probe_one(). It loops over the same
nodes, just at a later point in time.
There is no benefit to divide this setup and to have a time window where
the data structure is only filled partially. Additionally it uses the
"fwnode" API. Consolidate the setup and convert it to the "of" API.
Remark. This is a subtle change for dangling PHY nodes (not referenced
by ethernet-ports). Before this commit all PHY nodes were evaluated for
c45 setup, now only the referenced ones.
Until now the driver sets up the port to bus/address topology of the
controller after all buses are set up via otto_emdio_probe_one(). This
does not work for devices where U-Boot skips this setup. It is not
only needed for the hardware internal background PHY polling engine
but it is essential for access to the PHYs during probing.
Depending on the SoC type there exist two different register arrays
- Bus mapping registers (RTL930x, RTL931x) define to which bus the port
is attached. E.g. [1]
- Address mapping registers (RTL838x, RTL930x, RTL931x) define to which
address of the bus the port is attached. E.g. [2]
Relocate the topology setup and make it generic. For this
- Define device-specific bus_base/addr_base attributes that give the
register base address where the mapping lives. In case one or both are
not given the SoC does not support this specific type of mapping.
- Create a helper otto_emdio_setup_topology() that writes the detected
topology to the registers.
- Call this helper prior to otto_emdio_probe_one().
- Remove unneeded code from otto_emdio_9300_mdiobus_init().
- Due to the added prefixes, increase define indentation
Subtle change: The old coding used regmap_bulk_write and silently wrote
bus=0/address=0 to mapping registers for ports that are out of scope.
The new coding leaves those untouched.
The bus probing of the MDIO driver uses a two stage approach.
1. The device tree "ethernet-ports" node is scanned to build a mapping
between ports and PHYs.
2. The children of the device tree "controller" are scanned to create
the individual MDIO buses.
The first step already checks the consistency of the PHY and bus nodes
that are linked via the ports. But it might miss a dangling bus child
node that is not linked. Step two simply iterates over all bus child
nodes and might read malformed data from nodes not checked in step one.
Harden this and return a meaningful error message.
Due to its design the MDIO driver needs to set up a port to bus/address
mapping during probing. The "ethernet-ports" subnodes are scanned and
from the "phy-handle" property the MDIO nodes are looked up. In case of
a malformed device tree the driver might produce out-of-bounds accesses.
The PHY address is not checked against the maximum supported address.
Add a sanity check and drop the unneeded MAX_SMI_ADDR define.
This function has multiple issues:
- It uses __free low level cleanups
- It mixes "fwnode" and "of" functions
Convert this to a uniform "of" usage and manual reference counting
cleanup. With that also fix two subtle lookup bugs in the original
code.
mdio_dn = phy_dn->parent;
if (mdio_dn->parent != dev->of_node)
continue;
This skips an API access and therefore misses reference counting.
Additionally in the case of a very buggy device tree, phy_dn might
be a root node. Looking up its grandparent leads to a NULL pointer
access.
Vikas Gupta [Thu, 4 Jun 2026 16:37:09 +0000 (22:07 +0530)]
bnge: fix context mem iteration
The firmware advertises context memory (backing store) types
through a linked list, with BNGE_CTX_INV serving as the
end-of-list sentinel.
However, the driver incorrectly assumes that the list is strictly
ordered and prematurely terminates traversal when it encounters
an unrecognized type (>=BNGE_CTX_V2_MAX). As a result, any valid
context types that appear later in the chain are silently skipped,
leading to incomplete memory configuration and eventual driver load
failure.
Fix this by traversing the entire list until the BNGE_CTX_INV sentinel
is reached, while safely ignoring only those context types that fall
outside the supported range.
Fixes: 29c5b358f385 ("bng_en: Add backing store support") Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Dharmender Garg <dharmender.garg@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bit 16 of the MAC HW Feature1 register reports the DCB (Data Centre
Bridging) feature. Read it so that dma_cap.dcben and the debugfs
report it accurately. Right now it is always reported as being disabled.
Add dma_rmb() barrier after req_id completion check in
ena_com_phc_get_timestamp(). On weakly-ordered architectures,
payload fields may be read before req_id is observed as updated.
Fixes: e0ea34158ee8 ("net: ena: Add PHC support in the ENA driver") Closes: https://sashiko.dev/#/patchset/20260430032507.11586-1-akiyano%40amazon.com Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ZhaoJinming [Thu, 4 Jun 2026 07:03:52 +0000 (15:03 +0800)]
net: airoha: Add NULL check for of_reserved_mem_lookup() in airoha_qdma_init_hfwd_queues()
of_reserved_mem_lookup() may return NULL if the reserved memory region
referenced by the "memory-region" phandle is not found in the reserved
memory table (e.g. due to a misconfigured DTS or a removed
memory-region node). The current code dereferences the returned
pointer without checking for NULL, leading to a kernel NULL pointer
dereference at the following lines:
dma_addr = rmem->base; // line 1156
num_desc = div_u64(rmem->size, buf_size); // line 1160
Add a NULL check after of_reserved_mem_lookup() and return -ENODEV if
the lookup fails, which is consistent with the existing error handling
for of_parse_phandle() failure in the same code block.
Fixes: 3a1ce9e3d01b ("net: airoha: Add the capability to allocate hwfd buffers via reserved-memory") Cc: stable@vger.kernel.org Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maoyi Xie [Thu, 4 Jun 2026 05:49:49 +0000 (13:49 +0800)]
hsr: broadcast netlink notifications in the device's net namespace
The HSR generic netlink family sets .netnsok = true. HSR devices can
live in network namespaces other than init_net.
Two async notifiers broadcast events with genlmsg_multicast(). They
are hsr_nl_ringerror() and hsr_nl_nodedown(). That helper delivers
only on the default genl socket in init_net. So the events always land
in init_net. The network namespace of the device does not matter.
This has two effects. A listener in the device's own namespace never
sees its own ring error and node down events. A privileged listener in
init_net receives events from HSR devices in other namespaces. The
payload carries the peer node MAC (HSR_A_NODE_ADDR) and the slave port
ifindex (HSR_A_IFINDEX).
Switch both callers to genlmsg_multicast_netns(). Other families with
.netnsok = true already do this. Examples are gtp, ovpn, team,
batman-adv, netdev-genl, ethtool and handshake.
hsr_nl_ringerror() already has the slave port. It uses
dev_net(port->dev). hsr_nl_nodedown() takes the namespace from the
master port via hsr_port_get_hsr().
====================
net: devmem: allow bind-rx from non-init user namespaces
NETDEV_CMD_BIND_RX is GENL_ADMIN_PERM, which checks CAP_NET_ADMIN
against init_user_ns. With netkit and netns support for devmem, it is
now useful to let workloads holding CAP_NET_ADMIN only in their own
user_ns issue bind-rx for a netns owned by that user_ns.
The first patch switches the flag to GENL_UNS_ADMIN_PERM so the check
uses the target netns's owning user_ns. Init remains permitted.
The second patch just adds test cases. They are identical to
nk_devmem.py tests, but using a non-init userns.
====================
Bobby Eshleman [Wed, 3 Jun 2026 01:37:32 +0000 (18:37 -0700)]
selftests: drv-net: add userns devmem RX test
Add userns_devmem.py, which mirrors nk_devmem.py but places the netkit
guest in a netns whose owning user_ns is non-init. ncdevmem is ran there
via nsenter so the bind-rx call is issued with creds that hold
CAP_NET_ADMIN only in the child user_ns.
Without the preceding GENL_UNS_ADMIN_PERM patch the test fails at
bind-rx with EPERM, but with the patch the transfer completes and tests
pass.
Bobby Eshleman [Wed, 3 Jun 2026 01:37:31 +0000 (18:37 -0700)]
net: devmem: allow bind-rx from non-init user namespaces
NETDEV_CMD_BIND_RX is currently GENL_ADMIN_PERM, which checks
CAP_NET_ADMIN against init userns. With recent container/netkit/ns
support for devmem, other userns/netns use cases come online and require
bind-rx to allow CAP_NET_ADMIN in non-init user ns as well.
Switch the flag to GENL_UNS_ADMIN_PERM to allow bind-rx for
CAP_NET_ADMIN in the netns's owning userns as well.
Linus Torvalds [Sat, 6 Jun 2026 01:02:23 +0000 (18:02 -0700)]
Merge tag 'drm-fixes-2026-06-06' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly drm fixes, not contributing to things settling down
unfortunately. Lots of driver fixes for various bounds checks, leaks
and UAF type things, i915/xe probably the most sane, amdgpu has a mix
of fixes all over, then ethosu has lots of small fixes.
The problem of fixing thing in private has really hit us with the
change handle ioctl, and "Sima was right" and we should have disabled
the ioctl, since it was only introduced a couple of kernels ago and
failed to upstream it's tests in time.
The patch here fixes the problems Sima identified, but disables the
ioctl as well, with a list of known problems in it and a request for
proper tests to be written and upstreamed. It's a niche user ioctl
designed for CRIU with AMD ROCm, so I think it's fine to just disable
it.
Maybe this week will settle down.
core:
- disable the gem change handle ioctl for security reasons (plan to
fix it on list later with proper test coverage)
dumb-buffer:
- remove strict limits on buffer geometry
amdkfd:
- UAF race fix
- Fix a potential NULL pointer dereference
- GC 11 buffer overflow fix for SDMA
xe:
- Revert removing support for unpublished NVL-S GuC
- Suspend fixes related to multi-queue
i915:
- Fix color blob reference handling in intel_plane_state
- Revert "drm/i915/backlight: Remove try_vesa_interface"
ethosu:
- reject unsupported NPU_OP_RESIZE
- fix index of IFM region
- fix weight index
- fix overflows in DMA-size calculations
- reject DMA commands with uninitialized length
- fix OOB write in ethosu_gem_cmdstream_copy_and_validate
imx:
- fix kernel-doc warnings
ivpu:
- add overflow checks in firmware handling and get_info_ioctl
v3d:
- wait for pending L2T flush before cleaning caches
- fix leak of vaddr
- skip CSD when it has zeroed workgroups
- fix ref counting in performance monitoring"
* tag 'drm-fixes-2026-06-06' of https://gitlab.freedesktop.org/drm/kernel: (50 commits)
drm/gem: Try to fix change_handle ioctl, attempt 4
Revert "drm/i915/backlight: Remove try_vesa_interface"
accel/ethosu: fix OOB write in ethosu_gem_cmdstream_copy_and_validate()
accel/ethosu: reject DMA commands with uninitialized length
accel/ethosu: fix arithmetic issues in dma_length()
accel/ethosu: fix wrong weight index in NPU_SET_SCALE1_LENGTH on U85
accel/ethosu: reject NPU_OP_RESIZE commands from userspace
accel/ethosu: fix IFM region index out-of-bounds in command stream parser
drm/v3d: Fix global performance monitor reference counting
drm/xe/multi_queue: skip submit when primary queue is suspended
drm/xe: Clear pending_disable before signaling suspend fence
Revert "drm/xe: Skip exec queue schedule toggle if queue is idle during suspend"
drm/amd/pm: smu_v14_0_0: use SoftMin for gfxclk in set_soft_freq_limited_range
drm/amdgpu: Fix incorrect VRAM GART mappings on non-4K page size systems
drm/amdgpu/userq: move wptr_obj cleanup in mqd_destroy
drm/amdgpu: improve the userq seq BO free bit lookup
drm/amdgpu/userq: remove the vital queue unmap logging
drm/amdkfd: Fix buffer overflow in SDMA queue checkpoint/restore on GFX11
drm/amdkfd: fix NULL dereference in get_queue_ids()
drm/amdgpu: set noretry=1 as default for GFX 10.1.x (Navi10/12/14)
...
Eric Dumazet [Thu, 4 Jun 2026 14:13:39 +0000 (14:13 +0000)]
bridge: provide lockless access to p->designated_port
Add READ_ONCE()/WRITE_ONCE() annotations around p->designated_port
This is needed at least for sysfs show_designated_port(), BRCTL_GET_PORT_INFO
and upcoming RTNL avoidance in "ip link" dumps (cf br_port_fill_attrs()).
Eric Dumazet [Thu, 4 Jun 2026 14:13:38 +0000 (14:13 +0000)]
bridge: provide lockless access to p->designated_cost
Add READ_ONCE()/WRITE_ONCE() annotations around p->designated_cost
This is needed at least for sysfs show_designated_cost(), BRCTL_GET_PORT_INFO
and upcoming RTNL avoidance in "ip link" dumps (cf br_port_fill_attrs()).
Qingfang Deng [Wed, 3 Jun 2026 06:17:44 +0000 (14:17 +0800)]
selftests: net: do not detect PPPoX loopback
By default, pppd attempts to detect loopbacks on the underlying
interface using a pseudo-randomly generated magic number and checks if
the same value is received. The seed for the PRNG is a hash of hostname
XOR current time XOR pid, which is likely to collide on NIPA, causing
false positives. Disable magic number generation.
Reported-by: Matthieu Baerts <matttbe@kernel.org> Fixes: 7af2a94f4dcf ("selftests: net: add tests for PPPoL2TP") Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev> Link: https://patch.msgid.link/20260603061746.23452-1-qingfang.deng@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guangshuo Li [Thu, 4 Jun 2026 04:31:15 +0000 (12:31 +0800)]
net: cpsw_new: unregister devlink on port registration failure
cpsw_probe() registers devlink before registering the CPSW ports.
If cpsw_register_ports() fails, the error path only unregisters the
notifiers and then releases the lower level resources. It does not undo
the successful cpsw_register_devlink() call, leaving the devlink instance
and its parameters registered after probe has failed.
Add a devlink cleanup label for the path where devlink registration has
already succeeded, and use it when port registration fails.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Link: https://patch.msgid.link/20260604043115.1409134-1-lgs201920130244@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alok Tiwari [Tue, 2 Jun 2026 22:55:11 +0000 (15:55 -0700)]
idpf: fix mailbox capability for set device clock time
The current code incorrectly uses VIRTCHNL2_CAP_PTP_SET_DEVICE_CLK_TIME
for both direct and mailbox capabilities, causing mailbox-only support
to be ignored and potentially reporting IDPF_PTP_NONE.
Fixes: d5dba8f7206da ("idpf: add PTP clock configuration") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260602225513.393338-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Oros [Tue, 2 Jun 2026 22:55:10 +0000 (15:55 -0700)]
ice: fix missing priority callbacks for U.FL DPLL pins
The U.FL2 input pin advertises DPLL_PIN_CAPABILITIES_PRIORITY_CAN_CHANGE
in its capability mask, but ice_dpll_pin_ufl_ops does not provide
.prio_get and .prio_set callbacks. As a result the DPLL subsystem
cannot report or accept priority for U.FL pins: pin-get omits the prio
field on U.FL2 and pin-set with prio is rejected as invalid, even
though the capability is present. This prevents user space from using
priority to select or disable U.FL2 as a DPLL input source.
Reproducer with iproute2 (dpll command):
# dpll pin show board-label U.FL2
pin id 16:
module-name ice
board-label U.FL2
type ext
capabilities priority-can-change|state-can-change
parent-device:
id 0 direction input state selectable phase-offset 0
/* note: no "prio" between "direction" and "state",
even though priority-can-change is advertised */
# dpll pin set id 16 parent-device 0 prio 5
RTNETLINK answers: Operation not supported
After the fix the prio field is reported by pin show and pin set with
prio is accepted on U.FL2.
Add the missing .prio_get and .prio_set callbacks to
ice_dpll_pin_ufl_ops, reusing ice_dpll_sw_input_prio_{get,set}. The
same ops struct is shared by U.FL1 and U.FL2: U.FL2 (input) delegates
to the backing hardware input pin, while U.FL1 (output) does not
advertise DPLL_PIN_CAPABILITIES_PRIORITY_CAN_CHANGE so the dpll core
capability gate never invokes prio_set for it, and prio_get reports
the OUTPUT sentinel (ICE_DPLL_PIN_PRIO_OUTPUT) on the output side
exactly like the SMA path does today.
Fixes: 2dd5d03c77e2 ("ice: redesign dpll sma/u.fl pins control") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Petr Oros <poros@redhat.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260602225513.393338-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Young [Fri, 5 Jun 2026 15:14:16 +0000 (16:14 +0100)]
selftests/bpf: Fix test_lirc test
Since commit 68a99f6a0ebf ("media: lirc: report ir receiver overflow"),
the rc-loopback driver does not accept edges over 50ms, as these are
never seen in real life ir protocols. Fix this.
====================
Add validation for bpf_set_retval helper
From: Xu Kuohai <xukuohai@huawei.com>
The bpf_set_retval() helper is used by cgroup BPF programs to set the
return value of the kernel hook. The argument type for this helper is
ARG_ANYTHING. This allows setting a positive value, which no cgroup
hook expects and can cause issues, such as the kernel panic reported
in [1].
This series adds validation for the argument of the bpf_set_retval()
helper.
For BPF_LSM_CGROUP, the same validation as BPF_LSM_MAC is enforced,
i.e. validate the argument against the LSM hook specific range, which
is returned by bpf_lsm_get_retval_range().
For all other cgroup program types, restrict the argument to
[-MAX_ERRNO, 0], which matches the kernel convention of 0 for success
and negative errno for error.
BPF_CGROUP_GETSOCKOPT is an exception from this restriction, since valid
getsockopt implementations may return positive values (e.g. optlen), as
allowed by commit c4dcfdd406aa ("bpf: Move getsockopt retval to struct
bpf_cg_run_ctx").
v5:
- Use resolve_prog_type(env->prog) instead of env->prog->type for prog type checks
- Target bpf-next tree
v4: https://lore.kernel.org/bpf/20260604130458.617765-1-xukuohai@huaweicloud.com
- Remove the return value limit for BPF_CGROUP_GETSOCKOPT type
- Refine the range of return value of bpf_get_retval helper
v3: https://lore.kernel.org/bpf/20260530101239.590395-1-xukuohai@huaweicloud.com/
- Mark R1 as precise to prevent validation bypass via branch pruning (sashiko)
v2: https://lore.kernel.org/bpf/20260530055557.549474-1-xukuohai@huaweicloud.com/
- Extend validation from LSM cgroup BPF type to all cgroup BPF types (sashiko)
Xu Kuohai [Fri, 5 Jun 2026 14:02:42 +0000 (14:02 +0000)]
bpf: Add validation for bpf_set_retval argument
The bpf_set_retval() helper is used by cgroup BPF programs to set the
return value of the target hook. The argument type for this helper is
ARG_ANYTHING. This allows setting a positive value, which no cgroup
hook expects and can cause issues, such as:
- BPF_LSM_CGROUP: a positive value from bpf_lsm_socket_create bypasses
the err < 0 check in __sock_create(), leaving the socket object
unallocated. The positive return value is then propagated to the
syscall entry __sys_socket(), which also bypasses the IS_ERR() guard
and ultimately causes a NULL pointer dereference.
- BPF_CGROUP_DEVICE: a positive value can be returned through cgroup
device bpf prog -> devcgroup_check_permission() -> bdev_permission()
-> bdev_file_open_by_dev(), where ERR_PTR(positive) produces a pointer
that IS_ERR() does not catch, leading to a wild pointer dereference.
- BPF_CGROUP_SOCK: a positive value can be returned through cgroup sock
bpf prog -> __cgroup_bpf_run_filter_sk() -> inet_create() ->
__sock_create(), where inet_create() frees the newly allocated sk
via sk_common_release() and sets sock->sk = NULL on the non-zero
return, but __sock_create() only checks err < 0 for cleanup, so a
positive retval bypasses cleanup and returns a socket with NULL sk
to userspace, triggering a NULL pointer dereference on subsequent
socket operations.
- BPF_CGROUP_SYSCTL: a positive value can be returned through the cgroup
bpf prog -> __cgroup_bpf_run_filter_sysctl() -> proc_sys_call_handler(),
where a non-zero return bypasses the normal sysctl proc_handler and is
returned directly to userspace as return value of read() or write()
syscall.
So add validation for the argument of the bpf_set_retval() helper.
For BPF_LSM_CGROUP, enforce the LSM hook specific range returned by
bpf_lsm_get_retval_range().
For all other cgroup program types, restrict the argument to
[-MAX_ERRNO, 0], which matches the kernel convention of 0 for success
and negative errno for error.
BPF_CGROUP_GETSOCKOPT is an exception, since valid getsockopt
implementations may return positive values, as allowed by commit c4dcfdd406aa ("bpf: Move getsockopt retval to struct bpf_cg_run_ctx").
Also refine the return value range of bpf_get_retval() so that
values returned by bpf_get_retval() can be passed directly to
bpf_set_retval() without extra manual bounds checking.
Xu Kuohai [Fri, 5 Jun 2026 14:02:41 +0000 (14:02 +0000)]
selftests/bpf: Restrict bpf_set_retval argument in sk_bypass_prot_mem
Test sk_bypass_prot_mem passes an unchecked value as argument to helper
bpf_set_retval(). The argument can be outside the valid range enforced
by the strict retval validation added in the next patch.
Restrict the argument to -EFAULT when it is outside the valid range, so
the test will not be rejected by the verifier when retval validation
is enforced.
Simona Vetter [Thu, 4 Jun 2026 19:44:37 +0000 (21:44 +0200)]
drm/gem: Try to fix change_handle ioctl, attempt 4
[airlied: just added some comments on how to reenable]
On-list because the cat is out of the bag and we're clearly not good
enough to figure this out in private. The story thus far:
5e28b7b94408 ("drm: Set old handle to NULL before prime swap in
change_handle") tried to fix a race condition between the gem_close and
gem_change_handle ioctls, but got a few things wrong:
- There's a confusion with the local variable handle, which is actually
the new handle, and so the two-stage trick was actually applied to the
wrong idr slot. 7164d78559b0 ("drm/gem: fix race between
change_handle and handle_delete") tried to fix that by adding yet
another code block, but forgot to add the error handling. Which meant
we now have two paths, both kinda wrong.
- dc366607c41c ("drm: Replace old pointer to new idr") tried to apply
another fix, but inconsistently, again because of the handle confusion
- this would be the right fix (kinda, somewhat, it's a mess) if we'd
do the two-stage approach for the new handle. Except that wasn't the
intent of the original fix.
We also didn't have an igt merged for the original ioctl, which is a big
no-go. This was attempted to address off-list in the original bugfix,
and amd QA people claimed the bug was fixed now. Very clearly that's not
the case. Here's my attempt to sort this out:
- Rename the local variable to new_handle, the old aliasing with
args->handle is just too dangerously confusing.
- Merge the gem obj lookup with the two-stage idr_replace so that we
avoid getting ourselves confused there.
- This means we don't have a surplus temporary reference anymore, only
an inherited from the idr. A concurrent gem_close on the new_handle
could steal that. Fix that with the same two-stage approach
create_tail uses. This is a bit overkill as documented in the comment,
but I also don't trust my ability to understand this all correctly, so
go with the established pattern we have from other ioctls instead for
maximum paranoia.
- Adjust error paths. I've tried to make the error and success paths
common, because they are identical except for which handle is removed
and on which we call idr_replace to (re)install the object again. But
that made things messier to read, so I've left it at the more verbose
version, which unfortunately hides the symmetry in the entire code
flow a bit.
- While at it, also replace the 7 space indent with 1 tab.
And finally, because I flat out don't trust my abilities here at all
anymore:
- Disable the ioctl until we have the igt situation and everything else
sorted out on-list and with full consensus.
v2:
Sashiko noticed that I didn't handle the error path for idr_replace
correctly, it must be checked with IS_ERR_OR_NULL like in
gem_handle_delete. So yeah, definitely should just the existing paths
1:1 because this is endless amounts of tricky.
Also add the Fixes: line for the original ioctl, I forgot that too.
Reported-by: DARKNAVY (@DarkNavyOrg) <vr@darknavy.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Fixes: dc366607c41c ("drm: Replace old pointer to new idr") Cc: syzbot+d7c9eed171647e421013@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Edward Adam Davis <eadavis@qq.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") Cc: David Francis <David.Francis@amd.com> Cc: Puttimet Thammasaeng <pwn8official@gmail.com> Cc: Christian Koenig <Christian.Koenig@amd.com> Fixes: 7164d78559b0 ("drm/gem: fix race between change_handle and handle_delete") Cc: Zhenghang Xiao <kipreyyy@gmail.com> Fixes: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") Reviewed-by: David Francis <David.Francis@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20260604194437.1725314-1-simona.vetter@ffwll.ch
====================
bpf: fix sysctl new-value handling in __cgroup_bpf_run_filter_sysctl
This series fixes three bugs in the sysctl write-buffer replacement path
of __cgroup_bpf_run_filter_sysctl(). It resolves a kvzalloc()/kfree()
mismatch, adds a missing NUL terminator to the replacement string, and
updates a stale return value check to safely restore the replacement
functionality.
Patch Summary:
- patch 1 NUL-terminates the replaced sysctl value
- patch 2 uses kvfree() for the replaced sysctl write buffer
- patch 3 restores sysctl new-value replacement
Changelog:
v2 -> v3:
- reordered patches 1 and 2
- added the missing Reviewed-by/Acked-by tags to patches 2 and 3
- fixed the incorrect Fixes tag in patch 3
- simplified the dynamic test logs in patch 1 and 2, and updated
titles
====================
Dawei Feng [Wed, 3 Jun 2026 10:53:17 +0000 (18:53 +0800)]
bpf: Restore sysctl new-value from 1 to 0
Commit 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value
helpers") changed the success return value to 0, but failed to update the
corresponding check in __cgroup_bpf_run_filter_sysctl(). Since
bpf_prog_run_array_cg() now returns 0 on success, the legacy ret == 1
condition is never satisfied. As a result, the modified value is ignored,
and bpf_sysctl_set_new_value() fails to replace the write buffer.
Fix this by checking for a return value of 0 instead, so cgroup/sysctl
programs can correctly replace the pending sysctl buffer.
This bug was discovered during a manual code review. Tested via a
cgroup/sysctl BPF reproducer overriding writes to a target sysctl.
Pre-fix, bpf_sysctl_set_new_value("foo") was silently ignored: the write
returned 8192 and the value remained "600". Post-fix, the BPF replacement
buffer properly propagates: the write returns 3 and the value updates to
"foo".
Dawei Feng [Wed, 3 Jun 2026 10:53:16 +0000 (18:53 +0800)]
bpf: use kvfree() for replaced sysctl write buffer
proc_sys_call_handler() allocates its temporary sysctl buffer with
kvzalloc() and passes it to __cgroup_bpf_run_filter_sysctl(). Since
kvzalloc() may fall back to vmalloc() for large allocations, freeing
that buffer with kfree() is wrong and can corrupt memory.
Use kvfree() to safely handle both kmalloc and kvzalloc()/vmalloc
allocations.
The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still
present in v7.1-rc5.
Reproduced the bug based on v7.1-rc4 in a QEMU x86_64 guest booted with
KASAN and CONFIG_FAILSLAB enabled. To exercise the replacement path, the
test tree also included the accompanying fix for the stale ret == 1
check in __cgroup_bpf_run_filter_sysctl(). The reproducer confines
failslab injections to the proc_sys_call_handler() range, uses
stacktrace-depth=32, and injects fail-nth=1 while writing 8191 bytes to
/proc/sys/kernel/domainname from a task in the target cgroup. Under
that setup, fail-nth=1 triggered the fault:
Dawei Feng [Wed, 3 Jun 2026 10:53:15 +0000 (18:53 +0800)]
bpf: NUL-terminate replaced sysctl value
When writing to sysctls, proc_sys_call_handler() guarantees that the
buffer passed to proc handlers is NUL-terminated. If
bpf_sysctl_set_new_value() replaces the pending sysctl value, it can
hand a replacement buffer directly to proc handlers. However, the
helper currently copies only buf_len bytes into that buffer without
appending a NUL terminator, leaving downstream parsers vulnerable to
out-of-bounds access.
Fix this by appending a '\0' after the replaced value to restore the
expected sysctl semantics. Since the helper already rejects buf_len
greater than PAGE_SIZE - 1, there is always room for the extra byte.
Reproduced in a QEMU x86_64 guest booted with KASAN while exercising
the sysctl replacement path with a cgroup/sysctl BPF program. The
reproducer targets `/proc/sys/net/core/flow_limit_cpu_bitmap`, fills
the original user write buffer with non-zero bytes, and overrides the
sysctl value so the replacement buffer lacks a terminating NUL. Under
that setup, the pre-fix kernel reported:
BUG: KASAN: slab-out-of-bounds in strnchrnul+0x72/0x90
Read of size 1 at addr ffff88800de57000 by task repro_patch3/66
CPU: 0 UID: 0 PID: 66 Comm: repro_patch3 Not tainted 7.1.0-rc3-00269-g8370ca1f87cc #6 PREEMPT(lazy)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x68/0xa0
print_report+0xcb/0x5e0
? __virt_addr_valid+0x21d/0x3f0
? strnchrnul+0x72/0x90
? strnchrnul+0x72/0x90
kasan_report+0xca/0x100
? strnchrnul+0x72/0x90
strnchrnul+0x72/0x90
bitmap_parse+0x37/0x2e0
flow_limit_cpu_sysctl+0xc6/0x840
? __pfx_flow_limit_cpu_sysctl+0x10/0x10
? __kvmalloc_node_noprof+0x5ba/0x870
proc_sys_call_handler+0x31d/0x480
? __pfx_proc_sys_call_handler+0x10/0x10
? selinux_file_permission+0x39f/0x500
? lock_is_held_type+0x9e/0x120
vfs_write+0x98e/0x1000
...
</TASK>
The buggy address is located 0 bytes to the right of
allocated 4096-byte region [ffff88800de56000, ffff88800de57000)
With this fix applied, rerunning the same sysctl-targeted path yields
no corresponding KASAN reports.
Dave Airlie [Fri, 5 Jun 2026 22:37:21 +0000 (08:37 +1000)]
Merge tag 'drm-misc-fixes-2026-06-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
dumb-buffer:
- remove strict limits on buffer geometry
ethosu:
- reject unsupported NPU_OP_RESIZE
- fix index of IFM region
- fix weight index
- fix overflows in DMA-size calculations
- reject DMA commands with uninitialized length
- fix OOB write in ethosu_gem_cmdstream_copy_and_validate
imx:
- fix kernel-doc warnings
ivpu:
- add overflow checks in firmware handling and get_info_ioctl
v3d:
- wait for pending L2T flush before cleaning caches
- fix leak of vaddr
- skip CSD when it has zeroed workgroups
- fix ref counting in performance monitoring
====================
bpf: Update transport_header when encapsulating UDP tunnel in lwt
Currently, bpf_lwt_push_ip_encap() does not update skb->transport_header.
When a driver, e.g. ice, reuses the stale skb->transport_header to
offload checksum computation to NIC hardware, VxLAN packets encapsulated
by bpf_lwt_push_encap() helper may be dropped due to incorrect checksum.
Update skb->transport_header in bpf_lwt_push_ip_encap() whenever the
encapsulated packet uses UDP, so checksum offload works correctly.
Changes:
v3 -> v4:
* Address comments from Emil:
* Make the logic of skb_set_transport_header() clearer in patch #1.
* Fold the code of fexit_lwt_push_ip_encap() into test_lwt_ip_encap.c in
patch #2.
* Resolve assorted issues of test in patch #2.
* v3: https://lore.kernel.org/bpf/20260601150203.20352-1-leon.hwang@linux.dev/
v2 -> v3:
* Drop patch #1 and #2 of v2 that aim to resolve potential issues
reported by sashiko (per Alexei).
* Check target IP version and UDP tunnel in test (per sashiko).
* v2: https://lore.kernel.org/bpf/20260529151351.69911-1-leon.hwang@linux.dev/
v1 -> v2:
* Address sashiko's reviews:
* Fix TOCTOU issue in lwt to avoid changing hdr after checks.
* Add check iph->ihl < 5 in lwt to avoid infinite-loop in MIPS driver.
* Update comment style in selftests with BPF comment style.
* v1: https://lore.kernel.org/bpf/20260525142650.2569-1-leon.hwang@linux.dev/
====================
The unexpected offsets are: outer encap headers
(IPv4: iphdr+udp+vxlan+eth = 50 bytes, IPv6: ipv6hdr+udp+vxlan+eth = 70 bytes)
plus the inner IP header (20 or 40 bytes), because without the fix
transport_header still points at the inner transport layer instead of the
outer UDP header.
Leon Hwang [Tue, 2 Jun 2026 15:09:30 +0000 (23:09 +0800)]
bpf: Update transport_header when encapsulating UDP tunnel in lwt
Currently, bpf_lwt_push_ip_encap() does not update skb->transport_header.
When a driver, e.g. ice, reuses the stale skb->transport_header to
offload checksum computation to NIC hardware, VxLAN packets encapsulated
by bpf_lwt_push_encap() helper may be dropped due to incorrect checksum.
Update skb->transport_header in bpf_lwt_push_ip_encap() whenever the
encapsulated packet uses UDP, so checksum offload works correctly.
Fixes: 52f278774e79 ("bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap") Cc: Leon Hwang <leon.huangfu@shopee.com> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260602150931.49629-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
====================
RISC-V JIT support for bpf_get_current_task/_btf
These two patches add support for the bpf_get_current_task and
bpf_get_current_task_btf kfuncs in RISC-V JIT and add a selftest.
The first patch adds support for cpu and feature detection on
the JIT disassembly helper function as RISC-V JITed code was not
being disassembled using `LLVMCreateDisasm` as it was missing the
"+c" CPU feature and JITed code contained RISC-V Compressed (C)
Extension. This patch generalizes that to detect CPU features and
enables testing on more RISC-V JIT work ahead.
The second patch, which actually adds this support has been benchmarked
on QEMU RISC-V and shows significant improvements.
It was benchmarked using a simple loop inside a bpf program that ran
bpf_get_current_task() and execution time was measured. It used
bpf_prog_test_run_opts() to repeatedly trigger the BPF program. The loop
ran in 1 second intervals and it kept firing bpf_prog_test_run_opts() as
fast as possible until a second had elapsed and then reported statistics.
====================
Varun R Mallya [Tue, 2 Jun 2026 20:58:47 +0000 (02:28 +0530)]
bpf, riscv: inline bpf_get_current_task() and bpf_get_current_task_btf()
On RISC-V, the current task pointer is stored in the thread pointer
register (tp). Emit a single `mv a5, tp` instead of a full helper
call for BPF_FUNC_get_current_task and BPF_FUNC_get_current_task_btf.
Register bpf_jit_inlines_helper_call() entries for both helpers so the
verifier treats them as inlined, and add the expected `mv a5, tp`
annotation to the riscv64 selftests.
The following show changes before and after this patch.
Varun R Mallya [Tue, 2 Jun 2026 20:58:46 +0000 (02:28 +0530)]
selftests/bpf: use host CPU features in JIT disassembler
Pass the host CPU name and feature string to
LLVMCreateDisasmCPUFeatures() instead of using LLVMCreateDisasm(), so
the disassembler correctly decodes CPU-specific instructions and
extensions such as RISC-V compressed and vector instructions.
====================
bpf: Check tail zero of bpf_map_info and bpf_prog_info
Check the tail bytes of bpf_map_info and bpf_prog_info due to padding
when getting map info and prog info via BPF_OBJ_GET_INFO_BY_FD, which
was discussed in the thread
"bpf: Check tail zero of bpf_common_attr using offsetofend" [1].
Leon Hwang [Fri, 5 Jun 2026 15:52:49 +0000 (23:52 +0800)]
selftests/bpf: Add tests to verify checking padding bytes for bpf_[map,prog]_info
Add two tests to verify that the tail padding 4 bytes of struct
bpf_map_info and bpf_prog_info are checked in syscall.c using
bpf_check_uarg_tail_zero().
/* size: 232, cachelines: 4, members: 38 */
/* sum members: 224 */
/* sum bitfield members: 1 bits, bit holes: 1, sum bit holes: 31 bits */
/* padding: 4 */
/* forced alignments: 9 */
/* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));
If a future kernel extension adds a new 4-byte field, older userspace
programs allocating this structure on the stack might inadvertently pass
uninitialized stack garbage into the new field, permanently breaking
backward compatibility. -- sashiko [1]
Fix it by changing sizeof(info) to
offsetofend(struct bpf_prog_info, attach_btf_id).
And, add "__u32 :32" to the tail of struct bpf_prog_info.
If a future kernel extension adds a new 4-byte field, older userspace
programs allocating this structure on the stack might inadvertently pass
uninitialized stack garbage into the new field, permanently breaking
backward compatibility. -- sashiko [1]
Fix it by changing sizeof(info) to
offsetofend(struct bpf_map_info, hash_size).
And, add "__u32 :32" to the tail of struct bpf_map_info.
Document the Samsung SOFEF01-M Display-Driver-IC and 1080x2520@60Hz
command-mode DSI panels found in many Sony phones:
- Sony Xperia 5 (kumano bahamut): amb609tc01
- Sony Xperia 10 II (seine pdx201): ams597ut01
- Sony Xperia 10 III (lena pdx213): ams597ut04
- Sony Xperia 10 IV (murray pdx225): ams597ut05
- Sony Xperia 10 V (zambezi pdx235): ams605dk01
- Sony Xperia 10 VI (columbia pdx246): ams605dk01
====================
selftests/bpf: Tolerate partial builds across kernel configs
Currently the BPF selftests can only be built by using the minimum kernel
configuration defined in tools/testing/selftests/bpf/config*. This poses a
problem in distribution kernels that may have some of the flags disabled or
set as module. For example, we have been running the tests regularly in
openSUSE Tumbleweed [1] [2] but to work around this fact we created a
special package [3] that build the tests against an auxiliary vmlinux with
the BPF Kconfig. We keep a list of known issues that may happen due to,
amongst other things, configuration mismatches [4] [5].
The maintenance of this package is far from ideal, especially for
enterprise kernels. The goal of this series is to enable the common usecase
of running the following in any system:
As an example, the following script targeting a minimal config can be used
for testing:
```sh
make defconfig
scripts/config --file .config \
--enable DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT \
--enable DEBUG_INFO_BTF \
--enable BPF_SYSCALL \
--enable BPF_JIT
make olddefconfig
make -j$(nproc)
make -j$(nproc) -C tools/testing/selftests install \
SKIP_TARGETS= \
TARGETS=bpf \
BPF_STRICT_BUILD=0
```
This produces a test_progs binary with 592 subtests, against the total of
727. Many of them will still fail or be skipped at runtime due to lack of
symbols, but at least there will be a clear way of building the tests.
[1]: https://openqa.opensuse.org/tests/5811715
[2]: https://openqa.opensuse.org/tests/5811730
[3]: https://src.opensuse.org/rmarliere/kselftests
[4]: https://github.com/openSUSE/kernel-qe/blob/main/kselftests_known_issues.yaml
[5]: https://openqa.opensuse.org/tests/5811730/logfile?filename=run_kselftests-config_mismatches.txt
---
Changes in v12:
- Rebase from 11 to 8 commits: split the test_kmods KDIR patch into a
pure KDIR fix (no PERMISSIVE) and a separate tolerance patch; squash
the BPF_STRICT_BUILD toggle with its first PERMISSIVE user; squash
the four test_progs partial-build patches into one; move the install
fix to immediately follow the first PERMISSIVE user
- Link to v11: https://patch.msgid.link/20260430-selftests-bpf_misconfig-v11-0-e11f7a8c4fdc@suse.com
Changes in v11:
- Gate the BTFIDS pretty-print on $(filter 1,$(V)) so V=0 and V=2 still
print the marker; only V=1 suppresses it (patch 6)
- Use asm volatile ("") in the uprobe_multi_func_{1,2,3} weak stubs to
match the strong definitions in prog_tests/uprobe_multi_test.c
(patch 10)
- Link to v10: https://patch.msgid.link/20260430-selftests-bpf_misconfig-v10-0-cd302a31af16@suse.com
Changes in v10:
- Drop $(wildcard $^) from the bench link recipe so cc fails cleanly on
the first missing input and the || fallback emits one SKIP-LINK line
instead of a wall of undefined-reference errors; also makes make -n
accurate (patch 9)
- Include <alloca.h> in testing_helpers.c for alloca() (patch 10)
- Link to v9: https://patch.msgid.link/20260429-selftests-bpf_misconfig-v9-0-c311f06b4791@suse.com
Changes in v9:
- Also pass KBUILD_OUTPUT=$(KMOD_O_VALID) when invoking kbuild so an
inherited command-line KBUILD_OUTPUT cannot disagree with O= (patch 2)
- Note BPFOBJ remains a normal prereq intentionally (patch 5)
- Sub-shell isolate cd/CC and use $@ for $(RM); gate the BTFIDS
pretty-print on $(V) so verbose mode does not double-print (patch 6)
- Restrict permissive compile-failure tolerance and partial-link to
test_progs%; runners with strong cross-object references (test_maps)
keep strict semantics (patches 6 and 8)
- Link to v8: https://patch.msgid.link/20260428-selftests-bpf_misconfig-v8-0-bf02cf97dbcb@suse.com
Changes in v8:
- In permissive mode, keep source changes to already-built tests
triggering relinks, while using recipe-time $(wildcard ...) to pick
up fresh .test.o files without duplicate linker inputs (patch 8)
- Resolve relative O=/KBUILD_OUTPUT before recursing into test_kmods,
and only treat it as a kernel build dir when it contains
Module.symvers (patch 2)
- Note in the commit message that PERMISSIVE-mode bisecting here needs
the next two patches (patch 6)
- Clarify in the commit message that order-only skeleton prereqs do not
break the new-skel-via-local-header case (patch 5)
- Exclude not-built tests from the success count so summary and JSON
report them only as skipped (patch 7)
- Tighten commit messages for accuracy and clarity
- Link to v7: https://patch.msgid.link/20260416-selftests-bpf_misconfig-v7-0-a078e18012e4@suse.com
Changes in v7:
- Use $(abspath) for KMOD_O so relative O= paths resolve correctly
when make -C changes directory (patch 2)
- Guard make clean against missing KDIR unconditionally; there is
nothing to clean when kernel headers are absent (patch 2)
- Drop explicit $(TRUNNER_TEST_OBJS) from linker filter in strict mode;
$^ already contains them as normal prerequisites (patch 8)
- Link to v6: https://patch.msgid.link/20260416-selftests-bpf_misconfig-v6-0-7efeab504af1@suse.com
Changes in v6:
- Add --ignore-missing-args to -extras rsync so out-of-tree permissive
builds do not abort when .ko files are absent (patch 2)
- Use $(abspath) for KMOD_O so relative O= paths resolve correctly
when make -C changes directory (patch 2)
- Guard make clean against missing KDIR unconditionally so cleaning
does not abort when kernel headers are absent (patch 2)
- Remove stale skeleton headers in early-exit paths when .bpf.o is
missing on incremental builds (patch 3)
- Fix strict-mode skeleton rules: use && before temp file cleanup so
bpftool failures are not masked by rm -f exit code (patch 3)
- Track test filter selection separately from not_built so -t/-n flags
are respected for unbuilt tests (patch 7)
- Make TRUNNER_TEST_OBJS order-only only in permissive mode, preserving
incremental relinking in strict builds (patch 8)
- Drop explicit $(TRUNNER_TEST_OBJS) from linker filter in strict mode;
they are already in $^ as normal prerequisites (patch 8)
- Reorder: move skip-unbuilt-tests patch before partial-linking patch
for bisectability
- Link to v5: https://patch.msgid.link/20260415-selftests-bpf_misconfig-v5-0-03d0a52a898a@suse.com
Changes in v5:
- Add BPF_STRICT_BUILD toggle as patch 1 so every subsequent patch
gates tolerance behind PERMISSIVE from the start, making the series
bisectable with strict-by-default at every point
- Fix O= commit message; make parent cp conditional (patch 2)
- Tolerate linked skeleton failures (patch 3)
- Skip feature detection for emit_tests (patch 4)
- Clarify bench is all-or-nothing in commit message (patch 8)
- Move stack_mprotect() to testing_helpers.c, drop weak stubs (patch 9)
- Report not-built tests as "SKIP (not built)" in output (patch 10)
- Drop overly broad 2>/dev/null || true from install rsync; rely solely
on --ignore-missing-args which already handles absent files (patch 11)
- Link to v4: https://patch.msgid.link/20260406-selftests-bpf_misconfig-v4-0-9914f50efdf7@suse.com
Changes in v4:
- Drop the test_kmods kselftest module flow patch: lib.mk gen_mods_dir
invokes $(MAKE) -C $(TEST_GEN_MODS_DIR) without forwarding
RESOLVE_BTFIDS, breaking ASAN and GCC BPF CI builds (Makefile.modfinal
cannot find resolve_btfids in the kbuild output tree)
- Link to v3:
https://patch.msgid.link/20260406-selftests-bpf_misconfig-v3-0-587a1114263c@suse.com
Changes in v3:
- Split test_kmods patch into two: fix KDIR handling (O= passthrough,
EXTRA_CFLAGS/EXTRA_LDFLAGS clearing) and wire into lib.mk via
TEST_GEN_MODS_DIR
- Pass O= through to the kernel module build so artifacts land in the
output tree, not the source tree
- Clear EXTRA_CFLAGS and EXTRA_LDFLAGS when invoking the kernel build to
prevent host flags (e.g. -static) leaking into module compilation
- Replace the bespoke test_kmods pattern rule with lib.mk module
infrastructure (TEST_GEN_MODS_DIR); lib.mk now drives build and clean
lifecycle
- Make the .ko copy step resilient: emit SKIP instead of failing when a
module is absent
- Expand the uprobe weak stub comment in bpf_cookie.c to explain why
noinline is required
- Link to v2:
https://patch.msgid.link/20260403-selftests-bpf_misconfig-v2-0-f06700380a9d@suse.com
Changes in v2:
- Skip test_kmods build/clean when KDIR directory does not exist
- Use `Module.symvers` instead of `.config` for in-tree detection
- Fix skeleton order-only prereqs commit message
- Guard BTFIDS step when .test.o is absent
- Add `__weak stack_mprotect()` stubs in `bpf_cookie.c` and `iters.c`
- Link to v1:
https://patch.msgid.link/20260401-selftests-bpf_misconfig-v1-0-3ae42c0af76f@suse.com
Assisted-by: {codex,claude} Tested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Ricardo B. Marliere <rbm@suse.com>
---
Ricardo B. Marlière (11):
selftests/bpf: Add BPF_STRICT_BUILD toggle
selftests/bpf: Fix test_kmods KDIR to honor O= and distro kernels
selftests/bpf: Tolerate BPF and skeleton generation failures
selftests/bpf: Avoid rebuilds when running emit_tests
selftests/bpf: Make skeleton headers order-only prerequisites of .test.d
selftests/bpf: Tolerate test file compilation failures
selftests/bpf: Skip tests whose objects were not built
selftests/bpf: Allow test_progs to link with a partial object set
selftests/bpf: Tolerate benchmark build failures
selftests/bpf: Provide weak definitions for cross-test functions
selftests/bpf: Tolerate missing files during install