Vladimir Oltean [Tue, 5 May 2026 10:05:07 +0000 (13:05 +0300)]
drm/rockchip: dw_hdmi: avoid direct dereference of phy->dev.of_node
The dw_hdmi-rockchip driver validates pixel clock rates against the
HDMI PHY's internal clock provider on certain SoCs like RK3328.
This is currently achieved by dereferencing hdmi->phy->dev.of_node
to obtain the provider node, which violates the Generic PHY API's
encapsulation (the goal is for struct phy to be an opaque pointer
with a hidden definition, to be interacted with only using API
functions or NULL pointer checks, for the case where optional variants
of phy_get() did not find a PHY).
Refactor dw_hdmi_rockchip_bind() to perform a manual phandle lookup
on the "hdmi" PHY index within the controller's DT node. This provides
a parallel path to the clock provider's OF node without relying on the
internal structure of the struct phy handle.
Hyunwoo Kim [Fri, 15 May 2026 22:28:53 +0000 (07:28 +0900)]
net: skbuff: propagate shared-frag marker through frag-transfer helpers
Two frag-transfer helpers (__pskb_copy_fclone() and skb_shift()) fail
to propagate the SKBFL_SHARED_FRAG bit in skb_shinfo()->flags when
moving frags from source to destination. __pskb_copy_fclone() defers
the rest of the shinfo metadata to skb_copy_header() after copying
frag descriptors, but that helper only carries over gso_{size,segs,
type} and never touches skb_shinfo()->flags; skb_shift() moves frag
descriptors directly and leaves flags untouched. As a result, the
destination skb keeps a reference to the same externally-owned or
page-cache-backed pages while reporting skb_has_shared_frag() as
false.
The mismatch is harmful in any in-place writer that uses
skb_has_shared_frag() to decide whether shared pages must be detoured
through skb_cow_data(). ESP input is one such writer (esp4.c,
esp6.c), and a single nft 'dup to <local>' rule -- or any other
nf_dup_ipv4() / xt_TEE caller -- is enough to land a pskb_copy()'d
skb in esp_input() with the marker stripped, letting an unprivileged
user write into the page cache of a root-owned read-only file via
authencesn-ESN stray writes.
Set SKBFL_SHARED_FRAG on the destination whenever frag descriptors
were actually moved from the source. skb_copy() and skb_copy_expand()
share skb_copy_header() too but linearize all paged data into freshly
allocated head storage and emerge with nr_frags == 0, so
skb_has_shared_frag() returns false on its own; they need no change.
The same omission exists in skb_gro_receive() and skb_gro_receive_list().
The former moves the incoming skb's frag descriptors into the
accumulator's last sub-skb via two paths (a direct frag-move loop and
the head_frag + memcpy path); the latter chains the incoming skb whole
onto p's frag_list. Downstream skb_segment() reads only
skb_shinfo(p)->flags, and skb_segment_list() reuses each sub-skb's
shinfo as the nskb -- both p and lp must carry the marker.
The same omission also exists in tcp_clone_payload(), which builds an
MTU probe skb by moving frag descriptors from skbs on sk_write_queue
into a freshly allocated nskb. The helper falls into the same family
and warrants the same fix for consistency; no TCP TX-side in-place
writer is currently known to reach a user page through this gap, but
a future consumer depending on the marker would regress silently.
The same omission exists in skb_segment(): the per-iteration flag
merge takes only head_skb's flag, and the inner switch that rebinds
frag_skb to list_skb on head_skb-frags exhaustion does not fold the
new frag_skb's flag into nskb. Fold frag_skb's flag at both sites
so segments drawing frags from frag_list members carry the marker.
Fixes: cef401de7be8 ("net: fix possible wrong checksum generation") Fixes: f4c50a4034e6 ("xfrm: esp: avoid in-place decrypt on shared skb frags") Suggested-by: Sabrina Dubroca <sd@queasysnail.net> Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com> Suggested-by: Ben Hutchings <ben@decadent.org.uk> Suggested-by: Lin Ma <malin89@huawei.com> Suggested-by: Jingguo Tan <tanjingguo@huawei.com> Suggested-by: Aaron Esau <aaron1esau@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Tested-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com> Link: https://patch.msgid.link/ageeJfJHwgzmKXbh@v4bel Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Merge patch series "vfs: add O_EMPTYPATH to openat(2)/openat2(2)"
Jori Koolstra <jkoolstra@xs4all.nl> says:
To get an operable version of an O_PATH file descriptor, it is possible
to use openat(fd, ".", O_DIRECTORY) for directories, but other files
currently require going through open("/proc/<pid>/fd/<nr>"), which
depends on a functioning procfs.
This patch adds the O_EMPTYPATH flag to openat(2)/openat2(2). If passed,
LOOKUP_EMPTY is set at path resolution time.
* patches from https://patch.msgid.link/20260424114611.1678641-1-jkoolstra@xs4all.nl:
selftest: add tests for O_EMPTYPATH
vfs: add O_EMPTYPATH to openat(2)/openat2(2)
Add tests for the new O_EMPTYPATH flag of openat(2)/openat2(2).
Also, the current openat2 tests include a helper header file that
defines the necessary structs and constants to use openat2(2), such as
struct open_how. This may result in conflicting definitions when the
system header openat2.h is present as well.
So add openat2.h generated by 'make headers' to the uapi header
files in ./tools/include and remove the helper file definitions of
the current openat2 selftests.
Merge patch series "selftests: openat2: migrate to kselftest harness"
Aleksa Sarai <aleksa@amutable.com> says:
These tests were written in the early days of selftests' TAP support,
the more modern kselftest harness is much easier to follow and maintain.
The actual contents of the tests are unchanged by this change.
* patches from https://patch.msgid.link/20260401-openat2-selftests-kunit-v2-0-ad153a07da0c@amutable.com:
selftests: openat2: migrate to kselftest harness
selftests: openat2: switch from custom ARRAY_LEN to ARRAY_SIZE
selftests: openat2: move helpers to header
selftests: move openat2 tests to selftests/filesystems/
To get an operable version of an O_PATH file descriptor, it is possible
to use openat(fd, ".", O_DIRECTORY) for directories, but other files
currently require going through open("/proc/<pid>/fd/<nr>"), which
depends on a functioning procfs.
This patch adds the O_EMPTYPATH flag to openat(2)/openat2(2). If passed,
LOOKUP_EMPTY is set at path resolution time.
Note: This implies that you cannot rely anymore on disabling procfs from
being mounted (e.g. inside a container without procfs mounted and with
CAP_SYS_ADMIN dropped) to prevent O_PATH fds from being re-opened
read-write.
These tests were written in the early days of selftests' TAP support,
the more modern kselftest harness is much easier to follow and maintain.
The actual contents of the tests are unchanged by this change. Most of
the diff involves switching from the E_* syscall wrappers we previously
used to ASSERT_EQ(fn(...), 0) in tests and helper functions.
The first pass of the migration was done using Claude, followed by a
manual rework and review.
This is a bit ugly, but in the next patch we will move to using
kselftest_harness.h -- which doesn't play well with being included in
multiple compilation units due to duplicate function definitions.
Not including kselftest_harness.h would let us avoid this patch, but the
helpers will need include kselftest_harness.h in order to switch to
TH_LOG.
regmap-i2c: fix sparse warning in regmap_smbus_word_write_reg16
i2c_smbus_write_word_data() expects a plain u16, but cpu_to_le16()
returns __le16 (a sparse-restricted endian type), causing:
drivers/base/regmap/regmap-i2c.c:340: sparse: incorrect type in
argument 3 (different base types)
expected unsigned short [usertype] value
got restricted __le16 [usertype]
SMBus already defines byte ordering internally, so cpu_to_le16() is
wrong here. Replace it with a plain (u16) cast.
Fixes: bad4bd28abf4 ("regmap-i2c: add SMBus byte/word reg16 bus for adapters lacking I2C_FUNC_I2C") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202605161621.mY5zFh4D-lkp@intel.com/ Signed-off-by: Nishanth Sampath Kumar <nissampa@cisco.com> Signed-off-by: Mark Brown <broonie@kernel.org>
Ian Abbott [Wed, 22 Apr 2026 16:21:19 +0000 (17:21 +0100)]
comedi: comedi_test: fix check for valid scan_begin_src in waveform_ai_cmdtest()
Commit 783ddaebd397 ("staging: comedi: comedi_test: support
scan_begin_src == TRIG_FOLLOW") neglected to add a test that
`scan_begin_src` has only one bit set. The allowed values are
`TRIG_FOLLOW` and `TRIG_TIMER`, but the code incorrectly also allows
`TRIG_FOLLOW | TRIG_TIMER`. Add a call to
`comedi_check_trigger_is_unique()` to check that only one trigger source
bit is set.
Ian Abbott [Wed, 22 Apr 2026 14:46:37 +0000 (15:46 +0100)]
comedi: comedi_test: Fix limiting of convert_arg in waveform_ai_cmdtest()
The function checks and possibly modifies the description of an
asynchronous command to be run on the analog input subdevice of a comedi
device attached to the "comedi_test" driver, returning 0 if no
modifications were required, or a positive value that indicates which
step of the checking process it failed on. Step 4 fixes up various
argument values for various trigger sources.
There are two bugs in the fixing up of the `convert_arg` value to keep
the `scan_begin_arg` value within the range of `unsigned int` when
`scan_begin_src` and `convert_src` both have the value `TRIG_TIMER`,
which indicates that the corresponding `_arg` values hold a time period
in nanoseconds. The code also uses `scan_end_arg` which hold the number
of "conversions" within each "scan". The goal is to end up with the
scan period being less than or equal to the convert period multiplied by
the number of conversions per scan. It intends to do that by clamping
the `convert_arg` value to a maximum value of `UINT_MAX / scan_end_arg`
rounded down to a multiple of 1000 (`NSEC_PER_USEC`).
(The rounding from nanoseconds to microseconds is because the driver is
modelling a device that uses a 1 MHz clock for timing. This is partly
because that is a more typical timing base for real hardware devices
driven by comedi, and partly because the driver used to use `struct
timeval` internally.)
The first bug is that the code checks if `scan_begin_arg == TRIG_TIMER`
when it should be checking if `scan_begin_src == TRIG_TIMER`. The
bugged check will always fail because if `scan_begin_src == TRIG_TIMER`,
then `scan_begin_arg` will be at least 1000 (`NSEC_PER_USEC`), otherwise
`scan_begin_src == TRIG_FOLLOW` and `scan_begin_arg` will be 0. (N.B
`TRIG_TIMER` is defined as `0x10`.) The second bug is that is rounding
the maximum value down to a multiple of 1000000000 (`NSEC_PER_SEC`)
instead of 1000 (`NSEC_PER_USEC`), however this bug is not reached due
to the first bug. This patch fixes both bugs.
Fixes: 783ddaebd397 ("staging: comedi: comedi_test: support scan_begin_src == TRIG_FOLLOW") Fixes: 5afdcad2f818 ("staging: comedi: comedi_test: limit maximum convert_arg") Cc: stable <stable@kernel.org> Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://patch.msgid.link/20260422144637.27692-1-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stepan Ionichev [Wed, 20 May 2026 11:05:04 +0000 (16:05 +0500)]
gpio: pca953x: propagate regulator_enable() error from resume
pca953x_resume() returns 0 when regulator_enable() fails, dropping
the real error code and masking the failure as a successful resume.
The caller then proceeds as if the chip is powered, while the
regulator is in fact disabled.
Wanquan Zhong [Wed, 20 May 2026 11:32:45 +0000 (19:32 +0800)]
USB: serial: option: add missing RSVD(5) flag for Rolling RW135R-GL
The RW135R-GL entry added in commit 01e8d0f74222 ("USB: serial: option:
add support for Rolling Wireless RW135R-GL") was missing the
.driver_info = RSVD(5) flag used by other Rolling Wireless MBIM laptop
modules (e.g. RW135-GL and RW350-GL).
Without this flag, the option driver incorrectly binds to the reserved
ADB interface (If#5) in multi-interface USB modes, causing AT/MBIM
communication failures after mode switching. This matches the handling
of other Rolling Wireless MBIM devices.
- VID:PID 33f8:1003, RW135R-GL for laptop debug M.2 cards (with MBIM
interface for Linux/Chrome OS)
Use kmalloc_flex() when allocating a new 'struct external_name' in
__d_alloc() to replace offsetof() and the open-coded size arithmetic,
and to keep the size type-safe.
vfs: remove always taken if-branch in find_next_fd()
find_next_fd() finds the next free fd slot in the passed fdtable's
bitmap. It does so in two steps: first it checks whether the bitmap
has a free entry in the word containing start. If not, it looks at
second level bitmap that registers which words in the first level bitmap
are full and then looks at the first level bitmap at the first non-full
word.
In the current code the second level lookup is done by:
where bitbit = start / BITS_PER_LONG. However, in the fast path (first
step) we already checked the word at bitbit, so we can skip that word bit
and start at bitbit+1. This also means that we can get rid of the branch
Wang Haoran [Mon, 13 Apr 2026 06:06:55 +0000 (14:06 +0800)]
iov_iter: use kmemdup_array for dup_iter to harden against overflow
While auditing the Linux 7.0-rc2 kernel, I identified a potential security
vulnerability in the iov_iter framework's memory allocation logic.
The dup_iter() function, which is exported via EXPORT_SYMBOL, currently
uses kmemdup() with a raw multiplication to allocate the duplicate iovec array:
The hazard here is that dup_iter() relies on a primitive multiplication without
any integrated overflow check. Since nr_segs is often derived from user-space
input, this line is vulnerable to integer overflow (on 32-bit systems or
via type narrowing), potentially leading to a small allocation followed by a
large out-of-bounds memory copy. Furthermore, it allows for unbounded memory
allocations, as the function lacks intrinsic knowledge of safe limits.
On the 7.0-rc2 branch, several high-impact callchains still rely on this
exported function:
drivers/usb/gadget/function/f_fs.c:
The ffs_epfile_read_iter() path demonstrates why relying on dup_iter() is
dangerous: it performs allocation based on user input before verifying driver
state. This confirms that dup_iter() must be hardened internally as it cannot
assume pre-validated input.
drivers/usb/gadget/legacy/inode.c:
The ep_read_iter() path illustrates how dup_iter()’s lack of boundary awareness
compounds resource risks. When combined with other allocations, it creates
a multiplier effect for kernel memory pressure.
This patch replaces kmemdup() with kmemdup_array(), which utilizes
check_mul_overflow() to ensure the allocation size is calculated safely,
hardening dup_iter() against malicious or malformed inputs from its callers
Merge patch series "initramfs: test and improve cpio hex header validation"
David Disseldorp <ddiss@suse.de> says:
The series that introduced simple_strntoul() had passed into kernel
without proper review and hence reinvented a wheel that's not needed.
Here is the refactoring to show that. It can go via PRINTK or VFS
tree.
I have tested this on x86, but I believe the same result will be
on big-endian CPUs (I deduced that from how strtox() works).
I also run KUnit tests.
* patches from https://patch.msgid.link/20260331070519.5974-1-ddiss@suse.de:
kstrtox: Drop extern keyword in the simple_strtox() declarations
vsprintf: Revert "add simple_strntoul"
initramfs: Refactor to use hex2bin() instead of custom approach
initramfs: Sort headers alphabetically
initramfs_test: test header fields with 0x hex prefix
initramfs_test: add fill_cpio() inject_ox parameter
Andy Shevchenko [Tue, 31 Mar 2026 06:57:36 +0000 (17:57 +1100)]
kstrtox: Drop extern keyword in the simple_strtox() declarations
There is legacy 'extern' keyword for the exported simple_strtox()
function which are the artefact that can be removed. So drop it.
While at it, tweak the declaration to provide parameter names.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260331070519.5974-7-ddiss@suse.de Reviewed-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260331070519.5974-6-ddiss@suse.de Acked-by: Petr Mladek <pmladek@suse.com> Reviewed-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
Andy Shevchenko [Tue, 31 Mar 2026 06:57:34 +0000 (17:57 +1100)]
initramfs: Refactor to use hex2bin() instead of custom approach
There is a simple_strntoul() function used solely as a shortcut
for hex2bin() with proper endianess conversions. Replace that
and drop the unneeded function in the next changes.
This implementation will abort if we fail to parse the cpio header,
instead of using potentially bogus header values.
Co-developed-by: David Disseldorp <ddiss@suse.de> Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260331070519.5974-5-ddiss@suse.de Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
Andy Shevchenko [Tue, 31 Mar 2026 06:57:33 +0000 (17:57 +1100)]
initramfs: Sort headers alphabetically
Sorting headers alphabetically helps locating duplicates, and makes it
easier to figure out where to insert new headers.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260331070519.5974-4-ddiss@suse.de Reviewed-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
David Disseldorp [Tue, 31 Mar 2026 06:57:32 +0000 (17:57 +1100)]
initramfs_test: test header fields with 0x hex prefix
cpio header fields are 8-byte hex strings, but one "interesting"
side-effect of our historic simple_str[n]toul() use means that a "0x"
(or "0X") prefixed header field will be successfully processed when
coupled alongside a 6-byte hex remainder string.
"0x" prefix support is contrary to the initramfs specification at
Documentation/driver-api/early-userspace/buffer-format.rst which states:
The structure of the cpio_header is as follows (all fields contain
hexadecimal ASCII numbers fully padded with '0' on the left to the
full width of the field, for example, the integer 4780 is represented
by the ASCII string "000012ac"):
Test for this corner case by injecting "0x" prefixes into the uid, gid
and namesize cpio header fields. Confirm that init_stat() returns
matching uid and gid values.
This test can be modified in future to expect unpack_to_rootfs() failure
when header validation is changed to properly follow the specification.
Add some missing struct kstat initializations to account for possible
init_stat() failures.
Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://patch.msgid.link/20260331070519.5974-3-ddiss@suse.de Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
fill_cpio() uses sprintf() to write out the in-memory cpio archive from
an array of struct initramfs_test_cpio. This change allows callers to
modify the cpio sprintf() format string so that future tests can
intentionally corrupt the header with "0x" and "0X" prefixed fields.
Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://patch.msgid.link/20260331070519.5974-2-ddiss@suse.de Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
Tejun Heo [Tue, 19 May 2026 07:53:11 +0000 (21:53 -1000)]
sched_ext: Add cmask mask ops
Sub-sched cap code and other upcoming consumers need bulk cmask ops, both
mutating (and/or/copy/andnot) and predicate (subset/intersects/empty).
cmask_walk_op2() walks the intersection of two ranges word by word;
cmask_walk_op1() walks one range. Both are __always_inline and dispatched on
a compile-time-constant op enum, so each public entry collapses to a
specialized loop with the inner switch reduced to one arm.
Two-cmask ops only touch bits in the intersection of the two ranges; bits
outside are left unchanged. scx_cmask_or_racy() and scx_cmask_copy_racy()
mirror the locking forms but read @src word-by-word through data_race();
callers handle ordering with concurrent writers themselves.
v2: Add scx_cmask_empty().
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
Tejun Heo [Tue, 19 May 2026 07:53:11 +0000 (21:53 -1000)]
sched_ext: Track bits[] storage size in struct scx_cmask
scx_cmask carries @base and @nr_cids but not the bits[] allocation size, so
helpers reshaping the active range have no way to check it fits and later
kfuncs taking caller-provided storage can't validate it.
Add @alloc_words (u64 word count) annotated with __counted_by, and split the
bit-range API into three helpers:
- SCX_CMASK_DEFINE() / __SCX_CMASK_DEFINE() define an on-stack cmask, the
latter taking an explicit capacity for oversized storage.
SCX_CMASK_DEFINE_SHARD() is a thin wrapper that always reserves
SCX_CID_SHARD_MAX_CPUS bits of storage.
- scx_cmask_init() / __scx_cmask_init() initialize a cmask, with the same
tight-vs-explicit split.
- scx_cmask_reframe() reshapes the active range without resizing storage.
The BPF mirror (cmask_init / __cmask_init / cmask_reframe) gets the same
shape.
Add scx_cmask_clear() and scx_cmask_fill() to zero and set the
active-range bits respectively. scx_cpumask_to_cmask() uses
scx_cmask_clear(); scx_cmask_init() would otherwise re-write @alloc_words
on every call.
A later patch uses @alloc_words in scx_cmask_ref_shard() to refuse output
storage that can't hold the requested shard.
v2: Init per-CPU scx_set_cmask_scratch (was zero-init, emitted empty
cmasks). Add nr_cids/alloc_cids check in BPF __cmask_init().
(sashiko AI)
Widen SCX_CMASK_NR_WORDS()/CMASK_NR_WORDS() to compute in u64 so that
@nr_cids near U32_MAX no longer wraps to a small value and bypasses
the bounds check in cmask_reframe(). (Andrea)
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
Tejun Heo [Tue, 19 May 2026 07:53:11 +0000 (21:53 -1000)]
sched_ext: Rename scx_cmask.nr_bits to nr_cids
struct scx_cmask is a base-windowed bitmap over cid space. Each bit
represents one cid, so the count of active bits is the count of cids. The
sibling struct scx_cid_shard already uses nr_cids. Rename as a prep so the
following patches that grow the cmask API can use the consistent name.
v2: Also rename src->nr_bits / dst->nr_bits in
cmask_copy_from_kernel(). (sashiko AI)
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
Laurent Pinchart [Mon, 11 May 2026 23:56:31 +0000 (02:56 +0300)]
media: renesas: vsp1: Use spinlock guards
Replace manual spinlock locking and unlocking with guards. This
simplifies error paths and reduces the amount of code. Limit the changes
to locations where the guard covers until the end of the function to
ease review. Scoped guards will be introduced separately.
Laurent Pinchart [Mon, 11 May 2026 23:56:29 +0000 (02:56 +0300)]
media: renesas: vsp1: Use mutex guards
Replace manual mutex locking and unlocking with guards. This simplifies
error paths and reduces the amount of code. Limit the changes to
locations where the guard covers until the end of the function to ease
review. Scoped guards will be introduced separately.
Laurent Pinchart [Mon, 11 May 2026 23:56:26 +0000 (02:56 +0300)]
media: renesas: vsp1: Split vsp1_du_setup_lif()
The vsp1_du_setup_lif() function is used to configure and enable a
pipeline, as well as disable it, depending on the cfg argument being a
valid pointer or NULL. This creates a confusing API. Improve it by
splitting the function in two, a vsp1_du_enable() function to configure
a pipeline, and a vsp1_du_disable() function to disaple it.
Keep vsp1_du_setup_lif() as an inline wrapper for existing callers in
the DRM subsystem, to simplify merging. The callers will be updated
separately and the old API will then be removed.
Marc Zyngier [Wed, 20 May 2026 10:02:00 +0000 (11:02 +0100)]
KVM: arm64: vgic-v2: Don't init the vgic on in-kernel interrupt injection
We now have the lazy init on three paths:
- on first run of a vcpu
- on first injection of an interrupt from userspace and irqfd
- on first injection of an interrupt from kernel space as
part of the device emulation (timers, PMU, vgic MI)
Given that we recompute the state of each in-kernel interrupt
every time we are about to enter the guest, we can drop the lazy
init from the kernel injection path.
This solves a bunch of issues related to vgic_lazy_init() being called
in non-preemptible context, such as vcpu reset.
Marc Zyngier [Wed, 20 May 2026 10:01:59 +0000 (11:01 +0100)]
KVM: arm64: vgic-v2: Force vgic init on injection outside the run loop
Make sure that any attempt to inject an interrupt from userspace
or an irqfd results in the GICv2 lazy init to take place.
This is not currently necessary as the init is also performed on
*any* interrupt injection. But as we're about to remove that,
let's introduce it here.
Marc Zyngier [Wed, 20 May 2026 10:01:57 +0000 (11:01 +0100)]
KVM: arm64: timer: Kill the per-timer irq level cache
The timer code makes use of a per-timer irq level cache, which
looks like a very minor optimisation to avoid taking a lock upon
updating the GIC view of the interrupt when it is unchanged from
the previous state.
This is coming in the way of more important correctness issues,
so get rid of the cache, which simplifies a couple of minor things.
Marc Zyngier [Wed, 20 May 2026 10:01:56 +0000 (11:01 +0100)]
KVM: arm64: Simplify userspace notification of interrupt state
The userspace notification of interrupts is has a few problems:
- it is utterly pointless
- it is annoyingly split between detecting the need for notification
and the population of the interrupts in the run structure
We can't do anything about the former (yet), but the latter can be
addressed. If we detect that we must notify userspace, we know that
we are going to exit, as we populate the exit status. Which means
we can also populate the interrupt state at this stage and be done
with it.
Marc Zyngier [Wed, 20 May 2026 10:01:55 +0000 (11:01 +0100)]
KVM: arm64: timer: Repaint kvm_timer_{should,irq_can}_fire() to kvm_timer_{pending,enabled}()
kvm_timer_should_fire() seems to date back to a time where the author
of the timer code didn't seem to have made the word "pending" part of
their vocabulary.
Having since slightly improved on that front, let's rename this predicate
to kvm_timer_pending(), which clearly indicates whether the timer
interrupt is pending or not.
Similarly, kvm_timer_irq_can_fire() is renamed to kvm_timer_enabled().
Marc Zyngier [Mon, 11 May 2026 10:46:41 +0000 (11:46 +0100)]
KVM: arm64: nv: Don't save/restore FP register during a nested ERET or exception
When switching between L1 and L2, we save the old state using
kvm_arch_vcpu_put(), mutate the state in memory, then load the new
state using kvm_arch_vcpu_load(). Any live FPSIMD/SVE state is saved
and unbound, such that it can be lazily restored on a subsequent trap.
The FPSIMD/SVE state is shared by exception levels, and only a handful
of related control registers need to be changed when transitioning
between L1 and L2. The save/restore of the common state is needless
overhead, especially as trapping becomes exponentially more expensive
with nesting.
Avoid this overhead by leaving the common FPSIMD/SVE state live on the
CPU, and only switching the state that is distinct for L1 and L2:
- the trap controls: the effective values are recomputed on each entry
into the guest to take the EL into account and merge the L0 and L1
configuration if in a nested context, or directly use the L0 configuration
in non-nested context (see __activate_traps()).
- the VL settings: the effective values are are also recomputed on each
entry into the guest (see fpsimd_lazy_switch_to_guest()).
Since we appear to cover all bases, use the vcpu flags indicating the
handling of a nested ERET or exception delivery to avoid the whole FP
save/restore shenanigans. SME will have to be similarly dealt with when
it eventually gets supported.
For an EL1 L3 guest where L1 and L2 have this optimisation, this
results in at least a 10% wall clock reduction when running an I/O
heavy workload, generating a high rate of nested exceptions.
Marc Zyngier [Mon, 11 May 2026 10:46:11 +0000 (11:46 +0100)]
KVM: arm64: nv: Track L2 to L1 exception emulation
While we currently track that we are emulating a nested ERET from
L1 to L2, we currently don't track the reverse direction (an exception
going from L2 to L1).
Add a new vcpu state flag for this purpose, which will see some
use shortly.
Guoniu Zhou [Mon, 23 Mar 2026 08:33:31 +0000 (16:33 +0800)]
media: nxp: imx8-isi: Fix scale factor calculation for hardware rounding
The ISI hardware rounds the actual output size up to an integer, as
described in i.MX93 Reference Manual section 57.7.8 (Channel 0 Scale
Factor). The scale factor must be calculated to ensure the theoretical
output value rounds up to exactly the desired size.
The maximum downscaling factor supported by ISI can be up to 16. Add
minimum value constraint before applying the setting to hardware.
Otherwise, the process will not respond even when Ctrl+C is executed.
In the current buffer selection logic, when both discard and pending
buffers are available, the driver fills hardware slots with discard
buffers first which results in an unnecessary frame drop even though
a user buffer was queued and ready.
Change the buffer selection logic to use pending buffers first (up to
the number available), and only use discard buffers to fill remaining
slots when insufficient pending buffers are queued.
This improves behavior by:
- Reducing discarded frames at stream start when user buffers are ready
- Decreasing latency in delivering captured frames to user-space
- Ensuring user buffers are utilized as soon as they are queued
- Improving overall buffer utilization efficiency
After v4l2_subdev_init_finalize() succeeds in mxc_isi_pipe_init(), if
platform_get_irq() or devm_request_irq() fails, the error path jumps to
a label that only calls media_entity_cleanup() and mutex_destroy(),
missing the v4l2_subdev_cleanup() call needed to free the subdev active
state allocated by v4l2_subdev_init_finalize().
Add an error_subdev label that calls v4l2_subdev_cleanup() before
falling through to the existing error cleanup.
Xiaolei Wang [Thu, 7 May 2026 04:13:16 +0000 (12:13 +0800)]
media: nxp: imx8-isi: Add missing v4l2_subdev_cleanup() in crossbar and pipe
Both mxc_isi_crossbar_init() and mxc_isi_pipe_init() call
v4l2_subdev_init_finalize() which allocates the subdev active state,
but neither mxc_isi_crossbar_cleanup() nor mxc_isi_pipe_cleanup()
calls v4l2_subdev_cleanup() to free it.
This causes a memory leak on every rmmod, reported by kmemleak:
Allocated by task 249:
__kmalloc_noprof+0x27c/0x690
mxc_isi_crossbar_init+0x22c/0x560 [imx8_isi]
Freed by task 724:
kfree+0x1e4/0x5b0
mxc_isi_crossbar_cleanup+0x34/0x80 [imx8_isi]
mxc_isi_remove+0x11c/0x2a0 [imx8_isi]
The problem is that mxc_isi_remove() calls mxc_isi_crossbar_cleanup()
before mxc_isi_v4l2_cleanup(). The crossbar cleanup frees the media
entity pads, but the subsequent v4l2 cleanup still tries to remove
media links that reference those pads.
Fix this by calling mxc_isi_v4l2_cleanup() before
mxc_isi_crossbar_cleanup() to ensure all media entities are properly
unregistered while the pads are still valid.
Fixes: cf21f328fcaf ("media: nxp: Add i.MX8 ISI driver") Cc: stable@vger.kernel.org Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://patch.msgid.link/20260507041318.491594-2-xiaolei.wang@windriver.com Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Barnabás Pőcze [Mon, 11 May 2026 15:09:57 +0000 (17:09 +0200)]
media: rkisp1: Add support for CAC
The CAC block implements chromatic aberration correction. Expose it to
userspace using the extensible parameters format. This was tested on the
i.MX8MP platform, but based on available documentation it is also present
in the RK3399 variant (V10). Thus presumably also in later versions,
so no feature flag is introduced.
The name of the enum to hold the mapping of parameter buffer versions
have a typo in the name, correct it. While this is a uAPI header the
impact should be minimal as the enum is only used as a collection for
the one version number supported.
media: mc-entity: Drop ifdef for media_entity_cleanup definition
The media_entity_cleanup() function is defined in media-entity.h as a
static inline no-op when CONFIG_MEDIA_CONTROLLER is enabled, and as a
no-op macro otherwise. This complexity is unneeded. Use a static inline
function in all cases.
Eric Dumazet [Tue, 19 May 2026 08:46:11 +0000 (08:46 +0000)]
tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction
Blamed commit moved the TIME_WAIT-derived ISN from the skb control
block to a per-CPU variable, assuming the value would always be consumed
by tcp_conn_request() for the same packet that wrote it. That assumption
is violated by multiple drop paths between the producer
(__this_cpu_write(tcp_tw_isn, isn) in tcp_v{4,6}_rcv()) and the consumer
(tcp_conn_request()):
- min_ttl / min_hopcount check
- xfrm policy check
- tcp_inbound_hash() MD5/AO mismatch
- tcp_filter() eBPF/SO_ATTACH_FILTER drop
- th->syn && th->fin discard in tcp_rcv_state_process() TCP_LISTEN
- psp_sk_rx_policy_check() in tcp_v{4,6}_do_rcv()
- tcp_checksum_complete() in tcp_v{4,6}_do_rcv()
- tcp_v{4,6}_cookie_check() returning NULL
When a packet is dropped on any of these paths, tcp_tw_isn is left set.
The next SYN processed on the same CPU then consumes the non zero value in
tcp_conn_request(), receiving a potentially predictable ISN.
This patch moves back tcp_tw_isn to skb->cb[], getting rid of the per-cpu
variable.
Note that tcp_v{4,6}_fill_cb() do not set it.
Very litle impact on overall code size/complexity:
The warning is not new and exists since the initial bareudp commit 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling
different protocols like MPLS, IP, NSH etc.").
Let's use rtnl_dereference().
Note that bareudp_sock_release() is called from bareudp_stop()
under RTNL, so there is no real issue even without the helper.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202605062359.e3gOfZCr-lkp@intel.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260518050726.318824-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bareudp: Remove synchronize_net() in bareudp_sock_release().
synchronize_net() in bareudp_sock_release() has existed since
day 1, commit 571912c69f0e ("net: UDP tunnel encapsulation module
for tunnelling different protocols like MPLS, IP, NSH etc.").
It was most likely copied from a similar tunneling device like
vxlan or geneve.
bareudp_sock_release() is called from dev->netdev_ops->ndo_stop(),
and synchronize_net() in unregister_netdevice_many_notify() ensures
that inflight bareudp fast paths finish before bareudp_dev is freed.
Let's remove the redundant synchronize_net() in bareudp_sock_release().
geneve: Remove synchronize_net() in geneve_unquiesce().
When changing the geneve config, geneve_changelink() sandwiches
the config memcpy() between geneve_quiesce() and geneve_unquiesce().
geneve_quiesce() temporarily clears geneve->sock[46] and their
sk_user_data, and then calls synchronize_net() to wait for inflight
fast paths to finish.
geneve_unquiesce() then restores the cleared pointers, but it also
superfluously calls synchronize_net().
The latter synchronize_net() provides no benefit; with or without it,
inflight fast paths can see either the NULL pointers or the original
pointers alongside the new configuration.
Let's remove the redundant synchronize_net() in geneve_unquiesce().
geneve: Remove synchronize_net() in geneve_sock_release().
vxlan previously had an issue where the fast path could access
stale pointers, which was fixed by commit c6fcc4fc5f8b ("vxlan:
avoid using stale vxlan socket.").
geneve later followed the same pattern, and commit fceb9c3e3825
("geneve: avoid using stale geneve socket.") copied synchronize_net()
from vxlan_sock_release() into geneve_sock_release().
However, that change occurred after commit ca065d0cf80f ("udp: no
longer use SLAB_DESTROY_BY_RCU"), and geneve had already been
using kfree_rcu() to free geneve_sock.
Therefore, the synchronize_net() was never actually needed there.
Let's remove the redundant synchronize_net() in geneve_sock_release().
vxlan: Remove synchronize_net() in vxlan_sock_release().
Initially, a dedicated workqueue was used to defer calling
udp_tunnel_sock_release(vxlan_sock->sock) and kfree(vxlan_sock).
Later, commit 0412bd931f5f ("vxlan: synchronously and race-free
destruction of vxlan sockets") removed the workqueue and instead
invoked these two functions immediately after synchronize_net().
This was intended to prevent UAF of the UDP socket in the fast path.
( Note that the "nondeterministic behaviour" mentioned in that
commit was not addressed, as another thread not waiting RCU gp
still sees the same behaviour. )
However, a week prior to that change, commit ca065d0cf80f ("udp:
no longer use SLAB_DESTROY_BY_RCU") had already moved UDP socket
freeing to after the RCU grace period. This made the synchronize_net()
in vxlan_sock_release() completely redundant.
Since vxlan_sock now uses kfree_rcu() and is invoked after
udp_tunnel_sock_release(), vxlan_sock is guaranteed to be freed
either at the same time or after the UDP socket is released,
following the RCU grace period.
Let's remove the redundant synchronize_net() in vxlan_sock_release().
That made vmci_transport_recv_listen() skip vsock_remove_pending(),
leaving the pending socket on the listener's pending_links with
sk_state = TCP_CLOSE while destroy: still dropped the explicit
reference taken before schedule_delayed_work().
One second later vsock_pending_work() observed is_pending=true and
performed full cleanup: vsock_remove_pending() then the two trailing
sock_put(sk) calls -- the first reached refcount 0 and __sk_freed
the socket, and the second wrote into the freed object:
BUG: KASAN: slab-use-after-free in refcount_warn_saturate
Write of size 4 at addr ffff88800b1cac80 by task kworker
Workqueue: events vsock_pending_work
Treat peer RST like any other unexpected packet type (err = -EINVAL).
All destroy: arms now return err < 0, so vmci_transport_recv_listen()
removes pending from pending_links synchronously and
vsock_pending_work() takes the is_pending=false / !rejected branch,
dropping only its own work reference. This also closes the
multi-packet race Sashiko reported on v2: pending is removed from
the list before any subsequent packet can find it.
The pre-existing sk_acceptq_removed() gap on the err < 0 path of
vmci_transport_recv_listen() that Sashiko also noted is not
introduced or changed by this patch.
Tested on lts-6.12.79 with KASAN: 52/100 unpatched -> 0/100 patched.
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Cc: stable@vger.kernel.org Signed-off-by: Minh Nguyen <minhnguyen.080505@gmail.com> Acked-by: Bryan Tan <bryan-bt.tan@broadcom.com> Link: https://patch.msgid.link/20260519102310.237181-1-minhnguyen.080505@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Bulwahn [Tue, 19 May 2026 09:16:46 +0000 (11:16 +0200)]
MAINTAINERS: remove obsolete file entry in NETWORKING DRIVERS
Commit 5e138e0ec32b ("w5100: remove MMIO support") removes
include/linux/platform_data/wiznet.h, but misses to remove the file entry
in NETWORKING DRIVERS referring to that file.
Remove the obsolete file entry in NETWORKING DRIVERS.
Ivan Vecera [Tue, 19 May 2026 13:22:05 +0000 (15:22 +0200)]
dpll: zl3073x: fix memory leak on pin registration failure
If zl3073x_dpll_pin_register() fails, the allocated pin is not yet
added to zldpll->pins list. The error path calls
zl3073x_dpll_pins_unregister() which only iterates pins on the list,
so the current pin is leaked. Free the pin before jumping to the error
label.
Additionally move the pin->dpll_pin = NULL assignment in
zl3073x_dpll_pin_register() from err_register to the common
err_pin_get path. When dpll_pin_get() fails, pin->dpll_pin holds an
ERR_PTR value. Without this fix the subsequent zl3073x_dpll_pin_free()
would trigger a spurious WARN because it checks pin->dpll_pin for
non-NULL.
Fixes: 75a71ecc2412 ("dpll: zl3073x: Register DPLL devices and pins") Reviewed-by: Petr Oros <poros@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20260519132205.161847-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: pse-pd: Use named initializers for arrays of i2c_device_data
While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.
The mentioned robustness is relevant for a planned change to struct
i2c_device_id that replaces .driver_data by an anonymous union.
While touching all these arrays, unify usage of whitespace in the list
terminator.
This patch doesn't modify the compiled arrays, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.
dpll: zl3073x: Use named initializers for struct i2c_device_id
While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.
This patch doesn't modify the compiled arrays, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.
net: dsa: Use named initializers for struct i2c_device_id
While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.
This patch doesn't modify the compiled arrays, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.
While touching these arrays, unify usage of whitespace in the list
terminator.
mctp: i2c: Use named initializers for struct i2c_device_id
While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.
This patch doesn't modify the compiled arrays, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.
While touching this array, unify usage of whitespace in the list
terminator to what most other arrays are using.
mlxsw: minimal: Use named initializers for struct i2c_device_id
While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.
This patch doesn't modify the compiled array, only its representation in
source form benefits. The former was confirmed with x86 and arm64
builds.
Eric Dumazet [Tue, 19 May 2026 19:32:48 +0000 (19:32 +0000)]
ipv4: use WARN_ON_ONCE() in ip_rt_bug()
It turns out ip_rt_bug() can be called more than expected.
syzbot will still panic (because of panic_on_warn=1), but non debug
kernels will no longer die while repeating stack traces on the console.
Fixes: c378a9c019cf ("ipv4: Give backtrace in ip_rt_bug().") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20260519193248.4018872-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change __icmp_send() to not send ICMP to broadcast/multicast destinations.
Fixes: c378a9c019cf ("ipv4: Give backtrace in ip_rt_bug().") Reported-by: syzbot+c13a57c2639c2c0d03a6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a0cc169.170a0220.1f6c2d.0004.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260519200836.4141061-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Carlier [Tue, 19 May 2026 20:35:30 +0000 (21:35 +0100)]
net: devmem: reject dma-buf bind with non-page-aligned size or SG length
net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
PAGE_SIZE multiples without checking:
- tx_vec is sized dmabuf->size / PAGE_SIZE, and
net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
- owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
covers the full byte len, so a non-page-multiple non-final sg
desyncs num_niovs from the gen_pool region for every later sg, on
both RX and TX.
dma-buf does not require page-aligned sizes, so the bind path has to
enforce what its own indexing assumes. Reject both with -EINVAL.
The size check is TX-only (only tx_vec is sized off dmabuf->size); the
SG-length check covers both directions.
Fixes: bd61848900bf ("net: devmem: Implement TX path") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20260519203530.66310-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Carlos López [Tue, 12 May 2026 10:00:42 +0000 (12:00 +0200)]
virt: sev-guest: Explicitly leak pages in unknown state
When set_memory_{encrypted,decrypted}() fail, the user cannot know at which
point the function failed, meaning that the pages are left in an unknown state
from the point of view of the caller.
Since the pages may be left in an unencrypted state, they are not suitable for
general use, and cannot be returned safely to the buddy allocator. Avoid the
issue by never freeing the pages, and then do the proper accounting by calling
snp_leak_pages().
Fixes: 3e385c0d6ce8 ("virt: sev-guest: Move SNP Guest Request data pages handling under snp_cmd_mutex") Signed-off-by: Carlos López <clopez@suse.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@kernel.org
Jakub Kicinski [Thu, 21 May 2026 00:26:55 +0000 (17:26 -0700)]
Merge tag 'for-net-2026-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE
- L2CAP: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del()
- ISO: drop ISO_END frames received without prior ISO_START
- MGMT: validate Add Extended Advertising Data length
- bnep: Fix UAF read of dev->name
- btmtk: fix urb->setup_packet leak in error paths
- btintel_pcie: Fix incorrect MAC access programming
- hci_uart: fix UAFs and race conditions in close and init paths
* tag 'for-net-2026-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del()
Bluetooth: hci_uart: fix UAFs and race conditions in close and init paths
Bluetooth: MGMT: validate Add Extended Advertising Data length
Bluetooth: btmtk: fix urb->setup_packet leak in error paths
Bluetooth: ISO: drop ISO_END frames received without prior ISO_START
Bluetooth: btintel_pcie: Fix incorrect MAC access programming
Bluetooth: hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE
Bluetooth: bnep: Fix UAF read of dev->name
====================
====================
bpf, skmsg: fix verdict sk_data_ready racing with ktls rx
sk_psock_verdict_data_ready() lacks the tls_sw_has_ctx_rx() guard that
sk_psock_strp_data_ready() gained in e91de6afa81c. When a socket is
inserted into a sockmap (BPF_SK_SKB_VERDICT) before TLS RX is configured,
the missing guard causes tcp_read_skb() to drain sk_receive_queue without
advancing copied_seq, leaving a dangling frag_list pointer that
tls_decrypt_sg() walks — a use-after-free.
Patch 1 mirrors the fix from e91de6afa81c: add the tls_sw_has_ctx_rx()
check to sk_psock_verdict_data_ready() so that when a TLS RX context is
present the function defers to psock->saved_data_ready (sock_def_readable)
instead of calling tcp_read_skb().
Patch 2 adds a selftest that drives the vulnerable sequence end-to-end
and verifies recv() returns the correct decrypted data.
====================
Xingwang Xiang [Sun, 17 May 2026 14:56:27 +0000 (23:56 +0900)]
selftests/bpf: add regression test for ktls+sockmap verdict UAF
Test the scenario where a socket is inserted into a sockmap with a
BPF_SK_SKB_VERDICT program before TLS RX is configured. Previously
sk_psock_verdict_data_ready() would call tcp_read_skb() and drain the
receive queue without advancing copied_seq, causing tls_decrypt_sg()
to walk a dangling frag_list pointer (use-after-free).
The test drives the full vulnerable sequence and verifies that after
the fix recv() returns the correct decrypted data.
Xingwang Xiang [Sun, 17 May 2026 14:56:26 +0000 (23:56 +0900)]
bpf, skmsg: fix verdict sk_data_ready racing with ktls rx
sk_psock_strp_data_ready() already checks tls_sw_has_ctx_rx() and
defers to psock->saved_data_ready when a TLS RX context is present,
avoiding a conflict with the TLS strparser's ownership of the receive
queue (commit e91de6afa81c, "bpf: Fix running sk_skb program types
with ktls").
sk_psock_verdict_data_ready() has no equivalent guard. When a socket
is inserted into a sockmap (BPF_SK_SKB_VERDICT) before TLS RX is
configured, tls_sw_strparser_arm() saves sk_psock_verdict_data_ready
as rx_ctx->saved_data_ready. On data arrival:
tls_data_ready -> tls_strp_data_ready -> tls_rx_msg_ready
-> saved_data_ready() = sk_psock_verdict_data_ready()
-> tcp_read_skb() drains sk_receive_queue via __skb_unlink()
without calling tcp_eat_skb(), so copied_seq is not advanced.
tls_strp_msg_load() then finds tcp_inq() >= full_len (stale), calls
tcp_recv_skb() on the now-empty queue, hits WARN_ON_ONCE(!first), and
returns with rx_ctx->strp.anchor.frag_list pointing at a psock-owned
(potentially freed) skb. tls_decrypt_sg() subsequently walks that
frag_list: use-after-free.
Apply the same fix as sk_psock_strp_data_ready(): if a TLS RX context
is present, call psock->saved_data_ready (sock_def_readable) to wake
recv() waiters and return immediately, leaving the receive queue
untouched. TLS retains sole ownership of the queue and decrypts the
record normally through tls_sw_recvmsg().
Dave Airlie [Thu, 21 May 2026 00:12:21 +0000 (10:12 +1000)]
Merge tag 'drm-msm-fixes-2026-05-17' of https://gitlab.freedesktop.org/drm/msm into drm-fixes
Fixes for v7.1:
Core:
- Fixed bindings for SM8650, SM8750 and Eliza
- Don't use UTS_RELEASE directly
- Fix typo in clock-names property
DPU:
- Fixed CWB description on Kaanapali
- Fixed scanline strides for YUV UBWC formats
- Stopped DSI register dumping to access past the end of region
DSI:
- Fix dumping unaligned regions
GPU:
- Fix GMEM_BASE for a6xx gen3
- Fix userspace reachable crash on a2xx-a4xx
- Fix sysprof_active for counter collection with IFPC enabled GPUs
- Fix shrinker lockdep
PPCNT group 0x10 (per-priority counters) carries an rx_discards field at
offset 0x78. These counters aggregate up into if_in_discards, but don't
show up anywhere else. Since there are many things that aggregate into
`if_in_discards`, having these counters helps distinguish what caused
those discards (in my case they were caused by headroom buffer overruns
due to inappropriately configured buffer sizes).
Of note, from emperical testing, these counter are per-"priority group"
(PG) not per-"switch priority". It's a bit confusing, because the rest
of these counter are per-"switch priority" and the header file calls
these "Per Priority Group Counters". However, that should be read as
"(Per Priority) Group Counters", not "Per (Priority Group) Counters".
I attempted to distinguish this in the counter naming by calling these
`rx_discards_pg_N` rather than `rx_discards_prio_N` (which is the
naming scheme of the other counters in this PPCNT group).
I will also note that the mlx5 driver (which already has this counter)
uses the schme `rx_prioN_discards` (and same for the other counters
in this group). However, I was unable to determine whether the mlx5
counters behave the same as the mlxsw counters with respect to PG
mapping. An attempt to remap to a different PG there did not change
which counter incremented, but the mlx5 configuration code is quite
different, so it's possible the remapping needs to be done differently.
====================
selftests: rds: Add ROCE support to rds selftests
Currently the rds selftests only tests the tcp transport. This means
most of rds_rdma.ko has no testing coverage. This series refactors the
rds self tests to add an rdma option when running tests. When used,
the test creates a pair of ROCE interfaces to run the payloads through.
Most of this set is refactoring the existing test.py module. Since most
of this code is one long procedure, it is difficult to modularize it
without creating a lot of pylint complaints about lengthy functions
with too many variables or branches.
Patch 1 fixes an RDS-IB shutdown hang exposed by the new ROCE selftests
in patches 10/11. The next seven patches break down test.py into helper
functions. After we have modularized the send/recv packet logic, we
introduce the new ROCE equivalent network configurations, add the new
command line flags to build and run the test with rdma support.
====================
This patch adds support for testing rds rdma over ROCE. A new
-r flag is added to config.sh which enables the required kernel
configs for rdma. We also add a -T flag to run.sh, which takes
a transport option, tcp or rdma. The rdma option will check to
ensure the proper configs have been enabled. The flag is then
passed to test.py, which will run the test over the specified
transport(s)
This patch adds support for testing rds rdma over ROCE in test.py
A new -T flag is added, which takes a transport option, tcp or rdma.
A new setup_rdma() function is added that will configure rdma
interfaces and sockets for use in the test case.
selftests: rds: Register network teardown via atexit
This patch adds a teardown_tcp() helper that removes net0/net1.
The cmd calls here use fail=False so they can be called from
completed or partially-setup states on error. Also call
teardown_tcp() at the top of setup_tcp() so a previous
interrupted run does not leave net0/net1 lingering and break a
subsequent ip netns add. Register teardown_tcp() with atexit
before setup_tcp() is invoked.
Likewise, we can simpliy stop_pcaps() handling by registering it
with atexit instead of calling it from the signal handler.
atexit handlers run on any exit path - normal completion, raised
exception, and sys.exit() from the timeout signal handler. This
guarantees cleanup are called without further wrapping the test
body in a try/finally blocks.
atexit LIFO ordering keeps stop_pcaps before teardown_tcp so
tcpdumps are killed cleanly before their namespaces go away.
This is a preparatory cleanup for the upcoming ROCE patch which
will also register a teardown_rdma() alongside teardown_tcp()
Sockets created by child processes in netns_socket may raise
exceptions that are currently not handled by the parent. If for
example a namespace didn't exist or the rds module didn't load. Because
these exceptions occur with in a child thread, the child thread exits,
but the parent does not check the return status.
Further, allowing the child processes to quietly raise exceptions
will cause problems later if the parent registers clean up functions
with atexit. Since the child processes inherit the parents handlers,
they may prematurely call the parents cleanup routines without the
parent being aware.
Fix this by all catching exceptions raised by the child processes.
Child errors surface as a non-zero exit status, which are then
properly raised in the parent process.