Anton Leontev [Thu, 4 Jun 2026 16:59:38 +0000 (19:59 +0300)]
hv_netvsc: use kmap_local_page in netvsc_copy_to_send_buf
netvsc_copy_to_send_buf() copies page buffer entries into the VMBus
send buffer using phys_to_virt() on the entry PFN. Entries for the
RNDIS header and the skb linear data come from kmalloc'd memory and
are always in the kernel direct map, but entries for skb fragments
reference page cache or user pages, which on 32-bit x86 with
CONFIG_HIGHMEM=y can live above the LOWMEM boundary. For such a page
phys_to_virt() returns an address outside the direct map and the
subsequent memcpy() faults on the transmit softirq path, which is
fatal.
Map the pages with kmap_local_page() instead, handling two properties
of the page buffer entries:
- pb[i].pfn is a Hyper-V PFN at HV_HYP_PAGE_SIZE (4K) granularity,
not a native PFN. Reconstruct the physical address first and derive
the native page from it, so the mapping stays correct where
PAGE_SIZE > HV_HYP_PAGE_SIZE (e.g. arm64 with 64K pages).
- Since commit 41a6328b2c55 ("hv_netvsc: Preserve contiguous PFN
grouping in the page buffer array"), an entry describes a full
physically contiguous fragment and pb[i].len can exceed PAGE_SIZE,
while kmap_local_page() maps a single page. Copy page by page,
splitting at native page boundaries.
The copy path only handles packets smaller than the send section size
(6144 bytes by default); larger packets take the cp_partial path where
only the RNDIS header is copied. So entries here are bounded by the
section size and a copy is split at most once on 4K-page systems. On
!CONFIG_HIGHMEM configs kmap_local_page() folds to page_address() and
no mapping work is added.
Dawei Feng [Thu, 4 Jun 2026 14:37:56 +0000 (22:37 +0800)]
octeontx2-af: fix memory leak in rvu_setup_hw_resources()
If rvu_npc_exact_init() fails in rvu_setup_hw_resources(), the function
returns directly instead of jumping to the error handling path. This
causes a resource leak for the previously initialized CGX, NPC, fwdata,
and MSI-X states.
Fix this by replacing the direct return with goto cgx_err to ensure
proper cleanup.
The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still present in
v7.1-rc6.
An x86_64 allyesconfig build showed no new warnings. As we do not have
access to Marvell OcteonTX2 RVU AF hardware to test with, no runtime
testing was able to be performed.
Fixes: 3571fe07a090 ("octeontx2-af: Drop rules for NPC MCAM") Cc: stable@vger.kernel.org Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Link: https://patch.msgid.link/20260604143756.1524482-1-dawei.feng@seu.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dmitry Osipenko [Thu, 4 Jun 2026 12:27:43 +0000 (15:27 +0300)]
drm/virtio: Fix driver removal with disabled KMS
DRM atomic and modesetting aren't initialized if virtio-gpu driver built
with disabled KMS, leading to access of uninitialized data on driver
removal/unbinding and crashing kernel. Fix it by skipping shutting down
atomic core with unavailable KMS.
David Howells [Thu, 4 Jun 2026 11:46:00 +0000 (12:46 +0100)]
rxrpc: Fix the ACK parser to extract the SACK table for parsing
Fix modification of the received skbuff in rxrpc_input_soft_acks() and a
potential incorrect access of the buffer in a fragmented UDP packet (the
packet would probably have to be deliberately pre-generated as fragmented)
when AF_RXRPC tries to extract the contents of the SACK table by copying
out the contents of the SACK table into a buffer before attempting to parse
AF_RXRPC assumes that it can just call skb_condense() and then validly
access the SACK table from skb->data and that it will be a flat buffer -
but skb_condense() can silently fail to do anything under some
circumstances.
Note that whilst rxrpc_input_soft_acks() should be able to parse extended
ACKs, the rest of AF_RXRPC doesn't currently support that.
Further, there's then no need to call skb_condense() in rxrpc_input_ack(),
so don't.
Fixes: d57a3a151660 ("rxrpc: Save last ACK's SACK table rather than marking txbufs") Reported-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://lore.kernel.org/r/20260513180907.2061972-1-michael.bommarito@gmail.com Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: stable@kernel.org Link: https://patch.msgid.link/105362.1780573560@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Chih Kai Hsu [Thu, 4 Jun 2026 09:22:47 +0000 (17:22 +0800)]
r8152: handle the return value of usb_reset_device()
If usb_reset_device() returns a negative error code, stop the
process of probing.
Fixes: 10c3271712f5 ("r8152: disable the ECM mode") Signed-off-by: Chih Kai Hsu <hsu.chih.kai@realtek.com> Reviewed-by: Hayes Wang <hayeswang@realtek.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260604092247.27158-450-nic_swsd@realtek.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Liang Luo [Mon, 8 Jun 2026 07:55:00 +0000 (15:55 +0800)]
sched/deadline: Use task_on_rq_migrating() helper
Replace the open-coded "p->on_rq == TASK_ON_RQ_MIGRATING" comparisons
in enqueue_task_dl() and dequeue_task_dl() with the existing
task_on_rq_migrating() helper, consistent with the rest of the
scheduler code.
Liang Luo [Mon, 8 Jun 2026 07:18:42 +0000 (15:18 +0800)]
sched/core: Combine separate 'else' and 'if' statements
The kernel coding style recommends using 'else if' instead of
placing 'if' on a separate line after 'else'. This change makes
the code consistent with the rest of the kernel codebase.
Hongyan Xia [Fri, 5 Jun 2026 09:43:39 +0000 (09:43 +0000)]
sched/fair: Fix cpu_util runnable_avg arithmetic
If we take runnable_avg in max(runnable_avg, util_avg) in cpu_util(), we
should then add or subtract task runnable_avg, but the arithmetic below
is still with task util_avg. This mixes runnable_avg with util_avg which
is incorrect.
Fix by always doing arithmetic with runnable_avg and only take
max(runnable_avg, util_avg) at the last step.
Fabricio Parra [Fri, 5 Jun 2026 05:23:31 +0000 (22:23 -0700)]
rust: sync: completion: Mark inline complete_all and wait_for_completion
When building the kernel using the llvm-22.1.0-rust-1.93.1-x86_64
toolchain provided by kernel.org with ARCH=x86_64, the following symbols
are generated:
$ nm vmlinux | grep ' _R'.*Completion | rustfilt ffffffff81827930 T <kernel::sync::completion::Completion>::complete_all ffffffff81827950 T <kernel::sync::completion::Completion>::wait_for_completion
These Rust methods are thin wrappers around the C completion helpers
`complete_all` and `wait_for_completion`. Mark them `#[inline]` to keep
the wrapper pattern consistent with other small Rust helper methods.
After applying this patch, the above command will produce no output.
Boqun Feng [Fri, 5 Jun 2026 05:23:29 +0000 (22:23 -0700)]
MAINTAINERS: Add RUST [SYNC] entry
We have two pull requests on Rust synchronization primitives with 10+
patches in a row for recent cycles, so it makes sense to start the
effort of handling this area as a group.
Luckily for me, Gary Guo and Alice Ryhl agreed to help as
co-maintainers, and we also have a talented group of reviewers:
Lyude Paul started the SpinLockIrq work [1] and did an amazing job at
improving the design and implementation.
Daniel Almeida resolved the Lock<T: !Unpin> issue [2] and he did a fair
amount of reviews in areas related to synchronization primitives
already.
Onur Özkan started the ww_mutex work [3] and did an amazing job at
consolidating various design requirements and decisions.
Of course, this only reflects my own knowledge, and I believe they did
way more outside what I'm aware of ;-)
Note that having this MAINTAINERS entry is meant to bring more people
to help on the synchronization primitives in Rust, which means for patch
submissions and design discussion, please still involve the
corresponding maintainers (e.g. LOCKING and ATOMIC),
scripts/get_maintainers.pl should have this covered.
Carlos Song [Wed, 20 May 2026 09:33:23 +0000 (17:33 +0800)]
i2c: imx-lpi2c: fix resource leaks switching to devm_dma_request_chan()
The LPI2C driver requests DMA channels using dma_request_chan(), but
never releases them in lpi2c_imx_remove(), resulting in DMA channel
leaks every time the driver is unloaded.
Additionally, when lpi2c_dma_init() successfully requests the TX DMA
channel but fails to request the RX DMA channel, the probe falls back
to PIO mode and completes successfully. Since probe succeeds, the devres
framework will not trigger any cleanup, leaving the TX DMA channel and
the memory allocated for the dma structure held for the lifetime of the
device even though DMA is never used.
Switch to devm_dma_request_chan() to let the device core manage DMA
channel lifetime automatically. Wrap all allocations within a devres
group so that devres_release_group() can release all partially acquired
resources when DMA init fails and probe continues in PIO mode.
Fixes: a09c8b3f9047 ("i2c: imx-lpi2c: add eDMA mode support for LPI2C") Signed-off-by: Carlos Song <carlos.song@nxp.com> Cc: <stable@vger.kernel.org> # v6.14+ Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20260520093323.2882070-1-carlos.song@oss.nxp.com
Arnd Bergmann [Tue, 9 Jun 2026 07:16:25 +0000 (09:16 +0200)]
Merge tag 'v7.1-rockchip-arm32fixe' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes
A change in 7.1-rc improved the general handling for reset controllers
using SRCU, but at the same time broke really early users before work-
queues are available.
So adapt the SMP bringup to keep the core-resets around, instead
of aquiring/releasing them on ever SMP action.
* tag 'v7.1-rockchip-arm32fixe' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
ARM: rockchip: keep reset control around
drm/i915/edp: Check supported link rates DPCD read
intel_edp_set_sink_rates() reads DP_SUPPORTED_LINK_RATES into a local
stack array and then parses the array unconditionally. If the read
fails, the array contents are not valid and may result in bogus sink
link rates being used.
Use drm_dp_dpcd_read_data() and clear the sink rate array on failure,
so the existing parser falls back to the default sink rate handling.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 68f357cb7347 ("drm/i915/dp: generate and cache sink rate array for all DP, not just eDP 1.4") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patch.msgid.link/20260529145759.1640646-1-n.zhandarovich@fintech.ru Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit bd61c7756b34157e093028225a69383b4b1203cc) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
accel/ivpu: Fix signed integer truncation in IPC receive
Fix potential buffer overflow where firmware-supplied data_size is cast
to signed int before being used in min_t(). Large unsigned values
(>= 0x80000000) become negative, causing unsigned wraparound and
oversized memcpy operations that can overflow the stack buffer.
Change min_t(int, ...) to min() as both values are unsigned and can be
handled by min() without explicit cast.
Fixes: 3b434a3445ff ("accel/ivpu: Use threaded IRQ to handle JOB done messages") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com> Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://patch.msgid.link/20260601161643.229342-1-andrzej.kacprowski@linux.intel.com
Adrian Moreno [Thu, 4 Jun 2026 12:19:46 +0000 (14:19 +0200)]
net: openvswitch: fix possible kfree_skb of ERR_PTR
After the patch in the "Fixes" tag, the allocation of the "reply" skb
can happen either before or after locking the ovs_mutex.
However, error cleanups still follow the classical reversed order,
assuming "reply" is allocated before locking: it is freed after unlocking.
If "reply" allocation happens after locking the mutex and it fails,
"reply" is left with an ERR_PTR, and execution jumps to the correspondent
cleanup stage which will try to free an invalid pointer.
Fix this by setting the pointer to NULL after having saved its error
value.
Miguel Ojeda [Tue, 9 Jun 2026 02:18:30 +0000 (04:18 +0200)]
Merge patch series "`zerocopy` support"
Introduce support for `zerocopy` [1][2]:
Fast, safe, compile error. Pick two.
Zerocopy makes zero-cost memory manipulation effortless. We write
`unsafe` so you don't have to.
It essentially provides derivable traits (e.g. `FromBytes`) and macros
(e.g. `transmute!`) for safely converting between byte sequences and
other types. Having such support allows us to remove some `unsafe` code.
It is among the most downloaded Rust crates (top #50 recent, top #100
all-time downloads; according to crates.io), and it is also used by the
Rust compiler itself.
The series starts with a few preparation commits, then the `zerocopy`
and `zerocopy-derive` crates are added. Finally, an example patch using
it is on top, removing one `unsafe impl`.
I had to adapt the crates slightly (just +2/-3 lines), but both patches
could potentially be provided upstream eventually. Please see the
commits for details.
In total, it is about ~39k lines added, ~32k without counting `benches/`
which are just for documentation purposes.
See the cover letter for `syn` for some more details about depending on
third-party crates in commit 54e3eae85562 ("Merge patch series "`syn`
support"").
The codegen of an isolated example function similar to the patch on top
is essentially identical. It also turns out that (for that particular
case) `zerocopy`'s version, even under `debug-assertions` enabled, has
no remaining panics, unlike a few in the current code (because the
compiler can prove the remaining `ub_checks` statically).
So their "fast, safe" does indeed check out -- at least in that case.
P.S. This version of `zerocopy` has already the unstable `Ptr{,Inner}`
types -- to play with them, please use:
Miguel Ojeda [Mon, 8 Jun 2026 14:14:38 +0000 (16:14 +0200)]
gpu: nova-core: firmware: parse `FalconUCodeDescV2` via `zerocopy`
Now that we have `zerocopy` support, we can avoid some `unsafe` code.
For instance, for `FalconUCodeDescV2`, we can replace the `unsafe impl
FromBytes` by safely deriving `zerocopy`'s `FromBytes` and then calling
`read_from_prefix`.
Miguel Ojeda [Mon, 8 Jun 2026 14:14:35 +0000 (16:14 +0200)]
rust: zerocopy-derive: add `README.md`
Originally, when the Rust upstream `alloc` standard library crate was
vendored in commit 057b8d257107 ("rust: adapt `alloc` crate to the
kernel"), a `README.md` file was added to explain the provenance and
licensing of the source files.
Linux is built with `-Dnon_ascii_idents`. However, `zerocopy-derive`
uses a non-ASCII character (`ẕ`) internally, which in turn triggers
the lint when attempting to use derives like `FromBytes`:
error: identifier contains non-ASCII characters
--> rust/kernel/lib.rs:153:9
|
153 | a: u32,
| ^
|
= note: requested on the command line with `-D non-ascii-idents`
This was already noticed by another project using
`#![deny(non_ascii_idents)]` [1]. `zerocopy` added an
`#[allow(non_ascii_idents)]` [2], but it does not work since, at the
moment, the `non_ascii_idents` lint is a `crate_level_only` one, and thus
`allow`s only work at the crate root level.
Due to this, an issue about relaxing this restriction was created in
upstream Rust [3] some months ago.
Thus work around it here by using another prefix. The likelihood of a
collision is very small for us, since we control the callers, and this
will hopefully be fixed soon at either the `zerocopy` or the Rust level.
I filed an issue [4] about it with upstream `zerocopy` as requested
and we discussed this with upstream Rust and `zerocopy`: the Rust issue
got nominated and a PR [5] to relax the restriction was submitted by
Joshua. Upstream `zerocopy` prefers that approach, so if Rust merges it,
then it means we will be able to remove the workaround when we bump the
MSRV, thus likely late 2027, since we follow Debian Stable.
Originally, when the Rust upstream `alloc` standard library crate was
vendored, the SPDX License Identifiers were added to every file so that
the license on those was clear. The same happened with the vendoring of
`proc_macro2`, `quote` and `syn`. Please see:
This makes `scripts/spdxcheck.py` pass: use parentheses like commit 06e9bfc1e57d ("ionic: make spdxcheck.py happy") did since we have two
`OR` operators in the expression (three licenses).
Finally, as requested, I filed an issue [1] with upstream about it.
The next two patches modify these files as needed for use within the
kernel. This patch split allows reviewers to double-check the import
and to clearly see the differences introduced.
The following script may be used to verify the contents:
for path in $(cd rust/zerocopy-derive/ && find . -type f); do
curl --silent --show-error --location \
https://github.com/google/zerocopy/raw/v0.8.50/zerocopy-derive/src/$path \
| diff --unified rust/zerocopy-derive/$path - && echo $path: OK
done
Miguel Ojeda [Mon, 8 Jun 2026 14:14:30 +0000 (16:14 +0200)]
rust: zerocopy: add `README.md`
Originally, when the Rust upstream `alloc` standard library crate was
vendored in commit 057b8d257107 ("rust: adapt `alloc` crate to the
kernel"), a `README.md` file was added to explain the provenance and
licensing of the source files.
Miguel Ojeda [Mon, 8 Jun 2026 14:14:29 +0000 (16:14 +0200)]
rust: zerocopy: remove float `Display` support
The kernel builds `core` with the `no_fp_fmt_parse` `--cfg`, which means
we do not have support for formatting floating point primitives. However,
`zerocopy` expects those implementations to exist:
error[E0277]: `f32` doesn't implement `core::fmt::Display`
--> rust/zerocopy/src/byteorder.rs:172:29
|
172 | $trait::fmt(&self.get(), f)
| ----------- ^^^^^^^^^^^ the trait `core::fmt::Display` is not implemented for `f32`
| |
| required by a bound introduced by this call
...
907 | / define_type!(
908 | | An,
909 | | "A 32-bit floating point number",
910 | | F32,
... |
922 | | []
923 | | );
| |_- in this macro invocation
|
= help: the following other types implement trait `core::fmt::Display`:
i128
i16
i32
i64
i8
isize
u128
u16
and 4 others
= note: this error originates in the macro `impl_fmt_trait` which comes from the expansion of the macro `define_type` (in Nightly builds, run with -Z macro-backtrace for more info)
Thus work around it by skipping those implementations in `zerocopy`.
Ideally, `zerocopy` would have the equivalent of `no_fp_fmt_parse`;
and, indeed, upstream just added it [1] after I filed an issue [2]
about it as requested. We can try it in a future update of our
vendored copy.
Miguel Ojeda [Mon, 8 Jun 2026 14:14:28 +0000 (16:14 +0200)]
rust: zerocopy: add SPDX License Identifiers
Originally, when the Rust upstream `alloc` standard library crate was
vendored, the SPDX License Identifiers were added to every file so that
the license on those was clear. The same happened with the vendoring of
`proc_macro2`, `quote` and `syn`. Please see:
This makes `scripts/spdxcheck.py` pass: use parentheses like commit 06e9bfc1e57d ("ionic: make spdxcheck.py happy") did since we have two
`OR` operators in the expression (three licenses).
SPDX identifiers are not added to the `benches` files because they are
included in rendered documentation. Nevertheless, the `README.md` to be
added by a later commit mentions the license.
Finally, as requested, I filed an issue [1] with upstream about it.
Miguel Ojeda [Mon, 8 Jun 2026 14:14:27 +0000 (16:14 +0200)]
rust: zerocopy: import crate
This is a subset of the Rust `zerocopy` crate, version v0.8.50 (released
2026-05-31), licensed under "BSD-2-Clause OR Apache-2.0 OR MIT", from:
https://github.com/google/zerocopy/tree/v0.8.50
The files are copied as-is, with no modifications whatsoever (not even
adding the SPDX identifiers).
The `benches` folder is added (i.e. not just `src` like in other cases)
since the files there are included in the rendered documentation,
as well as the `rustdoc` CSS style file that is needed to make those
visually more understandable.
The next two patches modify these files as needed for use within the
kernel. This patch split allows reviewers to double-check the import
and to clearly see the differences introduced.
The following script may be used to verify the contents:
for path in $(cd rust/zerocopy/ && find . -type f); do
curl --silent --show-error --location \
https://github.com/google/zerocopy/raw/v0.8.50/$path \
| diff --unified rust/zerocopy/$path - && echo $path: OK
done
Since we are adding one more proc macro crate (`zerocopy-derive`),
we are refactoring their handling.
Thus, instead of using `libmacros_extension` as the common variable to
hold the extension for all of them, use a dedicated variable with a more
generic name (including for its implementation).
Add an associated function to `UniqueArc` for getting a raw pointer. The
implementation defers to the `Arc` implementation.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20260605-unique-arc-as-ptr-v2-1-425476d2abdb@kernel.org
[ Relaxed bound moving it to new `T: ?Sized` impl block. Reworded since
it is not a method anymore. Added intra-doc link. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
These methods should be inlined for optimization reasons. Failure to do
so can also produce symbol names larger than what `modpost` or `objtool`
can handle.
Joel Fernandes [Sat, 6 Jun 2026 12:43:05 +0000 (21:43 +0900)]
rust: bitfield: Add KUnit tests for bitfield
Add KUnit tests to make sure the macro is working correctly. The unit
tests are put behind the new `RUST_BITFIELD_KUNIT_TEST` Kconfig option.
Acked-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
[acourbot:
- Use a consistent test axis where each test focuses on a single thing.
- Rename members to generic name including range for readability.
- Add test exercising `try_with`.
- Add test checking that unallocated bits are left untouched.
] Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Yury Norov <ynorov@nvidia.com> Link: https://patch.msgid.link/20260606-bitfield-v5-2-b92188820914@nvidia.com
[ Prefixed test suite name with `rust_` as mentioned. Markdown-formatted
a few comments with Markdown. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Kyle Zeng [Fri, 5 Jun 2026 07:34:48 +0000 (00:34 -0700)]
ipv6: sit: reload inner IPv6 header after GSO offloads
ipip6_tunnel_xmit() caches the inner IPv6 header pointer at function
entry and continues using it after iptunnel_handle_offloads().
For GSO skbs, iptunnel_handle_offloads() calls skb_header_unclone().
When the skb header is cloned, skb_header_unclone() can call
pskb_expand_head(), which may move the skb head. The pskb_expand_head()
contract requires pointers into the skb header to be reloaded after the
call.
If the later skb_realloc_headroom() branch is not taken, SIT uses the
stale iph6 pointer to read the inner hop limit and DS field. That can
read from a freed skb head after the old head's remaining clone is
released.
Reload iph6 after the offload helper succeeds and before subsequent
reads from the inner IPv6 header. Keep the existing reload after
skb_realloc_headroom(), since that branch can also replace the skb.
Fixes: 14909664e4e1 ("sit: Setup and TX path for sit/UDP foo-over-udp encapsulation") Signed-off-by: Kyle Zeng <kylebot@openai.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+6eb9ca986d80f6f88cf9@syzkaller.appspotmail.com Link: https://patch.msgid.link/20260605073448.6524-1-kylebot@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fushuai Wang [Fri, 5 Jun 2026 10:21:12 +0000 (18:21 +0800)]
net/mlx5: Use effective affinity mask for IRQ selection
When a sf is created after a CPU has been taken offline, the IRQ pool may
contain IRQs with affinity masks that include the offline CPU. Since only
online CPUs should be considered for IRQ placement, cpumask_subset() check
would fail because the iter_mask contains offline CPUs that are not present
in req_mask, causing sf creation to fail.
This is an example:
1. When mlx5 driver loads, it initializes the IRQ pools.
For sf_ctrl_pool with ≤64 sf:
- xa_num_irqs = {N, N} (There is only one slot)
2. When the first SF is created:
- The ctrl IRQ is allocated with mask=cpu_online_mask={0-191}
2. We take CPU 20 offline
3. Existing ctl irq still have mask={0-191}
4. Create a new SF:
- req_mask={0-19,21-191}
- iter_mask={0-191}
- {0-191} is NOT a subset of {0-19,21-191}
- least_loaded_irq=NULL
5. Try to allocate a new irq via irq_pool_request_irq()
6. xa_alloc() fails because the pool is full(There is only one slot)
7. sf creation fails with error
Use irq_get_effective_affinity_mask() instead, which returns the IRQ's
actual effective affinity that already excludes offline CPUs.
Fixes: 061f5b23588a ("net/mlx5: SF, Use all available cpu for setting cpu affinity") Suggested-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260605102112.91772-1-fushuai.wang@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dragos Tatulea [Thu, 4 Jun 2026 13:54:46 +0000 (16:54 +0300)]
net/mlx5e: xsk: Fix DMA and xdp_frame leak on XDP_TX xmit failure
In the XSK branch of mlx5e_xmit_xdp_buff(), when sq->xmit_xdp_frame()
returns false (e.g. XDPSQ is full), the function returns without
unmapping the DMA address or freeing the xdp_frame allocated by
xdp_convert_zc_to_xdp_frame(). The xdpi_fifo push only happens on
success, so the completion path cannot recover these entries.
With CONFIG_DMA_API_DEBUG=y, the leak surfaces on driver unbind:
DMA-API: pci 0000:08:00.0: device driver has pending DMA
allocations while released from device [count=1116]
One of leaked entries details: [device address=0x000000010ffd7028]
[size=1534 bytes] [mapped with DMA_TO_DEVICE] [mapped as phy]
WARNING: kernel/dma/debug.c:881 at dma_debug_device_change+0x127/0x180
...
DMA-API: Mapped at:
debug_dma_map_phys+0x4b/0xd0
dma_map_phys+0xfd/0x2d0
mlx5e_xdp_handle+0x5ae/0xac0 [mlx5_core]
mlx5e_xsk_skb_from_cqe_mpwrq_linear+0xc4/0x170 [mlx5_core]
mlx5e_handle_rx_cqe_mpwrq+0xc1/0x290 [mlx5_core]
Add the missing unmap + xdp_return_frame, matching the cleanup already
done in mlx5e_xdp_xmit(). has_frags is rejected earlier in this branch,
so no per-frag unmap is needed.
Dragos Tatulea [Thu, 4 Jun 2026 13:58:49 +0000 (16:58 +0300)]
net/mlx5: Fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list
mlx5_query_nic_vport_mac_list() sizes its firmware command buffer using
the PF's log_max_current_uc/mc_list capabilities. When querying a VF
vport with a larger configured max (via devlink), the firmware response
can overflow this buffer:
BUG: KASAN: slab-out-of-bounds in mlx5_query_nic_vport_mac_list+0x453/0x4c0 [mlx5_core]
Read of size 4 at addr ff1100013ffc8a12 by task kworker/u96:2/385
Fix by querying the vport's own HCA caps to size the buffer correctly.
Refactor the function to allocate and return the MAC list internally,
removing the caller's dependency on knowing the correct max.
Fixes: e16aea2744ab ("net/mlx5: Introduce access functions to modify/query vport mac lists") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260604135849.458060-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mingyu Wang [Thu, 4 Jun 2026 06:48:01 +0000 (14:48 +0800)]
net: qrtr: fix refcount saturation and potential UAF in qrtr_port_remove
In qrtr_port_remove(), the socket reference count is decremented via
__sock_put() before the port is removed from the qrtr_ports XArray and
before the RCU grace period elapses.
This breaks the fundamental RCU update paradigm. It exposes a race
window where a concurrent RCU reader (such as qrtr_reset_ports() or
qrtr_port_lookup()) can obtain a pointer to the socket from the XArray,
and attempt to call sock_hold() on a socket whose reference count has
already dropped to zero.
This exact race condition was hit during syzkaller fuzzing, leading to
the following refcount saturation warning and a potential Use-After-Free:
Fix this by deferring the reference count decrement until after the
xa_erase() and the synchronize_rcu() complete.
(Note: The v1 of this patch incorrectly replaced __sock_put() with
sock_put(). As Simon Horman pointed out, the callers of qrtr_port_remove()
still hold a reference to the socket, so freeing the socket memory here
would lead to a subsequent UAF in the caller. Thus, the __sock_put() is
kept, but only repositioned to close the RCU race.)
Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260604064801.1180388-1-w15303746062@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: phy: some cleanups following phy_port SFP
While posting the v11 of phy_port netlink, sashiko found some
pre-existing issues, and following the tentative fix, Nicolai found
some more :)
This is V3, with a re-ordering of the port/sfp cleanup, as well as a new
patch (patch 3) that also reorders the phy_remove() path.
====================
net: phy: don't try to setup PHY-driven SFP cages when using genphy
We don't have support for PHY-driver SFP cages with the genphy code.
On top of that, it was found by sashiko that running
sfp_bus_add_upstream() for genphy deadlocks, as for genphy the PHY
probing runs under RTNL, which isn't the case for non-genphy drivers.
This problem was reproduced, and does lead to a deadlock on RTNL.
Before the blamed commit, the phy_sfp_probe() call was made by
individual PHY drivers, so there was no way to get to the SFP probing
path when using genphy.
Let's therefore only run phy_sfp_probe when not using genphy.
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Fixes: bad869b5e41a ("net: phy: Only rely on phy_port for PHY-driven SFP") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260604092819.723505-5-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: phy: Clean the phy_ports after unregistering the downstream SFP bus
As reported by sashiko when looking a other patches, we need to ensure
that the downstream SFP bus gets unregistered prior to destroying the
phy_ports attached to a phy_device, as the SFP code may reference these
ports. Let's make sure we follow that ordering in phy_remove().
net: phy: clean the sfp upstream if phy probing fails
Sashiko reported that we don't call sfp_bus_del_upstream() in the probe
failure path, so let's add it, otherwise the sfp-bus is left with a
dangling 'upstream' field, that may be used later on during SFP events.
This issue existed before the generic phylib sfp support, back when
drivers were calling phy_sfp_probe themselves.
Jakub Kicinski [Sat, 6 Jun 2026 01:21:24 +0000 (18:21 -0700)]
netdev: fix double-free in netdev_nl_bind_rx_doit()
Sashiko flags that genlmsg_reply() always consumes the skb.
The error path calls nlmsg_free(rsp) so we can't jump directly
to it. Let's not unbind, just propagate the error to the user.
This is the typical way of handling genlmsg_reply() failures.
They shouldn't happen unless user does something silly like
calling the kernel with an already-full rcvbuf.
Reported-by: Sashiko <sashiko-bot@kernel.org> Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice") Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Santosh Kalluri [Thu, 4 Jun 2026 00:08:43 +0000 (17:08 -0700)]
net: phonet: free phonet_device after RCU grace period
phonet_device_destroy() removes a phonet_device from the per-net device
list with list_del_rcu(), but frees it immediately. RCU readers walking
the same list can still hold a pointer to the object after it has been
removed, leading to a slab-use-after-free.
Use kfree_rcu(), matching the lifetime rule already used by
phonet_address_del() for the same object type.
Fixes: eeb74a9d45f7 ("Phonet: convert devices list to RCU") Cc: stable@vger.kernel.org Signed-off-by: Santosh Kalluri <santosh.kalluri129@gmail.com> Acked-by: Rémi Denis-Courmont <remi@remlab.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev [Wed, 3 Jun 2026 22:12:17 +0000 (15:12 -0700)]
net: ibm: emac: Fix use-after-free during device removal
The driver was using devm_register_netdev() which causes unregister_netdev()
to be deferred until the devres cleanup phase, which runs after emac_remove()
returns. This creates a use-after-free window where:
1. emac_remove() is called, which tears down hardware (cancels work, detaches
modules, unregisters from MAL)
2. emac_remove() returns
3. devres cleanup runs and finally calls unregister_netdev()
During step 3, the network stack might still process packets, triggering
emac_irq(), emac_poll(), or other handlers that access now-freed hardware
resources (dev->emacp, dev->mal, etc.).
Fix this by replacing devm_register_netdev() with manual register_netdev()
and calling unregister_netdev() at the beginning of emac_remove(), before
any hardware teardown. This ensures the network device is fully stopped and
unregistered before hardware resources are released.
The change is safe because:
- dev->ndev is assigned very early in probe (before any error paths that
could bypass emac_remove)
- platform_set_drvdata() is only called after successful registration, so
emac_remove() only runs for fully registered devices
- unregister_netdev() is idempotent and safe to call on any registered device
Fixes: a4dd8535a527 ("net: ibm: emac: use devm for register_netdev") Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mlx4_init_user_cqes() fills a scratch buffer with the CQE
initialization pattern and then copies from that buffer to userspace.
In the single-copy path, the copy length is array_size(entries,
cqe_size), but the scratch buffer is allocated with PAGE_SIZE. GCC 10
does not carry the branch invariant strongly enough through the object
size checks and falsely triggers __bad_copy_from().
Size the scratch buffer to the actual copy length for the active path,
keep array_size() for the single-copy case, and retain a WARN_ON_ONCE()
guard for the PAGE_SIZE invariant before allocating the buffer.
Fixes: f69bf5dee7ef ("net/mlx4: Use array_size() helper in copy_to_user()") Signed-off-by: Yao Sang <sangyao@kylinos.cn> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
HanQuan [Thu, 4 Jun 2026 14:46:25 +0000 (14:46 +0000)]
net: add pskb_may_pull() to skb_gro_receive_list()
skb_gro_receive_list() calls skb_pull(skb, skb_gro_offset(skb)) without
first ensuring the data is in the linear area via pskb_may_pull(). When
the skb arrives via napi_gro_frags(), skb_headlen can be 0 (all data in
page fragments) while skb_gro_offset is non-zero (after IP+TCP header
parsing). The skb_pull() then decrements skb->len by skb_gro_offset
but skb->data_len stays unchanged, hitting BUG_ON(skb->len < skb->data_len)
in __skb_pull().
The UDP fraglist GRO path already contains this guard at
udp_offload.c:749. Adding it to skb_gro_receive_list() itself provides
centralized protection for all callers (TCP, UDP, and any future
protocols), and ensures the precondition of skb_pull() is satisfied
before it is called.
On pskb_may_pull() failure, set NAPI_GRO_CB(skb)->flush = 1 so the
skb is not held as a new GRO head and is instead delivered through the
normal receive path, matching the UDP handling.
Fixes: 8d95dc474f85 ("net: add code for TCP fraglist GRO") Reported-by: HanQuan <eilaimemedsnaimel@gmail.com> Reported-by: MingXuan <bwnie0730@outlook.com> Signed-off-by: HanQuan <eilaimemedsnaimel@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 5 Jun 2026 11:21:34 +0000 (11:21 +0000)]
tcp: restrict SO_ATTACH_FILTER to priv users
This patch restricts the use of SO_ATTACH_FILTER (cBPF) on TCP sockets
to users with CAP_NET_ADMIN capability.
This blocks potential side-channel attack where an unprivileged application
attaches a filter to leak TCP sequence/acknowledgment numbers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Tamir Shahar <tamirthesis@gmail.com> Reported-by: Amit Klein <aksecurity@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shashank Balaji [Mon, 18 May 2026 10:20:00 +0000 (19:20 +0900)]
driver core: platform: set mod_name in driver registration
Pass KBUILD_MODNAME through the driver registration macro so that the
driver core can create the module symlink in sysfs for built-in drivers,
and fixup all callers.
The Rust platform adapter is updated to pass the module name through to
the new parameter.
Tested on qemu with:
- x86 defconfig + CONFIG_RUST
- arm64 defconfig + CONFIG_RUST + CONFIG_CORESIGHT stuff
Shashank Balaji [Mon, 18 May 2026 10:19:59 +0000 (19:19 +0900)]
coresight: pass THIS_MODULE implicitly through a macro
Rename coresight_init_driver() to coresight_init_driver_with_owner() and
replace it with a macro wrapper that passes THIS_MODULE implicitly. This
is in line with what other buses do.
Shashank Balaji [Mon, 1 Jun 2026 10:19:41 +0000 (19:19 +0900)]
kernel: param: initialize module_kset in a pure_initcall
Commit "driver core: platform: set mod_name in driver registration" will
set struct device_driver's mod_name member for platform driver
registration. For a driver to be registered with its mod_name set,
module_kset needs to be initialized, which currently happens in a
subsys_initcall in param_sysfs_init(). The tegra cbb drivers register
themselves before module_kset init, in a core_initcall. This works
currently because lookup_or_create_module_kobject(), which dereferences
module_kset via kset_find_obj(), is not called if mod_name is not set,
which is the case now.
So in preparation for the commit "driver core: platform: set mod_name in
driver registration", move module_kset init to pure_initcall level,
ensuring it happens before tegra cbb driver registration.
Shashank Balaji [Mon, 18 May 2026 10:19:57 +0000 (19:19 +0900)]
soc/tegra: cbb: Move driver registration from pure_initcall to core_initcall
Commit "driver core: platform: set mod_name in driver registration" will
set struct device_driver's mod_name member for platform driver
registration. For a driver to be registered with its mod_name set,
module_kset needs to be initialized, which currently happens in a
subsys_initcall in param_sysfs_init(). The tegra cbb drivers register
themselves before module_kset init, in a pure_initcall. This works
currently because lookup_or_create_module_kobject(), which dereferences
module_kset via kset_find_obj(), is not called if mod_name is not set,
which is the case now.
So in preparation for the commit "driver core: platform: set mod_name in
driver registration", move tegra cbb driver registration to
core_initcall level, and commit "kernel: param: initialize module_kset
in a pure_initcall" will move module_kset init to pure_initcall level,
ensuring module_kset init happens before tegra cbb driver registration.
Akhil R [Mon, 18 May 2026 11:40:13 +0000 (17:10 +0530)]
i2c: tegra: Fix NOIRQ suspend/resume
The Tegra I2C driver relies on runtime PM to wake up the controller before
each transfer. However, runtime PM is disabled between the system suspend
and NOIRQ suspend. If an I2C device initiates a transfer during this
window, the I2C controller fails to wake up and the transfer fails. To
handle this, the controller must be kept available for this period to
allow transfers.
Rework the I2C controller's system PM callbacks such that the controller
is resumed from runtime suspend during system suspend and it stays
RPM_ACTIVE throughout the suspend-resume cycle until it is runtime
suspended back in the system resume. The clocks are disabled in NOIRQ
suspend and enabled back in NOIRQ resume by calling the controller's
runtime PM functions directly.
Fixes: 8ebf15e9c869 ("i2c: tegra: Move suspend handling to NOIRQ phase") Assisted-by: Cursor:claude-4.6-opus Signed-off-by: Akhil R <akhilrajeev@nvidia.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20260518114013.62065-5-akhilrajeev@nvidia.com
Akhil R [Mon, 18 May 2026 11:40:12 +0000 (17:10 +0530)]
i2c: tegra: Update Tegra410 I2C timing parameters
Update Tegra410 I2C timing parameters based on hardware characterization
results. This adjusts the fast mode and HS mode settings to be compliant
with the I2C specification.
Al Viro [Tue, 12 May 2026 16:18:21 +0000 (12:18 -0400)]
configfs: mark pinned dentries persistent
on the removal side we can (finally) get rid of __simple_unlink()
and __simple_rmdir() kludges now that dentries in question are
properly marked persistent - simple_unlink() and simple_rmdir()
will do the right thing for those.
Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 12 May 2026 07:19:41 +0000 (03:19 -0400)]
configfs: dentry refcount needs to be pinned only once
currently we have a weird situation where
* symlinks and roots of subtrees created by mkdir are pinned once
* subdirectories of subtrees created by mkdir are pinned twice
* roots of subtrees created by register_{group,subsystem} are pinned
twice.
It makes things harder to follow for no good reason. The goal is to
encapsulate the unbalanced dget/dput into d_{make,discard}_persisitent()
and, preferably, allow a use of simple_recursive_removal() or analogue
thereof. So let's regularize that and pin things only once.
create_default_group() and configfs_register_subsystem() don't need to
keep their reference around on success - configfs_create_dir() has pinned
the sucker already. So we can drop the reference passed to
configfs_create_dir() (via configfs_attach_group(), etc.) both on success
and on failure. On the removal side we no longer have the double references,
so we need an explicit dget() to compensate.
Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 12 May 2026 07:13:37 +0000 (03:13 -0400)]
switch configfs_detach_{group,item}() to passing dentry
... and there's no need to grab/drop it, or check for NULL - none
of the callers would even get there with NULL dentry and all of
them have the sucker pinned
Note that if sd is a directory configfs_dirent, we have sd->s_element
pointing to config_item with item->ci_dentry equal to sd->s_dentry.
Which is the only reason why detach_groups() gets away with using
the latter for locking the inode and the former for removal.
Aren't redundant data structures wonderful, for obfuscation if nothing
else?
Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 12 May 2026 06:26:41 +0000 (02:26 -0400)]
configfs_remove_dir(), detach_attrs(): switch to passing dentry
... and deal with grabbing/dropping it in the sole caller.
After that configfs_remove_dir() becomes an unconditional call of remove_dir(),
so we can fold them together.
Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 12 May 2026 06:18:38 +0000 (02:18 -0400)]
populate_attrs(): move cleanup to the sole caller
... where it folds with configfs_remove_dir() into a call of
configfs_detach_item(). Note that at the early failure exit
(before we'd added any children) we were not calling detach_attrs()
only because there it would've been a no-op - nothing added,
nothing there to be removed.
Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 12 May 2026 05:25:48 +0000 (01:25 -0400)]
configfs_do_depend_item(): pass configfs_dirent instead of dentry
Again, the only thing it uses the argument for is its ->d_fsdata
and callers already have that - as the matter of fact, they are
passing ->s_dentry of that configfs_dirent, so that the function
could get it back as ->d_fsdata of that. With nothing else in
dentry even looked at...
configfs_dirent in question is a directory one - in this case those
are subdirectories of root (aka roots of "subsystem" trees).
Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 12 May 2026 05:23:29 +0000 (01:23 -0400)]
configfs_depend_prep(): pass configfs_dirent instead of dentry
Again, the only thing it uses dentry for is dentry->d_fsdata; for the
recursive call the situation is the same as with configfs_detach_prep()
and the same observation about ->s_dentry->d_fsdata applies.
Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 12 May 2026 05:17:13 +0000 (01:17 -0400)]
configfs_detach_prep(): pass configfs_dirent instead of dentry
The only thing it uses the argument for is its ->d_fsdata and
all callers have that already available.
Note that in the recursive call we are dealing with a (sub)directory
configfs_dirent, and for those ->s_dentry->d_fsdata points back
to configfs_dirent itself.
Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 30 May 2026 07:48:34 +0000 (03:48 -0400)]
configfs: fix lockless traversals of ->s_children
Having the parent directory locked protects entries from removal
by another thread, but it does *not* protect cursors from being
moved around by lseek() - or freed, for that matter.
Fixes: 6f6107640625 ("configfs: Introduce configfs_dirent_lock") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Wentao Liang [Sun, 7 Jun 2026 09:03:03 +0000 (09:03 +0000)]
drm/virtio: fix dma_fence refcount leak on error in virtio_gpu_dma_fence_wait()
dma_fence_unwrap_for_each() internally calls dma_fence_unwrap_first()
which does cursor->chain = dma_fence_get(head), taking an extra
reference. On normal loop completion, dma_fence_unwrap_next()
releases this via dma_fence_chain_walk() -> dma_fence_put().
When virtio_gpu_do_fence_wait() fails and the function returns early
from inside the loop, the cursor->chain reference is never released.
This is the only caller in the entire kernel that does an early return
inside dma_fence_unwrap_for_each.
Add dma_fence_put(itr.chain) before the early return.
Cc: stable@vger.kernel.org Fixes: eba57fb5498f ("drm/virtio: Wait for each dma-fence of in-fence array individually") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patch.msgid.link/20260607090303.92423-1-vulab@iscas.ac.cn
i2c: qcom-cci: Fix NULL pointer dereference in cci_remove()
On all modern platforms Qualcomm CCI controller provides two I2C masters,
and on particular boards only one I2C master may be initialized, and in
such cases the device unbinding or driver removal causes a NULL pointer
dereference, because cci_halt() is called for all two I2C masters, but
a completion is initialized only for the single enabled master:
Dmitry Vyukov [Fri, 29 May 2026 15:09:06 +0000 (15:09 +0000)]
firmware_loader: Fix recursive lock in device_cache_fw_images()
A recursive locking deadlock can occur in the firmware loader's power
management notification handler.
During system suspend or hibernation preparation, fw_pm_notify() calls
device_cache_fw_images(). This function acquires fw_lock to set the
firmware cache state to FW_LOADER_START_CACHE and then iterates over all
devices using dpm_for_each_dev() while still holding the lock.
For each device, dev_cache_fw_image() schedules asynchronous work to cache
the firmware. If memory allocation for the async work entry fails (e.g., in
out-of-memory conditions), async_schedule_node_domain() falls back to
executing the work function synchronously in the current thread.
The synchronous execution path (__async_dev_cache_fw_image() ->
cache_firmware() -> request_firmware() -> assign_fw()) attempts to acquire
fw_lock again. Since the current thread already holds fw_lock, this results
in a recursive locking deadlock.
Fix this by releasing fw_lock immediately after updating the cache state
and before calling dpm_for_each_dev(). The lock is only needed to protect
the state update. Concurrent firmware requests will correctly see the
FW_LOADER_START_CACHE state and use the piggyback mechanism, which is
independently protected by its own fwc->name_lock.
stm32f7_i2c_compute_timing() uses i2c_dev->analog_filter to pick
the analog filter delay, but i2c_dev->analog_filter is parsed from
the "i2c-analog-filter" DT property only after the compute_timing
loop in stm32f7_i2c_setup_timing(), so in practice the timing
calculations always ignore the analog filter. On an STM32MP1 board
with clock-frequency = <400000> and i2c-analog-filter set, measured
SCL frequency was ~382 kHz.
This also affects (widens) the computed SDADEL range. At high bus
clock speeds, this can select an SDADEL value that violates tVD;DAT
(data valid time).
Fix by parsing "i2c-analog-filter" before the compute_timing loop.
ASoC: wm_adsp: Fix NULL dereference when removing firmware controls
In wm_adsp_control_remove() check that the priv pointer is not NULL
before attempting to cleanup what it points to.
When cs_dsp creates a control it calls wm_adsp_control_add_cb() so that
wm_adsp can create its own private control data. There are two cases
where private data is not created:
1. The control is a SYSTEM control, so an ALSA control is not created.
2. The codec driver has registered a control_add() callback that
hides the control, so wm_adsp_control_add() is not called.
When cs_dsp_remove destroys its control list it calls
wm_adsp_control_remove() for each control. But wm_adsp_control_remove()
was attempting to cleanup the private data pointed to by cs_ctl->priv
without checking the pointer for NULL.
Carlos Song [Thu, 21 May 2026 06:50:38 +0000 (14:50 +0800)]
i2c: imx: fix clock and pinctrl state inconsistency in runtime PM
In i2c_imx_runtime_suspend(), the clock is disabled before switching
the pinctrl state to sleep. If pinctrl_pm_select_sleep_state() fails,
the runtime suspend is aborted but the clock remains disabled, causing
a system crash when the hardware is subsequently accessed.
Fix this by switching the pinctrl state before disabling the clock so
that a pinctrl failure leaves the clock enabled and the hardware
accessible.
In i2c_imx_runtime_resume(), restore the pinctrl state back to sleep
if clk_enable() fails to keep the consistent.
Fixes: 576eba03c994 ("i2c: imx: switch different pinctrl state in different system power status") Signed-off-by: Carlos Song <carlos.song@nxp.com> Cc: <stable@vger.kernel.org> # v6.14+ Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20260521065038.2954998-1-carlos.song@oss.nxp.com
Heiko Carstens [Fri, 5 Jun 2026 15:32:06 +0000 (17:32 +0200)]
s390: Remove GENERIC_LOCKBREAK Kconfig option
s390 selects GENERIC_LOCKBREAK if PREEMPT is enabled. Reason is a historic
18 years old commit [1] which fixed a compile error for PREEMPT enabled
kernels. Back than only PREEMPT_NONE and PREEMPT_VOLUNTARY kernels were
considered to be important for s390. PREEMPT should "just work".
However, since recently PREEMPT is always enabled [2], which also causes
GENERIC_LOCKBREAK to be always enabled. For some workloads this leads to
massive performance degradation; e.g. a simple kernel compile on machines
with many CPUs may take up to four times longer.
To fix this just remove the GENERIC_LOCKBREAK from s390's Kconfig, since
the compile error from 18 years ago does not exist anymore.
[1] commit b6b40c532a36 ("[S390] Define GENERIC_LOCKBREAK.")
[2] commit 7dadeaa6e851 ("sched: Further restrict the preemption modes")
Cc: stable@vger.kernel.org Reported-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
RDMA/srp: bound SRP_RSP sense copy by the received length
srp_process_rsp() copies sense data from rsp->data + resp_data_len,
where resp_data_len is the full 32-bit value supplied by the SRP target
and is never checked against the number of bytes actually received
(wc->byte_len). The copy length is bounded to SCSI_SENSE_BUFFERSIZE, so
at most 96 bytes are copied, but the source offset is not bounded.
A malicious or compromised SRP target on the InfiniBand/RoCE fabric that
the initiator has logged into can return an SRP_RSP with
SRP_RSP_FLAG_SNSVALID set and a large resp_data_len. The receive buffer
is allocated at the target-chosen max_ti_iu_len, so the source of the
sense copy lands past the bytes actually received; with resp_data_len
near 0xFFFFFFFF it is gigabytes past the buffer and the read faults.
Copy the sense data only if it has not been truncated, that is, only if
the response header, the response data, and the sense region fit within
the bytes actually received; otherwise drop the sense and log. The
in-tree iSER and NVMe-RDMA receive paths already bound their parse by
wc->byte_len; this brings ib_srp into line with them.
Fixes: aef9ec39c47f ("IB: Add SCSI RDMA Protocol (SRP) initiator") Link: https://patch.msgid.link/r/20260602220457.2542840-1-michael.bommarito@gmail.com Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
IB/isert: Reject login PDUs shorter than ISER_HEADERS_LEN
In drivers/infiniband/ulp/isert/ib_isert.c, isert_login_recv_done()
computes the login request payload length as wc->byte_len minus
ISER_HEADERS_LEN with no lower bound, and login_req_len is a signed int.
A remote iSER initiator can post a login Send work request carrying
fewer than ISER_HEADERS_LEN (76) bytes, so the subtraction underflows
and login_req_len becomes negative.
isert_rx_login_req() then reads that negative length back into a signed
int, takes size = min(rx_buflen, MAX_KEY_VALUE_PAIRS), and because the
min() is signed it keeps the negative value; the value is then passed as
the memcpy() length and sign-extended to a multi-gigabyte size_t. The
copy into the 8192-byte login->req_buf runs far out of bounds and
faults, crashing the target node. The login phase precedes iSCSI
authentication, so no credentials are required to reach this path.
Reject any login PDU shorter than ISER_HEADERS_LEN before the
subtraction, mirroring the existing early return on a failed work
completion, so login_req_len can never go negative. The upper bound was
already safe: a posted login buffer cannot deliver more than
ISER_RX_PAYLOAD_SIZE, so the difference stays at or below
MAX_KEY_VALUE_PAIRS and the existing min() clamps it; only the missing
lower bound needs to be added.
Fixes: b8d26b3be8b3 ("iser-target: Add iSCSI Extensions for RDMA (iSER) target driver") Link: https://patch.msgid.link/r/20260602194642.2273217-1-michael.bommarito@gmail.com Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pierre Gondois [Thu, 28 May 2026 09:09:06 +0000 (11:09 +0200)]
cpufreq: Use policy->min/max init as QoS request
Modify cpufreq_policy_init_qos() introduced previously to use
policy->min/max set in the driver .init() callback as the initial
values for the policy min/max frequency QoS requests, respectively,
so long as they are different from 0 (which means that they have
been updated by the driver). Update the documentation in accordance
with that code change.
Prior to commit 521223d8b3ec ("cpufreq: Fix initialization of min and
max frequency QoS requests"), drivers were setting policy->min/max and
these values were used as initial policy QoS constraints.
After the above commit, these values are only used temporarily, as
cpufreq_set_policy() ultimately overrides them through:
cpufreq_policy_online()
\-cpufreq_init_policy()
\-cpufreq_set_policy()
\-/* Set policy->min/max */
A subsequent change will restore the previous behavior allowing
drivers to request special min/max QoS frequencies instead of
FREQ_QOS_MIN_DEFAULT_VALUE and FREQ_QOS_MAX_DEFAULT_VALUE, respectively,
if desired. For instance, the CPPC driver wants to advertise the lowest
non-linear frequency that should be used as the initial minimum
frequency QoS request.
However, for this purpose, all drivers setting policy->min/max to
policy->cpuinfo.min/max_freq, respectively, need to be updated so
their initial policy->min/max settings don't limit the frequency
scaling unnecessarily going forward (which would defeat the purpose
of commit 521223d8b3ec), so do that.
This does not actually alter the observed behavior of all of
the drivers in question because setting policy->min/max to
policy->cpuinfo.min/max_freq, respectively, is not necessary or
even useful any more after a previous change ("cpufreq: Set default
policy->min/max values for all drivers").
Pierre Gondois [Thu, 28 May 2026 09:09:04 +0000 (11:09 +0200)]
cpufreq: Set default policy->min/max values for all drivers
Some drivers set policy->min/max in their .init() callback, but
cpufreq_set_policy() will ultimately override them through:
cpufreq_policy_online()
\-cpufreq_init_policy()
\-cpufreq_set_policy()
\-/* Set policy->min/max */
Thus the policy min/max values set by the drivers are only temporary.
There is an exception if CPUFREQ_NEED_INITIAL_FREQ_CHECK is set and
cpufreq_policy_online() calls __cpufreq_driver_target() which invokes
cpufreq_driver->target().
To prepare for a subsequent change that will remove all initialization
of policy->min/max in driver .init() callbacks if the min/max value is
equal to the corresponding cpuinfo.min/max_freq, set default
policy->min/max values in the core for all drivers.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com>
[ rjw: Edits of the new comment and changelog ] Link: https://patch.msgid.link/20260528090913.2759118-3-pierre.gondois@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This facilitats subsequent changes that will, in
cpufreq_policy_init_qos():
- Set a default policy->min/max value for all policies.
- Use the policy->min/max values set by drivers as initial request
values for policy frequency QoS requests.
No functional change.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com> Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com>
[ rjw: Changelog edits ] Link: https://patch.msgid.link/20260528090913.2759118-2-pierre.gondois@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Jason Gunthorpe [Thu, 4 Jun 2026 18:03:13 +0000 (15:03 -0300)]
RDMA: During rereg_mr ensure that REREG_ACCESS is compatible
If IB_MR_REREG_ACCESS changes from RO to RW then the umem has to be
re-evaluated to ensure it is properly pinned as RW. Since the umem is
hidden inside each driver's mr struct add a ib_umem_check_rereg() function
that each driver has to call before processing IB_MR_REREG_ACCESS.
mlx4 has to retain its duplicate ib_access_writable check because it
implements IB_MR_REREG_ACCESS | IB_MR_REREG_TRANS by changing both items
in place sequentially while the MR is live, so it will continue to not
support this combination.
Wentao Liang [Mon, 8 Jun 2026 07:11:23 +0000 (07:11 +0000)]
i2c: riic: fix refcount leak in riic_i2c_resume_noirq()
When riic_i2c_resume_noirq() is called, it deasserts the reset
using reset_control_deassert(), which for shared resets increments
a reference count. If pm_runtime_force_resume() then fails, the
function returns without calling reset_control_assert() to
decrement the count. This leaves the reset deasserted and the
reference count unbalanced, which can prevent other users of the
shared reset from properly asserting it later.
Fix the leak by calling reset_control_assert() on the error
handling path for a failed pm_runtime_force_resume().
Yun Zhou [Mon, 8 Jun 2026 08:43:34 +0000 (16:43 +0800)]
gpio: mvebu: fix NULL pointer dereference in suspend/resume
mvebu_pwm_suspend() and mvebu_pwm_resume() are called for all GPIO
banks during suspend/resume, but not all banks have PWM functionality.
GPIO banks without PWM have mvchip->mvpwm set to NULL.
Calling mvebu_pwm_suspend() with mvpwm == NULL causes a NULL pointer
dereference when it tries to access mvpwm->blink_select.
Unable to handle kernel NULL pointer dereference at virtual address 00000020 when write
[00000020] *pgd=00000000
Internal error: Oops: 815 [#1] PREEMPT ARM
Modules linked in:
CPU: 0 UID: 0 PID: 406 Comm: sh Not tainted 6.12.74-rt12-yocto-standard-g4e96f98fb7db-dirty #353
Hardware name: Marvell Armada 370/XP (Device Tree)
PC is at regmap_mmio_read+0x38/0x54
LR is at regmap_mmio_read+0x38/0x54
pc : [<c05fd2ac>] lr : [<c05fd2ac>] psr: 200f0013
sp : f0c11d10 ip : 00000000 fp : c100d2f0
r10: c14fb854 r9 : 00000000 r8 : 00000000
r7 : c1799c00 r6 : 00000020 r5 : 00000020 r4 : c179c7c0
r3 : f0a231a0 r2 : 00000020 r1 : 00000020 r0 : 00000000
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 135ec059 DAC: 00000051
Call trace:
regmap_mmio_read from _regmap_bus_reg_read+0x78/0xac
_regmap_bus_reg_read from _regmap_read+0x60/0x154
_regmap_read from regmap_read+0x3c/0x60
regmap_read from mvebu_gpio_suspend+0xa4/0x14c
mvebu_gpio_suspend from dpm_run_callback+0x54/0x180
dpm_run_callback from device_suspend+0x124/0x630
device_suspend from dpm_suspend+0x124/0x270
dpm_suspend from dpm_suspend_start+0x64/0x6c
dpm_suspend_start from suspend_devices_and_enter+0x140/0x8e8
suspend_devices_and_enter from pm_suspend+0x2fc/0x308
pm_suspend from state_store+0x6c/0xc8
state_store from kernfs_fop_write_iter+0x10c/0x1f8
kernfs_fop_write_iter from vfs_write+0x270/0x468
vfs_write from ksys_write+0x70/0xf0
ksys_write from ret_fast_syscall+0x0/0x54
Add a NULL check for mvchip->mvpwm before calling the PWM
suspend/resume functions.
Linus Torvalds [Mon, 8 Jun 2026 14:31:41 +0000 (07:31 -0700)]
Merge tag 'hyperv-fixes-signed-20260607' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- MSHV driver fixes from various people (Anirudh Rayabharam, Can Peng,
Dexuan Cui, Michael Kelley, Jork Loeser, Wei Liu)
- Hyper-V user space tools fixes (Thorsten Blum)
- Allow VMBus to be unloaded after frame buffer is flushed (Michael
Kelley)
* tag 'hyperv-fixes-signed-20260607' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
mshv: support 1G hugepages by passing them as 2M-aligned chunks
Drivers: hv: vmbus: Improve the logic of reserving fb_mmio on Gen2 VMs
mshv: use kmalloc_array in mshv_root_scheduler_init
mshv: Add conditional VMBus dependency
hyperv: Clean up and fix the guest ID comment in hvgdk.h
drm/hyperv: During panic do VMBus unload after frame buffer is flushed
Drivers: hv: vmbus: Provide option to skip VMBus unload on panic
mshv: unmap debugfs stats pages on kexec
mshv: clean up SynIC state on kexec for L1VH
mshv: limit SynIC management to MSHV-owned resources
hv: utils: replace deprecated strcpy with strscpy in kvp_register
hv: utils: handle and propagate errors in kvp_register
mshv: add a missing padding field