net: rfkill: prevent unlimited numbers of rfkill events from being created
Userspace can create an unlimited number of rfkill events if the system
is so configured, while not consuming them from the rfkill file
descriptor, causing a potential out of memory situation. Prevent this
from bounding the number of pending rfkill events at a "large" number
(i.e. 1000) to prevent abuses like this.
Cc: Johannes Berg <johannes@sipsolutions.net> Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2026033013-disfigure-scroll-e25e@gregkh Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johan Hovold [Fri, 27 Mar 2026 11:32:19 +0000 (12:32 +0100)]
wifi: rt2x00usb: fix devres lifetime
USB drivers bind to USB interfaces and any device managed resources
should have their lifetime tied to the interface rather than parent USB
device. This avoids issues like memory leaks when drivers are unbound
without their devices being physically disconnected (e.g. on probe
deferral or configuration changes).
Fix the USB anchor lifetime so that it is released on driver unbind.
Fixes: 8b4c0009313f ("rt2x00usb: Use usb anchor to manage URB") Cc: stable@vger.kernel.org # 4.7 Cc: Vishal Thanki <vishalthanki@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Stanislaw Gruszka <stf_xl@wp.pl> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260327113219.1313748-1-johan@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pengpeng Hou [Mon, 23 Mar 2026 07:45:51 +0000 (15:45 +0800)]
wifi: brcmfmac: validate bsscfg indices in IF events
brcmf_fweh_handle_if_event() validates the firmware-provided interface
index before it touches drvr->iflist[], but it still uses the raw
bsscfgidx field as an array index without a matching range check.
Reject IF events whose bsscfg index does not fit in drvr->iflist[]
before indexing the interface array.
Shawn Lin [Mon, 30 Mar 2026 09:53:21 +0000 (17:53 +0800)]
gpio: rockchip: convert to dynamic GPIO base allocation
This driver is used on device tree based platform. Use dynamic
GPIO numberspace base to suppress the warning:
gpio gpiochip0: Static allocation of GPIO base is deprecated, use dynamic allocation.
gpio gpiochip1: Static allocation of GPIO base is deprecated, use dynamic allocation.
gpio gpiochip2: Static allocation of GPIO base is deprecated, use dynamic allocation.
gpio gpiochip3: Static allocation of GPIO base is deprecated, use dynamic allocation.
gpio gpiochip4: Static allocation of GPIO base is deprecated, use dynamic allocation.
dev-err-probe is an overengineered solution to a simple problem. Use a
combination of wait_for_probe() and device_is_bound() to synchronously
wait for the platform device to probe.
dev-err-probe is an overengineered solution to a simple problem. Use a
combination of wait_for_probe() and device_is_bound() to synchronously
wait for the platform device to probe.
dev-err-probe is an overengineered solution to a simple problem. Use a
combination of wait_for_probe() and device_is_bound() to synchronously
wait for the platform device to probe.
Aleksa Sarai [Tue, 31 Mar 2026 14:46:21 +0000 (01:46 +1100)]
dcache: permit dynamic_dname()s up to NAME_MAX
dynamic_dname() has had an implicit limit of 64 characters since it was
introduced in commit c23fbb6bcb3e ("VFS: delay the dentry name
generation on sockets and pipes"), however it seems that this was a
fairly arbitrary number (suspiciously it was double the previously
hardcoded buffer size).
NAME_MAX seems like a more reasonable and consistent limit for d_name
lengths. While we're at it, we can also remove the unnecessary
stack-allocated array and just memmove() the formatted string to the end
of the buffer.
It should also be noted that at least one driver (in particular,
liveupdate's usage of anon_inode for session files) already exceeded
this limit without noticing that readlink(/proc/self/fd/$n) always
returns -ENAMETOOLONG, so this fixes those drivers as well.
Fixes: 0153094d03df ("liveupdate: luo_session: add sessions support") Fixes: c23fbb6bcb3e ("VFS: delay the dentry name generation on sockets and pipes") Signed-off-by: Aleksa Sarai <aleksa@amutable.com> Link: https://patch.msgid.link/20260401-dynamic-dname-name_max-v1-1-8ca20ab2642e@amutable.com Tested-by: Luca Boccassi <luca.boccassi@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
This driver provides support for new way of handling platform events,
through the use of GPIO-signaled ACPI events. This mechanism is used on
Intel client platforms released in 2026 and later, starting with Intel
Nova Lake.
Signed-off-by: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260401174526.60881-1-alan.borzeszkowski@linux.intel.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Francesco Lavra [Mon, 30 Mar 2026 16:19:14 +0000 (18:19 +0200)]
pinctrl: mcp23s08: Disable all pin interrupts during probe
A chip being probed may have the interrupt-on-change feature enabled on
some of its pins, for example after a reboot. This can cause the chip to
generate interrupts for pins that don't have a registered nested handler,
which leads to a kernel crash such as below:
This issue has always been present, but has been latent until commit
"f9f4fda15e72" ("pinctrl: mcp23s08: init reg_defaults from HW at probe and
switch cache type"), which correctly removed reg_defaults from the regmap
and as a side effect changed the behavior of the interrupt handler so that
the real value of the MCP_GPINTEN register is now being read from the chip
instead of using a bogus 0 default value; a non-zero value for this
register can trigger the invocation of a nested handler which may not exist
(yet).
Fix this issue by disabling all pin interrupts during initialization.
Fixes: f9f4fda15e72 ("pinctrl: mcp23s08: init reg_defaults from HW at probe and switch cache type") Signed-off-by: Francesco Lavra <flavra@baylibre.com> Signed-off-by: Linus Walleij <linusw@kernel.org>
fs: attr: fix comment formatting and spelling issues
Fix minor comment issues in fs/attr.c reported by checkpatch:
- Wrap long comment lines to comply with the 75-character limit
- Correct spelling of “overriden” to “overridden”
lib/tests/slub_kunit: add a test case for {kmalloc,kfree}_nolock
Testing invocation of {kmalloc,kfree}_nolock() during kmalloc() or
kfree() is tricky, and it is even harder to ensure that slowpaths are
properly tested. Lack of such testing has led to late discovery of
the bug fixed by commit a1e244a9f177 ("mm/slab: use prandom if
!allow_spin").
Add a slub_kunit test that allocates and frees objects in a tight loop
while a perf event triggers interrupts (NMI or hardirq depending on
the arch) on the same task, invoking {kmalloc,kfree}_nolock() from the
overflow handler.
Brian Masney [Fri, 3 Apr 2026 21:12:17 +0000 (17:12 -0400)]
irqchip/irq-pic32-evic: Add __maybe_unused for board_bind_eic_interrupt in COMPILE_TEST
There are a few ifdefs in this driver so that it can be compiled on all
architectures when COMPILE_TEST is set. board_bind_eic_interrupt is
defined in arch/mips/ for normal usage, however when this driver is
compiled with COMPILE_TEST on other architectures, it is defined as a
static variable inside this driver. This causes the following warning:
drivers/irqchip/irq-pic32-evic.c:54:15: warning: variable
'board_bind_eic_interrupt' set but not used [-Wunused-but-set-global]
54 | static void (*board_bind_eic_interrupt)(int irq,
int regset);
| ^
Annotate the static variable with __maybe_unused to avoid having to put
even more ifdefs into this driver.
Fixes: 282f8b547d51d ("irqchip/irq-pic32-evic: Define board_bind_eic_interrupt for !MIPS builds") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Brian Masney <bmasney@redhat.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260403-irq-pic32-evic-unused-v1-1-447cdc0675ec@redhat.com Closes: https://lore.kernel.org/oe-kbuild-all/202603300715.4HuMMAFb-lkp@intel.com/
Hao Li [Fri, 3 Apr 2026 07:37:36 +0000 (15:37 +0800)]
slub: use N_NORMAL_MEMORY in can_free_to_pcs to handle remote frees
Memory hotplug now keeps N_NORMAL_MEMORY up to date correctly, so make
can_free_to_pcs() use it.
As a result, when freeing objects on memoryless nodes, or on nodes that
have memory but only in ZONE_MOVABLE, the objects can be freed to the
sheaf instead of going through the slow path.
Signed-off-by: Hao Li <hao.li@linux.dev> Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Link: https://patch.msgid.link/20260403073958.8722-1-hao.li@linux.dev Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Zhengchuan Liang [Sun, 22 Mar 2026 18:46:08 +0000 (11:46 -0700)]
net: af_key: zero aligned sockaddr tail in PF_KEY exports
PF_KEY export paths use `pfkey_sockaddr_size()` when reserving sockaddr
payload space, so IPv6 addresses occupy 32 bytes on the wire. However,
`pfkey_sockaddr_fill()` initializes only the first 28 bytes of
`struct sockaddr_in6`, leaving the final 4 aligned bytes uninitialized.
Not every PF_KEY message is affected. The state and policy dump builders
already zero the whole message buffer before filling the sockaddr
payloads. Keep the fix to the export paths that still append aligned
sockaddr payloads with plain `skb_put()`:
Eric Biggers [Sun, 5 Apr 2026 01:15:13 +0000 (18:15 -0700)]
xfrm: Drop support for HMAC-RIPEMD-160
Drop support for HMAC-RIPEMD-160 from IPsec to reduce the UAPI surface
and simplify future maintenance. It's almost certainly unused.
RIPEMD-160 received some attention in the early 2000s when SHA-* weren't
quite as well established. But it never received much adoption outside
of certain niches such as Bitcoin.
It's actually unclear that Linux + IPsec + HMAC-RIPEMD-160 has *ever*
been used, even historically. When support for it was added in 2003, it
was done so in a "cleanup" commit without any justification [1]. It
didn't actually work until someone happened to fix it 5 years later [2].
That person didn't use or test it either [3]. Finally, also note that
"hmac(rmd160)" is by far the slowest of the algorithms in aalg_list[].
Of course, today IPsec is usually used with an AEAD, such as AES-GCM.
But even for IPsec users still using a dedicated auth algorithm, they
almost certainly aren't using, and shouldn't use, HMAC-RIPEMD-160.
Thus, let's just drop support for it. Note: no kconfig update is
needed, since CRYPTO_RMD160 wasn't actually being selected anyway.
References:
[1] linux-history commit d462985fc1941a47
("[IPSEC]: Clean up key manager algorithm handling.")
[2] linux commit a13366c632132bb9
("xfrm: xfrm_algo: correct usage of RIPEMD-160")
[3] https://lore.kernel.org/all/1212340578-15574-1-git-send-email-rueegsegger@swiss-it.ch
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Miguel Ojeda [Tue, 7 Apr 2026 08:40:11 +0000 (10:40 +0200)]
Merge patch series "rust: bump minimum Rust and `bindgen` versions"
As proposed in the past in e.g. LPC 2025 and the Maintainers Summit [1],
we are going to follow Debian Stable's Rust versions as our minimum
supported version.
Debian Trixie was released with a Rust 1.85.0 toolchain [2], which it
still uses to this day [3] (i.e. no update to Rust 1.85.1).
Debian Trixie was released with `bindgen` 0.71.1, which it also still
uses to this day [4].
Debian Trixie's release happened on 2025-08-09 [5], which means that a
fair amount of time has passed since its release for kernel developers
to upgrade.
There are a few main parts to the series, in this order:
- A few cleanups that can be performed before the bumps.
- The Rust bump (and its cleanups).
- The `bindgen` bump (and its cleanups).
- Documentation updates.
- The `cfi_encoding` patch, added here, which needs the bump.
- The per-version flags support and a Clippy cleanup on top.
struct xfrm_user_report is a __u8 proto field followed by a struct
xfrm_selector which means there is three "empty" bytes of padding, but
the padding is never zeroed before copying to userspace. Fix that up by
zeroing the structure before setting individual member variables.
Cc: stable <stable@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Assisted-by: gregkh_clanker_t1000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
struct xfrm_usersa_id has a one-byte padding hole after the proto
field, which ends up never getting set to zero before copying out to
userspace. Fix that up by zeroing out the whole structure before
setting individual variables.
Fixes: 3a2dfbe8acb1 ("xfrm: Notify changes in UDP encapsulation via netlink") Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Assisted-by: gregkh_clanker_t1000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The root cause is a double call to xfrm_pol_hold_rcu() in
xfrm_migrate_policy_find(). The lookup function already returns
a policy with held reference, making the second call redundant.
Remove the redundant xfrm_pol_hold_rcu() call to fix the refcount
imbalance and prevent the memory leak.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype") Signed-off-by: Kotlyarov Mihail <mihailkotlyarow@gmail.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
xfrm: hold dev ref until after transport_finish NF_HOOK
After async crypto completes, xfrm_input_resume() calls dev_put()
immediately on re-entry before the skb reaches transport_finish.
The skb->dev pointer is then used inside NF_HOOK and its okfn,
which can race with device teardown.
Remove the dev_put from the async resumption entry and instead
drop the reference after the NF_HOOK call in transport_finish,
using a saved device pointer since NF_HOOK may consume the skb.
This covers NF_DROP, NF_QUEUE and NF_STOLEN paths that skip
the okfn.
For non-transport exits (decaps, gro, drop) and secondary
async return points, release the reference inline when
async is set.
xfrm: Wait for RCU readers during policy netns exit
xfrm_policy_fini() frees the policy_bydst hash tables after flushing the
policy work items and deleting all policies, but it does not wait for
concurrent RCU readers to leave their read-side critical sections first.
The policy_bydst tables are published via rcu_assign_pointer() and are
looked up through rcu_dereference_check(), so netns teardown must also
wait for an RCU grace period before freeing the table memory.
Fix this by adding synchronize_rcu() before freeing the policy hash tables.
Miguel Ojeda [Sun, 5 Apr 2026 23:53:09 +0000 (01:53 +0200)]
rust: kbuild: allow `clippy::precedence` for Rust < 1.86.0
The Clippy `precedence` lint was extended in Rust 1.85.0 to include
bitmasking and shift operations [1]. However, because it generated
many hits, in Rust 1.86.0 it was split into a new `precedence_bits`
lint which is not enabled by default [2].
In other words, only Rust 1.85 has a different behavior. For instance,
it reports:
warning: operator precedence can trip the unwary
--> drivers/gpu/nova-core/fb/hal/ga100.rs:16:5
|
16 | / u64::from(regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::read(bar).adr_39_08()) << FLUSH_SYSMEM_ADDR_SHIFT
17 | | | u64::from(regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI::read(bar).adr_63_40())
18 | | << FLUSH_SYSMEM_ADDR_SHIFT_HI
| |_________________________________________^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#precedence
= note: `-W clippy::precedence` implied by `-W clippy::all`
= help: to override `-W clippy::all` add `#[allow(clippy::precedence)]`
help: consider parenthesizing your expression
|
16 ~ (u64::from(regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::read(bar).adr_39_08()) << FLUSH_SYSMEM_ADDR_SHIFT) | (u64::from(regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI::read(bar).adr_63_40())
17 + << FLUSH_SYSMEM_ADDR_SHIFT_HI)
|
While so far we try our best to keep all versions Clippy-clean, the
minimum (which is now Rust 1.85.0 after the bump) and the latest stable
are the most important ones; and this may be considered a "false positive"
with respect to the behavior in other versions.
Thus allow this lint for this version using the per-version flags
mechanism introduced in the previous commit.
Miguel Ojeda [Sun, 5 Apr 2026 23:53:08 +0000 (01:53 +0200)]
rust: kbuild: support global per-version flags
Sometimes it is useful to gate global Rust flags per compiler version.
For instance, we may want to disable a lint that has false positives in
a single version [1].
We already had helpers like `rustc-min-version` for that, which we use
elsewhere, but we cannot currently use them for `rust_common_flags`,
which contains the global flags for all Rust code (kernel and host),
because `rustc-min-version` depends on `CONFIG_RUSTC_VERSION`, which
does not exist when `rust_common_flags` is defined.
Thus, to support that, introduce `rust_common_flags_per_version`,
defined after the `include/config/auto.conf` inclusion (where
`CONFIG_RUSTC_VERSION` becomes available), and append it to
`rust_common_flags`, `KBUILD_HOSTRUSTFLAGS` and `KBUILD_RUSTFLAGS`.
In addition, move the expansion of `HOSTRUSTFLAGS` to the same place,
so that users can also override per-version flags [2].
Alice Ryhl [Sun, 5 Apr 2026 23:53:07 +0000 (01:53 +0200)]
rust: declare cfi_encoding for lru_status
By default bindgen will convert 'enum lru_status' into a typedef for an
integer. For the most part, an integer of the same size as the enum
results in the correct ABI, but in the specific case of CFI, that is not
the case. The CFI encoding is supposed to be the same as a struct called
'lru_status' rather than the name of the underlying native integer type.
To fix this, tell bindgen to generate a newtype and set the CFI type
explicitly. Note that we need to set the CFI attribute explicitly as
bindgen is using repr(transparent), which is otherwise identical to the
inner type for ABI purposes.
This allows us to remove the page range helper C function in Binder
without risking a CFI failure when list_lru_walk calls the provided
function pointer.
The --with-attribute-custom-enum argument requires bindgen v0.71 or
greater.
[ In particular, the feature was added in 0.71.0 [1][2].
In addition, `feature(cfi_encoding)` has been available since
Rust 1.71.0 [3].
My testing procedure was to add this to the android17-6.18 branch and
verify that rust_shrink_free_page is successfully called without crash,
and verify that it does in fact crash when the cfi_encoding is set to
other values. Note that I couldn't test this on android16-6.12 as that
branch uses a bindgen version that is too old.
Miguel Ojeda [Sun, 5 Apr 2026 23:53:06 +0000 (01:53 +0200)]
docs: rust: general-information: use real example
Currently the example in the documentation shows a version-based name
for the Kconfig example:
RUSTC_VERSION_MIN_107900
The reason behind it was to possibly avoid repetition in case several
features used the same minimum.
However, we ended up preferring to give them a descriptive name for each
feature added even if that could lead to some repetition. In practice,
the repetition has not happened so far, and even if it does at some point,
it is not a big deal.
Thus replace the example in the documentation with one of our current
examples (after removing previous ones from the bump), to show how they
actually look like, and in case someone `grep`s for it.
In addition, it has the advantage that it shows the `RUSTC_HAS_*`
pattern we follow in `init/Kconfig`, similar to the C side.
The versions provided nowadays by even a distribution like Debian Stable
(and Debian Old Stable) are newer than those mentioned [1].
Thus remove the workaround.
Note that the minimum binutils version in the kernel is still 2.30, so
one could argue part of the note is still relevant, but it is unlikely
a kernel developer using such an old binutils is enabling Rust on a
modern kernel, especially when using distribution toolchains, e.g. the
Rust minimum version is not satisfied by Debian Old Stable.
So we are at the point where keeping the docs short and relevant for
essentially everyone is probably the better trade-off.
Miguel Ojeda [Sun, 5 Apr 2026 23:53:01 +0000 (01:53 +0200)]
docs: rust: quick-start: add Ubuntu 26.04 LTS and remove subsection title
Ubuntu 26.04 LTS (Resolute Raccoon) is scheduled to be released in a few
weeks [1], and it has a recent enough Rust toolchain, just like Ubuntu
25.10 has [2][3].
We could update the title and the paragraph, but to simplify and to
make it more consistent with the other distributions' sections, let's
instead just remove that title. It will also reduce the differences
later on to keep it updated. Eventually, when we remove the remaining
subsection for older LTSs, Ubuntu should be a small section like the
other distributions.
Thus remove the title and add the mention of Ubuntu 26.04 LTS.
Now that the minimum supported Rust version is bumped, bump the versioned
Rust packages [1][2][3][4] to that version for Ubuntu in the Quick
Start guide.
In addition, add "may" to the `RUST_LIB_SRC` line since it does not look
like it is needed from a quick test in a Ubuntu 24.04 LTS container.
Miguel Ojeda [Sun, 5 Apr 2026 23:52:56 +0000 (01:52 +0200)]
rust: kbuild: update `bindgen --rust-target` version and replace comment
As the comment in the `Makefile` explains, previously, we needed to
limit ourselves to the list of Rust versions known by `bindgen` for its
`--rust-target` option [1].
In other words, we needed to consult the versions known by the minimum
version of `bindgen` that we supported.
Now that we bumped the minimum version of `bindgen`, that limitation
does not apply anymore since `bindgen` 0.71.0 [2].
Thus replace the comment and simply write our minimum supported Rust
version there, which is much simpler.
See commit 7a5f93ea5862 ("rust: kbuild: set `bindgen`'s Rust target
version") for more details.
Miguel Ojeda [Sun, 5 Apr 2026 23:52:53 +0000 (01:52 +0200)]
rust: bump `bindgen` minimum supported version to 0.71.1 (Debian Trixie)
As proposed in the past in e.g. LPC 2025 and the Maintainers Summit [1],
we are going to follow Debian Stable's `bindgen` versions as our minimum
supported version.
Debian Trixie was released with `bindgen` 0.71.1, which it still uses
to this day [2].
Debian Trixie's release happened on 2025-08-09 [3], which means that a
fair amount of time has passed since its release for kernel developers
to upgrade.
Thus bump the minimum to the new version.
Then, in later commits, clean up most of the workarounds and other bits
that this upgrade of the minimum allows us.
Ubuntu 25.10 also has a recent enough `bindgen` [4] (even the already
unsupported Ubuntu 25.04 had it), and they also provide versioned packages
with `bindgen` 0.71.1 back to Ubuntu 24.04 LTS [5].
Miguel Ojeda [Sun, 5 Apr 2026 23:52:51 +0000 (01:52 +0200)]
rust: macros: simplify code using `feature(extract_if)`
`feature(extract_if)` [1] was stabilized in Rust 1.87.0 [2], and the last
significant change happened in Rust 1.85.0 [3] when the range parameter
was added.
That is, with our new minimum version, we can start using the feature.
Thus simplify the code using the feature and remove the TODO comment.
Miguel Ojeda [Sun, 5 Apr 2026 23:52:46 +0000 (01:52 +0200)]
rust: remove `RUSTC_HAS_COERCE_POINTEE` and simplify code
With the Rust version bump in place, the `RUSTC_HAS_COERCE_POINTEE`
Kconfig (automatic) option is always true.
Thus remove the option and simplify the code.
In particular, this includes removing our use of the predecessor unstable
features we used with Rust < 1.84.0 (`coerce_unsized`, `dispatch_from_dyn`
and `unsize`).
Miguel Ojeda [Sun, 5 Apr 2026 23:52:45 +0000 (01:52 +0200)]
rust: remove `RUSTC_HAS_SLICE_AS_FLATTENED` and simplify code
With the Rust version bump in place, the `RUSTC_HAS_SLICE_AS_FLATTENED`
Kconfig (automatic) option is always true.
Thus remove the option and simplify the code.
In particular, this includes removing the `slice` module which contained
the temporary slice helpers, i.e. the `AsFlattened` extension trait and
its `impl`s.
Miguel Ojeda [Sun, 5 Apr 2026 23:52:44 +0000 (01:52 +0200)]
rust: simplify `RUSTC_VERSION` Kconfig conditions
With the Rust version bump in place, several Kconfig conditions based on
`RUSTC_VERSION` are always true.
Thus simplify them.
The minimum supported major LLVM version by our new Rust minimum version
is now LLVM 18, instead of LLVM 16. However, there are no possible
cleanups for `RUSTC_LLVM_VERSION`.
Miguel Ojeda [Sun, 5 Apr 2026 23:52:43 +0000 (01:52 +0200)]
rust: allow globally `clippy::incompatible_msrv`
`clippy::incompatible_msrv` is not buying us much, and we discussed
allowing it several times in the past.
For instance, there was recently another patch sent to `allow` it where
needed [1]. While that particular case would not be needed after the
minimum version bump to 1.85.0, it is simpler to just allow it to prevent
future instances.
[ In addition, the lint fired without taking into account the features
that have been enabled in a crate [2]. While this was improved in Rust
1.90.0 [3], it would still fire in a case like this patch. ]
Thus do so, and remove the last instance of locally allowing it we have
in the tree (except the one in the vendored `proc_macro2` crate).
Note that we still keep the `msrv` config option in `clippy.toml` since
that affects other lints as well.
Miguel Ojeda [Sun, 5 Apr 2026 23:52:41 +0000 (01:52 +0200)]
rust: bump Rust minimum supported version to 1.85.0 (Debian Trixie)
As proposed in the past in e.g. LPC 2025 and the Maintainers Summit [1],
we are going to follow Debian Stable's Rust versions as our minimum
supported version.
Debian Trixie was released with a Rust 1.85.0 toolchain [2], which it
still uses to this day [3] (i.e. no update to Rust 1.85.1).
Debian Trixie's release happened on 2025-08-09 [4], which means that a
fair amount of time has passed since its release for kernel developers
to upgrade.
Thus bump the minimum to the new version.
Then, in later commits, clean up most of the workarounds and other bits
that this upgrade of the minimum allows us.
pin-init was left as-is since the patches come from upstream. And the
vendored crates are unmodified, since we do not want to change those.
Note that the minimum LLVM major version for Rust 1.85.0 is LLVM 18 (the
Rust upstream binaries use LLVM 19.1.7), thus e.g. `RUSTC_LLVM_VERSION`
tests can also be updated, but there are no suitable ones to simplify.
Ubuntu 25.10 also has a recent enough Rust toolchain [5], and they also
provide versioned packages with a Rust 1.85.1 toolchain even back to
Ubuntu 22.04 LTS [6].
Miguel Ojeda [Sun, 5 Apr 2026 23:52:39 +0000 (01:52 +0200)]
rust: kbuild: remove unneeded old `allow`s for generated layout tests
The issue that required `allow`s for `cfg(test)` code generated by
`bindgen` for layout testing was fixed back in `bindgen` 0.60.0 [1],
so it could have been removed even before the version bump, but it does
not hurt.
Commit 8cf5b3f83614 ("Revert "kbuild, rust: use -fremap-path-prefix
to make paths relative"") removed `--remap-path-prefix` from the build
system, so the workarounds are not needed anymore.
Thus remove them.
Note that the flag has landed again in parallel in this cycle in
commit dda135077ecc ("rust: build: remap path to avoid absolute path"),
together with `--remap-path-scope=macro` [1]. However, they are gated on
`rustc-option-yn, --remap-path-scope=macro`, which means they are both
only passed starting with Rust 1.95.0 [2]:
`--remap-path-scope` is only stable in Rust 1.95, so use `rustc-option`
to detect its presence. This feature has been available as
`-Zremap-path-scope` for all versions that we support; however due to
bugs in the Rust compiler, it does not work reliably until 1.94. I opted
to not enable it for 1.94 as it's just a single version that we missed.
In turn, that means the workarounds removed here should not be needed
again (even with the flag added again above), since:
- `rustdoc` now recognizes the `--remap-path-prefix` flag since Rust
1.81.0 [3] (even if it is still an unstable feature [4]).
- The Internal Compiler Error [5] that the comment mentions was fixed in
Rust 1.87.0 [6]. We tested that was the case in a previous version
of this series by making the workaround conditional [7][8].
...which are both older versions than Rust 1.95.0.
We will still need to skip `--remap-path-scope` for `rustdoc` though,
since `rustdoc` does not support that one yet [4].
Jouni Högander [Fri, 27 Mar 2026 11:45:53 +0000 (13:45 +0200)]
drm/i915/psr: Do not use pipe_src as borders for SU area
This far using crtc_state->pipe_src as borders for Selective Update area
haven't caused visible problems as drm_rect_width(crtc_state->pipe_src) ==
crtc_state->hw.adjusted_mode.crtc_hdisplay and
drm_rect_height(crtc_state->pipe_src) ==
crtc_state->hw.adjusted_mode.crtc_vdisplay when pipe scaling is not
used. On the other hand using pipe scaling is forcing full frame updates and all the
Selective Update area calculations are skipped. Now this improper usage of
crtc_state->pipe_src is causing following warnings:
"drm/i915/dsc: Add helper for writing DSC Selective Update ET parameters"
These warnings are seen when DSC and pipe scaling are enabled
simultaneously. This is because on full frame update SU area is improperly
set as pipe_src which is not aligned with DSC slice height.
Fix these by creating local rectangle using
crtc_state->hw.adjusted_mode.crtc_hdisplay and
crtc_state->hw.adjusted_mode.crtc_vdisplay. Use this local rectangle as
borders for SU area.
Arthur Husband [Mon, 6 Apr 2026 22:23:35 +0000 (15:23 -0700)]
ata: ahci: force 32-bit DMA for JMicron JMB582/JMB585
The JMicron JMB585 (and JMB582) SATA controllers advertise 64-bit DMA
support via the S64A bit in the AHCI CAP register, but their 64-bit DMA
implementation is defective. Under sustained I/O, DMA transfers targeting
addresses above 4GB silently corrupt data -- writes land at incorrect
memory addresses with no errors logged.
The failure pattern is similar to the ASMedia ASM1061
(commit 20730e9b2778 ("ahci: add 43-bit DMA address quirk for ASMedia
ASM1061 controllers")), which also falsely advertised full 64-bit DMA
support. However, the JMB585 requires a stricter 32-bit DMA mask rather
than 43-bit, as corruption occurs with any address above 4GB.
On the Minisforum N5 Pro specifically, the combination of the JMB585's
broken 64-bit DMA with the AMD Family 1Ah (Strix Point) IOMMU causes
silent data corruption that is only detectable via checksumming
filesystems (BTRFS/ZFS scrub). The corruption occurs when 32-bit IOVA
space is exhausted and the kernel transparently switches to 64-bit DMA
addresses.
Add device-specific PCI ID entries for the JMB582 (0x0582) and JMB585
(0x0585) before the generic JMicron class match, using a new board type
that combines AHCI_HFLAG_IGN_IRQ_IF_ERR (preserving existing behavior)
with AHCI_HFLAG_32BIT_ONLY to force 32-bit DMA masks.
Signed-off-by: Arthur Husband <artmoty@gmail.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
selftests/nolibc: don't skip tests for unimplemented syscalls anymore
The automatic skipping of tests on ENOSYS returns was introduced in
commit 349afc8a52f8 ("selftests/nolibc: skip tests for unimplemented
syscalls"). It handled the fact that nolibc would return ENOSYS for many
syscall wrappers on riscv32.
Nowadays nolibc handles all these correctly, so this logic is not used
anymore. To make missing nolibc functionality more obvious fail the
tests again if something is not implemented.
selftests/nolibc: explicitly handle ENOSYS from ptrace()
The automatic ENOSYS handling in EXPECT_SYSER() is about to be removed.
ptrace() will return legitimately return ENOSYS on qemu-user, so handle
it explicitly.
The standard syscall() function or macro uses the libc return value
convention. Errors returned from the kernel as negative values are
stored in errno and -1 is returned. Users who want to avoid using
errno don't have a way to call raw syscalls and check the returned
error.
Add a new macro _syscall() which works like the standard syscall()
but passes through the return value from the kernel unchanged.
The naming scheme and return values match the named _sys_foo()
system call wrappers already part of nolibc.
tools/nolibc: move the call to __sysret() into syscall()
__sysret() transforms the return value from the kernel into the libc
return value convention. There is no reason for it to be called in the
middle of the internals of the syscall() implementation macros.
Move the call up, directly into syscall(), to make the code simpler.
tools/nolibc: rename the internal macros used in syscall()
These macros are the internal implementation of syscall().
They can not be used by users. Align them with the standard naming
scheme for internal symbols.
The current name also prevents the addition of an application-usable
_syscall() symbol.
to_ratio() computes BW_SHIFT-scaled bandwidth ratios from u64 period and
runtime values, but it returns unsigned long. tg_rt_schedulable() also
stores the current group limit and the accumulated child sum in unsigned
long.
On 32-bit builds, large bandwidth ratios can be truncated and the RT
group sum can wrap when enough siblings are present. That can let an
overcommitted RT hierarchy pass the schedulability check, and it also
narrows the helper result for other callers.
Return u64 from to_ratio() and use u64 for the RT group totals so
bandwidth ratios are preserved and compared at full width on both 32-bit
and 64-bit builds.
Fixes: b40b2e8eb521 ("sched: rt: multi level group constraints") Assisted-by: Codex:GPT-5 Signed-off-by: Joseph Salisbury <joseph.salisbury@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260403210014.2713404-1-joseph.salisbury@oracle.com
When retry_aligned_read() encounters an overlapped stripe, it releases
the stripe via raid5_release_stripe() which puts it on the lockless
released_stripes llist. In the next raid5d loop iteration,
release_stripe_list() drains the stripe onto handle_list (since
STRIPE_HANDLE is set by the original IO), but retry_aligned_read()
runs before handle_active_stripes() and removes the stripe from
handle_list via find_get_stripe() -> list_del_init(). This prevents
handle_stripe() from ever processing the stripe to resolve the
overlap, causing an infinite loop and soft lockup.
Fix this by using __release_stripe() with temp_inactive_list instead
of raid5_release_stripe() in the failure path, so the stripe does not
go through the released_stripes llist. This allows raid5d to break out
of its loop, and the overlap will be resolved when the stripe is
eventually processed by handle_stripe().
Zide Chen [Fri, 13 Mar 2026 17:40:50 +0000 (10:40 -0700)]
perf/x86/intel/uncore: Remove extra double quote mark
The third argument in INTEL_UNCORE_FR_EVENT_DESC() is subject to
__stringify(), and the extra double quote marks can result in the
expansion "3.814697266e-6" in the sysfs knobs, instead of
3.814697266e-6.
This is incorrect, though it may still work for perf, e.g.
perf stat -e uncore_iio_free_running_0/bw_in_port0/
Fixes: d8987048f665 ("perf/x86/intel/uncore: Support IIO free-running counters on DMR") Closes: https://lore.kernel.org/all/20251231224233.113839-1-zide.chen@intel.com/ Reported-by: Chun-Tse Shao <ctshao@google.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chun-Tse Shao <ctshao@google.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://patch.msgid.link/20260313174050.171704-5-zide.chen@intel.com
Zide Chen [Fri, 13 Mar 2026 17:40:49 +0000 (10:40 -0700)]
perf/x86/intel/uncore: Fix die ID init and look up bugs
In snbep_pci2phy_map_init(), in the nr_node_ids > 8 path,
uncore_device_to_die() may return -1 when all CPUs associated
with the UBOX device are offline.
Remove the WARN_ON_ONCE(die_id == -1) check for two reasons:
- The current code breaks out of the loop. This is incorrect because
pci_get_device() does not guarantee iteration in domain or bus order,
so additional UBOX devices may be skipped during the scan.
- Returning -EINVAL is incorrect, since marking offline buses with
die_id == -1 is expected and should not be treated as an error.
Separately, when NUMA is disabled on a NUMA-capable platform,
pcibus_to_node() returns NUMA_NO_NODE, causing uncore_device_to_die()
to return -1 for all PCI devices. As a result,
spr_update_device_location(), used on Intel SPR and EMR, ignores the
corresponding PMON units and does not add them to the RB tree.
Fix this by using uncore_pcibus_to_dieid(), which retrieves topology
from the UBOX GIDNIDMAP register and works regardless of whether NUMA
is enabled in Linux. This requires snbep_pci2phy_map_init() to be
added in spr_uncore_pci_init().
Keep uncore_device_to_die() only for the nr_node_ids > 8 case, where
NUMA is expected to be enabled.
Fixes: 9a7832ce3d92 ("perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info") Fixes: 65248a9a9ee1 ("perf/x86/uncore: Add a quirk for UPI on SPR") Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Steve Wahl <steve.wahl@hpe.com> Link: https://patch.msgid.link/20260313174050.171704-4-zide.chen@intel.com
Zide Chen [Fri, 13 Mar 2026 17:40:48 +0000 (10:40 -0700)]
perf/x86/intel/uncore: Skip discovery table for offline dies
This warning can be triggered if NUMA is disabled and the system
boots with fewer CPUs than the number of CPUs in die 0.
WARNING: CPU: 9 PID: 7257 at uncore.c:1157 uncore_pci_pmu_register+0x136/0x160 [intel_uncore]
Currently, the discovery table continues to be parsed even if all CPUs
in the associated die are offline. This can lead to an array overflow
at "pmu->boxes[die] = box" in uncore_pci_pmu_register(), which may
trigger the warning above or cause other issues.
Fixes: edae1f06c2cd ("perf/x86/intel/uncore: Parse uncore discovery tables") Reported-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Steve Wahl <steve.wahl@hpe.com> Link: https://patch.msgid.link/20260313174050.171704-3-zide.chen@intel.com
Zide Chen [Fri, 13 Mar 2026 17:40:47 +0000 (10:40 -0700)]
perf/x86/intel/uncore: Fix iounmap() leak on global_init failure
Kernel test robot reported:
Unverified Error/Warning (likely false positive, kindly check if
interested):
arch/x86/events/intel/uncore_discovery.c:293:2-8:
ERROR: missing iounmap; ioremap on line 288 and execution via
conditional on line 292
If domain->global_init() fails in __parse_discovery_table(), the
ioremap'ed MMIO region is not released before returning, resulting
in an MMIO mapping leak.
Fixes: b575fc0e3357 ("perf/x86/intel/uncore: Add domain global init callback") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://patch.msgid.link/20260313174050.171704-2-zide.chen@intel.com
Yu Kuai [Fri, 27 Mar 2026 14:07:29 +0000 (22:07 +0800)]
md: wake raid456 reshape waiters before suspend
During raid456 reshape, direct IO across the reshape position can sleep
in raid5_make_request() waiting for reshape progress while still
holding an active_io reference. If userspace then freezes reshape and
writes md/suspend_lo or md/suspend_hi, mddev_suspend() kills active_io
and waits for all in-flight IO to drain.
This can deadlock: the IO needs reshape progress to continue, but the
reshape thread is already frozen, so the active_io reference is never
dropped and suspend never completes.
raid5_prepare_suspend() already wakes wait_for_reshape for dm-raid. Do
the same for normal md suspend when reshape is already interrupted, so
waiting raid456 IO can abort, drop its reference, and let suspend
finish.
The mdadm test tests/25raid456-reshape-deadlock reproduces the hang.
Xiao Ni [Tue, 24 Mar 2026 07:24:54 +0000 (15:24 +0800)]
md/raid1: serialize overlap io for writemostly disk
Previously, using wait_event() would wake up all waiters simultaneously,
and they would compete for the tree lock. The bio which gets the lock
first will be handled, so the write sequence cannot be guaranteed.
For example:
bio1(100,200)
bio2(150,200)
bio3(150,300)
The write sequence of fast device is bio1,bio2,bio3. But the write sequence
of slow device could be bio1,bio3,bio2 due to lock competition. This causes
data corruption.
Replace waitqueue with a fifo list to guarantee the write sequence. And it
also needs to iterate the list when removing one entry. If not, it may miss
the opportunity to wake up the waiting io.
For example:
bio1(1,3), bio2(2,4)
bio3(5,7), bio4(6,8)
These four bios are in the same bucket. bio1 and bio3 are inserted into
the rbtree. bio2 and bio4 are added to the waiting list and bio2 is the
first one. bio3 returns from slow disk and tries to wake up the waiting
bios. bio2 is removed from the list and will be handled. But bio1 hasn't
finished. So bio2 will be added into waiting list again. Then bio1 returns
from slow disk and wakes up waiting bios. bio4 is removed from the list
and will be handled. Now bio1, bio3 and bio4 all finish and bio2 is left
on the waiting list. So it needs to iterate the waiting list to wake up
the right bio.
Yu Kuai [Mon, 23 Mar 2026 05:46:44 +0000 (13:46 +0800)]
md/md-llbitmap: optimize initial sync with write_zeroes_unmap support
For RAID-456 arrays with llbitmap, if all underlying disks support
write_zeroes with unmap, issue write_zeroes to zero all disk data
regions and initialize the bitmap to BitCleanUnwritten instead of
BitUnwritten.
This optimization skips the initial XOR parity building because:
1. write_zeroes with unmap guarantees zeroed reads after the operation
2. For RAID-456, when all data is zero, parity is automatically
consistent (0 XOR 0 XOR ... = 0)
3. BitCleanUnwritten indicates parity is valid but no user data
has been written
The implementation adds two helper functions:
- llbitmap_all_disks_support_wzeroes_unmap(): Checks if all active
disks support write_zeroes with unmap
- llbitmap_zero_all_disks(): Issues blkdev_issue_zeroout() to each
rdev's data region to zero all disks
The zeroing and bitmap state setting happens in llbitmap_init_state()
during bitmap initialization. If any disk fails to zero, we fall back
to BitUnwritten and normal lazy recovery.
This significantly reduces array initialization time for RAID-456
arrays built on modern NVMe SSDs or other devices that support
write_zeroes with unmap.
Yu Kuai [Mon, 23 Mar 2026 05:46:43 +0000 (13:46 +0800)]
md/md-llbitmap: add CleanUnwritten state for RAID-5 proactive parity building
Add new states to the llbitmap state machine to support proactive XOR
parity building for RAID-5 arrays. This allows users to pre-build parity
data for unwritten regions before any user data is written.
New states added:
- BitNeedSyncUnwritten: Transitional state when proactive sync is triggered
via sysfs on Unwritten regions.
- BitSyncingUnwritten: Proactive sync in progress for unwritten region.
- BitCleanUnwritten: XOR parity has been pre-built, but no user data
written yet. When user writes to this region, it transitions to BitDirty.
New actions added:
- BitmapActionProactiveSync: Trigger for proactive XOR parity building.
- BitmapActionClearUnwritten: Convert CleanUnwritten/NeedSyncUnwritten/
SyncingUnwritten states back to Unwritten before recovery starts.
State flows:
- Current (lazy): Unwritten -> (write) -> NeedSync -> (sync) -> Dirty -> Clean
- New (proactive): Unwritten -> (sysfs) -> NeedSyncUnwritten -> (sync) -> CleanUnwritten
- On write to CleanUnwritten: CleanUnwritten -> (write) -> Dirty -> Clean
- On disk replacement: CleanUnwritten regions are converted to Unwritten
before recovery starts, so recovery only rebuilds regions with user data
A new sysfs interface is added at /sys/block/mdX/md/llbitmap/proactive_sync
(write-only) to trigger proactive sync. This only works for RAID-456 arrays.
Yu Kuai [Mon, 23 Mar 2026 05:46:42 +0000 (13:46 +0800)]
md: add fallback to correct bitmap_ops on version mismatch
If default bitmap version and on-disk version doesn't match, and mdadm
is not the latest version to set bitmap_type, set bitmap_ops based on
the disk version.
Junrui Luo [Sat, 4 Apr 2026 07:44:35 +0000 (15:44 +0800)]
md/raid5: validate payload size before accessing journal metadata
r5c_recovery_analyze_meta_block() and
r5l_recovery_verify_data_checksum_for_mb() iterate over payloads in a
journal metadata block using on-disk payload size fields without
validating them against the remaining space in the metadata block.
A corrupted journal contains payload sizes extending beyond the PAGE_SIZE
boundary can cause out-of-bounds reads when accessing payload fields or
computing offsets.
Add bounds validation for each payload type to ensure the full payload
fits within meta_size before processing.
Gregory Price [Sun, 8 Mar 2026 23:42:02 +0000 (19:42 -0400)]
md/raid0: use kvzalloc/kvfree for strip_zone and devlist allocations
syzbot reported a WARNING at mm/page_alloc.c:__alloc_frozen_pages_noprof()
triggered by create_strip_zones() in the RAID0 driver.
When raid_disks is large, the allocation size exceeds MAX_PAGE_ORDER (4MB
on x86), causing WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER).
Convert the strip_zone and devlist allocations from kzalloc/kzalloc_objs to
kvzalloc/kvzalloc_objs, which first attempts a contiguous allocation with
__GFP_NOWARN and then falls back to vmalloc for large sizes. Convert the
corresponding kfree calls to kvfree.
Both arrays are pure metadata lookup tables (arrays of pointers and zone
descriptors) accessed only via indexing, so they do not require physically
contiguous memory.
erofs: handle 48-bit blocks/uniaddr for extra devices
erofs_init_device() only reads blocks_lo and uniaddr_lo from the
on-disk device slot, ignoring blocks_hi and uniaddr_hi that were
introduced alongside the 48-bit block addressing feature.
For the primary device (dif0), erofs_read_superblock() already handles
this correctly by combining blocks_lo with blocks_hi when 48-bit
layout is enabled. But the same logic was not applied to extra
devices.
With a 48-bit EROFS image using extra devices whose uniaddr or blocks
exceed 32-bit range, the truncated values cause erofs_map_dev() to
compute wrong physical addresses, leading to silent data corruption.
Fix this by reading blocks_hi and uniaddr_hi in erofs_init_device()
when 48-bit layout is enabled, consistent with the primary device
handling. Also fix the erofs_deviceslot on-disk definition where
blocks_hi was incorrectly declared as __le32 instead of __le16.
DRBD used a custom mechanism to mark netlink attributes as "mandatory":
bit 14 of nla_type was repurposed as DRBD_GENLA_F_MANDATORY. Attributes
sent from userspace that had this bit present and that were unknown
to the kernel would lead to an error.
Since commit ef6243acb478 ("genetlink: optionally validate strictly/dumps"),
the generic netlink layer rejects unknown top-level attributes when
strict validation is enabled. DRBD never opted out of strict
validation, so unknown top-level attributes are already rejected by
the netlink core.
The mandatory flag mechanism was required for nested attributes, because
these are parsed liberally, silently dropping attributes unknown to the
kernel.
This prepares for the move to a new YNL-based family, which will use the
now-default strict parsing.
The current family is not expected to gain any new attributes, which
makes this change safe.
Old userspace that still sets bit 14 is unaffected: nla_type()
strips it before __nla_validate_parse() performs attribute validation,
so the bit never reaches DRBD.
Remove all references to the mandatory flag in DRBD.
selftests: mptcp: join: recreate signal endp with same ID
In this "delete re-add signal" MPTCP Join subtest, the endpoint linked
to the initial subflow is removed, but readded once with different ID.
It appears that there was an issue when reusing the same ID, recently
fixed by commit d191101dee25 ("mptcp: pm: in-kernel: always set ID as
avail when rm endp"). The test then now reuses the same ID the first
time, but continue to use another one (88) the second time.
Factor out a new helper tcp_recv_should_stop() from tcp_recvmsg_locked()
and tcp_splice_read() to check whether to stop receiving. And use this
helper in mptcp_recvmsg() and mptcp_splice_read() to reduce redundant code.
Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-3-b0b33bea3fed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gang Yan [Fri, 3 Apr 2026 11:29:28 +0000 (13:29 +0200)]
mptcp: preserve MSG_EOR semantics in sendmsg path
Extend MPTCP's sendmsg handling to recognize and honor the MSG_EOR flag,
which marks the end of a record for application-level message boundaries.
Data fragments tagged with MSG_EOR are explicitly marked in the
mptcp_data_frag structure and skb context to prevent unintended
coalescing with subsequent data chunks. This ensures the intent of
applications using MSG_EOR is preserved across MPTCP subflows,
maintaining consistent message segmentation behavior.
Gang Yan [Fri, 3 Apr 2026 11:29:27 +0000 (13:29 +0200)]
mptcp: reduce 'overhead' from u16 to u8
The 'overhead' in struct mptcp_data_frag can safely use u8, as it
represents 'alignment + sizeof(mptcp_data_frag)'. With a maximum
alignment of 7('ALIGN(1, sizeof(long)) - 1'), the overhead is at most
47, well below U8_MAX and validated with BUILD_BUG_ON().
This patch also adds a field named 'unused' for further extensions.
dpaa2: avoid linking objects into multiple modules
Each object file contains information about which module it gets linked
into, so linking the same file into multiple modules now causes a warning:
scripts/Makefile.build:254: drivers/net/ethernet/freescale/dpaa2/Makefile: dpaa2-mac.o is added to multiple modules: fsl-dpaa2-eth fsl-dpaa2-switch
scripts/Makefile.build:254: drivers/net/ethernet/freescale/dpaa2/Makefile: dpmac.o is added to multiple modules: fsl-dpaa2-eth fsl-dpaa2-switch
Change the way that dpaa2 is built by moving the two common files into a
separate module with exported symbols instead.
net: ethernet: ti-cpsw: fix linking built-in code to modules
There are six variants of the cpsw driver, sharing various parts of
the code: davinci-emac, cpsw, cpsw-switchdev, netcp, netcp_ethss and
am65-cpsw-nuss.
I noticed that this means some files can be linked into more than
one loadable module, or even part of vmlinux but also linked into
a loadable module, both of which mess up assumptions of the build
system, and causes warnings:
scripts/Makefile.build:279: cpsw_ale.o is added to multiple modules: ti-am65-cpsw-nuss ti_cpsw ti_cpsw_new
scripts/Makefile.build:279: cpsw_priv.o is added to multiple modules: ti_cpsw ti_cpsw_new
scripts/Makefile.build:279: cpsw_sl.o is added to multiple modules: ti-am65-cpsw-nuss ti_cpsw ti_cpsw_new
scripts/Makefile.build:279: cpsw_ethtool.o is added to multiple modules: ti_cpsw ti_cpsw_new
scripts/Makefile.build:279: davinci_cpdma.o is added to multiple modules: ti_cpsw ti_cpsw_new ti_davinci_emac
Change this back to having separate modules for each portion that
can be linked standalone, exporting symbols as needed:
- ti-cpsw-common.ko now contains both cpsw-common.o and
davinci_cpdma.o as they are always used together
- ti-cpsw-priv.ko contains cpsw_priv.o, cpsw_sl.o and cpsw_ethtool.o,
which are the core of the cpsw and cpsw-new drivers.
- ti-cpsw-sl.ko contains the cpsw-sl.o object and is used on
ti-am65-cpsw-nuss.ko in addition to the two other cpsw variants.
- ti-cpsw-ale.o is the one standalone module that is used by all
except davinci_emac.
Each of these will be built-in if any of its users are built-in, otherwise
it's a loadable module if there is at least one module using it. I did
not bring back the separate Kconfig symbols for this, but just handle
it using Makefile logic.
Note: ideally this is something that Kbuild complains about, but usually
we just notice when something using THIS_MODULE misbehaves in a way that
a user notices.
net: ethernet: ti-cpsw:: rename soft_reset() function
While looking at the glob symbols shared between the cpsw drivers,
I noticed that soft_reset() is the only one that is missing a proper
namespace prefix, and will pollute the kernel namespace, so rename
it to be consistent with the other symbols.
Jakub Kicinski [Fri, 3 Apr 2026 22:05:01 +0000 (15:05 -0700)]
eth: remove the driver for acenic / tigon1&2
The entire git history for this driver looks like tree-wide
and automated cleanups. There's even more coming now with
AI, so let's try to delete it instead.