Always set nf_flow_route tuple out ifindex even if the indev is not one
of the flowtable configured devices since otherwise the outdev lookup in
nf_flow_offload_ip_hook() or nf_flow_offload_ipv6_hook() for
FLOW_OFFLOAD_XMIT_NEIGH flowtable entries will fail.
The above issue occurs in the following configuration since IP6IP6
tunnel does not support flowtable acceleration yet:
$ip addr show
5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:11:22:33:22:55 brd ff:ff:ff:ff:ff:ff link-netns ns1
inet6 2001:db8:1::2/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::211:22ff:fe33:2255/64 scope link tentative proto kernel_ll
valid_lft forever preferred_lft forever
6: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:22:22:33:22:55 brd ff:ff:ff:ff:ff:ff link-netns ns3
inet6 2001:db8:2::1/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::222:22ff:fe33:2255/64 scope link tentative proto kernel_ll
valid_lft forever preferred_lft forever
7: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN group default qlen 1000
link/tunnel6 2001:db8:2::1 peer 2001:db8:2::2 permaddr a85:e732:2c37::
inet6 2002:db8:1::1/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::885:e7ff:fe32:2c37/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
$ip -6 route show
2001:db8:1::/64 dev eth0 proto kernel metric 256 pref medium
2001:db8:2::/64 dev eth1 proto kernel metric 256 pref medium
2002:db8:1::/64 dev tun0 proto kernel metric 256 pref medium
default via 2002:db8:1::2 dev tun0 metric 1024 pref medium
Slavin Liu [Fri, 21 Nov 2025 08:52:13 +0000 (16:52 +0800)]
ipvs: fix ipv4 null-ptr-deref in route error path
The IPv4 code path in __ip_vs_get_out_rt() calls dst_link_failure()
without ensuring skb->dev is set, leading to a NULL pointer dereference
in fib_compute_spec_dst() when ipv4_link_failure() attempts to send
ICMP destination unreachable messages.
The issue emerged after commit ed0de45a1008 ("ipv4: recompile ip options
in ipv4_link_failure") started calling __ip_options_compile() from
ipv4_link_failure(). This code path eventually calls fib_compute_spec_dst()
which dereferences skb->dev. An attempt was made to fix the NULL skb->dev
dereference in commit 0113d9c9d1cc ("ipv4: fix null-deref in
ipv4_link_failure"), but it only addressed the immediate dev_net(skb->dev)
dereference by using a fallback device. The fix was incomplete because
fib_compute_spec_dst() later in the call chain still accesses skb->dev
directly, which remains NULL when IPVS calls dst_link_failure().
The crash occurs when:
1. IPVS processes a packet in NAT mode with a misconfigured destination
2. Route lookup fails in __ip_vs_get_out_rt() before establishing a route
3. The error path calls dst_link_failure(skb) with skb->dev == NULL
4. ipv4_link_failure() → ipv4_send_dest_unreach() →
__ip_options_compile() → fib_compute_spec_dst()
5. fib_compute_spec_dst() dereferences NULL skb->dev
Apply the same fix used for IPv6 in commit 326bf17ea5d4 ("ipvs: fix
ipv6 route unreach panic"): set skb->dev from skb_dst(skb)->dev before
calling dst_link_failure().
netfilter: nf_conncount: fix leaked ct in error paths
There are some situations where ct might be leaked as error paths are
skipping the refcounted check and return immediately. In order to solve
it make sure that the check is always called.
Fixes: be102eb6a0e7 ("netfilter: nf_conncount: rework API to use sk_buff directly") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
====================
inet: frags: flush pending skbs in fqdir_pre_exit()
Fix the issue reported by NIPA starting on Sep 18th [1], where
pernet_ops_rwsem is constantly held by a reader, preventing writers
from grabbing it (specifically driver modules from loading).
The fact that reports started around that time seems coincidental.
The issue seems to be skbs queued for defrag preventing conntrack
from exiting.
First patch fixes another theoretical issue, it's mostly a leftover
from an attempt to get rid of the inet_frag_queue refcnt, which
I gave up on (still think it's doable but a bit of a time sink).
Second patch is a minor refactor.
The real fix is in the third patch. It's the simplest fix I can
think of which is to flush the frag queues. Perhaps someone has
a better suggestion?
Last patch adds an explicit warning for conntrack getting stuck,
as this seems like something that can easily happen if bugs sneak in.
The warning will hopefully save us the first 20% of the investigation
effort.
Jakub Kicinski [Sun, 7 Dec 2025 01:09:42 +0000 (17:09 -0800)]
netfilter: conntrack: warn when cleanup is stuck
nf_conntrack_cleanup_net_list() calls schedule() so it does not
show up as a hung task. Add an explicit check to make debugging
leaked skbs/conntack references more obvious.
Jakub Kicinski [Sun, 7 Dec 2025 01:09:41 +0000 (17:09 -0800)]
inet: frags: flush pending skbs in fqdir_pre_exit()
We have been seeing occasional deadlocks on pernet_ops_rwsem since
September in NIPA. The stuck task was usually modprobe (often loading
a driver like ipvlan), trying to take the lock as a Writer.
lockdep does not track readers for rwsems so the read wasn't obvious
from the reports.
On closer inspection the Reader holding the lock was conntrack looping
forever in nf_conntrack_cleanup_net_list(). Based on past experience
with occasional NIPA crashes I looked thru the tests which run before
the crash and noticed that the crash follows ip_defrag.sh. An immediate
red flag. Scouring thru (de)fragmentation queues reveals skbs sitting
around, holding conntrack references.
The problem is that since conntrack depends on nf_defrag_ipv6,
nf_defrag_ipv6 will load first. Since nf_defrag_ipv6 loads first its
netns exit hooks run _after_ conntrack's netns exit hook.
Flush all fragment queue SKBs during fqdir_pre_exit() to release
conntrack references before conntrack cleanup runs. Also flush
the queues in timer expiry handlers when they discover fqdir->dead
is set, in case packet sneaks in while we're running the pre_exit
flush.
The commit under Fixes is not exactly the culprit, but I think
previously the timer firing would eventually unblock the spinning
conntrack.
Jakub Kicinski [Sun, 7 Dec 2025 01:09:40 +0000 (17:09 -0800)]
inet: frags: add inet_frag_queue_flush()
Instead of exporting inet_frag_rbtree_purge() which requires that
caller takes care of memory accounting, add a new helper. We will
need to call it from a few places in the next patch.
Jakub Kicinski [Sun, 7 Dec 2025 01:09:39 +0000 (17:09 -0800)]
inet: frags: avoid theoretical race in ip_frag_reinit()
In ip_frag_reinit() we want to move the frag timeout timer into
the future. If the timer fires in the meantime we inadvertently
scheduled it again, and since the timer assumes a ref on frag_queue
we need to acquire one to balance things out.
This is technically racy, we should have acquired the reference
_before_ we touch the timer, it may fire again before we take the ref.
Avoid this entire dance by using mod_timer_pending() which only modifies
the timer if its pending (and which exists since Linux v2.6.30)
Note that this was the only place we ever took a ref on frag_queue
since Eric's conversion to RCU. So we could potentially replace
the whole refcnt field with an atomic flag and a bit more RCU.
Alexey Simakov [Fri, 5 Dec 2025 15:58:16 +0000 (18:58 +0300)]
broadcom: b44: prevent uninitialized value usage
On execution path with raised B44_FLAG_EXTERNAL_PHY, b44_readphy()
leaves bmcr value uninitialized and it is used later in the code.
Add check of this flag at the beginning of the b44_nway_reset() and
exit early of the function with restarting autonegotiation if an
external PHY is used.
Fixes: 753f492093da ("[B44]: port to native ssb support") Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alexey Simakov <bigalex934@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20251205155815.4348-1-bigalex934@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
caoping [Thu, 4 Dec 2025 09:10:58 +0000 (01:10 -0800)]
net/handshake: restore destructor on submit failure
handshake_req_submit() replaces sk->sk_destruct but never restores it when
submission fails before the request is hashed. handshake_sk_destruct() then
returns early and the original destructor never runs, leaking the socket.
Restore sk_destruct on the error path.
Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests") Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: caoping <caoping@cmss.chinamobile.com> Link: https://patch.msgid.link/20251204091058.1545151-1-caoping@cmss.chinamobile.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The outermost OVS_ACTION_ATTR_PUSH_NSH attribute is OK'ed by the
nla_for_each_nested() inside __ovs_nla_copy_actions(). The innermost
OVS_NSH_KEY_ATTR_BASE/MD1/MD2 are OK'ed by the nla_for_each_nested()
inside nsh_key_put_from_nlattr(). But nothing checks if the attribute
in the middle is OK. We don't even check that this attribute is the
OVS_KEY_ATTR_NSH. We just do a double unwrap with a pair of nla_data()
calls - first time directly while calling validate_push_nsh() and the
second time as part of the nla_for_each_nested() macro, which isn't
safe, potentially causing invalid memory access if the size of this
attribute is incorrect. The failure may not be noticed during
validation due to larger netlink buffer, but cause trouble later during
action execution where the buffer is allocated exactly to the size:
BUG: KASAN: slab-out-of-bounds in nsh_hdr_from_nlattr+0x1dd/0x6a0 [openvswitch]
Read of size 184 at addr ffff88816459a634 by task a.out/22624
Let's add some checks that the attribute is properly sized and it's
the only one attribute inside the action. Technically, there is no
real reason for OVS_KEY_ATTR_NSH to be there, as we know that we're
pushing an NSH header already, it just creates extra nesting, but
that's how uAPI works today. So, keeping as it is.
Paolo Abeni [Fri, 5 Dec 2025 18:55:17 +0000 (19:55 +0100)]
mptcp: avoid deadlock on fallback while reinjecting
Jakub reported an MPTCP deadlock at fallback time:
WARNING: possible recursive locking detected
6.18.0-rc7-virtme #1 Not tainted
--------------------------------------------
mptcp_connect/20858 is trying to acquire lock: ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_try_fallback+0xd8/0x280
but task is already holding lock: ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
other info that might help us debug this:
Possible unsafe locking scenario:
The packet scheduler could attempt a reinjection after receiving an
MP_FAIL and before the infinite map has been transmitted, causing a
deadlock since MPTCP needs to do the reinjection atomically from WRT
fallback.
Address the issue explicitly avoiding the reinjection in the critical
scenario. Note that this is the only fallback critical section that
could potentially send packets and hit the double-lock.
Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/mptcp-dbg/results/412720/1-mptcp-join-sh/stderr Fixes: f8a1d9b18c5e ("mptcp: make fallback action and fallback decision atomic") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-4-9e4781a6c1b8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Fri, 5 Dec 2025 18:55:16 +0000 (19:55 +0100)]
mptcp: schedule rtx timer only after pushing data
The MPTCP protocol usually schedule the retransmission timer only
when there is some chances for such retransmissions to happen.
With a notable exception: __mptcp_push_pending() currently schedule
such timer unconditionally, potentially leading to unnecessary rtx
timer expiration.
The issue is present since the blamed commit below but become easily
reproducible after commit 27b0e701d387 ("mptcp: drop bogus optimization
in __mptcp_check_push()")
Fixes: 33d41c9cd74c ("mptcp: more accurate timeout") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-3-9e4781a6c1b8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
selftests: mptcp: pm: ensure unknown flags are ignored
This validates the previous commit: the userspace can set unknown flags
-- the 7th bit is currently unused -- without errors, but only the
supported ones are printed in the endpoints dumps.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Before this patch, the kernel was saving any flags set by the userspace,
even unknown ones. This doesn't cause critical issues because the kernel
is only looking at specific ones. But on the other hand, endpoints dumps
could tell the userspace some recent flags seem to be supported on older
kernel versions.
Instead, ignore all unknown flags when parsing them. By doing that, the
userspace can continue to set unsupported flags, but it has a way to
verify what is supported by the kernel.
Note that it sounds better to continue accepting unsupported flags not
to change the behaviour, but also that eases things on the userspace
side by adding "optional" endpoint types only supported by newer kernel
versions without having to deal with the different kernel versions.
A note for the backports: there will be conflicts in mptcp.h on older
versions not having the mentioned flags, the new line should still be
added last, and the '5' needs to be adapted to have the same value as
the last entry.
Jakub Kicinski [Sun, 7 Dec 2025 00:47:40 +0000 (16:47 -0800)]
ynl: add regen hint to new headers
Recent commit 68e83f347266 ("tools: ynl-gen: add regeneration comment")
added a hint how to regenerate the code to the headers. Update
the new headers from this release cycle to also include it.
Eric Biggers [Thu, 4 Dec 2025 05:44:17 +0000 (21:44 -0800)]
mptcp: select CRYPTO_LIB_UTILS instead of CRYPTO
Since the only crypto functions used by the mptcp code are the SHA-256
library functions and crypto_memneq(), select only the options needed
for those: CRYPTO_LIB_SHA256 and CRYPTO_LIB_UTILS.
Previously, CRYPTO was selected instead of CRYPTO_LIB_UTILS. That does
pull in CRYPTO_LIB_UTILS as well, but it's unnecessarily broad.
Years ago, the CRYPTO_LIB_* options were visible only when CRYPTO. That
may be another reason why CRYPTO is selected here. However, that was
fixed years ago, and the libraries can now be selected directly.
Mateusz Guzik [Wed, 3 Dec 2025 10:01:22 +0000 (11:01 +0100)]
af_unix: annotate unix_gc_lock with __cacheline_aligned_in_smp
Otherwise the lock is susceptible to ever-changing false-sharing due to
unrelated changes. This in particular popped up here where an unrelated
change improved performance:
https://lore.kernel.org/oe-lkp/202511281306.51105b46-lkp@intel.com/
Stabilize it with an explicit annotation which also has a side effect
of furher improving scalability:
> in our oiginal report, 284922f4c5 has a 6.1% performance improvement comparing
> to parent 17d85f33a8.
> we applied your patch directly upon 284922f4c5. as below, now by
> "284922f4c5 + your patch"
> we observe a 12.8% performance improvements (still comparing to 17d85f33a8).
Note nothing was done for the other fields, so some fluctuation is still
possible.
Tested-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251203100122.291550-1-mjguzik@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Wed, 3 Dec 2025 00:30:24 +0000 (16:30 -0800)]
bnxt_en: Fix XDP_TX path
For XDP_TX action in bnxt_rx_xdp(), clearing of the event flags is not
correct. __bnxt_poll_work() -> bnxt_rx_pkt() -> bnxt_rx_xdp() may be
looping within NAPI and some event flags may be set in earlier
iterations. In particular, if BNXT_TX_EVENT is set earlier indicating
some XDP_TX packets are ready and pending, it will be cleared if it is
XDP_TX action again. Normally, we will set BNXT_TX_EVENT again when we
successfully call __bnxt_xmit_xdp(). But if the TX ring has no more
room, the flag will not be set. This will cause the TX producer to be
ahead but the driver will not hit the TX doorbell.
For multi-buf XDP_TX, there is no need to clear the event flags and set
BNXT_AGG_EVENT. The BNXT_AGG_EVENT flag should have been set earlier in
bnxt_rx_pkt().
The visible symptom of this is that the RX ring associated with the
TX XDP ring will eventually become empty and all packets will be dropped.
Because this condition will cause the driver to not refill the RX ring
seeing that the TX ring has forever pending XDP_TX packets.
The fix is to only clear BNXT_RX_EVENT when we have successfully
called __bnxt_xmit_xdp().
Fixes: 7f0a168b0441 ("bnxt_en: Add completion ring pointer in TX and RX ring structures") Reported-by: Pavel Dubovitsky <pdubovitsky@meta.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251203003024.2246699-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tim Hostetler [Tue, 2 Dec 2025 20:02:07 +0000 (20:02 +0000)]
gve: Move gve_init_clock to after AQ CONFIGURE_DEVICE_RESOURCES call
commit 46e7860ef941 ("gve: Move ptp_schedule_worker to gve_init_clock")
moved the first invocation of the AQ command REPORT_NIC_TIMESTAMP to
gve_probe(). However, gve_init_clock() invoking REPORT_NIC_TIMESTAMP is
not valid until after gve_probe() invokes the AQ command
CONFIGURE_DEVICE_RESOURCES.
Failure to do so results in the following error:
gve 0000:00:07.0: failed to read NIC clock -11
This was missed earlier because the driver under test was loaded at
runtime instead of boot-time. The boot-time driver had already
initialized the device, causing the runtime driver to successfully call
gve_init_clock() incorrectly.
Fixes: 46e7860ef941 ("gve: Move ptp_schedule_worker to gve_init_clock") Reviewed-by: Ankit Garg <nktgrg@google.com> Signed-off-by: Tim Hostetler <thostet@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251202200207.1434749-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
René Rebe [Tue, 2 Dec 2025 18:41:37 +0000 (19:41 +0100)]
r8169: fix RTL8117 Wake-on-Lan in DASH mode
Wake-on-Lan does currently not work for r8169 in DASH mode, e.g. the
ASUS Pro WS X570-ACE with RTL8168fp/RTL8117.
Fix by not returning early in rtl_prepare_power_down when dash_enabled.
While this fixes WoL, it still kills the OOB RTL8117 remote management
BMC connection. Fix by not calling rtl8168_driver_stop if WoL is enabled.
Jakub Kicinski [Fri, 5 Dec 2025 01:53:52 +0000 (17:53 -0800)]
Merge branch 'mlxsw-three-m-router-fixes'
Petr Machata says:
====================
mlxsw: Three (m)router fixes
This patchset contains two fixes in mlxsw Spectrum router code, and one for
the Spectrum multicast router code. Please see the individual patches for
more details.
====================
Ido Schimmel [Tue, 2 Dec 2025 17:44:13 +0000 (18:44 +0100)]
mlxsw: spectrum_mr: Fix use-after-free when updating multicast route stats
Cited commit added a dedicated mutex (instead of RTNL) to protect the
multicast route list, so that it will not change while the driver
periodically traverses it in order to update the kernel about multicast
route stats that were queried from the device.
One instance of list entry deletion (during route replace) was missed
and it can result in a use-after-free [1].
Fix by acquiring the mutex before deleting the entry from the list and
releasing it afterwards.
[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum]
Read of size 8 at addr ffff8881523c2fa8 by task kworker/2:5/22043
Fixes: f38656d06725 ("mlxsw: spectrum_mr: Protect multicast route list with a lock") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/f996feecfd59fde297964bfc85040b6d83ec6089.1764695650.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We sometimes observe use-after-free when dereferencing a neighbour [1].
The problem seems to be that the driver stores a pointer to the
neighbour, but without holding a reference on it. A reference is only
taken when the neighbour is used by a nexthop.
Fix by simplifying the reference counting scheme. Always take a
reference when storing a neighbour pointer in a neighbour entry. Avoid
taking a referencing when the neighbour is used by a nexthop as the
neighbour entry associated with the nexthop already holds a reference.
Tested by running the test that uncovered the problem over 300 times.
Without this patch the problem was reproduced after a handful of
iterations.
[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_neigh_entry_update+0x2d4/0x310
Read of size 8 at addr ffff88817f8e3420 by task ip/3929
Last potentially related work creation:
kasan_save_stack+0x30/0x50
kasan_record_aux_stack+0x8c/0xa0
kvfree_call_rcu+0x93/0x5b0
mlxsw_sp_router_neigh_event_work+0x67d/0x860
process_one_work+0xd57/0x1390
worker_thread+0x4d6/0xd40
kthread+0x355/0x5b0
ret_from_fork+0x1d4/0x270
ret_from_fork_asm+0x11/0x20
Fixes: 6cf3c971dc84 ("mlxsw: spectrum_router: Add private neigh table") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/92d75e21d95d163a41b5cea67a15cd33f547cba6.1764695650.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Tue, 2 Dec 2025 17:44:11 +0000 (18:44 +0100)]
mlxsw: spectrum_router: Fix possible neighbour reference count leak
mlxsw_sp_router_schedule_work() takes a reference on a neighbour,
expecting a work item to release it later on. However, we might fail to
schedule the work item, in which case the neighbour reference count will
be leaked.
Fix by taking the reference just before scheduling the work item. Note
that mlxsw_sp_router_schedule_work() can receive a NULL neighbour
pointer, but neigh_clone() handles that correctly.
Spotted during code review, did not actually observe the reference count
leak.
Fixes: 151b89f6025a ("mlxsw: spectrum_router: Reuse work neighbor initialization in work scheduler") Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/ec2934ae4aca187a8d8c9329a08ce93cca411378.1764695650.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thorsten Blum [Tue, 2 Dec 2025 17:27:44 +0000 (18:27 +0100)]
net: phy: marvell-88q2xxx: Fix clamped value in mv88q2xxx_hwmon_write
The local variable 'val' was never clamped to -75000 or 180000 because
the return value of clamp_val() was not used. Fix this by assigning the
clamped value back to 'val', and use clamp() instead of clamp_val().
Cc: stable@vger.kernel.org Fixes: a557a92e6881 ("net: phy: marvell-88q2xxx: add support for temperature sensor") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Dimitri Fedrau <dima.fedrau@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251202172743.453055-3-thorsten.blum@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gerd Bayer [Tue, 2 Dec 2025 11:12:57 +0000 (12:12 +0100)]
net/mlx5: Fix double unregister of HCA_PORTS component
Clear hca_devcom_comp in device's private data after unregistering it in
LAG teardown. Otherwise a slightly lagging second pass through
mlx5_unload_one() might try to unregister it again and trip over
use-after-free.
On s390 almost all PCI level recovery events trigger two passes through
mxl5_unload_one() - one through the poll_health() method and one through
mlx5_pci_err_detected() as callback from generic PCI error recovery.
While testing PCI error recovery paths with more kernel debug features
enabled, this issue reproducibly led to kernel panics with the following
call chain:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI
Fault in home space mode while using kernel ASCE.
AS:00000000705c4007 R3:0000000000000024
Oops: 0038 ilc:3 [#1]SMP
ipvlan: Ignore PACKET_LOOPBACK in handle_mode_l2()
Packets with pkt_type == PACKET_LOOPBACK are captured by
handle_frame() function, but they don't have L2 header.
We should not process them in handle_mode_l2().
This doesn't affect old L2 functionality, since handling
was anyway incorrect.
Handle them the same way as in br_handle_frame():
just pass the skb.
To observe invalid behaviour, just start "ping -b" on bcast address
of port-interface.
Daniel Golle [Tue, 2 Dec 2025 09:57:21 +0000 (09:57 +0000)]
net: dsa: mxl-gsw1xx: fix SerDes RX polarity
According to MaxLinear engineer Benny Weng the RX lane of the SerDes
port of the GSW1xx switches is inverted in hardware, and the
SGMII_PHY_RX0_CFG2_INVERT bit is set by default in order to compensate
for that. Hence also set the SGMII_PHY_RX0_CFG2_INVERT bit by default in
gsw1xx_pcs_reset().
Ivan Galkin [Tue, 2 Dec 2025 09:07:42 +0000 (10:07 +0100)]
net: phy: RTL8211FVD: Restore disabling of PHY-mode EEE
When support for RTL8211F(D)(I)-VD-CG was introduced in commit bb726b753f75 ("net: phy: realtek: add support for RTL8211F(D)(I)-VD-CG")
the implementation assumed that this PHY model doesn't have the
control register PHYCR2 (Page 0xa43 Address 0x19). This
assumption was based on the differences in CLKOUT configurations
between RTL8211FVD and the remaining RTL8211F PHYs. In the latter
commit 2c67301584f2
("net: phy: realtek: Avoid PHYCR2 access if PHYCR2 not present")
this assumption was expanded to the PHY-mode EEE.
I performed tests on RTL8211FI-VD-CG and confirmed that disabling
PHY-mode EEE works correctly and is uniform with other PHYs
supported by the driver. To validate the correctness,
I contacted Realtek support. Realtek confirmed that PHY-mode EEE on
RTL8211F(D)(I)-VD-CG is configured via Page 0xa43 Address 0x19 bit 5.
Moreover, Realtek informed me that the most recent datasheet
for RTL8211F(D)(I)-VD-CG v1.1 is incomplete and the naming of
control registers is partly inconsistent. The errata I
received from Realtek corrects the naming as follows:
This information confirms that the supposedly missing control register,
PHYCR2, exists in the RTL8211F(D)(I)-VD-CG under the same address and
the same name. It controls widely the same configs as other PHYs from
the RTL8211F series (e.g. PHY-mode EEE). Clock out configuration is an
exception.
Given all this information, restore disabling of the PHY-mode EEE.
Fixes: 2c67301584f2 ("net: phy: realtek: Avoid PHYCR2 access if PHYCR2 not present") Signed-off-by: Ivan Galkin <ivan.galkin@axis.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251202-phy_eee-v1-1-fe0bf6ab3df0@axis.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Moshe Shemesh [Mon, 1 Dec 2025 15:13:27 +0000 (17:13 +0200)]
net/mlx5: make enable_mpesw idempotent
The enable_mpesw() function returns -EINVAL if ldev->mode is not
MLX5_LAG_MODE_NONE. This means attempting to enable MPESW mode when it's
already enabled will fail. In contrast, disable_mpesw() properly checks
if the mode is MLX5_LAG_MODE_MPESW before proceeding, making it
naturally idempotent and safe to call multiple times.
Fix enable_mpesw() to return success if mpesw is already enabled.
Fixes: a32327a3a02c ("net/mlx5: Lag, Control MultiPort E-Switch single FDB mode") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1764602008-1334866-2-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jamal Hadi Salim [Fri, 28 Nov 2025 15:19:19 +0000 (10:19 -0500)]
net/sched: ets: Always remove class from active list before deleting in ets_qdisc_change
zdi-disclosures@trendmicro.com says:
The vulnerability is a race condition between `ets_qdisc_dequeue` and
`ets_qdisc_change`. It leads to UAF on `struct Qdisc` object.
Attacker requires the capability to create new user and network namespace
in order to trigger the bug.
See my additional commentary at the end of the analysis.
// (1) this lock is preventing .change handler (`ets_qdisc_change`)
//to race with .dequeue handler (`ets_qdisc_dequeue`)
sch_tree_lock(sch);
for (i = nbands; i < oldbands; i++) {
if (i >= q->nstrict && q->classes[i].qdisc->q.qlen)
list_del_init(&q->classes[i].alist);
qdisc_purge_queue(q->classes[i].qdisc);
}
WRITE_ONCE(q->nbands, nbands);
for (i = nstrict; i < q->nstrict; i++) {
if (q->classes[i].qdisc->q.qlen) {
// (2) the class is added to the q->active
list_add_tail(&q->classes[i].alist, &q->active);
q->classes[i].deficit = quanta[i];
}
}
WRITE_ONCE(q->nstrict, nstrict);
memcpy(q->prio2band, priomap, sizeof(priomap));
for (i = 0; i < q->nbands; i++)
WRITE_ONCE(q->classes[i].quantum, quanta[i]);
for (i = oldbands; i < q->nbands; i++) {
q->classes[i].qdisc = queues[i];
if (q->classes[i].qdisc != &noop_qdisc)
qdisc_hash_add(q->classes[i].qdisc, true);
}
// (3) the qdisc is unlocked, now dequeue can be called in parallel
// to the rest of .change handler
sch_tree_unlock(sch);
ets_offload_change(sch);
for (i = q->nbands; i < oldbands; i++) {
// (4) we're reducing the refcount for our class's qdisc and
// freeing it
qdisc_put(q->classes[i].qdisc);
// (5) If we call .dequeue between (4) and (5), we will have
// a strong UAF and we can control RIP
q->classes[i].qdisc = NULL;
WRITE_ONCE(q->classes[i].quantum, 0);
q->classes[i].deficit = 0;
gnet_stats_basic_sync_init(&q->classes[i].bstats);
memset(&q->classes[i].qstats, 0, sizeof(q->classes[i].qstats));
}
return 0;
}
Comment:
This happens because some of the classes have their qdiscs assigned to
NULL, but remain in the active list. This commit fixes this issue by always
removing the class from the active list before deleting and freeing its
associated qdisc
Reproducer Steps
(trimmed version of what was sent by zdi-disclosures@trendmicro.com)
1. Maher Azzouzi working with Trend Micro Zero Day Initiative was reported as
the person who found the issue. I requested to get a proper email to add to the
reported-by tag but got no response. For this reason i will credit the person
i exchanged emails with i.e zdi-disclosures@trendmicro.com
2. Neither i nor Victor who did a much more thorough testing was able to
reproduce a UAF with the PoC or other approaches we tried. We were both able to
reproduce a null ptr deref. After exchange with zdi-disclosures@trendmicro.com
they sent a small change to be made to the code to add an extra delay which
was able to simulate the UAF. i.e, this:
qdisc_put(q->classes[i].qdisc);
mdelay(90);
q->classes[i].qdisc = NULL;
I was informed by Thomas Gleixner(tglx@linutronix.de) that adding delays was
acceptable approach for demonstrating the bug, quote:
"Adding such delays is common exploit validation practice"
The equivalent delay could happen "by virt scheduling the vCPU out, SMIs,
NMIs, PREEMPT_RT enabled kernel"
3. I asked the OP to test and report back but got no response and after a
few days gave up and proceeded to submit this fix.
Fixes: de6d25924c2a ("net/sched: sch_ets: don't peek at classes beyond 'nbands'") Reported-by: zdi-disclosures@trendmicro.com Tested-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/20251128151919.576920-1-jhs@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shaurya Rane [Sat, 29 Nov 2025 09:37:18 +0000 (15:07 +0530)]
net/hsr: fix NULL pointer dereference in prp_get_untagged_frame()
prp_get_untagged_frame() calls __pskb_copy() to create frame->skb_std
but doesn't check if the allocation failed. If __pskb_copy() returns
NULL, skb_clone() is called with a NULL pointer, causing a crash:
Wang Liang [Sat, 29 Nov 2025 04:13:15 +0000 (12:13 +0800)]
netrom: Fix memory leak in nr_sendmsg()
syzbot reported a memory leak [1].
When function sock_alloc_send_skb() return NULL in nr_output(), the
original skb is not freed, which was allocated in nr_sendmsg(). Fix this
by freeing it before return.
Wei Fang [Fri, 28 Nov 2025 02:59:15 +0000 (10:59 +0800)]
net: fec: ERR007885 Workaround for XDP TX path
The ERR007885 will lead to a TDAR race condition for mutliQ when the
driver sets TDAR and the UDMA clears TDAR simultaneously or in a small
window (2-4 cycles). And it will cause the udma_tx and udma_tx_arbiter
state machines to hang. Therefore, the commit 53bb20d1faba ("net: fec:
add variable reg_desc_active to speed things up") and the commit a179aad12bad ("net: fec: ERR007885 Workaround for conventional TX") have
added the workaround to fix the potential issue for the conventional TX
path. Similarly, the XDP TX path should also have the potential hang
issue, so add the workaround for XDP TX path.
Linus Torvalds [Thu, 4 Dec 2025 01:24:33 +0000 (17:24 -0800)]
Merge tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Replace busylock at the Tx queuing layer with a lockless list.
Resulting in a 300% (4x) improvement on heavy TX workloads, sending
twice the number of packets per second, for half the cpu cycles.
- Allow constantly busy flows to migrate to a more suitable CPU/NIC
queue.
Normally we perform queue re-selection when flow comes out of idle,
but under extreme circumstances the flows may be constantly busy.
Add sysctl to allow periodic rehashing even if it'd risk packet
reordering.
- Optimize the NAPI skb cache, make it larger, use it in more paths.
- Attempt returning Tx skbs to the originating CPU (like we already
did for Rx skbs).
- Various data structure layout and prefetch optimizations from Eric.
- Remove ktime_get() from the recvmsg() fast path, ktime_get() is
sadly quite expensive on recent AMD machines.
- Extend threaded NAPI polling to allow the kthread busy poll for
packets.
- Make MPTCP use Rx backlog processing. This lowers the lock
pressure, improving the Rx performance.
- Support memcg accounting of MPTCP socket memory.
- Allow admin to opt sockets out of global protocol memory accounting
(using a sysctl or BPF-based policy). The global limits are a poor
fit for modern container workloads, where limits are imposed using
cgroups.
- Improve heuristics for when to kick off AF_UNIX garbage collection.
- Allow users to control TCP SACK compression, and default to 33% of
RTT.
- Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
unnecessarily aggressive rcvbuf growth and overshot when the
connection RTT is low.
- Preserve skb metadata space across skb_push / skb_pull operations.
- Support for IPIP encapsulation in the nftables flowtable offload.
- Support appending IP interface information to ICMP messages (RFC
5837).
- Support setting max record size in TLS (RFC 8449).
- Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
- Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
- Let users configure the number of write buffers in SMC.
- Add new struct sockaddr_unsized for sockaddr of unknown length,
from Kees.
- Some conversions away from the crypto_ahash API, from Eric Biggers.
- Some preparations for slimming down struct page.
- YAML Netlink protocol spec for WireGuard.
- Add a tool on top of YAML Netlink specs/lib for reporting commonly
computed derived statistics and summarized system state.
Driver API:
- Add CAN XL support to the CAN Netlink interface.
- Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
defined by the OPEN Alliance's "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" specification.
- Add DPLL phase-adjust-gran pin attribute (and implement it in
zl3073x).
- Refactor xfrm_input lock to reduce contention when NIC offloads
IPsec and performs RSS.
- Add info to devlink params whether the current setting is the
default or a user override. Allow resetting back to default.
- Add standard device stats for PSP crypto offload.
- Leverage DSA frame broadcast to implement simple HSR frame
duplication for a lot of switches without dedicated HSR offload.
- Convert drivers to support dedicated ops for timestamping control,
and away from the direct IOCTL handling. While at it support GET
operations for PHY timestamping.
- Add (and convert most drivers to) a dedicated ethtool callback for
reading the Rx ring count.
- Significant refactoring efforts in the STMMAC driver, which
supports Synopsys turn-key MAC IP integrated into a ton of SoCs.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support PPS in/out on all pins
- Intel (100G, ice, idpf):
- ice: implement standard ethtool and timestamping stats
- i40e: support setting the max number of MAC addresses per VF
- iavf: support RSS of GTP tunnels for 5G and LTE deployments
- nVidia/Mellanox (mlx5):
- reduce downtime on interface reconfiguration
- disable being an XDP redirect target by default (same as
other drivers) to avoid wasting resources if feature is
unused
- Meta (fbnic):
- add support for Linux-managed PCS on 25G, 50G, and 100G links
- Wangxun:
- support Rx descriptor merge, and Tx head writeback
- support Rx coalescing offload
- support 25G SPF and 40G QSFP modules
- Ethernet virtual:
- Google (gve):
- allow ethtool to configure rx_buf_len
- implement XDP HW RX Timestamping support for DQ descriptor
format
- Microsoft vNIC (mana):
- support HW link state events
- handle hardware recovery events when probing the device
- Ethernet NICs consumer, and embedded:
- usbnet: add support for Byte Queue Limits (BQL)
- AMD (amd-xgbe):
- add device selftests
- NXP (enetc):
- add i.MX94 support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- bcmasp: add support for PHY-based Wake-on-LAN
- Broadcom switches (b53):
- support port isolation
- support BCM5389/97/98 and BCM63XX ARL formats
- Lantiq/MaxLinear switches:
- support bridge FDB entries on the CPU port
- use regmap for register access
- allow user to enable/disable learning
- support Energy Efficient Ethernet
- support configuring RMII clock delays
- add tagging driver for MaxLinear GSW1xx switches
- Synopsys (stmmac):
- support using the HW clock in free running mode
- add Eswin EIC7700 support
- add Rockchip RK3506 support
- add Altera Agilex5 support
- Cadence (macb):
- cleanup and consolidate descriptor and DMA address handling
- add EyeQ5 support
- TI:
- icssg-prueth: support AF_XDP
- Airoha access points:
- add missing Ethernet stats and link state callback
- add AN7583 support
- support out-of-order Tx completion processing
- Power over Ethernet:
- pd692x0: preserve PSE configuration across reboots
- add support for TPS23881B devices
- Ethernet PHYs:
- Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
- Support 50G SerDes and 100G interfaces in Linux-managed PHYs
- micrel:
- support for non PTP SKUs of lan8814
- enable in-band auto-negotiation on lan8814
- realtek:
- cable testing support on RTL8224
- interrupt support on RTL8221B
- motorcomm: support for PHY LEDs on YT853
- microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
- mscc: support for PHY LED control
- CAN drivers:
- m_can: add support for optional reset and system wake up
- remove can_change_mtu() obsoleted by core handling
- mcp251xfd: support GPIO controller functionality
- Bluetooth:
- add initial support for PASTa
- WiFi:
- split ieee80211.h file, it's way too big
- improvements in VHT radiotap reporting, S1G, Channel Switch
Announcement handling, rate tracking in mesh networks
- improve multi-radio monitor mode support, and add a cfg80211
debugfs interface for it
- HT action frame handling on 6 GHz
- initial chanctx work towards NAN
- MU-MIMO sniffer improvements
- WiFi drivers:
- RealTek (rtw89):
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- Intel:
- iwlwifi: new sniffer API support
- MediaTek (mt76):
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- Qualcomm/Atheros:
- ath10k: factory test support
- ath11k: TX power insertion support
- ath12k: BSS color change support
- ath12k: statistics improvements
- brcmfmac: Acer A1 840 tablet quirk
- rtl8xxxu: 40 MHz connection fixes/support"
* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
net: page_pool: sanitise allocation order
net: page pool: xa init with destroy on pp init
net/mlx5e: Support XDP target xmit with dummy program
net/mlx5e: Update XDP features in switch channels
selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
wireguard: netlink: generate netlink code
wireguard: uapi: generate header with ynl-gen
wireguard: uapi: move flag enums
wireguard: uapi: move enum wg_cmd
wireguard: netlink: add YNL specification
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
selftests: drv-net: introduce Iperf3Runner for measurement use cases
selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
Documentation: net: dsa: mention simple HSR offload helpers
Documentation: net: dsa: mention availability of RedBox
...
Linus Torvalds [Thu, 4 Dec 2025 00:54:54 +0000 (16:54 -0800)]
Merge tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Convert selftests/bpf/test_tc_edt and test_tc_tunnel from .sh to
test_progs runner (Alexis Lothoré)
- Convert selftests/bpf/test_xsk to test_progs runner (Bastien
Curutchet)
- Replace bpf memory allocator with kmalloc_nolock() in
bpf_local_storage (Amery Hung), and in bpf streams and range tree
(Puranjay Mohan)
- Introduce support for indirect jumps in BPF verifier and x86 JIT
(Anton Protopopov) and arm64 JIT (Puranjay Mohan)
- Remove runqslower bpf tool (Hoyeon Lee)
- Fix corner cases in the verifier to close several syzbot reports
(Eduard Zingerman, KaFai Wan)
- Several improvements in deadlock detection in rqspinlock (Kumar
Kartikeya Dwivedi)
- Implement "jmp" mode for BPF trampoline and corresponding
DYNAMIC_FTRACE_WITH_JMP. It improves "fexit" program type performance
from 80 M/s to 136 M/s. With Steven's Ack. (Menglong Dong)
- Add ability to test non-linear skbs in BPF_PROG_TEST_RUN (Paul
Chaignon)
- Do not let BPF_PROG_TEST_RUN emit invalid GSO types to stack (Daniel
Borkmann)
- Generalize buildid reader into bpf_dynptr (Mykyta Yatsenko)
- Optimize bpf_map_update_elem() for map-in-map types (Ritesh
Oedayrajsingh Varma)
- Introduce overwrite mode for BPF ring buffer (Xu Kuohai)
* tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (169 commits)
bpf: optimize bpf_map_update_elem() for map-in-map types
bpf: make kprobe_multi_link_prog_run always_inline
selftests/bpf: do not hardcode target rate in test_tc_edt BPF program
selftests/bpf: remove test_tc_edt.sh
selftests/bpf: integrate test_tc_edt into test_progs
selftests/bpf: rename test_tc_edt.bpf.c section to expose program type
selftests/bpf: Add success stats to rqspinlock stress test
rqspinlock: Precede non-head waiter queueing with AA check
rqspinlock: Disable spinning for trylock fallback
rqspinlock: Use trylock fallback when per-CPU rqnode is busy
rqspinlock: Perform AA checks immediately
rqspinlock: Enclose lock/unlock within lock entry acquisitions
bpf: Remove runqslower tool
selftests/bpf: Remove usage of lsm/file_alloc_security in selftest
bpf: Disable file_alloc_security hook
bpf: check for insn arrays in check_ptr_alignment
bpf: force BPF_F_RDONLY_PROG on insn array creation
bpf: Fix exclusive map memory leak
selftests/bpf: Make CS length configurable for rqspinlock stress test
selftests/bpf: Add lock wait time stats to rqspinlock stress test
...
Linus Torvalds [Wed, 3 Dec 2025 23:50:11 +0000 (15:50 -0800)]
Merge tag 'linux_kselftest-kunit-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- Make filter parameters configurable via Kconfig
- Add description of kunit.enable parameter to documentation
* tag 'linux_kselftest-kunit-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Make filter parameters configurable via Kconfig
Documentation: kunit: add description of kunit.enable parameter
Linus Torvalds [Wed, 3 Dec 2025 23:08:18 +0000 (15:08 -0800)]
Merge tag 'linux_kselftest-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
- Add basic test for trace_marker_raw file to tracing selftest
- Fix invalid array access in printf dma_map_benchmark selftest
- Add tprobe enable/disable testcase to tracing selftest
- Update fprobe selftest for ftrace based fprobe
* tag 'linux_kselftest-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: tracing: Update fprobe selftest for ftrace based fprobe
selftests: tracing: Add tprobe enable/disable testcase
selftests/run_kselftest.sh: exit with error if tests fail
selftests/dma: fix invalid array access in printf
selftests/tracing: Add basic test for trace_marker_raw file
Linus Torvalds [Wed, 3 Dec 2025 22:42:21 +0000 (14:42 -0800)]
Merge tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild updates from Nicolas Schier:
- Enable -fms-extensions, allowing anonymous use of tagged struct or
union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary
conversion patch is added here, too (btrfs).
[ Editor's note: the core of this actually came in early through a
shared branch and a few other trees - Linus ]
- Introduce architecture-specific CC_CAN_LINK and flags for userprogs
- Add new packaging target 'modules-cpio-pkg' for building a initramfs
cpio w/ kmods
- Handle included .c files in gen_compile_commands
- Minor kbuild changes:
- Use objtree for module signing key path, fixing oot kmod signing
- Improve documentation of KBUILD_BUILD_TIMESTAMP
- Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
- Rename scripts/Makefile.extrawarn to Makefile.warn
- Drop obsolete types.h check from headers_check.pl
- Remove outdated config leak ignore entries
* tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: add target to build a cpio containing modules
initramfs: add gen_init_cpio to hostprogs unconditionally
kbuild: allow architectures to override CC_CAN_LINK
init: deduplicate cc-can-link.sh invocations
kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings
scripts: headers_install.sh: Remove two outdated config leak ignore entries
scripts/clang-tools: Handle included .c files in gen_compile_commands
kbuild: uapi: Drop types.h check from headers_check.pl
kbuild: Rename Makefile.extrawarn to Makefile.warn
MAINTAINERS, .mailmap: Update mail address for Nicolas Schier
kbuild: uapi: reuse KBUILD_USERCFLAGS
kbuild: doc: improve KBUILD_BUILD_TIMESTAMP documentation
kbuild: Use objtree for module signing key path
btrfs: send: make use of -fms-extensions for defining struct fs_path
Linus Torvalds [Wed, 3 Dec 2025 22:16:49 +0000 (14:16 -0800)]
Merge tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Add support for 'syn'.
Syn is a parsing library for parsing a stream of Rust tokens into a
syntax tree of Rust source code.
Currently this library is geared toward use in Rust procedural
macros, but contains some APIs that may be useful more generally.
'syn' allows us to greatly simplify writing complex macros such as
'pin-init' (Benno has already prepared the 'syn'-based version). We
will use it in the 'macros' crate too.
'syn' is the most downloaded Rust crate (according to crates.io),
and it is also used by the Rust compiler itself. While the amount
of code is substantial, there should not be many updates needed for
these crates, and even if there are, they should not be too big,
e.g. +7k -3k lines across the 3 crates in the last year.
'syn' requires two smaller dependencies: 'quote' and 'proc-macro2'.
I only modified their code to remove a third dependency
('unicode-ident') and to add the SPDX identifiers. The code can be
easily verified to exactly match upstream with the provided
scripts.
They are all licensed under "Apache-2.0 OR MIT", like the other
vendored 'alloc' crate we had for a while.
Please see the merge commit with the cover letter for more context.
- Allow 'unreachable_pub' and 'clippy::disallowed_names' for
doctests.
Examples (i.e. doctests) may want to do things like show public
items and use names such as 'foo'.
Nevertheless, we still try to keep examples as close to real code
as possible (this is part of why running Clippy on doctests is
important for us, e.g. for safety comments, which userspace Rust
does not support yet but we are stricter).
'kernel' crate:
- Replace our custom 'CStr' type with 'core::ffi::CStr'.
Using the standard library type reduces our custom code footprint,
and we retain needed custom functionality through an extension
trait and a new 'fmt!' macro which replaces the previous 'core'
import.
This started in 6.17 and continued in 6.18, and we finally land the
replacement now. This required quite some stamina from Tamir, who
split the changes in steps to prepare for the flag day change here.
- Replace 'kernel::c_str!' with C string literals.
C string literals were added in Rust 1.77, which produce '&CStr's
(the 'core' one), so now we can write:
c"hi"
instead of:
c_str!("hi")
- Add 'num' module for numerical features.
It includes the 'Integer' trait, implemented for all primitive
integer types.
It also includes the 'Bounded' integer wrapping type: an integer
value that requires only the 'N' least significant bits of the
wrapped type to be encoded:
// An unsigned 8-bit integer, of which only the 4 LSBs are used.
let v = Bounded::<u8, 4>::new::<15>();
assert_eq!(v.get(), 15);
'Bounded' is useful to e.g. enforce guarantees when working with
bitfields that have an arbitrary number of bits.
Values can also be constructed from simple non-constant expressions
or, for more complex ones, validated at runtime.
'Bounded' also comes with comparison and arithmetic operations
(with both their backing type and other 'Bounded's with a
compatible backing type), casts to change the backing type,
extending/shrinking and infallible/fallible conversions from/to
primitives as applicable.
It enables to use just an immutable tree reference where
appropriate. The existing fully-featured mutable cursor is renamed
to 'CursorMut'.
kallsyms:
- Fix wrong "big" kernel symbol type read from procfs.
'pin-init' crate:
- A couple minor fixes (Benno asked me to pick these patches up for
him this cycle).
Documentation:
- Quick Start guide: add Debian 13 (Trixie).
Debian Stable is now able to build Linux, since Debian 13 (released
2025-08-09) packages Rust 1.85.0, which is recent enough.
We are planning to propose that the minimum supported Rust version
in Linux follows Debian Stable releases, with Debian 13 being the
first one we upgrade to, i.e. Rust 1.85.
MAINTAINERS:
- Add entry for the new 'num' module.
- Remove Alex as Rust maintainer: he hasn't had the time to
contribute for a few years now, so it is a no-op change in
practice.
And a few other cleanups and improvements"
* tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (53 commits)
rust: macros: support `proc-macro2`, `quote` and `syn`
rust: syn: enable support in kbuild
rust: syn: add `README.md`
rust: syn: remove `unicode-ident` dependency
rust: syn: add SPDX License Identifiers
rust: syn: import crate
rust: quote: enable support in kbuild
rust: quote: add `README.md`
rust: quote: add SPDX License Identifiers
rust: quote: import crate
rust: proc-macro2: enable support in kbuild
rust: proc-macro2: add `README.md`
rust: proc-macro2: remove `unicode_ident` dependency
rust: proc-macro2: add SPDX License Identifiers
rust: proc-macro2: import crate
rust: kbuild: support using libraries in `rustc_procmacro`
rust: kbuild: support skipping flags in `rustc_test_library`
rust: kbuild: add proc macro library support
rust: kbuild: simplify `--cfg` handling
rust: kbuild: introduce `core-flags` and `core-skip_flags`
...
Linus Torvalds [Wed, 3 Dec 2025 21:46:48 +0000 (13:46 -0800)]
Merge tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
- Support both paths where tracefs is typically mounted in selftests
- Make old_sympos 0 and 1 equal. They both are valid when there is only
one symbol with the given name.
* tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests: livepatch: use canonical ftrace path
livepatch: Match old_sympos 0 and 1 in klp_find_func()
Linus Torvalds [Wed, 3 Dec 2025 21:25:39 +0000 (13:25 -0800)]
Merge tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Improve recovery from misbehaving BPF schedulers.
When a scheduler puts many tasks with varying affinity restrictions
on a shared DSQ, CPUs scanning through tasks they cannot run can
overwhelm the system, causing lockups.
Bypass mode now uses per-CPU DSQs with a load balancer to avoid this,
and hooks into the hardlockup detector to attempt recovery.
Add scx_cpu0 example scheduler to demonstrate this scenario.
- Add lockless peek operation for DSQs to reduce lock contention for
schedulers that need to query queue state during load balancing.
- Allow scx_bpf_reenqueue_local() to be called from anywhere in
preparation for deprecating cpu_acquire/release() callbacks in favor
of generic BPF hooks.
- Prepare for hierarchical scheduler support: add
scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() kfuncs,
make scx_bpf_dsq_insert*() return bool, and wrap kfunc args in
structs for future aux__prog parameter.
- Implement cgroup_set_idle() callback to notify BPF schedulers when a
cgroup's idle state changes.
- Fix migration tasks being incorrectly downgraded from
stop_sched_class to rt_sched_class across sched_ext enable/disable.
Applied late as the fix is low risk and the bug subtle but needs
stable backporting.
- Various fixes and cleanups including cgroup exit ordering,
SCX_KICK_WAIT reliability, and backward compatibility improvements.
* tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (44 commits)
sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks
sched_ext: tools: Removing duplicate targets during non-cross compilation
sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object
sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
sched_ext: Update comments replacing breather with aborting mechanism
sched_ext: Implement load balancer for bypass mode
sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked()
sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
sched_ext: Add scx_cpu0 example scheduler
sched_ext: Hook up hardlockup detector
sched_ext: Make handle_lockup() propagate scx_verror() result
sched_ext: Refactor lockup handlers into handle_lockup()
sched_ext: Make scx_exit() and scx_vexit() return bool
sched_ext: Exit dispatch and move operations immediately when aborting
sched_ext: Simplify breather mechanism with scx_aborting flag
sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode
sched_ext: Refactor do_enqueue_task() local and global DSQ paths
sched_ext: Use shorter slice in bypass mode
sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races
sched_ext: Minor cleanups to scx_task_iter
...
Linus Torvalds [Wed, 3 Dec 2025 21:04:07 +0000 (13:04 -0800)]
Merge tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Defer task cgroup unlink until after the dying task's final context
switch so that controllers see the cgroup properly populated until
the task is truly gone
- cpuset cleanups and simplifications.
Enforce that domain isolated CPUs stay in root or isolated partitions
and fail if isolated+nohz_full would leave no housekeeping CPU. Fix
sched/deadline root domain handling during CPU hot-unplug and race
for tasks in attaching cpusets
- Misc fixes including memory reclaim protection documentation and
selftest KTAP conformance
* tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cpuset: Treat cpusets in attaching as populated
sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug
cgroup/cpuset: Introduce cpuset_cpus_allowed_locked()
docs: cgroup: No special handling of unpopulated memcgs
docs: cgroup: Note about sibling relative reclaim protection
docs: cgroup: Explain reclaim protection target
selftests/cgroup: conform test to KTAP format output
cpuset: remove need_rebuild_sched_domains
cpuset: remove global remote_children list
cpuset: simplify node setting on error
cgroup: include missing header for struct irq_work
cgroup: Fix sleeping from invalid context warning on PREEMPT_RT
cgroup/cpuset: Globally track isolated_cpus update
cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition
cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
cgroup: Defer task cgroup unlink until after the task is done switching out
cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()
cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()
...
Linus Torvalds [Wed, 3 Dec 2025 20:50:10 +0000 (12:50 -0800)]
Merge tag 'wq-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
- Rescuer affinity management: Affinity is now updated only when
detached using wq_unbound_cpumask consistently. DISASSOCIATED workers
also follow unbound cpumask changes to avoid breaking CPU isolation
- Rescuer cleanups preparing for fetching work items one by one from
pool list: Work assignment factored out, optimized to skip pwqs no
longer needing rescue, and shutdown logic simplified
* tag 'wq-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Don't rely on wq->rescuer to stop rescuer
workqueue: Only assign rescuer work when really needed
workqueue: Factor out assign_rescuer_work()
workqueue: Init rescuer's affinities as wq_unbound_cpumask
workqueue: Let DISASSOCIATED workers follow unbound wq cpumask changes
workqueue: Update the rescuer's affinity only when it is detached
workqueue: Remove unused assert_rcu_or_wq_mutex_or_pool_mutex
Linus Torvalds [Wed, 3 Dec 2025 20:42:36 +0000 (12:42 -0800)]
Merge tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Allow creaing nbcon console drivers with an unsafe write_atomic()
callback that can only be called by the final nbcon_atomic_flush_unsafe().
Otherwise, the driver would rely on the kthread.
It is going to be used as the-best-effort approach for an
experimental nbcon netconsole driver, see
In an ideal world, all networking drivers would be fixed first and
the atomic flush would be blocked only in NMI context. But it brings
the question how reliable networking drivers are when the system is
in a bad state. They might block flushing more reliable serial
consoles which are more suitable for serious debugging anyway.
- Allow to use the last 4 bytes of the printk ring buffer.
- Prevent queuing IRQ work and block printk kthreads when consoles are
suspended. Otherwise, they create non-necessary churn or even block
the suspend.
- Release console_lock() between each record in the kthread used for
legacy consoles on RT. It might significantly speed up the boot.
- Release nbcon context between each record in the atomic flush. It
prevents stalls of the related printk kthread after it has lost the
ownership in the middle of a record
- Add support for NBCON consoles into KDB
- Add %ptsP modifier for printing struct timespec64 and use it where
possible
- Misc code clean up
* tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (48 commits)
printk: Use console_is_usable on console_unblank
arch: um: kmsg_dump: Use console_is_usable
drivers: serial: kgdboc: Drop checks for CON_ENABLED and CON_BOOT
lib/vsprintf: Unify FORMAT_STATE_NUM handlers
printk: Avoid irq_work for printk_deferred() on suspend
printk: Avoid scheduling irq_work on suspend
printk: Allow printk_trigger_flush() to flush all types
tracing: Switch to use %ptSp
scsi: snic: Switch to use %ptSp
scsi: fnic: Switch to use %ptSp
s390/dasd: Switch to use %ptSp
ptp: ocp: Switch to use %ptSp
pps: Switch to use %ptSp
PCI: epf-test: Switch to use %ptSp
net: dsa: sja1105: Switch to use %ptSp
mmc: mmc_test: Switch to use %ptSp
media: av7110: Switch to use %ptSp
ipmi: Switch to use %ptSp
igb: Switch to use %ptSp
e1000e: Switch to use %ptSp
...
Linus Torvalds [Wed, 3 Dec 2025 20:41:00 +0000 (12:41 -0800)]
Merge tag 'lkmm.2025.12.01a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull lkmm documentation update from Paul McKenney:
- Sort the memory-barriers.txt file's wait_event* and wait_on_bit* list
alphabetically
* tag 'lkmm.2025.12.01a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
memory-barriers.txt: Sort wait_event* and wait_on_bit* list alphabetically
Linus Torvalds [Wed, 3 Dec 2025 20:18:07 +0000 (12:18 -0800)]
Merge tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Frederic Weisbecker:
"SRCU:
- Properly handle SRCU readers within IRQ disabled sections in tiny
SRCU
- Preparation to reimplement RCU Tasks Trace on top of SRCU fast:
- Introduce API to expedite a grace period and test it through
rcutorture
- Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.
Both are still targeted toward faster readers (without full
barriers on LOCK and UNLOCK) at the expense of heavier write
side (using full RCU grace period ordering instead of simply
full ordering) as compared to "traditional" non-fast SRCU. But
those srcu-fast flavours are going to be optimized in two
different ways:
- SRCU-fast will become the reimplementation basis for
RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
be NMI safe, SRCU-fast must be as well.
- SRCU-fast-updown will be needed for uretprobes code in order
to get rid of the read-side memory barriers while still
allowing entering the reader at task level while exiting it
in a timer handler. It is considered semaphore-like in that
it can have different owners between LOCK and UNLOCK.
However it is not NMI-safe.
The actual optimizations are work in progress for the next
cycle. Only the new interfaces are added for now, along with
related torture and scalability test code.
- Create/document/debug/torture new proper initializers for RCU fast:
DEFINE_SRCU_FAST() and init_srcu_struct_fast()
This allows for using right away the proper ordering on the write
side (either full ordering or full RCU grace period ordering)
without waiting for the read side to tell which to use.
This also optimizes the read side altogether with moving flavour
debug checks under debug config and with removing a costly RmW
operation on their first call.
- Make some diagnostic functions tracing safe
Refscale:
- Add performance testing for common context synchronizations
(Preemption, IRQ, Softirq) and per-cpu increments. Those are
relevant comparisons against SRCU-fast read side APIs, especially
as they are planned to synchronize further tracing fast-path code
Miscellanous:
- In order to prepare the layout for nohz_full work deferral to user
exit, the context tracking state must shrink the counter of
transitions to/from RCU not watching. The only possible hazard is
to trigger wrap-around more easily, delaying a bit grace periods
when that happens. This should be a rare event though. Yet add
debugging and torture code to test that assumption
- Fix memory leak on locktorture module
- Annotate accesses in rculist_nulls.h to prevent from KCSAN
warnings. On recent discussions, we also concluded that all those
WRITE_ONCE() and READ_ONCE() on list APIs deserve appropriate
comments. Something to be expected for the next cycle
- Provide a script to apply several configs to several commits with
torture
- Allow torture to reuse a build directory in order to save needless
rebuild time
- Various cleanups"
* tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (29 commits)
refscale: Add SRCU-fast-updown readers
refscale: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast()
rcutorture: Make srcu{,d}_torture_init() announce the SRCU type
srcu: Create an SRCU-fast-updown API
refscale: Do not disable interrupts for tests involving local_bh_enable()
refscale: Add non-atomic per-CPU increment readers
refscale: Add this_cpu_inc() readers
refscale: Add preempt_disable() readers
refscale: Add local_bh_disable() readers
refscale: Add local_irq_disable() and local_irq_save() readers
torture: Permit negative kvm.sh --kconfig numberic arguments
srcu: Add SRCU_READ_FLAVOR_FAST_UPDOWN CPP macro
rcu: Mark diagnostic functions as notrace
rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE
rcutorture: Remove redundant rcutorture_one_extend() from rcu_torture_one_read()
rcutorture: Permit kvm-again.sh to re-use the build directory
torture: Add kvm-series.sh to test commit/scenario combination
rcu: use WRITE_ONCE() for ->next and ->pprev of hlist_nulls
locktorture: Fix memory leak in param_set_cpumask()
doc: Update for SRCU-fast definitions and initialization
...
Linus Torvalds [Wed, 3 Dec 2025 19:53:47 +0000 (11:53 -0800)]
Merge tag 'slab-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- mempool_alloc_bulk() support for upcoming users in the block layer
that need to allocate multiple objects at once with the mempool's
guaranteed progress semantics, which is not achievable with an
allocation single objects in a loop. Along with refactoring and
various improvements (Christoph Hellwig)
- Preparations for the upcoming separation of struct slab from struct
page, mostly by removing the struct folio layer, as the purpose of
struct folio has shifted since it became used in slab code (Matthew
Wilcox)
- Modernisation of slab's boot param API usage, which removes some
unexpected parsing corner cases (Petr Tesarik)
- Refactoring of freelist_aba_t (now struct freelist_counters) and
associated functions for double cmpxchg, enabled by -fms-extensions
(Vlastimil Babka)
- Cleanups and improvements related to sheaves caching layer, that were
part of the full conversion to sheaves, which is planned for the next
release (Vlastimil Babka)
* tag 'slab-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (42 commits)
slab: Remove unnecessary call to compound_head() in alloc_from_pcs()
mempool: clarify behavior of mempool_alloc_preallocated()
mempool: drop the file name in the top of file comment
mempool: de-typedef
mempool: remove mempool_{init,create}_kvmalloc_pool
mempool: legitimize the io_schedule_timeout in mempool_alloc_from_pool
mempool: add mempool_{alloc,free}_bulk
mempool: factor out a mempool_alloc_from_pool helper
slab: Remove references to folios from virt_to_slab()
kasan: Remove references to folio in __kasan_mempool_poison_object()
memcg: Convert mem_cgroup_from_obj_folio() to mem_cgroup_from_obj_slab()
mempool: factor out a mempool_adjust_gfp helper
mempool: add error injection support
mempool: improve kerneldoc comments
mm: improve kerneldoc comments for __alloc_pages_bulk
fault-inject: make enum fault_flags available unconditionally
usercopy: Remove folio references from check_heap_object()
slab: Remove folio references from kfree_nolock()
slab: Remove folio references from kfree_rcu_sheaf()
slab: Remove folio references from build_detached_freelist()
...
Linus Torvalds [Wed, 3 Dec 2025 19:34:28 +0000 (11:34 -0800)]
Merge tag 'docs-6.19' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"This has been another busy cycle for documentation, with a lot of
build-system thrashing. That work should slow down from here on out.
- The various scripts and tools for documentation were spread out in
several directories; now they are (almost) all coalesced under
tools/docs/. The holdout is the kernel-doc script, which cannot be
easily moved without some further thought.
- As the amount of Python code increases, we are accumulating modules
that are imported by multiple programs. These modules have been
pulled together under tools/lib/python/ -- at least, for
documentation-related programs. There is other Python code in the
tree that might eventually want to move toward this organization.
- The Perl kernel-doc.pl script has been removed. It is no longer
used by default, and nobody has missed it, least of all anybody who
actually had to look at it.
- The docs build was controlled by a complex mess of makefilese that
few dared to touch. Mauro has moved that logic into a new program
(tools/docs/sphinx-build-wrapper) that, with any luck at all, will
be far easier to understand and maintain.
- The get_feat.pl program, used to access information under
Documentation/features/, has been rewritten in Python, bringing an
end to the use of Perl in the docs subsystem.
- The top-level README file has been reorganized into a more
reader-friendly presentation.
- A lot of Chinese translation additions
- Typo fixes and documentation updates as usual"
* tag 'docs-6.19' of git://git.lwn.net/linux: (164 commits)
docs: makefile: move rustdoc check to the build wrapper
README: restructure with role-based documentation and guidelines
docs: kdoc: various fixes for grammar, spelling, punctuation
docs: kdoc_parser: use '@' for Excess enum value
docs: submitting-patches: Clarify that removal of Acks needs explanation too
docs: kdoc_parser: add data/function attributes to ignore
docs: MAINTAINERS: update Mauro's files/paths
docs/zh_CN: Add wd719x.rst translation
docs/zh_CN: Add libsas.rst translation
get_feat.pl: remove it, as it got replaced by get_feat.py
Documentation/sphinx/kernel_feat.py: use class directly
tools/docs/get_feat.py: convert get_feat.pl to Python
Documentation/admin-guide: fix typo and comment in cscope example
docs/zh_CN: Add data-integrity.rst translation
docs/zh_CN: Add blk-mq.rst translation
docs/zh_CN: Add block/index.rst translation
docs/zh_CN: Update the Chinese translation of kbuild.rst
docs: bring some order to our Python module hierarchy
docs: Move the python libraries to tools/lib/python
Documentation/kernel-parameters: Move the kernel build options
...
Linus Torvalds [Wed, 3 Dec 2025 19:28:38 +0000 (11:28 -0800)]
Merge tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Rewrite memcpy_sglist from scratch
- Add on-stack AEAD request allocation
- Fix partial block processing in ahash
Algorithms:
- Remove ansi_cprng
- Remove tcrypt tests for poly1305
- Fix EINPROGRESS processing in authenc
- Fix double-free in zstd
Drivers:
- Use drbg ctr helper when reseeding xilinx-trng
- Add support for PCI device 0x115A to ccp
- Add support of paes in caam
- Add support for aes-xts in dthev2
Others:
- Use likely in rhashtable lookup
- Fix lockdep false-positive in padata by removing a helper"
* tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits)
crypto: zstd - fix double-free in per-CPU stream cleanup
crypto: ahash - Zero positive err value in ahash_update_finish
crypto: ahash - Fix crypto_ahash_import with partial block data
crypto: lib/mpi - use min() instead of min_t()
crypto: ccp - use min() instead of min_t()
hwrng: core - use min3() instead of nested min_t()
crypto: aesni - ctr_crypt() use min() instead of min_t()
crypto: drbg - Delete unused ctx from struct sdesc
crypto: testmgr - Add missing DES weak and semi-weak key tests
Revert "crypto: scatterwalk - Move skcipher walk and use it for memcpy_sglist"
crypto: scatterwalk - Fix memcpy_sglist() to always succeed
crypto: iaa - Request to add Kanchana P Sridhar to Maintainers.
crypto: tcrypt - Remove unused poly1305 support
crypto: ansi_cprng - Remove unused ansi_cprng algorithm
crypto: asymmetric_keys - fix uninitialized pointers with free attribute
KEYS: Avoid -Wflex-array-member-not-at-end warning
crypto: ccree - Correctly handle return of sg_nents_for_len
crypto: starfive - Correctly handle return of sg_nents_for_len
crypto: iaa - Fix incorrect return value in save_iaa_wq()
crypto: zstd - Remove unnecessary size_t cast
...
Linus Torvalds [Wed, 3 Dec 2025 19:19:34 +0000 (11:19 -0800)]
Merge tag 'ipe-pr-20251202' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull IPE udates from Fan Wu:
"The primary change is the addition of support for the AT_EXECVE_CHECK
flag. This allows interpreters to signal the kernel to perform IPE
security checks on script files before execution, extending IPE
enforcement to indirectly executed scripts.
Update documentation for it, and also fix a comment"
* tag 'ipe-pr-20251202' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
ipe: Update documentation for script enforcement
ipe: Add AT_EXECVE_CHECK support for script enforcement
ipe: Drop a duplicated CONFIG_ prefix in the ifdeffery
Linus Torvalds [Wed, 3 Dec 2025 19:08:03 +0000 (11:08 -0800)]
Merge tag 'integrity-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"Bug fixes:
- defer credentials checking from the bprm_check_security hook to the
bprm_creds_from_file security hook
- properly ignore IMA policy rules based on undefined SELinux labels
IMA policy rule extensions:
- extend IMA to limit including file hashes in the audit logs
(dont_audit action)
- define a new filesystem subtype policy option (fs_subtype)
Misc:
- extend IMA to support in-kernel module decompression by deferring
the IMA signature verification in kernel_read_file() to after the
kernel module is decompressed"
* tag 'integrity-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: Handle error code returned by ima_filter_rule_match()
ima: Access decompressed kernel module to verify appended signature
ima: add fs_subtype condition for distinguishing FUSE instances
ima: add dont_audit action to suppress audit actions
ima: Attach CREDS_CHECK IMA hook to bprm_creds_from_file LSM hook
Linus Torvalds [Wed, 3 Dec 2025 18:52:01 +0000 (10:52 -0800)]
Merge tag 'audit-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
- Consolidate the loops in __audit_inode_child() to improve performance
When logging a child inode in __audit_inode_child(), we first run
through the list of recorded inodes looking for the parent and then
we repeat the search looking for a matching child entry. This pull
request consolidates both searches into one pass through the recorded
inodes, resuling in approximately a 50% reduction in audit overhead.
See the commit description for the testing details.
- Combine kmalloc()/memset() into kzalloc() in audit_krule_to_data()
- Comment fixes
* tag 'audit-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: merge loops in __audit_inode_child()
audit: Use kzalloc() instead of kmalloc()/memset() in audit_krule_to_data()
audit: fix comment misindentation in audit.h
Linus Torvalds [Wed, 3 Dec 2025 18:45:47 +0000 (10:45 -0800)]
Merge tag 'selinux-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Improve the granularity of SELinux labeling for memfd files
Currently when creating a memfd file, SELinux treats it the same as
any other tmpfs, or hugetlbfs, file. While simple, the drawback is
that it is not possible to differentiate between memfd and tmpfs
files.
This adds a call to the security_inode_init_security_anon() LSM hook
and wires up SELinux to provide a set of memfd specific access
controls, including the ability to control the execution of memfds.
As usual, the commit message has more information.
- Improve the SELinux AVC lookup performance
Adopt MurmurHash3 for the SELinux AVC hash function instead of the
custom hash function currently used. MurmurHash3 is already used for
the SELinux access vector table so the impact to the code is minimal,
and performance tests have shown improvements in both hash
distribution and latency.
See the commit message for the performance measurments.
- Introduce a Kconfig option for the SELinux AVC bucket/slot size
While we have the ability to grow the number of AVC hash buckets
today, the size of the buckets (slot size) is fixed at 512. This pull
request makes that slot size configurable at build time through a new
Kconfig knob, CONFIG_SECURITY_SELINUX_AVC_HASH_BITS.
* tag 'selinux-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: improve bucket distribution uniformity of avc_hash()
selinux: Move avtab_hash() to a shared location for future reuse
selinux: Introduce a new config to make avc cache slot size adjustable
memfd,selinux: call security_inode_init_security_anon()
Linus Torvalds [Wed, 3 Dec 2025 17:53:48 +0000 (09:53 -0800)]
Merge tag 'lsm-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
- Rework the LSM initialization code
What started as a "quick" patch to enable a notification event once
all of the individual LSMs were initialized, snowballed a bit into a
30+ patch patchset when everything was done. Most of the patches, and
diffstat, is due to splitting out the initialization code into
security/lsm_init.c and cleaning up some of the mess that was there.
While not strictly necessary, it does cleanup the code signficantly,
and hopefully makes the upkeep a bit easier in the future.
Aside from the new LSM_STARTED_ALL notification, these changes also
ensure that individual LSM initcalls are only called when the LSM is
enabled at boot time. There should be a minor reduction in boot times
for those who build multiple LSMs into their kernels, but only enable
a subset at boot.
It is worth mentioning that nothing at present makes use of the
LSM_STARTED_ALL notification, but there is work in progress which is
dependent upon LSM_STARTED_ALL.
- Make better use of the seq_put*() helpers in device_cgroup
* tag 'lsm-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (36 commits)
lsm: use unrcu_pointer() for current->cred in security_init()
device_cgroup: Refactor devcgroup_seq_show to use seq_put* helpers
lsm: add a LSM_STARTED_ALL notification event
lsm: consolidate all of the LSM framework initcalls
selinux: move initcalls to the LSM framework
ima,evm: move initcalls to the LSM framework
lockdown: move initcalls to the LSM framework
apparmor: move initcalls to the LSM framework
safesetid: move initcalls to the LSM framework
tomoyo: move initcalls to the LSM framework
smack: move initcalls to the LSM framework
ipe: move initcalls to the LSM framework
loadpin: move initcalls to the LSM framework
lsm: introduce an initcall mechanism into the LSM framework
lsm: group lsm_order_parse() with the other lsm_order_*() functions
lsm: output available LSMs when debugging
lsm: cleanup the debug and console output in lsm_init.c
lsm: add/tweak function header comment blocks in lsm_init.c
lsm: fold lsm_init_ordered() into security_init()
lsm: cleanup initialize_lsm() and rename to lsm_init_single()
...
Linus Torvalds [Wed, 3 Dec 2025 17:45:23 +0000 (09:45 -0800)]
Merge tag 'keys-trusted-next-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull trusted key updates from Jarkko Sakkinen:
- Remove duplicate 'tpm2_hash_map' in favor of 'tpm2_find_hash_alg()'
- Fix a memory leak on failure paths of 'tpm2_load_cmd'
* tag 'keys-trusted-next-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
KEYS: trusted: Fix a memory leak in tpm2_load_cmd
KEYS: trusted: Replace a redundant instance of tpm2_hash_map
Linus Torvalds [Wed, 3 Dec 2025 17:41:04 +0000 (09:41 -0800)]
Merge tag 'keys-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull keys update from Jarkko Sakkinen:
"This contains only three fixes"
* tag 'keys-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
keys: Fix grammar and formatting in 'struct key_type' comments
keys: Replace deprecated strncpy in ecryptfs_fill_auth_tok
keys: Remove redundant less-than-zero checks
Linus Torvalds [Wed, 3 Dec 2025 17:23:25 +0000 (09:23 -0800)]
Merge tag 'nolibc-20251130-for-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc
Pull nolibc updates from Thomas Weißschuh:
- Preparations to the use of nolibc in UML:
- Cleanup of sparse warnings
- Library mode without _start()
- More consistency when disabling errno
- Unconditional installation of all architecture support files
- Always 64-bit wide ino_t and off_t
- Various cleanups and bug fixes
* tag 'nolibc-20251130-for-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (25 commits)
selftests/nolibc: error out on linker warnings
selftests/nolibc: use lld to link loongarch binaries
tools/nolibc: remove more __nolibc_enosys() fallbacks
tools/nolibc: remove now superfluous overflow check in llseek
tools/nolibc: use 64-bit off_t
tools/nolibc: prefer the llseek syscall
tools/nolibc: handle 64-bit off_t for llseek
tools/nolibc: use 64-bit ino_t
tools/nolibc: avoid using plain integer as NULL pointer
tools/nolibc: add support for fchdir()
tools/nolibc: clean up outdated comments in generic arch.h
tools/nolibc: make the "headers" target install all supported archs
tools/nolibc: add the more portable inttypes.h
tools/nolibc: provide the portable sys/select.h
tools/nolibc: add missing memchr() to string.h
tools/nolibc: fix misleading help message regarding installation path
tools/nolibc: add uio.h with readv and writev
tools/nolibc: add option to disable runtime
tools/nolibc: use __fallthrough__ rather than fallthrough
tools/nolibc: implement %m if errno is not defined
...
dt-bindings: thermal: qcom-tsens: Remove invalid tab character
Commit 1ee90870ce79 ("dt-bindings: thermal: tsens: Add QCS8300
compatible") uses a tab character which is illegal in YAML (at the
beginning of a line). The original patch was correct, so this got
corrupted when applied.
Yanzhu Huang [Wed, 5 Nov 2025 23:26:15 +0000 (23:26 +0000)]
ipe: Update documentation for script enforcement
This patch adds explanation of script enforcement mechanism in admin
guide documentation. Describes how IPE supports integrity enforcement
for indirectly executed scripts through the AT_EXECVE_CHECK flag, and
how this differs from kernel enforcement for compiled executables.
Signed-off-by: Yanzhu Huang <yanzhuhuang@linux.microsoft.com> Signed-off-by: Fan Wu <wufan@kernel.org>
Yanzhu Huang [Wed, 5 Nov 2025 23:26:14 +0000 (23:26 +0000)]
ipe: Add AT_EXECVE_CHECK support for script enforcement
This patch adds a new ipe_bprm_creds_for_exec() hook that integrates
with the AT_EXECVE_CHECK mechanism. To enable script enforcement,
interpreters need to incorporate the AT_EXECVE_CHECK flag when
calling execveat() on script files before execution.
When a userspace interpreter calls execveat() with the AT_EXECVE_CHECK
flag, this hook triggers IPE policy evaluation on the script file. The
hook only triggers IPE when bprm->is_check is true, ensuring it's
being called from an AT_EXECVE_CHECK context. It then builds an
evaluation context for an IPE_OP_EXEC operation and invokes IPE policy.
The kernel returns the policy decision to the interpreter, which can
then decide whether to proceed with script execution.
This extends IPE enforcement to indirectly executed scripts, permitting
trusted scripts to execute while denying untrusted ones.
Signed-off-by: Yanzhu Huang <yanzhuhuang@linux.microsoft.com> Signed-off-by: Fan Wu <wufan@kernel.org>
Linus Torvalds [Wed, 3 Dec 2025 03:00:26 +0000 (19:00 -0800)]
Merge tag 'random-6.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
- Dynamically allocate cpumasks off of the stack if the kernel is
configured for a lot of CPUs, to handle a -Wframe-larger-than case
- The removal of next_pseudo_random32() after the last user was
switched over to the prandom interface
- The removal of get_random_u{8,16,32,64}_wait() functions, as there
were no users of those at all
- Some house keeping changes - a few grammar cleanups in the
comments, system_unbound_wq was renamed to system_dfl_wq, and
static_key_initialized no longer needs to be checked
* tag 'random-6.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: complete sentence of comment
random: drop check for static_key_initialized
random: remove unused get_random_var_wait functions
random: replace use of system_unbound_wq with system_dfl_wq
random: use offstack cpumask when necessary
prandom: remove next_pseudo_random32
media: vivid: use prandom
random: add missing words in function comments
Linus Torvalds [Wed, 3 Dec 2025 02:53:50 +0000 (18:53 -0800)]
Merge tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull arm64 FPSIMD on-stack buffer updates from Eric Biggers:
"This is a core arm64 change. However, I was asked to take this because
most uses of kernel-mode FPSIMD are in crypto or CRC code.
In v6.8, the size of task_struct on arm64 increased by 528 bytes due
to the new 'kernel_fpsimd_state' field. This field was added to allow
kernel-mode FPSIMD code to be preempted.
Unfortunately, 528 bytes is kind of a lot for task_struct. This
regression in the task_struct size was noticed and reported.
Recover that space by making this state be allocated on the stack at
the beginning of each kernel-mode FPSIMD section.
To make it easier for all the users of kernel-mode FPSIMD to do that
correctly, introduce and use a 'scoped_ksimd' abstraction"
* tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (23 commits)
lib/crypto: arm64: Move remaining algorithms to scoped ksimd API
lib/crypto: arm/blake2b: Move to scoped ksimd API
arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack
arm64/fpu: Enforce task-context only for generic kernel mode FPU
net/mlx5: Switch to more abstract scoped ksimd guard API on arm64
arm64/xorblocks: Switch to 'ksimd' scoped guard API
crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API
crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API
crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API
crypto/arm64: polyval - Switch to 'ksimd' scoped guard API
crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API
crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API
crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API
crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API
raid6: Move to more abstract 'ksimd' guard API
crypto: aegis128-neon - Move to more abstract 'ksimd' guard API
crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
...
Linus Torvalds [Wed, 3 Dec 2025 02:26:54 +0000 (18:26 -0800)]
Merge tag 'libcrypto-at-least-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull 'at_least' array size update from Eric Biggers:
"C supports lower bounds on the sizes of array parameters, using the
static keyword as follows: 'void f(int a[static 32]);'. This allows
the compiler to warn about a too-small array being passed.
As discussed, this reuse of the 'static' keyword, while standard, is a
bit obscure. Therefore, add an alias 'at_least' to compiler_types.h.
Then, add this 'at_least' annotation to the array parameters of
various crypto library functions"
* tag 'libcrypto-at-least-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: sha2: Add at_least decoration to fixed-size array params
lib/crypto: sha1: Add at_least decoration to fixed-size array params
lib/crypto: poly1305: Add at_least decoration to fixed-size array params
lib/crypto: md5: Add at_least decoration to fixed-size array params
lib/crypto: curve25519: Add at_least decoration to fixed-size array params
lib/crypto: chacha: Add at_least decoration to fixed-size array params
lib/crypto: chacha20poly1305: Statically check fixed array lengths
compiler_types: introduce at_least parameter decoration pseudo keyword
wifi: iwlwifi: trans: rename at_least variable to min_mode
Linus Torvalds [Wed, 3 Dec 2025 02:24:35 +0000 (18:24 -0800)]
Merge tag 'aes-gcm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull AES-GCM optimizations from Eric Biggers:
"More optimizations and cleanups for the x86_64 AES-GCM code:
- Add a VAES+AVX2 optimized implementation of AES-GCM. This is very
helpful on CPUs that have VAES but not AVX512, such as AMD Zen 3.
- Make the VAES+AVX512 optimized implementation of AES-GCM handle
large amounts of associated data efficiently.
- Remove the "avx10_256" implementation of AES-GCM. It's superseded
by the VAES+AVX2 optimized implementation.
- Rename the "avx10_512" implementation to "avx512"
Overall, this fills in a gap where AES-GCM wasn't fully optimized on
some recent CPUs. It also drops code that won't be as useful as
initially expected due to AVX10/256 being dropped from the AVX10 spec"
* tag 'aes-gcm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
crypto: x86/aes-gcm-vaes-avx2 - initialize full %rax return register
crypto: x86/aes-gcm - optimize long AAD processing with AVX512
crypto: x86/aes-gcm - optimize AVX512 precomputation of H^2 from H^1
crypto: x86/aes-gcm - revise some comments in AVX512 code
crypto: x86/aes-gcm - reorder AVX512 precompute and aad_update functions
crypto: x86/aes-gcm - clean up AVX512 code to assume 512-bit vectors
crypto: x86/aes-gcm - rename avx10 and avx10_512 to avx512
crypto: x86/aes-gcm - remove VAES+AVX10/256 optimized code
crypto: x86/aes-gcm - add VAES+AVX2 optimized code
Linus Torvalds [Wed, 3 Dec 2025 02:20:06 +0000 (18:20 -0800)]
Merge tag 'libcrypto-tests-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library test updates from Eric Biggers:
- Add KUnit test suites for SHA-3, BLAKE2b, and POLYVAL. These are the
algorithms that have new crypto library interfaces this cycle.
- Remove the crypto_shash POLYVAL tests. They're no longer needed
because POLYVAL support was removed from crypto_shash. Better POLYVAL
test coverage is now provided via the KUnit test suite.
* tag 'libcrypto-tests-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
crypto: testmgr - Remove polyval tests
lib/crypto: tests: Add KUnit tests for POLYVAL
lib/crypto: tests: Add additional SHAKE tests
lib/crypto: tests: Add SHA3 kunit tests
lib/crypto: tests: Add KUnit tests for BLAKE2b
Linus Torvalds [Wed, 3 Dec 2025 02:01:03 +0000 (18:01 -0800)]
Merge tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:
"This is the main crypto library pull request for 6.19. It includes:
- Add SHA-3 support to lib/crypto/, including support for both the
hash functions and the extendable-output functions. Reimplement the
existing SHA-3 crypto_shash support on top of the library.
This is motivated mainly by the upcoming support for the ML-DSA
signature algorithm, which needs the SHAKE128 and SHAKE256
functions. But even on its own it's a useful cleanup.
This also fixes the longstanding issue where the
architecture-optimized SHA-3 code was disabled by default.
- Add BLAKE2b support to lib/crypto/, and reimplement the existing
BLAKE2b crypto_shash support on top of the library.
This is motivated mainly by btrfs, which supports BLAKE2b
checksums. With this change, all btrfs checksum algorithms now have
library APIs. btrfs is planned to start just using the library
directly.
This refactor also improves consistency between the BLAKE2b code
and BLAKE2s code. And as usual, it also fixes the issue where the
architecture-optimized BLAKE2b code was disabled by default.
- Add POLYVAL support to lib/crypto/, replacing the existing POLYVAL
support in crypto_shash. Reimplement HCTR2 on top of the library.
This simplifies the code and improves HCTR2 performance. As usual,
it also makes the architecture-optimized code be enabled by
default. The generic implementation of POLYVAL is greatly improved
as well.
- Clean up the BLAKE2s code
- Add FIPS self-tests for SHA-1, SHA-2, and SHA-3"
* tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (37 commits)
fscrypt: Drop obsolete recommendation to enable optimized POLYVAL
crypto: polyval - Remove the polyval crypto_shash
crypto: hctr2 - Convert to use POLYVAL library
lib/crypto: x86/polyval: Migrate optimized code into library
lib/crypto: arm64/polyval: Migrate optimized code into library
lib/crypto: polyval: Add POLYVAL library
crypto: polyval - Rename conflicting functions
lib/crypto: x86/blake2s: Use vpternlogd for 3-input XORs
lib/crypto: x86/blake2s: Avoid writing back unchanged 'f' value
lib/crypto: x86/blake2s: Improve readability
lib/crypto: x86/blake2s: Use local labels for data
lib/crypto: x86/blake2s: Drop check for nblocks == 0
lib/crypto: x86/blake2s: Fix 32-bit arg treated as 64-bit
lib/crypto: arm, arm64: Drop filenames from file comments
lib/crypto: arm/blake2s: Fix some comments
crypto: s390/sha3 - Remove superseded SHA-3 code
crypto: sha3 - Reimplement using library API
crypto: jitterentropy - Use default sha3 implementation
lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
lib/crypto: sha3: Support arch overrides of one-shot digest functions
...
Linus Torvalds [Wed, 3 Dec 2025 01:49:12 +0000 (17:49 -0800)]
Merge tag 'thermal-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These add Nova Lake processor support to the Intel thermal drivers and
DPTF code, update thermal control documentation, simplify the ACPI
DPTF code related to thermal control, add QCS8300 compatible to the
tsens thermal DT bindings, add DT bindings for NXP i.MX91 thermal
module and add support for it to the imx91 thermal driver, update a
few other thermal drivers and fix a format string issue in a thermal
utility:
- Add Nova Lake processor thermal device to the int340x
processor_thermal driver, add DLVR support for Nova Lake to it, add
Nova Lake support to the ACPI DPTF code, document thermal
throttling on Intel platforms, and update workload type hint
interface documentation (Srinivas Pandruvada)
- Remove int340x thermal scan handler from the ACPI DPTF code because
it turned out to be unnecessary (Slawomir Rosek)
- Clean up the Intel int340x thermal driver (Kaushlendra Kumar)
- Document the RZ/V2H TSU DT bindings (Ovidiu Panait)
- Document the Kaanapali Temperature Sensor (Manaf Meethalavalappu
Pallikunhi)
- Document R-Car Gen4 and RZ/G2 support in driver comment (Marek
Vasut)
- Convert to DEFINE_SIMPLE_DEV_PM_OPS() in R-Car [Gen3] (Geert
Uytterhoeven)
- Fix format string bug in thermal-engine (Malaya Kumar Rout)
- Make ipq5018 tsens standalone compatible (George Moussalem)
- Add the QCS8300 compatible for QCom Tsens (Gaurav Kohli)
- Add support for the NXP i.MX91 thermal module, including the DT
bindings (Pengfei Li)"
* tag 'thermal-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/drivers/imx91: Add support for i.MX91 thermal monitoring unit
dt-bindings: thermal: fsl,imx91-tmu: add bindings for NXP i.MX91 thermal module
dt-bindings: thermal: tsens: Add QCS8300 compatible
dt-bindings: thermal: qcom-tsens: make ipq5018 tsens standalone compatible
tools/thermal/thermal-engine: Fix format string bug in thermal-engine
docs: driver-api/thermal/intel_dptf: Add new workload type hint
thermal/drivers/rcar_gen3: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
thermal/drivers/rcar: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
Documentation: thermal: Document thermal throttling on Intel platforms
ACPI: DPTF: Support Nova Lake
thermal: intel: int340x: Add DLVR support for Nova Lake
thermal: int340x: processor_thermal: Add Nova Lake processor thermal device
thermal: intel: int340x: Replace sprintf() with sysfs_emit()
thermal: intel: int340x: Use symbolic constant for UUID comparison
thermal/drivers/rcar_gen3: Document R-Car Gen4 and RZ/G2 support in driver comment
dt-bindings: thermal: qcom-tsens: document the Kaanapali Temperature Sensor
dt-bindings: thermal: r9a09g047-tsu: Document RZ/V2H TSU
ACPI: DPTF: Remove int340x thermal scan handler
thermal: intel: Select INT340X_THERMAL from INTEL_SOC_DTS_THERMAL
Linus Torvalds [Wed, 3 Dec 2025 01:31:22 +0000 (17:31 -0800)]
Merge tag 'pm-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"There are quite a few interesting things here, including new hardware
support, new features, some bug fixes and documentation updates. In
addition, there are a usual bunch of minor fixes and cleanups all
over.
In the new hardware support category, there are intel_pstate and
intel_rapl driver updates to support new processors, Panther Lake,
Wildcat Lake, Noval Lake, and Diamond Rapids in the OOB mode, OPP and
bandwidth allocation support in the tegra186 cpufreq driver, and
JH7110S SOC support in dt-platdev cpufreq.
The new features are the PM QoS CPU latency limit for suspend-to-idle,
the netlink support for the energy model management, support for
terminating system suspend via a wakeup event during the sync of file
systems, configurable number of hibernation compression threads, the
runtime PM auto-cleanup macros, and the "poweroff" PM event that is
expected to be used during system shutdown.
Bugs are mostly fixed in cpuidle governors, but there are also fixes
elsewhere, like in the amd-pstate cpufreq driver.
Documentation updates include, but are not limited to, a new doc on
debugging shutdown hangs, cross-referencing fixes and cleanups in the
intel_pstate documentation, and updates of comments in the core
hibernation code.
Specifics:
- Introduce and document a QoS limit on CPU exit latency during
wakeup from suspend-to-idle (Ulf Hansson)
- Add support for building libcpupower statically (Zuo An)
- Add support for sending netlink notifications to user space on
energy model updates (Changwoo Mini, Peng Fan)
- Minor improvements to the Rust OPP interface (Tamir Duberstein)
- Fixes to scope-based pointers in the OPP library (Viresh Kumar)
- Use residency threshold in polling state override decisions in the
menu cpuidle governor (Aboorva Devarajan)
- Add sanity check for exit latency and target residency in the
cpufreq core (Rafael Wysocki)
- Use this_cpu_ptr() where possible in the teo governor (Christian
Loehle)
- Rework the handling of tick wakeups in the teo cpuidle governor to
increase the likelihood of stopping the scheduler tick in the cases
when tick wakeups can be counted as non-timer ones (Rafael Wysocki)
- Fix a reverse condition in the teo cpuidle governor and drop a
misguided target residency check from it (Rafael Wysocki)
- Clean up multiple minor defects in the teo cpuidle governor (Rafael
Wysocki)
- Update header inclusion to make it follow the Include What You Use
principle (Andy Shevchenko)
- Enable MSR-based RAPL PMU support in the intel_rapl power capping
driver and arrange for using it on the Panther Lake and Wildcat
Lake processors (Kuppuswamy Sathyanarayanan)
- Add support for Nova Lake and Wildcat Lake processors to the
intel_rapl power capping driver (Kaushlendra Kumar, Srinivas
Pandruvada)
- Add OPP and bandwidth support for Tegra186 (Aaron Kling)
- Optimizations for parameter array handling in the amd-pstate
cpufreq driver (Mario Limonciello)
- Fix for mode changes with offline CPUs in the amd-pstate cpufreq
driver (Gautham Shenoy)
- Preserve freq_table_sorted across suspend/hibernate in the cpufreq
core (Zihuan Zhang)
- Adjust energy model rules for Intel hybrid platforms in the
intel_pstate cpufreq driver and improve printing of debug messages
in it (Rafael Wysocki)
- Replace deprecated strcpy() in cpufreq_unregister_governor()
(Thorsten Blum)
- Fix duplicate hyperlink target errors in the intel_pstate cpufreq
driver documentation and use :ref: directive for internal linking
in it (Swaraj Gaikwad, Bagas Sanjaya)
- Add Diamond Rapids OOB mode support to the intel_pstate cpufreq
driver (Kuppuswamy Sathyanarayanan)
- Use mutex guard for driver locking in the intel_pstate driver and
eliminate some code duplication from it (Rafael Wysocki)
- Replace udelay() with usleep_range() in ACPI cpufreq (Kaushlendra
Kumar)
- Minor improvements to various cpufreq drivers (Christian Marangi,
Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu)
- Replace snprintf() with scnprintf() in show_trace_dev_match()
(Kaushlendra Kumar)
- Fix typos in runtime.c comments (Malaya Kumar Rout)
- Move governor.h from devfreq under include/linux/ and rename to
devfreq-governor.h to allow devfreq governor definitions in out of
drivers/devfreq/ (Dmitry Baryshkov)
- Use min() to improve readability in tegra30-devfreq.c (Thorsten
Blum)
- Fix potential use-after-free issue of OPP handling in
hisi_uncore_freq.c (Pengjie Zhang)
- Fix typo in DFSO_DOWNDIFFERENTIAL macro name in
governor_simpleondemand.c in devfreq (Riwen Lu)"
* tag 'pm-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (96 commits)
PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro name
cpuidle: Warn instead of bailing out if target residency check fails
cpuidle: Update header inclusion
Documentation: power/cpuidle: Document the CPU system wakeup latency QoS
cpuidle: Respect the CPU system wakeup QoS limit for cpuidle
sched: idle: Respect the CPU system wakeup QoS limit for s2idle
pmdomain: Respect the CPU system wakeup QoS limit for cpuidle
pmdomain: Respect the CPU system wakeup QoS limit for s2idle
PM: QoS: Introduce a CPU system wakeup QoS limit
cpuidle: governors: teo: Add missing space to the description
PM: hibernate: Extra cleanup of comments in swap handling code
PM / devfreq: tegra30: use min to simplify actmon_cpu_to_emc_rate
PM / devfreq: hisi: Fix potential UAF in OPP handling
PM / devfreq: Move governor.h to a public header location
powercap: intel_rapl: Enable MSR-based RAPL PMU support
powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers
cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list
PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper()
PM: sleep: Add support for wakeup during filesystem sync
cpufreq: ACPI: Replace udelay() with usleep_range()
...
Linus Torvalds [Wed, 3 Dec 2025 01:24:03 +0000 (17:24 -0800)]
Merge tag 'acpi-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These add Microsoft fan extensions support to the ACPI fan driver, fix
a bug in ACPICA, update other ACPI drivers (processor, time and alarm
device), update ACPI power management code and ACPI device properties
management, and fix an ACPI utility:
- Avoid walking the ACPI namespace in the AML interpreter if the
starting node cannot be determined (Cryolitia PukNgae)
- Use min() instead of min_t() in the ACPI device properties handling
code to avoid discarding significant bits (David Laight)
- Fix potential fwnode refcount leak in
acpi_fwnode_graph_parse_endpoint() that may prevent the parent
fwnode from being released (Haotian Zhang)
- Rework acpi_graph_get_next_endpoint() to use ACPI functions only,
remove unnecessary conditionals from it to make it easier to
follow, and make acpi_get_next_subnode() static (Sakari Ailus)
- Drop unused function acpi_get_lps0_constraint(), make some
Low-Power S0 callback functions for suspend-to-idle static, and
rearrange the code retrieving Low-Power S0 constraints so it only
runs when the constraints are actually used (Rafael Wysocki)
- Drop redundant locking from the ACPI battery driver (Rafael
Wysocki)
- Improve runtime PM in the ACPI time and alarm device (TAD) driver
using guard macros and rearrange code related to runtime PM in
acpi_tad_remove() (Rafael Wysocki)
- Add support for Microsoft fan extensions to the ACPI fan driver
along with notification support and work around a 64-bit firmware
bug in that driver (Armin Wolf)
- Use ACPI_FREE() to free ACPI buffer in the ACPI DPTF code
(Kaushlendra Kumar)
- Fix a memory leak and a resource leak in the ACPI pfrut utility
(Malaya Kumar Rout)
- Replace `core::mem::zeroed` with `pin_init::zeroed` in the ACPI
Rust code (Siyuan Huang)
- Update the ACPI code to use the new style of allocating workqueues
and new global workqueues (Marco Crivellari)
- Fix two spelling mistakes in the ACPI code (Chu Guangqing)
- Fix ISAPNP to generate uevents to auto-load modules (René Rebe)
- Relocate the state flags initialization in the ACPI processor idle
driver and drop redundant C-state count checks from it (Huisong Li)
- Fix map_x2apic_id() in the ACPI processor core driver for
amd-pstate on am4 (René Rebe)"
* tag 'acpi-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (30 commits)
ACPI: PM: Fix a spelling mistake
ACPI: LPSS: Fix a spelling mistake
ACPI: processor_core: fix map_x2apic_id for amd-pstate on am4
ACPICA: Avoid walking the Namespace if start_node is NULL
ACPI: tools: pfrut: fix memory leak and resource leak in pfrut.c
ACPI: property: use min() instead of min_t()
PNP: Fix ISAPNP to generate uevents to auto-load modules
ACPI: property: Fix fwnode refcount leak in acpi_fwnode_graph_parse_endpoint()
ACPI: DPTF: Use ACPI_FREE() for ACPI buffer deallocation
ACPI: processor: idle: Drop redundant C-state count checks
ACPI: thermal: Add WQ_PERCPU to alloc_workqueue() users
ACPI: OSL: Add WQ_PERCPU to alloc_workqueue() users
ACPI: EC: Add WQ_PERCPU to alloc_workqueue() users
ACPI: OSL: replace use of system_wq with system_percpu_wq
ACPI: scan: replace use of system_unbound_wq with system_dfl_wq
ACPI: fan: Add support for Microsoft fan extensions
ACPI: fan: Add hwmon notification support
ACPI: fan: Add basic notification support
ACPI: TAD: Improve runtime PM using guard macros
ACPI: TAD: Rearrange runtime PM operations in acpi_tad_remove()
...
Linus Torvalds [Wed, 3 Dec 2025 01:03:55 +0000 (17:03 -0800)]
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"These are the arm64 updates for 6.19.
The biggest part is the Arm MPAM driver under drivers/resctrl/.
There's a patch touching mm/ to handle spurious faults for huge pmd
(similar to the pte version). The corresponding arm64 part allows us
to avoid the TLB maintenance if a (huge) page is reused after a write
fault. There's EFI refactoring to allow runtime services with
preemption enabled and the rest is the usual perf/PMU updates and
several cleanups/typos.
Summary:
Core features:
- Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
driver under drivers/resctrl/ which makes use of the fs/rectrl/ API
Perf and PMU:
- Avoid cycle counter on multi-threaded CPUs
- Extend CSPMU device probing and add additional filtering support
for NVIDIA implementations
- Add support for the PMUs on the NoC S3 interconnect
- Add additional compatible strings for new Cortex and C1 CPUs
- Add support for data source filtering to the SPE driver
- Add support for i.MX8QM and "DB" PMU in the imx PMU driver
Memory managemennt:
- Avoid broadcast TLBI if page reused in write fault
- Elide TLB invalidation if the old PTE was not valid
- Drop redundant cpu_set_*_tcr_t0sz() macros
- Propagate pgtable_alloc() errors outside of __create_pgd_mapping()
- Propagate return value from __change_memory_common()
ACPI and EFI:
- Call EFI runtime services without disabling preemption
- Remove unused ACPI function
Miscellaneous:
- ptrace support to disable streaming on SME-only systems
- Improve sysreg generation to include a 'Prefix' descriptor
- Replace __ASSEMBLY__ with __ASSEMBLER__
- Align register dumps in the kselftest zt-test
- Remove some no longer used macros/functions
- Various spelling corrections"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic
arm64/pageattr: Propagate return value from __change_memory_common
arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
KVM: arm64: selftests: Consider all 7 possible levels of cache
KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
Documentation/arm64: Fix the typo of register names
ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
perf: arm_spe: Add support for filtering on data source
perf: Add perf_event_attr::config4
perf/imx_ddr: Add support for PMU in DB (system interconnects)
perf/imx_ddr: Get and enable optional clks
perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
arm64: mm: use untagged address to calculate page index
MAINTAINERS: new entry for MPAM Driver
arm_mpam: Add kunit tests for props_mismatch()
arm_mpam: Add kunit test for bitmap reset
arm_mpam: Add helper to reset saved mbwu state
...
Linus Torvalds [Wed, 3 Dec 2025 00:37:00 +0000 (16:37 -0800)]
Merge tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Provide a new interface for dynamic configuration and deconfiguration
of hotplug memory, allowing with and without memmap_on_memory
support. This makes the way memory hotplug is handled on s390 much
more similar to other architectures
- Remove compat support. There shouldn't be any compat user space
around anymore, therefore get rid of a lot of code which also doesn't
need to be tested anymore
- Add stackprotector support. GCC 16 will get new compiler options,
which allow to generate code required for kernel stackprotector
support
- Merge pai_crypto and pai_ext PMU drivers into a new driver. This
removes a lot of duplicated code. The new driver is also extendable
and allows to support new PMUs
- Add driver override support for AP queues
- Rework and extend zcrypt and AP trace events to allow for tracing of
crypto requests
- Support block sizes larger than 65535 bytes for CCW tape devices
- Since the rework of the virtual kernel address space the module area
and the kernel image are within the same 4GB area. This eliminates
the need of weak per cpu variables. Get rid of
ARCH_MODULE_NEEDS_WEAK_PER_CPU
- Various other small improvements and fixes
* tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (92 commits)
watchdog: diag288_wdt: Remove KMSG_COMPONENT macro
s390/entry: Use lay instead of aghik
s390/vdso: Get rid of -m64 flag handling
s390/vdso: Rename vdso64 to vdso
s390: Rename head64.S to head.S
s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros
s390: Add stackprotector support
s390/modules: Simplify module_finalize() slightly
s390: Remove KMSG_COMPONENT macro
s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
s390/ap: Restrict driver_override versus apmask and aqmask use
s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex
s390/ap: Support driver_override for AP queue devices
s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks
s390/debug: Update description of resize operation
s390/syscalls: Switch to generic system call table generation
s390/syscalls: Remove system call table pointer from thread_struct
s390/uapi: Remove 31 bit support from uapi header files
s390: Remove compat support
tools: Remove s390 compat support
...
Linus Torvalds [Tue, 2 Dec 2025 22:48:08 +0000 (14:48 -0800)]
Merge tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CPU feature updates from Dave Hansen:
"The biggest thing of note here is Linear Address Space Separation
(LASS). It represents the first time I can think of that the
upper=>kernel/lower=>user address space convention is actually
recognized by the hardware on x86. It ensures that userspace can not
even get the hardware to _start_ page walks for the kernel address
space. This, of course, is a really nice generic side channel defense.
This is really only a down payment on LASS support. There are still
some details to work out in its interaction with EFI calls and
vsyscall emulation. For now, LASS is disabled if either of those
features is compiled in (which is almost always the case).
There's also one straggler commit in here which converts an
under-utilized AMD CPU feature leaf into a generic Linux-defined leaf
so more feature can be packed in there.
Summary:
- Enable Linear Address Space Separation (LASS)
- Change X86_FEATURE leaf 17 from an AMD leaf to Linux-defined"
* tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Enable LASS during CPU initialization
selftests/x86: Update the negative vsyscall tests to expect a #GP
x86/traps: Communicate a LASS violation in #GP message
x86/kexec: Disable LASS during relocate kernel
x86/alternatives: Disable LASS when patching kernel code
x86/asm: Introduce inline memcpy and memset
x86/cpu: Add an LASS dependency on SMAP
x86/cpufeatures: Enumerate the LASS feature bits
x86/cpufeatures: Make X86_FEATURE leaf 17 Linux-specific
Linus Torvalds [Tue, 2 Dec 2025 22:16:42 +0000 (14:16 -0800)]
Merge tag 'x86_misc_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Dave Hansen:
"The most significant are some changes to ensure that symbols exported
for KVM are used only by KVM modules themselves, along with some
related cleanups.
In true x86/misc fashion, the other patch is completely unrelated and
just enhances an existing pr_warn() to make it clear to users how they
have tainted their kernel when something is mucking with MSRs.
Summary:
- Make MSR-induced taint easier for users to track down
- Restrict KVM-specific exports to KVM itself"
* tag 'x86_misc_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possible
x86/mm: Drop unnecessary export of "ptdump_walk_pgd_level_debugfs"
x86/mtrr: Drop unnecessary export of "mtrr_state"
x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base"
x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg)
Linus Torvalds [Tue, 2 Dec 2025 22:03:05 +0000 (14:03 -0800)]
Merge tag 'x86_sgx_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX updates from Dave HansenL
"The main content here is adding support for the new EUPDATESVN SGX
ISA. Before this, folks who updated microcode had to reboot before
enclaves could attest to the new microcode. The new functionality lets
them do this without a reboot.
The rest are some nice, but relatively mundane comment and kernel-doc
fixups.
Summary:
- Allow security version (SVN) updates so enclaves can attest to new
microcode
- Fix kernel docs typos"
* tag 'x86_sgx_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Fix a typo in the kernel-doc comment for enum sgx_attribute
x86/sgx: Remove superfluous asterisk from copyright comment in asm/sgx.h
x86/sgx: Document structs and enums with '@', not '%'
x86/sgx: Add kernel-doc descriptions for params passed to vDSO user handler
x86/sgx: Add a missing colon in kernel-doc markup for "struct sgx_enclave_run"
x86/sgx: Enable automatic SVN updates for SGX enclaves
x86/sgx: Implement ENCLS[EUPDATESVN]
x86/sgx: Define error codes for use by ENCLS[EUPDATESVN]
x86/cpufeatures: Add X86_FEATURE_SGX_EUPDATESVN feature flag
x86/sgx: Introduce functions to count the sgx_(vepc_)open()
Linus Torvalds [Tue, 2 Dec 2025 21:32:52 +0000 (13:32 -0800)]
Merge tag 'x86_mm_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Borislav Petkov:
- Use the proper accessors when reading CR3 as part of the page level
transitions (5-level to 4-level, the use case being kexec) so that
only the physical address in CR3 is picked up and not flags which are
above the physical mask shift
- Clean up and unify __phys_addr_symbol() definitions
* tag 'x86_mm_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub: Fix page table access in 5-level to 4-level paging transition
x86/boot: Fix page table access in 5-level to 4-level paging transition
x86/mm: Unify __phys_addr_symbol()
Linus Torvalds [Tue, 2 Dec 2025 21:27:09 +0000 (13:27 -0800)]
Merge tag 'x86_bugs_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CPU mitigation updates from Borislav Petkov:
- Convert the tsx= cmdline parsing to use early_param()
- Cleanup forward declarations gunk in bugs.c
* tag 'x86_bugs_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Get rid of the forward declarations
x86/tsx: Get the tsx= command line parameter with early_param()
x86/tsx: Make tsx_ctrl_state static
Linus Torvalds [Tue, 2 Dec 2025 21:07:53 +0000 (13:07 -0800)]
Merge tag 'x86_sev_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- Largely cleanups along with a change to save XSS to the GHCB
(Guest-Host Communication Block) in SEV-ES guests so that the
hypervisor can determine the guest's XSAVES buffer size properly
and thus support shadow stacks in AMD confidential guests
* tag 'x86_sev_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cc: Fix enum spelling to fix kernel-doc warnings
x86/boot: Drop unused sev_enable() fallback
x86/coco/sev: Convert has_cpuflag() to use cpu_feature_enabled()
x86/sev: Include XSS value in GHCB CPUID request
x86/boot: Move boot_*msr helpers to asm/shared/msr.h
Linus Torvalds [Tue, 2 Dec 2025 20:17:47 +0000 (12:17 -0800)]
Merge tag 'x86_cleanups_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
- The mandatory pile of cleanups the cat drags in every merge window
* tag 'x86_cleanups_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Clean up whitespace in a20.c
x86/mm: Delete disabled debug code
x86/{boot,mtrr}: Remove unused function declarations
x86/percpu: Use BIT_WORD() and BIT_MASK() macros
x86/cpufeatures: Correct LKGS feature flag description
x86/idtentry: Add missing '*' to kernel-doc lines
Linus Torvalds [Tue, 2 Dec 2025 19:55:58 +0000 (11:55 -0800)]
Merge tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
- Add support for AMD's Smart Data Cache Injection feature which allows
for direct insertion of data from I/O devices into the L3 cache, thus
bypassing DRAM and saving its bandwidth; the resctrl side of the
feature allows the size of the L3 used for data injection to be
controlled
- Add Intel Clearwater Forest to the list of CPUs which support
Sub-NUMA clustering
- Other fixes and cleanups
* tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fs/resctrl: Update bit_usage to reflect io_alloc
fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks
fs/resctrl: Modify struct rdt_parse_data to pass mode and CLOSID
fs/resctrl: Introduce interface to display io_alloc CBMs
fs/resctrl: Add user interface to enable/disable io_alloc feature
fs/resctrl: Introduce interface to display "io_alloc" support
x86,fs/resctrl: Implement "io_alloc" enable/disable handlers
x86,fs/resctrl: Detect io_alloc feature
x86/resctrl: Add SDCIAE feature in the command line options
x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement
fs/resctrl: Consider sparse masks when initializing new group's allocation
x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater Forest
Linus Torvalds [Tue, 2 Dec 2025 19:35:49 +0000 (11:35 -0800)]
Merge tag 'x86_microcode_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Borislav Petkov:
- Add microcode staging support on Intel: it moves the sole microcode
blobs loading to a non-critical path so that microcode loading
latencies are kept at minimum. The actual "directing" the hardware to
load microcode is the only step which is done on the critical path.
This scheme is also opportunistic as in: on a failure, the machinery
falls back to normal loading
- Add the capability to the AMD side of the loader to select one of two
per-family/model/stepping patches: one is pre-Entrysign and the other
is post-Entrysign; with the goal to take care of machines which
haven't updated their BIOS yet - something they should absolutely do
as this is the only proper Entrysign fix
- Other small cleanups and fixlets
* tag 'x86_microcode_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Mark early_parse_cmdline() as __init
x86/microcode/AMD: Select which microcode patch to load
x86/microcode/intel: Enable staging when available
x86/microcode/intel: Support mailbox transfer
x86/microcode/intel: Implement staging handler
x86/microcode/intel: Define staging state struct
x86/microcode/intel: Establish staging control logic
x86/microcode: Introduce staging step to reduce late-loading time
x86/cpu/topology: Make primary thread mask available with SMP=n
Pavel Begunkov [Sun, 30 Nov 2025 23:35:17 +0000 (23:35 +0000)]
net: page_pool: sanitise allocation order
We're going to give more control over rx buffer sizes to user space, and
since we can't always rely on driver validation, let's sanitise it in
page_pool_init() as well. Note that we only need to reject over
MAX_PAGE_ORDER allocations for normal page pools, as current memory
providers don't need to use the buddy allocator and must check the order
on init.i
Pavel Begunkov [Sun, 30 Nov 2025 23:35:16 +0000 (23:35 +0000)]
net: page pool: xa init with destroy on pp init
The free_ptr_ring label path initialises ->dma_mapped xarray but doesn't
destroy it in case of an error. That's not a real problem since init
itself doesn't do anything requiring destruction, but still match it
with xa_destroy() to silence warnings.
Linus Torvalds [Tue, 2 Dec 2025 19:04:37 +0000 (11:04 -0800)]
Merge tag 'ras_core_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- The second part of the AMD MCA interrupts rework after the
last-minute show-stopper from the last merge window was sorted out.
After this, the AMD MCA deferred errors, thresholding and corrected
errors interrupt handlers use common MCA code and are tightly
integrated into the core MCA code, thereby getting rid of
considerable duplication. All culminating into allowing CMCI error
thresholding storms to be detected at AMD too, using the common
infrastructure
- Add support for two new MCA bank bits on AMD Zen6 which denote
whether the error address logged is a system physical address, which
obviates the need for it to be translated before further error
recovery can be done
* tag 'ras_core_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Handle AMD threshold interrupt storms
x86/mce: Do not clear bank's poll bit in mce_poll_banks on AMD SMCA systems
x86/mce: Add support for physical address valid bit
x86/mce: Save and use APEI corrected threshold limit
x86/mce/amd: Define threshold restart function for banks
x86/mce/amd: Remove redundant reset_block()
x86/mce/amd: Support SMCA Corrected Error Interrupt
x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems
x86/mce: Unify AMD DFR handler with MCA Polling
x86/mce: Unify AMD THR handler with MCA Polling