Li RongQing [Tue, 7 Apr 2026 02:27:30 +0000 (22:27 -0400)]
devlink: Fix incorrect skb socket family dumping
The devlink_fmsg_dump_skb function was incorrectly using the socket
type (sk->sk_type) instead of the socket family (sk->sk_family)
when filling the "family" field in the fast message dump.
This patch fixes this to properly display the socket family.
Jiexun Wang [Tue, 7 Apr 2026 08:00:14 +0000 (16:00 +0800)]
af_unix: read UNIX_DIAG_VFS data under unix_state_lock
Exact UNIX diag lookups hold a reference to the socket, but not to
u->path. Meanwhile, unix_release_sock() clears u->path under
unix_state_lock() and drops the path reference after unlocking.
Read the inode and device numbers for UNIX_DIAG_VFS while holding
unix_state_lock(), then emit the netlink attribute after dropping the
lock.
This keeps the VFS data stable while the reply is being built.
Fixes: 5f7b0569460b ("unix_diag: Unix inode info NLA") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260407080015.1744197-1-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Li RongQing [Tue, 31 Mar 2026 05:32:45 +0000 (01:32 -0400)]
scsi: qla2xxx: Use nr_cpu_ids instead of NR_CPUS for qp_cpu_map allocation
Change the memory allocation for qp_cpu_map to use the actual number of
CPUs ('nr_cpu_ids') instead of the maximum possible CPUs ('NR_CPUS').
This saves memory on systems where the maximum CPU limit is much higher
than the active CPU count.
Jakub Kicinski [Thu, 9 Apr 2026 02:32:05 +0000 (19:32 -0700)]
Merge branch 'mptcp-autotune-related-improvement'
Matthieu Baerts says:
====================
mptcp: autotune related improvement
Here are two patches from Paolo that have been crafted a couple of
months ago, but needed more validation because they were indirectly
causing instabilities in the sefltests. The root cause has been fixed in
'net' recently in commit 8c09412e584d ("selftests: mptcp: more stable
simult_flows tests").
These patches refactor the receive space and RTT estimator, overall
making DRS more correct while avoiding receive buffer drifting to
tcp_rmem[2], which in turn makes the throughput more stable and less
bursty, especially with high bandwidth and low delay environments.
Note that the first patch addresses a very old issue. 'net-next' is
targeted because the change is quite invasive and based on a recent
backlog refactor. The 'Fixes' tag is then there more as a FYI, because
backporting this patch will quickly be blocked due to large conflicts.
====================
Paolo Abeni [Tue, 7 Apr 2026 08:45:17 +0000 (10:45 +0200)]
mptcp: better mptcp-level RTT estimator
The current MPTCP-level RTT estimator has several issues. On high speed
links, the MPTCP-level receive buffer auto-tuning happens with a
frequency well above the TCP-level's one. That in turn can cause
excessive/unneeded receive buffer increase.
On such links, the initial rtt_us value is considerably higher than the
actual delay, and the current mptcp_rcv_space_adjust() updates
msk->rcvq_space.rtt_us with a period equal to the such field previous
value. If the initial rtt_us is 40ms, its first update will happen after
40ms, even if the subflows see actual RTT orders of magnitude lower.
Additionally:
- setting the msk RTT to the maximum among all the subflows RTTs makes
DRS constantly overshooting the rcvbuf size when a subflow has
considerable higher latency than the other(s).
- during unidirectional bulk transfers with multiple active subflows,
the TCP-level RTT estimator occasionally sees considerably higher
value than the real link delay, i.e. when the packet scheduler reacts
to an incoming ACK on given subflow pushing data on a different
subflow.
- currently inactive but still open subflows (i.e. switched to backup
mode) are always considered when computing the msk-level RTT.
Address the all the issues above with a more accurate RTT estimation
strategy: the MPTCP-level RTT is set to the minimum of all the subflows
actually feeding data into the MPTCP receive buffer, using a small
sliding window.
While at it, also use EWMA to compute the msk-level scaling_ratio, to
that MPTCP can avoid traversing the subflow list is
mptcp_rcv_space_adjust().
Use some care to avoid updating msk and ssk level fields too often.
Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260407-net-next-mptcp-reduce-rbuf-v2-1-0d1d135bf6f6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Revert "mptcp: add needs_id for netlink appending addr"
This commit was originally adding the ability to add MPTCP endpoints
with ID 0 by accident. The in-kernel PM, handling MPTCP endpoints at the
net namespace level, is not supposed to handle endpoints with such ID,
because this ID 0 is reserved to the initial subflow, as mentioned in
the MPTCPv1 protocol [1], a per-connection setting.
Note that 'ip mptcp endpoint add id 0' stops early with an error, but
other tools might still request the in-kernel PM to create MPTCP
endpoints with this restricted ID 0.
In other words, it was wrong to call the mptcp_pm_has_addr_attr_id
helper to check whether the address ID attribute is set: if it was set
to 0, a new MPTCP endpoint would be created with ID 0, which is not
expected, and might cause various issues later.
mptcp: fix slab-use-after-free in __inet_lookup_established
The ehash table lookups are lockless and rely on
SLAB_TYPESAFE_BY_RCU to guarantee socket memory stability
during RCU read-side critical sections. Both tcp_prot and
tcpv6_prot have their slab caches created with this flag
via proto_register().
However, MPTCP's mptcp_subflow_init() copies tcpv6_prot into
tcpv6_prot_override during inet_init() (fs_initcall, level 5),
before inet6_init() (module_init/device_initcall, level 6) has
called proto_register(&tcpv6_prot). At that point,
tcpv6_prot.slab is still NULL, so tcpv6_prot_override.slab
remains NULL permanently.
This causes MPTCP v6 subflow child sockets to be allocated via
kmalloc (falling into kmalloc-4k) instead of the TCPv6 slab
cache. The kmalloc-4k cache lacks SLAB_TYPESAFE_BY_RCU, so
when these sockets are freed without SOCK_RCU_FREE (which is
cleared for child sockets by design), the memory can be
immediately reused. Concurrent ehash lookups under
rcu_read_lock can then access freed memory, triggering a
slab-use-after-free in __inet_lookup_established.
Fix this by splitting the IPv6-specific initialization out of
mptcp_subflow_init() into a new mptcp_subflow_v6_init(), called
from mptcp_proto_v6_init() before protocol registration. This
ensures tcpv6_prot_override.slab correctly inherits the
SLAB_TYPESAFE_BY_RCU slab cache.
sk_clone() initializes sk_tx_queue_mapping via sk_tx_queue_clear()
but does not initialize sk_rx_queue_mapping. Since this field is in
the sk_dontcopy region, it is neither copied from the parent socket
by sock_copy() nor zeroed by sk_prot_alloc() (called without
__GFP_ZERO from sk_clone).
Commit 03cfda4fa6ea ("tcp: fix another uninit-value
(sk_rx_queue_mapping)") attempted to fix this by introducing
sk_mark_napi_id_set() with force_set=true in tcp_child_process().
However, sk_mark_napi_id_set() -> sk_rx_queue_set() only writes
when skb_rx_queue_recorded(skb) is true. If the 3-way handshake
ACK arrives through a device that does not record rx_queue (e.g.
loopback or veth), sk_rx_queue_mapping remains uninitialized.
When a subsequent data packet arrives with a recorded rx_queue,
sk_mark_napi_id() -> sk_rx_queue_update() reads the uninitialized
field for comparison (force_set=false path), triggering KMSAN.
This was reproduced by establishing a TCP connection over loopback
(which does not call skb_record_rx_queue), then attaching a BPF TC
program on lo ingress to set skb->queue_mapping on data packets:
selftests: forwarding: lib: rewrite processing of command line arguments
The piece of code which processes the command line arguments and
populates NETIFS based on them is really unobvious. Rewrite it so that
the intention is clear and the code is easy to follow.
Ian Rogers [Wed, 8 Apr 2026 20:38:58 +0000 (13:38 -0700)]
perf data: Clean up use_stdio and structures
use_stdio was associated with struct perf_data and not perf_data_file
meaning there was implicit use of fd rather than fptr that may not be
safe. For example, in perf_data_file__write. Reorganize perf_data_file
to better abstract use_stdio, add kernel-doc and more consistently use
perf_data__ accessors so that use_stdio is better respected.
Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
So avoid these subtleties and instead reuse the gnu_basename() function
we had in srcline.c, renaming it to perf_basename() and replace
basename() calls with it, simplifying several cases by removing now
needless strdups.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
"
Could a malformed perf.data file provide out-of-bounds values for cpu and
domain?
These variables are read directly from the file and used as indices for
cd_map and cd_map[cpu]->domains without any validation against
env->nr_cpus_avail or max_sched_domains.
Similar to the issue above, this is an existing lack of validation that
becomes apparent when looking at the allocation boundaries.
"
Validate it.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ian Rogers [Wed, 8 Apr 2026 17:31:58 +0000 (14:31 -0300)]
perf tools: Make more global variables static
`make check` will run sparse on the perf code base. A frequent warning
is "warning: symbol '...' was not declared. Should it be static?" Go
through and make global definitions without declarations static.
In some cases it is deliberate due to dlsym accessing the symbol, this
change doesn't clean up the missing declarations for perf test suites.
Sometimes things can opportunistically be made const.
Making somethings static exposed unused functions warnings, so
restructuring of ifdefs was necessary for that.
These changes reduce the size of the perf binary by 568 bytes.
Committer notes:
Refreshed the patch, the original one fell thru the cracks, updated the
size reduction.
Remove the trace-event-scripting.c changes, break the build, noticed
with container builds and with sashiko:
perf util: Kill die() prototype, dead for a long time
In fef2a735167a827a ("perf tools: Kill die()") the die() function was
removed, but not the prototype in util.h, now when building with
LIBPERL=1, during a 'make -C tools/perf build-test' routine test, it is
failing as perl likes die() calls and then this clashes with this
remnant, remove it.
Fixes: fef2a735167a827a ("perf tools: Kill die()") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
net: phy: realtek: get rid of magic numbers in rtl8201_config_intr()
Replace the magic numbers with defines. Register names were obtained from
publicly available documentation[1]. This should make it clear what's going
on in the code.
1. RTL8201F/RTL8201FL/RTL8201FN Rev. 1.4 Datasheet Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Nicolai Buchwitz nb@tipi-net.de Link: https://patch.msgid.link/20260406201222.1043396-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: txgbe: leave space for null terminators on property_entry
Lists of struct property_entry are supposed to be terminated with an
empty property, this driver currently seems to be allocating exactly the
amount of entry used.
Change the struct definition to leave an extra element for all
property_entry.
Fixes: c3e382ad6d15 ("net: txgbe: Add software nodes to support phylink") Signed-off-by: Fabio Baltieri <fabio.baltieri@gmail.com> Tested-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20260405222013.5347-1-fabio.baltieri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yang Xiuwei [Mon, 30 Mar 2026 01:49:52 +0000 (09:49 +0800)]
scsi: sd: fix missing put_disk() when device_add(&disk_dev) fails
If device_add(&sdkp->disk_dev) fails, put_device() runs
scsi_disk_release(), which frees the scsi_disk but leaves the gendisk
referenced. The device_add_disk() error path in sd_probe() calls
put_disk(gd); call put_disk(gd) here to mirror that cleanup.
Fixes: 265dfe8ebbab ("scsi: sd: Free scsi_disk device via put_device()") Cc: stable@vger.kernel.org Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn> Link: https://patch.msgid.link/20260330014952.152776-1-yangxiuwei@kylinos.cn Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This code can lead to an out-of-bounds access of the dev->_tx[] array
when is_input is true. In such a case, the packet is on the RX path and
skb->queue_mapping contains the RX queue index of the ingress device. If
the ingress device has more RX queues than the egress device (dev) has
TX queues, skb_get_queue_mapping(skb) will exceed dev->num_tx_queues.
Add a check to avoid this situation since skb_get_tx_queue() does not
clamp the index. This issue has also revealed that per queue visibility
cannot be accurate and will be replaced later as a new feature.
While at it, add missing lock around qdisc_qstats_qlen_backlog(). The
function __ioam6_fill_trace_data() is called from both softirq and
process contexts, hence the use of spin_lock_bh() here.
Fixes: b63c5478e9cb ("ipv6: ioam: Support for Queue depth data field") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20260403214418.2233266-2-kuba@kernel.org/ Signed-off-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260404134137.24553-1-justin.iurman@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xiang Mei [Sat, 4 Apr 2026 22:04:12 +0000 (15:04 -0700)]
bonding: remove unused bond_is_first_slave and bond_is_last_slave macros
Since commit 2884bf72fb8f ("net: bonding: fix use-after-free in
bond_xmit_broadcast()"), bond_is_last_slave() was only used in
bond_xmit_broadcast(). After the recent fix replaced that usage with
a simple index comparison, bond_is_last_slave() has no remaining
callers. bond_is_first_slave() likewise has no callers.
Taniya Das [Fri, 3 Apr 2026 14:10:54 +0000 (16:10 +0200)]
clk: qcom: gcc: Add multiple global clock controller driver for Nord SoC
The global clock controller on the Nord SoC is partitioned into
GCC, SE_GCC, NE_GCC, and NW_GCC. Introduce driver support for each
of these controllers.
Signed-off-by: Taniya Das <taniya.das@oss.qualcomm.com>
[Shawn: Drop include of <linux/of.h> as the driver doesn't use any OF APIs] Co-developed-by: Shawn Guo <shengchao.guo@oss.qualcomm.com> Signed-off-by: Shawn Guo <shengchao.guo@oss.qualcomm.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260403-nord-clks-v1-6-018af14979fd@oss.qualcomm.com
[bjorn: Added missing .use_rpm to gcc_nord_desc] Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Jakub Kicinski [Thu, 9 Apr 2026 01:58:08 +0000 (18:58 -0700)]
Merge tag 'nf-next-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter: updates for net-next
1) Fix ancient sparse warnings in nf conntrack nat modules, from
Sun Jian.
2) Fix typo in enum description, from Jelle van der Waa.
3) remove redundant refetch of netns pointer in nf_conntrack_sip.
4) add a deprecation warning for dccp match.
We can extend the deadline later if needed, but plan atm is to
remove the feature.
5) remove nf_conntrack_h323 debug code that can read out-of-bounds
with malformed messages. This code was commented out, but better
remove this.
6+7) add more netlink policy validations in netfilter.
This could theoretically cause issues when a client sends e.g.
unsupported feature flags that were previously ignored, so we
may have to relax some changes. For now, try to be stricter and
reject upfront.
8+9) minor code cleanup in nft_set_pipapo (an nftables set backend).
10) Add nftables matching support fro double-tagged vlan and pppoe
frames, from Pablo Neira Ayuso.
11) Fix up indentation of debug messages in nf_conntrack_h323 conntrack
helper, from David Laight.
12) Add a helper to iterate to next flow action and bail out if the
maximum number of actions is reached, also from Pablo.
* tag 'nf-next-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables_offload: add nft_flow_action_entry_next() and use it
netfilter: nf_conntrack_h323: Correct indentation when H323_TRACE defined
netfilter: nft_meta: add double-tagged vlan and pppoe support
netfilter: nft_set_pipapo_avx2: remove redundant loop in lookup_slow
netfilter: nft_set_pipapo: increment data in one step
netfilter: nf_tables: add netlink policy based cap on registers
netfilter: add more netlink-based policy range checks
netfilter: nf_conntrack_h323: remove unreliable debug code in decode_octstr
netfilter: add deprecation warning for dccp support
netfilter: nf_conntrack_sip: remove net variable shadowing
netfilter: nf_tables: Fix typo in enum description
netfilter: use function typedefs for __rcu NAT helper hook pointers
====================
Taniya Das [Fri, 3 Apr 2026 14:10:51 +0000 (16:10 +0200)]
dt-bindings: clock: qcom: Add Nord Global Clock Controller
Add device tree bindings for the global clock controller on Qualcomm
Nord platform. The global clock controller on Nord SoC is divided into
multiple clock controllers (GCC,SE_GCC,NE_GCC and NW_GCC). Add each of
the bindings to define the clock controllers.
Signed-off-by: Taniya Das <taniya.das@oss.qualcomm.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260403-nord-clks-v1-3-018af14979fd@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Jakub Kicinski [Thu, 9 Apr 2026 01:56:17 +0000 (18:56 -0700)]
Merge tag 'wireless-2026-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
A few last-minute fixes:
- rfkill: prevent boundless event list
- rt2x00: fix USB resource management
- brcmfmac: validate firmware IDs
- brcmsmac: fix DMA free size
* tag 'wireless-2026-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
net: rfkill: prevent unlimited numbers of rfkill events from being created
wifi: rt2x00usb: fix devres lifetime
wifi: brcmfmac: validate bsscfg indices in IF events
wifi: brcmsmac: Fix dma_free_coherent() size
====================
1) Clear trailing padding in build_polexpire() to prevent
leaking unititialized memory. From Yasuaki Torimaru.
2) Fix aevent size calculation when XFRMA_IF_ID is used.
From Keenan Dong.
3) Wait for RCU readers during policy netns exit before
freeing the policy hash tables.
4) Fix dome too eaerly dropped references on the netdev
when uding transport mode. From Qi Tang.
5) Fix refcount leak in xfrm_migrate_policy_find().
From Kotlyarov Mihail.
6) Fix two fix info leaks in build_report() and
in build_mapping(). From Greg Kroah-Hartman.
7) Zero aligned sockaddr tail in PF_KEY exports.
From Zhengchuan Liang.
* tag 'ipsec-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
net: af_key: zero aligned sockaddr tail in PF_KEY exports
xfrm_user: fix info leak in build_report()
xfrm_user: fix info leak in build_mapping()
xfrm: fix refcount leak in xfrm_migrate_policy_find
xfrm: hold dev ref until after transport_finish NF_HOOK
xfrm: Wait for RCU readers during policy netns exit
xfrm: account XFRMA_IF_ID in aevent size calculation
xfrm: clear trailing padding in build_polexpire()
====================
Li Tian [Mon, 6 Apr 2026 01:53:44 +0000 (09:53 +0800)]
scsi: storvsc: Handle PERSISTENT_RESERVE_IN truncation for Hyper-V vFC
The storvsc driver has become stricter in handling SRB status codes
returned by the Hyper-V host. When using Virtual Fibre Channel (vFC)
passthrough, the host may return SRB_STATUS_DATA_OVERRUN for
PERSISTENT_RESERVE_IN commands if the allocation length in the CDB does
not match the host's expected response size.
Currently, this status is treated as a fatal error, propagating
Host_status=0x07 [DID_ERROR] to the SCSI mid-layer. This causes
userspace storage utilities (such as sg_persist) to fail with transport
errors, even when the host has actually returned the requested
reservation data in the buffer.
Refactor the existing command-specific workarounds into a new helper
function, storvsc_host_mishandles_cmd(), and add PERSISTENT_RESERVE_IN
to the list of commands where SRB status errors should be suppressed for
vFC devices. This ensures that the SCSI mid-layer processes the returned
data buffer instead of terminating the command.
Signed-off-by: Li Tian <litian@redhat.com> Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Link: https://patch.msgid.link/20260406015344.12566-1-litian@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
1) Update outdated comment in xfrm_dst_check().
From kexinsun.
2) Drop support for HMAC-RIPEMD-160 from IPsec.
From Eric Biggers.
* tag 'ipsec-next-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Drop support for HMAC-RIPEMD-160
xfrm: update outdated comment
====================
Jakub Kicinski [Thu, 9 Apr 2026 01:50:27 +0000 (18:50 -0700)]
Merge tag 'batadv-net-pullrequest-20260408' of https://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are two batman-adv bugfixes:
- reject oversized global TT response buffers, by Ruide Cao
- hold claim backbone gateways by reference, by Haoze Xie
* tag 'batadv-net-pullrequest-20260408' of https://git.open-mesh.org/linux-merge:
batman-adv: hold claim backbone gateways by reference
batman-adv: reject oversized global TT response buffers
====================
Jakub Kicinski [Thu, 9 Apr 2026 01:48:44 +0000 (18:48 -0700)]
Merge tag 'nf-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter updates for net
I only included crash fixes, as we're closer to a release, rest will
be handled via -next.
1) Fix a NULL pointer dereference in ip_vs_add_service error path, from
Weiming Shi, bug added in 6.2 development cycle.
2) Don't leak kernel data bytes from allocator to userspace: nfnetlink_log
needs to init the trailing NLMSG_DONE terminator. From Xiang Mei.
3) xt_multiport match lacks range validation, bogus userspace request will
cause out-of-bounds read. From Ren Wei.
4) ip6t_eui64 match must reject packets with invalid mac header before
calling eth_hdr. Make existing check unconditional. From Zhengchuan
Liang.
5) nft_ct timeout policies are free'd via kfree() while they may still
be reachable by other cpus that process a conntrack object that
uses such a timeout policy. Existing reaping of entries is not
sufficient because it doesn't wait for a grace period. Use kfree_rcu().
From Tuan Do.
6/7) Make nfnetlink_queue hash table per queue. As-is we can hit a page
fault in case underlying page of removed element was free'd. Per-queue
hash prevents parallel lookups. This comes with a test case that
demonstrates the bug, from Fernando Fernandez Mancera.
* tag 'nf-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: nft_queue.sh: add a parallel stress test
netfilter: nfnetlink_queue: make hash table per queue
netfilter: nft_ct: fix use-after-free in timeout object destroy
netfilter: ip6t_eui64: reject invalid MAC header for all packets
netfilter: xt_multiport: validate range encoding in checkentry
netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator
ipvs: fix NULL deref in ip_vs_add_service error path
====================
Jakub Kicinski [Thu, 9 Apr 2026 01:44:37 +0000 (18:44 -0700)]
Merge branch 'rxrpc-miscellaneous-fixes'
David Howells says:
====================
rxrpc: Miscellaneous fixes
Here are some fixes for rxrpc:
(1) Fix key quota calculation.
(2) Fix a memory leak.
(3) Fix rxrpc_new_client_call_for_sendmsg() to substitute NULL for an
empty key.
Might want to remove this substitution entirely or handle it in
rxrpc_init_client_call_security() instead.
(4) Fix deletion of call->link to be RCU safe.
(5) Fix missing bounds checks when parsing RxGK tickets.
(6) Fix use of wrong skbuff to get challenge serial number. Also actually
substitute the newer response skbuff and release the older one.
(7) Fix unexpected RACK timer warning to report old mode.
(8) Fix call key refcount leak.
(9) Fix the interaction of jumbograms with Tx window space, setting the
request-ack flag when the window space is getting low, typically
because each jumbogram take a big bite out of the window and fewer UDP
packets get traded.
(10) Don't call rxrpc_put_call() with a NULL pointer.
(11) Reject undecryptable rxkad response tickets by checking result of
decryption.
(12) Fix buffer bounds calculation in the RESPONSE authenticator parser.
(13) Fix oversized response length check.
(14) Fix refcount leak on multiple setting of server keyring.
(15) Fix checks made by RXRPC_SECURITY_KEY and RXRPC_SECURITY_KEYRING (both
should be allowed).
(16) Fix lack of result checking on calls to crypto_skcipher_en/decrypt().
(17) Fix token_len limit check in rxgk_verify_response().
(18) Fix rxgk context leak in rxgk_verify_response().
(19) Fix read beyond end of buffer in rxgk_do_verify_authenticator().
(20) Fix parsing of RESPONSE packet on a connection that has already been set
from a prior response.
(21) Fix size of buffers used for rendering addresses into for procfiles.
====================
rxrpc: proc: size address buffers for %pISpc output
The AF_RXRPC procfs helpers format local and remote socket addresses into
fixed 50-byte stack buffers with "%pISpc".
That is too small for the longest current-tree IPv6-with-port form the
formatter can produce. In lib/vsprintf.c, the compressed IPv6 path uses a
dotted-quad tail not only for v4mapped addresses, but also for ISATAP
addresses via ipv6_addr_is_isatap().
is possible with the current formatter. That is 50 visible characters, so
51 bytes including the trailing NUL, which does not fit in the existing
char[50] buffers used by net/rxrpc/proc.c.
Size the buffers from the formatter's maximum textual form and switch the
call sites to scnprintf().
Changes since v1:
- correct the changelog to cite the actual maximum current-tree case
explicitly
- frame the proof around the ISATAP formatting path instead of the earlier
mapped-v4 example
Fixes: 75b54cb57ca3 ("rxrpc: Add IPv6 support") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Anderson Nascimento <anderson@allelesecurity.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-22-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wang Jie [Wed, 8 Apr 2026 12:12:48 +0000 (13:12 +0100)]
rxrpc: only handle RESPONSE during service challenge
Only process RESPONSE packets while the service connection is still in
RXRPC_CONN_SERVICE_CHALLENGING. Check that state under state_lock before
running response verification and security initialization, then use a local
secured flag to decide whether to queue the secured-connection work after
the state transition. This keeps duplicate or late RESPONSE packets from
re-running the setup path and removes the unlocked post-transition state
test.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Jie Wang <jiewang2024@lzu.edu.cn> Signed-off-by: Yang Yang <n05ec@lzu.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-21-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Wed, 8 Apr 2026 12:12:45 +0000 (13:12 +0100)]
rxrpc: Fix integer overflow in rxgk_verify_response()
In rxgk_verify_response(), there's a potential integer overflow due to
rounding up token_len before checking it, thereby allowing the length check to
be bypassed.
Fix this by checking the unrounded value against len too (len is limited as
the response must fit in a single UDP packet).
Fixes: 9d1d2b59341f ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-18-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Wed, 8 Apr 2026 12:12:44 +0000 (13:12 +0100)]
rxrpc: Fix missing error checks for rxkad encryption/decryption failure
Add error checking for failure of crypto_skcipher_en/decrypt() to various
rxkad function as the crypto functions can fail with ENOMEM at least.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-17-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Wed, 8 Apr 2026 12:12:43 +0000 (13:12 +0100)]
rxrpc: Fix key/keyring checks in setsockopt(RXRPC_SECURITY_KEY/KEYRING)
An AF_RXRPC socket can be both client and server at the same time. When
sending new calls (ie. it's acting as a client), it uses rx->key to set the
security, and when accepting incoming calls (ie. it's acting as a server),
it uses rx->securities.
setsockopt(RXRPC_SECURITY_KEY) sets rx->key to point to an rxrpc-type key
and setsockopt(RXRPC_SECURITY_KEYRING) sets rx->securities to point to a
keyring of rxrpc_s-type keys.
Now, it should be possible to use both rx->key and rx->securities on the
same socket - but for userspace AF_RXRPC sockets rxrpc_setsockopt()
prevents that.
Fix this by:
(1) Remove the incorrect check rxrpc_setsockopt(RXRPC_SECURITY_KEYRING)
makes on rx->key.
(2) Move the check that rxrpc_setsockopt(RXRPC_SECURITY_KEY) makes on
rx->key down into rxrpc_request_key().
(3) Remove rxrpc_request_key()'s check on rx->securities.
This (in combination with a previous patch) pushes the checks down into the
functions that set those pointers and removes the cross-checks that prevent
both key and keyring being set.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Anderson Nascimento <anderson@allelesecurity.com>
cc: Luxiao Xu <rakukuip@gmail.com>
cc: Yuan Tan <yuantan098@gmail.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rxgk_verify_response() decodes auth_len from the packet and is supposed
to verify that it fits in the remaining bytes. The existing check is
inverted, so oversized RESPONSE authenticators are accepted and passed
to rxgk_decrypt_skb(), which can later reach skb_to_sgvec() with an
impossible length and hit BUG_ON(len).
Decoded from the original latest-net reproduction logs with
scripts/decode_stacktrace.sh:
rxgk_verify_authenticator() copies auth_len bytes into a temporary
buffer and then passes p + auth_len as the parser limit to
rxgk_do_verify_authenticator(). Since p is a __be32 *, that inflates the
parser end pointer by a factor of four and lets malformed RESPONSE
authenticators read past the kmalloc() buffer.
Decoded from the original latest-net reproduction logs with
scripts/decode_stacktrace.sh:
rxkad_decrypt_ticket() decrypts the RXKAD response ticket and then
parses the buffer as plaintext without checking whether
crypto_skcipher_decrypt() succeeded.
A malformed RESPONSE can therefore use a non-block-aligned ticket
length, make the decrypt operation fail, and still drive the ticket
parser with attacker-controlled bytes.
Check the decrypt result and abort the connection with RXKADBADTICKET
when ticket decryption fails.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Yuqi Xu <xuyuqiabc@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-12-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Douya Le [Wed, 8 Apr 2026 12:12:38 +0000 (13:12 +0100)]
rxrpc: Only put the call ref if one was acquired
rxrpc_input_packet_on_conn() can process a to-client packet after the
current client call on the channel has already been torn down. In that
case chan->call is NULL, rxrpc_try_get_call() returns NULL and there is
no reference to drop.
The client-side implicit-end error path does not account for that and
unconditionally calls rxrpc_put_call(). This turns a protocol error
path into a kernel crash instead of rejecting the packet.
Only drop the call reference if one was actually acquired. Keep the
existing protocol error handling unchanged.
Fixes: 5e6ef4f1017c ("rxrpc: Make the I/O thread take over the call and local processor work") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Signed-off-by: Douya Le <ldy3087146292@gmail.com> Co-developed-by: Yuan Tan <tanyuan98@gmail.com> Signed-off-by: Yuan Tan <tanyuan98@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Ao Zhou <n05ec@lzu.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-11-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Dionne [Wed, 8 Apr 2026 12:12:37 +0000 (13:12 +0100)]
rxrpc: Fix to request an ack if window is limited
Peers may only send immediate acks for every 2 UDP packets received.
When sending a jumbogram, it is important to check that there is
sufficient window space to send another same sized jumbogram following
the current one, and request an ack if there isn't. Failure to do so may
cause the call to stall waiting for an ack until the resend timer fires.
Where jumbograms are in use this causes a very significant drop in
performance.
Fixes: fe24a5494390 ("rxrpc: Send jumbo DATA packets") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-10-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rxrpc: Fix key reference count leak from call->key
When creating a client call in rxrpc_alloc_client_call(), the code obtains
a reference to the key. This is never cleaned up and gets leaked when the
call is destroyed.
Fix this by freeing call->key in rxrpc_destroy_call().
Before the patch, it shows the key reference counter elevated:
rxrpc: Fix rack timer warning to report unexpected mode
rxrpc_rack_timer_expired() clears call->rack_timer_mode to OFF before
the switch. The default case warning therefore always prints OFF and
doesn't identify the unexpected timer mode.
Log the saved mode value instead so the warning reports the actual
unexpected rack timer mode.
Fixes: 7c482665931b ("rxrpc: Implement RACK/TLP to deal with transmission stalls [RFC8985]") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-8-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rxrpc: Fix use of wrong skb when comparing queued RESP challenge serial
In rxrpc_post_response(), the code should be comparing the challenge serial
number from the cached response before deciding to switch to a newer
response, but looks at the newer packet private data instead, rendering the
comparison always false.
Fix this by switching to look at the older packet.
Fix further[1] to substitute the new packet in place of the old one if
newer and also to release whichever we don't use.
Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com Link: https://patch.msgid.link/20260408121252.2249051-7-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleh Konko [Wed, 8 Apr 2026 12:12:33 +0000 (13:12 +0100)]
rxrpc: Fix RxGK token loading to check bounds
rxrpc_preparse_xdr_yfs_rxgk() reads the raw key length and ticket length
from the XDR token as u32 values and passes each through round_up(x, 4)
before using the rounded value for validation and allocation. When the raw
length is >= 0xfffffffd, round_up() wraps to 0, so the bounds check and
kzalloc both use 0 while the subsequent memcpy still copies the original
~4 GiB value, producing a heap buffer overflow reachable from an
unprivileged add_key() call.
Fix this by:
(1) Rejecting raw key lengths above AFSTOKEN_GK_KEY_MAX and raw ticket
lengths above AFSTOKEN_GK_TOKEN_MAX before rounding, consistent with
the caps that the RxKAD path already enforces via AFSTOKEN_RK_TIX_MAX.
(2) Sizing the flexible-array allocation from the validated raw key
length via struct_size_t() instead of the rounded value.
(3) Caching the raw lengths so that the later field assignments and
memcpy calls do not re-read from the token, eliminating a class of
TOCTOU re-parse.
The control path (valid token with lengths within bounds) is unaffected.
Fixes: 0ca100ff4df6 ("rxrpc: Add YFS RxGK (GSSAPI) security class") Signed-off-by: Oleh Konko <security@1seal.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Wed, 8 Apr 2026 12:12:32 +0000 (13:12 +0100)]
rxrpc: Fix call removal to use RCU safe deletion
Fix rxrpc call removal from the rxnet->calls list to use list_del_rcu()
rather than list_del_init() to prevent stuffing up reading
/proc/net/rxrpc/calls from potentially getting into an infinite loop.
This, however, means that list_empty() no longer works on an entry that's
been deleted from the list, making it harder to detect prior deletion. Fix
this by:
Firstly, make rxrpc_destroy_all_calls() only dump the first ten calls that
are unexpectedly still on the list. Limiting the number of steps means
there's no need to call cond_resched() or to remove calls from the list
here, thereby eliminating the need for rxrpc_put_call() to check for that.
rxrpc_put_call() can then be fixed to unconditionally delete the call from
the list as it is the only place that the deletion occurs.
Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing") Closes: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-5-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Wed, 8 Apr 2026 12:12:31 +0000 (13:12 +0100)]
rxrpc: Fix anonymous key handling
In rxrpc_new_client_call_for_sendmsg(), a key with no payload is meant to
be substituted for a NULL key pointer, but the variable this is done with
is subsequently not used.
Fix this by using "key" rather than "rx->key" when filling in the
connection parameters.
Note that this only affects direct use of AF_RXRPC; the kAFS filesystem
doesn't use sendmsg() directly and so bypasses the issue. Further,
AF_RXRPC passes a NULL key in if no key is set, so using an anonymous key
in that manner works. Since this hasn't been noticed to this point, it
might be better just to remove the "key" variable and the code that sets it
- and, arguably, rxrpc_init_client_call_security() would be a better place
to handle it.
Fixes: 19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol info") Closes: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Wed, 8 Apr 2026 12:12:29 +0000 (13:12 +0100)]
rxrpc: Fix key quota calculation for multitoken keys
In the rxrpc key preparsing, every token extracted sets the proposed quota
value, but for multitoken keys, this will overwrite the previous proposed
quota, losing it.
Fix this by adding to the proposed quota instead.
Fixes: 8a7a3eb4ddbe ("KEYS: RxRPC: Use key preparsing") Closes: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann [Wed, 8 Apr 2026 19:12:42 +0000 (21:12 +0200)]
selftests/bpf: Add tests for ld_{abs,ind} failure path in subprogs
Extend the verifier_ld_ind BPF selftests with subprogs containing
ld_{abs,ind} and craft the test in a way where the invalid register
read is rejected in the fixed case. Also add a success case each,
and add additional coverage related to the BTF return type enforcement.
Daniel Borkmann [Wed, 8 Apr 2026 19:12:41 +0000 (21:12 +0200)]
bpf: Remove static qualifier from local subprog pointer
The local subprog pointer in create_jt() and visit_abnormal_return_insn()
was declared static.
It is unconditionally assigned via bpf_find_containing_subprog() before
every use. Thus, the static qualifier serves no purpose and rather creates
confusion. Just remove it.
Fixes: e40f5a6bf88a ("bpf: correct stack liveness for tail calls") Fixes: 493d9e0d6083 ("bpf, x86: add support for indirect jumps") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20260408191242.526279-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Wed, 8 Apr 2026 19:12:40 +0000 (21:12 +0200)]
bpf: Fix ld_{abs,ind} failure path analysis in subprogs
Usage of ld_{abs,ind} instructions got extended into subprogs some time
ago via commit 09b28d76eac4 ("bpf: Add abnormal return checks."). These
are only allowed in subprograms when the latter are BTF annotated and
have scalar return types.
The code generator in bpf_gen_ld_abs() has an abnormal exit path (r0=0 +
exit) from legacy cBPF times. While the enforcement is on scalar return
types, the verifier must also simulate the path of abnormal exit if the
packet data load via ld_{abs,ind} failed.
This is currently not the case. Fix it by having the verifier simulate
both success and failure paths, and extend it in similar ways as we do
for tail calls. The success path (r0=unknown, continue to next insn) is
pushed onto stack for later validation and the r0=0 and return to the
caller is done on the fall-through side.
Eric Biggers [Sat, 4 Apr 2026 20:30:03 +0000 (13:30 -0700)]
scsi: iscsi_tcp: Remove unneeded selections of CRYPTO and CRYPTO_MD5
As far as I can tell, CRYPTO_MD5 has been unnecessary here ever since it
was added by commit c899e4ef96f0 ("[SCSI] open-iscsi/linux-iscsi-5
Initiator: Kconfig update") in 2005.
CRYPTO was needed until commit 92186c1455a2 ("scsi: iscsi_tcp: Switch to
using the crc32c library"), but is no longer needed.
selftests/sched_ext: Improve runner error reporting for invalid arguments
Report an error for './runner foo' (positional arg instead of -t) and
for './runner -t foo' when the filter matches no tests. Previously both
cases produced no error output.
Pre-scan the test list before the main loop so the error is reported
immediately, avoiding spurious SKIP output from '-s' when no tests
match.
selftests/bpf: Add test to ensure kprobe_multi is not sleepable
Add a selftest to ensure that kprobe_multi programs cannot be attached
using the BPF_F_SLEEPABLE flag. This test succeeds when the kernel
rejects attachment of kprobe_multi when the BPF_F_SLEEPABLE flag is set.
Extract bpf_get_linfo_file_line as its own function so that the logic to
obtain the file, line, and line number for a given program can be shared
in subsequent patches.
ecryptfs: keep the lower iattr contained in truncate_upper
Currently the two callers of truncate_upper handle passing information
very differently. ecryptfs_truncate passes a zeroed lower_ia and expects
truncate_upper to fill it in from the upper ia created just for that,
while ecryptfs_setattr passes a fully initialized lower_ia copied from
the upper one. Both of them then call notify_change on the lower_ia.
Switch to only passing the upper ia, and derive the lower ia from it
inside truncate_upper, and call notify_change inside the function itself.
Because the old name is misleading now, rename the resulting function to
__ecryptfs_truncate as it deals with both the lower and upper inodes.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tyler Hicks <code@tyhicks.com>
ecryptfs: combine the two ATTR_SIZE blocks in ecryptfs_setattr
Simplify the logic in ecryptfs_setattr by combining the two ATTR_SIZE
blocks. This initializes lower_ia before the size check, which is
obviously correct as the size check doesn't look at it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tyler Hicks <code@tyhicks.com>
Use a few strategic gotos to reduce indentation and keep the main flow
outside of branches. Switch all touched comments to normal kernel style
and avoid breaks in printed strings for all the code touched.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tyler Hicks <code@tyhicks.com>
KVM: SEV: Move SEV-specific VM initialization to sev.c
Move SEV+ VM initialization to sev.c (as sev_vm_init()) so that
kvm_sev_info (and all usage) can be gated on CONFIG_KVM_AMD_SEV=y without
needing more #ifdefs. As a bonus, isolating the logic will make it easier
to harden the flow, e.g. to WARN if the vm_type is unknown.
No functional change intended (SEV, SEV_ES, and SNP VM types are only
supported if CONFIG_KVM_AMD_SEV=y).
KVM: SEV: Move standard VM-scoped helpers to detect SEV+ guests to sev.c
Now that all external usage of the VM-scoped APIs to detect SEV+ guests is
gone, drop the stubs provided for CONFIG_KVM_AMD_SEV=n builds and bury the
"standard" APIs in sev.c.
KVM: SEV: Document the SEV-ES check when querying SMM support as "safe"
Use the "unsafe" API to check for an SEV-ES+ guest when determining whether
or not SMBASE is a supported MSR, i.e. whether or not emulated SMM is
supported. This will eventually allow adding lockdep assertings to the
APIs for detecting SEV+ VMs without triggering "real" false positives.
While svm_has_emulated_msr() doesn't hold kvm->lock, i.e. can get both
false positives *and* false negatives, both are completely fine, as the
only time the result isn't stable is when userspace is the sole consumer
of the result. I.e. userspace can confuse itself, but that's it.
KVM: SEV: Add quad-underscore version of VM-scoped APIs to detect SEV+ guests
Add "unsafe" quad-underscore versions of the SEV+ guest detectors in
anticipation of hardening the APIs via lockdep assertions. This will allow
adding exceptions for usage that is known to be safe in advance of the
lockdep assertions.
Use a pile of underscores to try and communicate that use of the "unsafe"
shouldn't be done lightly.
KVM: SEV: Provide vCPU-scoped accessors for detecting SEV+ guests
Provide vCPU-scoped accessors for detecting if the vCPU belongs to an SEV,
SEV-ES, or SEV-SNP VM, partly to dedup a small amount of code, but mostly
to better document which usages are "safe". Generally speaking, using the
VM-scoped sev_guest() and friends outside of kvm->lock is unsafe, as they
can get both false positives and false negatives.
But for vCPUs, the accessors are guaranteed to provide a stable result as
KVM disallows initialization SEV+ state after vCPUs are created. I.e.
operating on a vCPU guarantees the VM can't "become" an SEV+ VM, and that
it can't revert back to a "normal" VM.
This will also allow dropping the stubs for the VM-scoped accessors, as
it's relatively easy to eliminate usage of the accessors from common SVM
once the vCPU-scoped checks are out of the way.
KVM: SEV: Lock all vCPUs for the duration of SEV-ES VMSA synchronization
Lock and unlock all vCPUs in a single batch when synchronizing SEV-ES VMSAs
during launch finish, partly to dedup the code by a tiny amount, but mostly
so that sev_launch_update_vmsa() uses the same logic/flow as all other SEV
ioctls that lock all vCPUs.
KVM: SEV: Lock all vCPUs when synchronzing VMSAs for SNP launch finish
Lock all vCPUs when synchronizing and encrypting VMSAs for SNP guests, as
allowing userspace to manipulate and/or run a vCPU while its state is being
synchronized would at best corrupt vCPU state, and at worst crash the host
kernel.
Opportunistically assert that vcpu->mutex is held when synchronizing its
VMSA (the SEV-ES path already locks vCPUs).
This commit fixes inconsistent quirk names and also fixes all the
checkpatch.pl issues alongside inconsistent code, it also adds static
asserts to assert struct sizes at compile time.
Vidya Sagar [Tue, 24 Mar 2026 19:09:58 +0000 (00:39 +0530)]
PCI: tegra194: Add core monitor clock support
Add support for Tegra PCIe core clock monitoring. Monitoring tracks rate
changes that may occur due to link speed changes and is useful for
detecting core clock changes not initiated by software. Parse the monitor
clock from device tree and enable it when present.
Vidya Sagar [Tue, 24 Mar 2026 19:09:57 +0000 (00:39 +0530)]
dt-bindings: PCI: tegra194: Add monitor clock support
Tegra supports PCIe core clock monitoring for any rate changes that may be
happening because of the link speed changes. This is useful in tracking
any changes in the core clock that are not initiated by the software.
Vidya Sagar [Tue, 24 Mar 2026 19:09:56 +0000 (00:39 +0530)]
PCI: tegra194: Enable hardware hot reset mode in Endpoint mode
When PCIe link goes down, hardware can retrain the link and try to link up.
To enable this feature, program the APPL_CTRL register with hardware hot
reset with immediate LTSSM enable mode when the controller is operating in
endpoint mode.
Vidya Sagar [Tue, 24 Mar 2026 19:09:54 +0000 (00:39 +0530)]
PCI: tegra194: Remove IRQF_ONESHOT flag during Endpoint interrupt registration
The Tegra PCIe Endpoint controller has a single interrupt line that is
shared between multiple interrupt sources:
1. PCIe link state events (link up, hot reset done)
2. Configuration space events (Bus Master Enable changes)
3. DMA completion events
The interrupt is currently registered with IRQF_ONESHOT, which keeps the
interrupt line masked until the threaded handler completes. That blocks
processing of DMA completion events (and other sources) while the
threaded handler runs.
Removing IRQF_ONESHOT is safe for the following reasons:
1. The hard IRQ handler (tegra_pcie_ep_hard_irq) properly acknowledges and
clears all interrupt status bits in hardware before returning. This
prevents interrupt storms and ensures the interrupt controller can
re-enable the interrupt line immediately.
2. The follow-up commit adds handling in the hard IRQ for DMA completion
events. Dropping IRQF_ONESHOT is required so the line is unmasked
after the hard IRQ returns and those events can be serviced without
being blocked by the threaded handler.
3. The threaded handler (tegra_pcie_ep_irq_thread) only processes link-up
notifications and LTR message sending. These operations don't conflict
with DMA interrupt processing and don't require the interrupt line to
remain masked.
This change enables both DMA driver and Endpoint controller driver to share
the interrupt line without blocking each other.
Vidya Sagar [Tue, 24 Mar 2026 19:09:53 +0000 (00:39 +0530)]
PCI: tegra194: Calibrate pipe to UPHY for Endpoint mode
Calibrate 'Pipe to Universal PHY(UPHY)' (P2U) for the Endpoint controller
to request UPHY PLL rate change to 2.5GT/s (Gen 1) during initialization.
This helps to reset stale PLL state from the previous bad link state.