git.ipfire.org Git - thirdparty/linux.git/log

smb: smbdirect: add some logging to SMBDIRECT_CHECK_STATUS_{WARN,DISCONNECT}()

This should make it easier to analyze any possible problems.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: smbdirect: introduce smbdirect_socket.logging infrastructure

This will be used by client and server in order to keep controlling
the logging when we move to shared functions.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: smbdirect: let smbdirect.h include #include <linux/types.h>

This will make it easier to use.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: server: avoid double-free in smb_direct_free_sendmsg after smb_direct_flush_send_list()

smb_direct_flush_send_list() already calls smb_direct_free_sendmsg(),
so we should not call it again after post_sendmsg()
moved it to the batch list.

Reported-by: Ruikai Peng <ruikai@pwno.io>
Closes: https://lore.kernel.org/linux-cifs/CAFD3drNOSJ05y3A+jNXSDxW-2w09KHQ0DivhxQ_pcc7immVVOQ@mail.gmail.com/
Fixes: 34abd408c8ba ("smb: server: make use of smbdirect_socket.send_io.bcredits")
Cc: stable@kernel.org
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Ruikai Peng <ruikai@pwno.io>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: security@kernel.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Tested-by: Ruikai Peng <ruikai@pwno.io>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: avoid double-free in smbd_free_send_io() after smbd_send_batch_flush()

smbd_send_batch_flush() already calls smbd_free_send_io(),
so we should not call it again after smbd_post_send()
moved it to the batch list.

Reported-by: Ruikai Peng <ruikai@pwno.io>
Closes: https://lore.kernel.org/linux-cifs/CAFD3drNOSJ05y3A+jNXSDxW-2w09KHQ0DivhxQ_pcc7immVVOQ@mail.gmail.com/
Fixes: 21538121efe6 ("smb: client: make use of smbdirect_socket.send_io.bcredits")
Cc: stable@kernel.org
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Ruikai Peng <ruikai@pwno.io>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: security@kernel.org
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Tested-by: Ruikai Peng <ruikai@pwno.io>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: fix use-after-free from async crypto on Qualcomm crypto engine

ksmbd_crypt_message() sets a NULL completion callback on AEAD requests
and does not handle the -EINPROGRESS return code from async hardware
crypto engines like the Qualcomm Crypto Engine (QCE). When QCE returns
-EINPROGRESS, ksmbd treats it as an error and immediately frees the
request while the hardware DMA operation is still in flight. The DMA
completion callback then dereferences freed memory, causing a NULL
pointer crash:

  pc : qce_skcipher_done+0x24/0x174
  lr : vchan_complete+0x230/0x27c
  ...
  el1h_64_irq+0x68/0x6c
  ksmbd_free_work_struct+0x20/0x118 [ksmbd]
  ksmbd_exit_file_cache+0x694/0xa4c [ksmbd]

Use the standard crypto_wait_req() pattern with crypto_req_done() as
the completion callback, matching the approach used by the SMB client
in fs/smb/client/smb2ops.c. This properly handles both synchronous
engines (immediate return) and async engines (-EINPROGRESS followed
by callback notification).

Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Link: https://github.com/openwrt/openwrt/issues/21822
Signed-off-by: Joshua Klinesmith <joshuaklinesmith@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: fix mechToken leak when SPNEGO decode fails after token alloc

The kernel ASN.1 BER decoder calls action callbacks incrementally as it
walks the input.  When ksmbd_decode_negTokenInit() reaches the mechToken
[2] OCTET STRING element, ksmbd_neg_token_alloc() allocates
conn->mechToken immediately via kmemdup_nul().  If a later element in
the same blob is malformed, then the decoder will return nonzero after
the allocation is already live.  This could happen if mechListMIC [3]
overrunse the enclosing SEQUENCE.

decode_negotiation_token() then sets conn->use_spnego = false because
both the negTokenInit and negTokenTarg grammars failed.  The cleanup at
the bottom of smb2_sess_setup() is gated on use_spnego:

if (conn->use_spnego && conn->mechToken) {
kfree(conn->mechToken);
conn->mechToken = NULL;
}

so the kfree is skipped, causing the mechToken to never be freed.

This codepath is reachable pre-authentication, so untrusted clients can
cause slow memory leaks on a server without even being properly
authenticated.

Fix this up by not checking check for use_spnego, as it's not required,
so the memory will always be properly freed.  At the same time, always
free the memory in ksmbd_conn_free() incase some other failure path
forgot to free it.

Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: <stable@kernel.org>
Assisted-by: gregkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: require 3 sub-authorities before reading sub_auth[2]

parse_dacl() compares each ACE SID against sid_unix_NFS_mode and on
match reads sid.sub_auth[2] as the file mode. If sid_unix_NFS_mode is
the prefix S-1-5-88-3 with num_subauth = 2 then compare_sids() compares
only min(num_subauth, 2) sub-authorities so a client SID with
num_subauth = 2 and sub_auth = {88, 3} will match.

If num_subauth = 2 and the ACE is placed at the very end of the security
descriptor, sub_auth[2] will be 4 bytes past end_of_acl. The
out-of-band bytes will then be masked to the low 9 bits and applied as
the file's POSIX mode, probably not something that is good to have
happen.

Fix this up by forcing the SID to actually carry a third sub-authority
before reading it at all.

Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: <stable@kernel.org>
Assisted-by: gregkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: validate EaNameLength in smb2_get_ea()

smb2_get_ea() reads ea_req->EaNameLength from the client request and
passes it directly to strncmp() as the comparison length without
verifying that the length of the name really is the size of the input
buffer received.

Fix this up by properly checking the size of the name based on the value
received and the overall size of the request, to prevent a later
strncmp() call to use the length as a "trusted" size of the buffer.
Without this check, uninitialized heap values might be slowly leaked to
the client.

Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: <stable@kernel.org>
Assisted-by: gregkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: Remove unnecessary selection of CRYPTO_ECB

Since the SMB server never uses any ecb(...) algorithm from the
crypto_skcipher API, selecting CRYPTO_ECB is unnecessary.

Remove it along with the unused CRYPTO_BLK_* constants.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: validate owner of durable handle on reconnect

Currently, ksmbd does not verify if the user attempting to reconnect
to a durable handle is the same user who originally opened the file.
This allows any authenticated user to hijack an orphaned durable handle
by predicting or brute-forcing the persistent ID.

According to MS-SMB2, the server MUST verify that the SecurityContext
of the reconnect request matches the SecurityContext associated with
the existing open.
Add a durable_owner structure to ksmbd_file to store the original opener's
UID, GID, and account name. and catpure the owner information when a file
handle becomes orphaned. and implementing ksmbd_vfs_compare_durable_owner()
to validate the identity of the requester during SMB2_CREATE (DHnC).

Fixes: c8efcc786146 ("ksmbd: add support for durable handles v1/v2")
Reported-by: Davide Ornaghi <d.ornaghi97@gmail.com>
Reported-by: Navaneeth K <knavaneeth786@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: fix use-after-free in __ksmbd_close_fd() via durable scavenger

When a durable file handle survives session disconnect (TCP close without
SMB2_LOGOFF), session_fd_check() sets fp->conn = NULL to preserve the
handle for later reconnection. However, it did not clean up the byte-range
locks on fp->lock_list.

Later, when the durable scavenger thread times out and calls
__ksmbd_close_fd(NULL, fp), the lock cleanup loop did:

    spin_lock(&fp->conn->llist_lock);

This caused a slab use-after-free because fp->conn was NULL and the
original connection object had already been freed by
ksmbd_tcp_disconnect().

The root cause is asymmetric cleanup: lock entries (smb_lock->clist) were
left dangling on the freed conn->lock_list while fp->conn was nulled out.

To fix this issue properly, we need to handle the lifetime of
smb_lock->clist across three paths:
- Safely skip clist deletion when list is empty and fp->conn is NULL.
- Remove the lock from the old connection's lock_list in
   session_fd_check()
- Re-add the lock to the new connection's lock_list in
   ksmbd_reopen_durable_fd().

Fixes: c8efcc786146 ("ksmbd: add support for durable handles v1/v2")
Co-developed-by: munan Huang <munanevil@gmail.com>
Signed-off-by: munan Huang <munanevil@gmail.com>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: ipc: use kzalloc_flex and __counted_by

The former is just a nice macro and the latter allows runtime analysis
of the allocation and its size.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: move filesystem_vol_info into common/fscc.h

The structure definition on the server side is specified in MS-CIFS
2.2.8.2.3, but we should instead refer to MS-FSCC 2.5.9, just as the
client side does.

Modify the following places:

  - smb3_fs_vol_info -> filesystem_vol_info
  - SerialNumber -> VolumeSerialNumber
  - VolumeLabelSize -> VolumeLabelLength

Then move it into common header file.

Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Steve French <stfrench@microsoft.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: move file_basic_info into common/fscc.h

This struct definition is specified in MS-FSCC, so move them into fscc.h.

Modify the following places:

- smb2_file_basic_info -> file_basic_info
- Pad1 -> Pad

Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: move some definitions from common/smb2pdu.h into common/fscc.h

These definitions are specified in MS-FSCC, so move them into fscc.h.

Only add some documentation references, no other changes.

Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Steve French <stfrench@microsoft.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

Merge branch 'bpf-fix-short-ipv4-ipv6-handling-in-test_run_skb'

Sun Jian says:

====================
bpf: fix short IPv4/IPv6 handling in test_run_skb

bpf_prog_test_run_skb() may access IPv4/IPv6 network headers based on
skb->protocol even when the provided test input only contains an
Ethernet header.

Fix it by rejecting such short IPv4/IPv6 inputs before accessing the
L3 headers, and add a selftest that exercises the reported
bpf_skb_adjust_room() path on ETH_HLEN-sized IPv4/IPv6 EtherType
inputs.

Changes in v4:
- Split the selftests into a separate patch.
- Rework the selftest to actually execute a BPF program calling
bpf_skb_adjust_room().
- Reuse a single struct ethhdr eth_hlen and initialize h_proto from
the test case table.
- Add the Fixes tag to the test_run.c patch.

Link: https://lore.kernel.org/bpf/CABFUUZF_CWQZrRk=L9cNxO=8Z4iSgGfXi3J=hpzeyTKDbfE2-w@mail.gmail.com/T/#mfabfe7e86bb30c0141fbc9f751b8b1cb07767f01
====================

Link: https://patch.msgid.link/20260408034623.180320-1-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: cover short IPv4/IPv6 inputs with adjust_room

Add a selftest covering ETH_HLEN-sized IPv4/IPv6 EtherType inputs for
bpf_prog_test_run_skb().

Reuse a single zero-initialized struct ethhdr eth_hlen and set
eth_hlen.h_proto from the per-test h_proto field.

Also add a dedicated tc_adjust_room program and route the short
IPv4/IPv6 cases to it, so the selftest actually exercises the
bpf_skb_adjust_room() path from the report.

Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260408034623.180320-3-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: reject short IPv4/IPv6 inputs in bpf_prog_test_run_skb

bpf_prog_test_run_skb() calls eth_type_trans() first and then uses
skb->protocol to initialize sk family and address fields for the test
run.

For IPv4 and IPv6 packets, it may access ip_hdr(skb) or ipv6_hdr(skb)
even when the provided test input only contains an Ethernet header.

Reject the input earlier if the Ethernet frame carries IPv4/IPv6
EtherType but the L3 header is too short.

Fold the IPv4/IPv6 header length checks into the existing protocol
switch and return -EINVAL before accessing the network headers.

Fixes: fa5cb548ced6 ("bpf: Setup socket family and addresses in bpf_prog_test_run_skb")
Reported-by: syzbot+619b9ef527f510a57cfc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=619b9ef527f510a57cfc
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260408034623.180320-2-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'net-fix-skb_ext-build_bug_on-failures-with-gcov'

Konstantin Khorenko says:

====================
net: fix skb_ext BUILD_BUG_ON failures with GCOV

This mini-series fixes build failures in net/core/skbuff.c when the
kernel is built with CONFIG_GCOV_PROFILE_ALL=y.

This is part of a larger effort to add -fprofile-update=atomic to
global CFLAGS_GCOV (posted earlier as a combined series):
  https://lore.kernel.org/lkml/20260401142020.1434243-1-khorenko@virtuozzo.com/T/#t

That combined series was split per subsystem as requested by Jakub.
The companion patches are:

- iommu: use __always_inline for amdv1pt_install_leaf_entry()
   (sent to iommu maintainers)
- gcov: add -fprofile-update=atomic globally (sent to gcov/kbuild
   maintainers, depends on this series and the iommu patch)

Patch 1/2 fixes a pre-existing build failure with CONFIG_GCOV_PROFILE_ALL:
GCOV counters prevent GCC from constant-folding the skb_ext_total_length()
loop.  It also removes the CONFIG_KCOV_INSTRUMENT_ALL preprocessor guard
from d6e5794b06c0: that guard was a precaution in case KCOV instrumentation
also prevented constant folding, but KCOV's -fsanitize-coverage=trace-pc
does not interfere with GCC's constant folding (verified experimentally
with GCC 14.2 and GCC 16.0.1), so the guard is unnecessary.

Patch 2/2 is an additional fix needed when -fprofile-update=atomic is
added to CFLAGS_GCOV: __no_profile on the __always_inline function alone
is insufficient because after inlining, the code resides in the caller's
profiled body.  The caller (skb_extensions_init) needs __no_profile and
noinline to prevent re-exposure to GCOV instrumentation.
====================

Link: https://patch.msgid.link/20260410162150.3105738-1-khorenko@virtuozzo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: add noinline __init __no_profile to skb_extensions_init() for GCOV compatibility

With -fprofile-update=atomic in global CFLAGS_GCOV, GCC still cannot
constant-fold the skb_ext_total_length() loop when it is inlined into a
profiled caller.  The existing __no_profile on skb_ext_total_length()
itself is insufficient because after __always_inline expansion the code
resides in the caller's body, which still carries GCOV instrumentation.

Mark skb_extensions_init() with __no_profile so the BUILD_BUG_ON checks
can be evaluated at compile time.  Also mark it noinline to prevent the
compiler from inlining it into skb_init() (which lacks __no_profile),
which would re-expose the function body to GCOV instrumentation.

Add __init since skb_extensions_init() is only called from __init
skb_init().  Previously it was implicitly inlined into the .init.text
section; with noinline it would otherwise remain in permanent .text,
wasting memory after boot.

Build-tested with both CONFIG_GCOV_PROFILE_ALL=y and
CONFIG_KCOV_INSTRUMENT_ALL=y.

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Link: https://patch.msgid.link/20260410162150.3105738-3-khorenko@virtuozzo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fix skb_ext_total_length() BUILD_BUG_ON with CONFIG_GCOV_PROFILE_ALL

When CONFIG_GCOV_PROFILE_ALL=y is enabled, the kernel fails to build:

  In file included from <command-line>:
  In function 'skb_extensions_init',
      inlined from 'skb_init' at net/core/skbuff.c:5214:2:
  ././include/linux/compiler_types.h:706:45: error: call to
    '__compiletime_assert_1490' declared with attribute error:
    BUILD_BUG_ON failed: skb_ext_total_length() > 255

CONFIG_GCOV_PROFILE_ALL adds -fprofile-arcs -ftest-coverage
-fno-tree-loop-im to CFLAGS globally. GCC inserts branch profiling
counters into the skb_ext_total_length() loop and, combined with
-fno-tree-loop-im (which disables loop invariant motion), cannot
constant-fold the result.
BUILD_BUG_ON requires a compile-time constant and fails.

The issue manifests in kernels with 5+ SKB extension types enabled
(e.g., after addition of SKB_EXT_CAN, SKB_EXT_PSP). With 4 extensions
GCC can still unroll and fold the loop despite GCOV instrumentation;
with 5+ it gives up.

Mark skb_ext_total_length() with __no_profile to prevent GCOV from
inserting counters into this function. Without counters the loop is
"clean" and GCC can constant-fold it even with -fno-tree-loop-im active.
This allows BUILD_BUG_ON to work correctly while keeping GCOV profiling
for the rest of the kernel.

This also removes the CONFIG_KCOV_INSTRUMENT_ALL preprocessor guard
introduced by d6e5794b06c0. That guard was added as a precaution because
KCOV instrumentation was also suspected of inhibiting constant folding.
However, KCOV uses -fsanitize-coverage=trace-pc, which inserts
lightweight trace callbacks that do not interfere with GCC's constant
folding or loop optimization passes. Only GCOV's -fprofile-arcs combined
with -fno-tree-loop-im actually prevents the compiler from evaluating
the loop at compile time. The guard is therefore unnecessary and can be
safely removed.

Fixes: 96ea3a1e2d31 ("can: add CAN skb extension infrastructure")
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Reviewed-by: Thomas Weissschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260410162150.3105738-2-khorenko@virtuozzo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipa-v5-2-support'

Luca Weiss says:

====================
IPA v5.2 support

Add support for IPA v5.2 which can be found in the Milos SoC.
====================

Link: https://patch.msgid.link/20260410-ipa-v5-2-v2-0-778422a05060@fairphone.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: add IPA v5.2 configuration data

Add the configuration data required for IPA v5.2, which is used in
the Qualcomm Milos SoC.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Link: https://patch.msgid.link/20260410-ipa-v5-2-v2-2-778422a05060@fairphone.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: qcom,ipa: add Milos compatible

Add support for the Milos SoC, which uses IPA v5.2.

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Link: https://patch.msgid.link/20260410-ipa-v5-2-v2-1-778422a05060@fairphone.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_eth_soc: initialize PPE per-tag-layer MTU registers

The PPE enforces output frame size limits via per-tag-layer VLAN_MTU
registers that the driver never initializes. The hardware defaults do
not account for PPPoE overhead, causing the PPE to punt encapsulated
frames back to the CPU instead of forwarding them.

Initialize the registers at PPE start and on MTU changes using the
maximum GMAC MTU. This is a conservative approximation -- the actual
per-PPE requirement depends on egress path, but using the global
maximum ensures the limits are never too small.

Fixes: ba37b7caf1ed2 ("net: ethernet: mtk_eth_soc: add support for initializing the PPE")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/ec995ab8ce8be423267a1cc093147a74d2eb9d82.1775789829.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pppox: convert pppox_sk() to use container_of()

Use container_of() macro instead of direct pointer casting to get the
pppox_sock from a sock pointer.

Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Link: https://patch.msgid.link/20260410054954.114031-2-qingfang.deng@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pppox: remove sk_pppox() helper

The sk member can be directly accessed from struct pppox_sock without
relying on type casting. Remove the sk_pppox() helper and update all
call sites to use po->sk directly.

Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Link: https://patch.msgid.link/20260410054954.114031-1-qingfang.deng@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rtc: abx80x: Disable alarm feature if no interrupt attached

Commit 795cda8338ea ("rtc: interface: Fix long-standing race when setting
alarm") exposed an issue where the rtc-abx80x driver does not clear the
alarm feature bit, but instead relies on the set_alarm operation to return
invalid.

For example, when a RTC_UIE_ON ioctl is handled, it should abort at the
feature validation. Instead, it proceeds to the rtc_timer_enqueue(),
which used to return an error from the set_alarm call. However,
following the race condition handling, which likely should not be
discarding predecing errors, a success condition is returned to the
ioctl() caller. This results in (for example):
hwclock: select() to /dev/rtc0 to wait for clock tick timed out

Notwithstanding the validity of the race condition handling, if an interrupt
wasn't specified, or could not be attached, the driver should clear the
alarm feature bit.

Fixes: 718a820a303c ("rtc: abx80x: add alarm support")
Signed-off-by: Anthony Pighin <anthony.pighin@nokia.com>
Link: https://patch.msgid.link/BN0PR08MB69510928028C933749F4139383D1A@BN0PR08MB6951.namprd08.prod.outlook.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

selftests/bpf: Use memfd_create instead of shm_open in cgroup_iter_memcg

Replace shm_open/shm_unlink with memfd_create in the shmem subtest.
shm_open requires /dev/shm to be mounted, which is not always available
in test environments, causing the test to fail with ENOENT.
memfd_create creates an anonymous shmem-backed fd without any filesystem
dependency while exercising the same shmem accounting path.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20260412210636.47516-1-alexei.starovoitov@gmail.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Merge branch 'mlx5-misc-fixes-2026-04-09'

Tariq Toukan says:

====================
mlx5 misc fixes 2026-04-09

This small patchset provides misc bug fixes from Gal to the mlx5 Eth
driver.
====================

Link: https://patch.msgid.link/20260409202852.158059-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: IPsec, fix ASO poll timeout with read_poll_timeout_atomic()

The do-while poll loop uses jiffies for its timeout:
expires = jiffies + msecs_to_jiffies(10);

jiffies is sampled at an arbitrary point within the current tick, so the
first partial tick contributes anywhere from a full tick down to nearly
zero real time. For small msecs_to_jiffies() results this is
significant, the effective poll window can be much shorter than the
requested 10ms, and in the worst case the loop exits after a single
iteration (e.g., when HZ=100), well before the device has delivered the
CQE.

Replace the loop with read_poll_timeout_atomic(), which counts elapsed
time via udelay() accounting rather than jiffies, guaranteeing the full
poll window regardless of HZ.

Additionally, read_poll_timeout_atomic() executes the poll operation one
more time after the timeout has expired, giving the CQE a final chance
to be detected. The old do-while loop could exit without a final poll if
the timeout expired during the udelay() between iterations.

Fixes: 76e463f6508b ("net/mlx5e: Overcome slow response for first IPsec ASO WQE")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260409202852.158059-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Fix features not applied during netdev registration

mlx5e_fix_features() returns early when the netdevice is not present.
This is correct during profile transitions where priv is cleared, but it
also incorrectly blocks feature fixups during register_netdev(), when
the device is also not yet present.

It is not trivial to distinguish between both cases as we cannot use
priv to carry state, and in both cases reg_state == NETREG_REGISTERED.

Force a netdev features update after register_netdev() completes, where
the device is present and fix_features() can actually work.

This is not a pretty solution, as it results in an additional features
update call (register_netdevice() already calls
__netdev_update_features() internally), but it is the simplest,
cleanest, and most robust way I found to fix this issue after multiple
attempts.

This fixes an issue on systems where CQE compression is enabled by
default, RXHASH remains enabled after registration despite the two
features being mutually exclusive.

Fixes: ab4b01bfdaa6 ("net/mlx5e: Verify dev is present for fix features ndo")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260409202852.158059-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Tariq Toukan says:

====================
mlx5-next updates 2026-04-09

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Add icm_mng_function_id_mode cap bit
  net/mlx5: Rename MLX5_PF page counter type to MLX5_SELF
  net/mlx5: Add vhca_id_type bit to alias context
  mlx5: Remove redundant iseg base
====================

Link: https://patch.msgid.link/20260409110431.154894-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vsock: fix buffer size clamping order

In vsock_update_buffer_size(), the buffer size was being clamped to the
maximum first, and then to the minimum. If a user sets a minimum buffer
size larger than the maximum, the minimum check overrides the maximum
check, inverting the constraint.

This breaks the intended socket memory boundaries by allowing the
vsk->buffer_size to grow beyond the configured vsk->buffer_max_size.

Fix this by checking the minimum first, and then the maximum. This
ensures the buffer size never exceeds the buffer_max_size.

Fixes: b9f2b0ffde0c ("vsock: handle buffer_size sockopts in the core")
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Norbert Szetei <norbert@doyensec.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/180118C5-8BCF-4A63-A305-4EE53A34AB9C@doyensec.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-reduce-sk_filter-and-friends-bloat'

Eric Dumazet says:

====================
net: reduce sk_filter() (and friends) bloat

Some functions return an error by value, and a drop_reason
by an output parameter. This extra parameter can force stack canaries.

A drop_reason is enough and more efficient.

This series reduces bloat by 678 bytes on x86_64:

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.final
add/remove: 0/0 grow/shrink: 3/18 up/down: 79/-757 (-678)
Function                                     old     new   delta
vsock_queue_rcv_skb                           50      79     +29
ipmr_cache_report                           1290    1315     +25
ip6mr_cache_report                          1322    1347     +25
tcp_v6_rcv                                  3169    3167      -2
packet_rcv_spkt                              329     327      -2
unix_dgram_sendmsg                          1731    1726      -5
netlink_unicast                              957     945     -12
netlink_dump                                1372    1359     -13
sk_filter_trim_cap                           889     858     -31
netlink_broadcast_filtered                  1633    1595     -38
tcp_v4_rcv                                  3152    3111     -41
raw_rcv_skb                                  122      80     -42
ping_queue_rcv_skb                           109      61     -48
ping_rcv                                     215     162     -53
rawv6_rcv_skb                                278     224     -54
__sk_receive_skb                             690     632     -58
raw_rcv                                      591     527     -64
udpv6_queue_rcv_one_skb                      935     869     -66
udp_queue_rcv_one_skb                        919     853     -66
tun_net_xmit                                1146    1074     -72
sock_queue_rcv_skb_reason                    166      76     -90
Total: Before=29722890, After=29722212, chg -0.00%

Future conversions from sock_queue_rcv_skb() to sock_queue_rcv_skb_reason()
can be done later.
====================

Link: https://patch.msgid.link/20260409145625.2306224-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: change sk_filter_trim_cap() to return a drop_reason by value

Current return value can be replaced with the drop_reason,
reducing kernel bloat:

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/2 grow/shrink: 1/11 up/down: 32/-603 (-571)
Function                                     old     new   delta
tcp_v6_rcv                                  3135    3167     +32
unix_dgram_sendmsg                          1731    1726      -5
netlink_unicast                              957     945     -12
netlink_dump                                1372    1359     -13
sk_filter_trim_cap                           882     858     -24
tcp_v4_rcv                                  3143    3111     -32
__pfx_tcp_filter                              32       -     -32
netlink_broadcast_filtered                  1633    1595     -38
sock_queue_rcv_skb_reason                    126      76     -50
tun_net_xmit                                1127    1074     -53
__sk_receive_skb                             690     632     -58
udpv6_queue_rcv_one_skb                      935     869     -66
udp_queue_rcv_one_skb                        919     853     -66
tcp_filter                                   154       -    -154
Total: Before=29722783, After=29722212, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409145625.2306224-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: change tcp_filter() to return the reason by value

sk_filter_trim_cap() will soon return the reason by value,
do the same for tcp_filter().

Note:

tcp_filter() is no longer inlined. Following patch will inline it again.

$ scripts/bloat-o-meter -t vmlinux.4 vmlinux.5
add/remove: 2/0 grow/shrink: 0/2 up/down: 186/-43 (143)
Function                                     old     new   delta
tcp_filter                                     -     154    +154
__pfx_tcp_filter                               -      32     +32
tcp_v4_rcv                                  3152    3143      -9
tcp_v6_rcv                                  3169    3135     -34
Total: Before=29722640, After=29722783, chg +0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409145625.2306224-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: change sk_filter_reason() to return the reason by value

sk_filter_trim_cap will soon return the reason by value,
do the same for sk_filter_reason().

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-21 (-21)
Function                                     old     new   delta
sock_queue_rcv_skb_reason                    128     126      -2
tun_net_xmit                                1146    1127     -19
Total: Before=29722661, After=29722640, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409145625.2306224-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: always set reason in sk_filter_trim_cap()

sk_filter_trim_cap() will soon return the drop reason by value.

Make sure *reason is cleared when no error is returned,
to ease this conversion.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-7 (-7)
Function old new delta
sk_filter_trim_cap 889 882 -7
Total: Before=29722668, After=29722661, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409145625.2306224-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: change sock_queue_rcv_skb_reason() to return a drop_reason

Change sock_queue_rcv_skb_reason() to return the drop_reason directly
instead of using a reference.

This is part of an effort to remove stack canaries and reduce bloat.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 3/7 up/down: 79/-301 (-222)
Function                                     old     new   delta
vsock_queue_rcv_skb                           50      79     +29
ipmr_cache_report                           1290    1315     +25
ip6mr_cache_report                          1322    1347     +25
packet_rcv_spkt                              329     327      -2
sock_queue_rcv_skb_reason                    166     128     -38
raw_rcv_skb                                  122      80     -42
ping_queue_rcv_skb                           109      61     -48
ping_rcv                                     215     162     -53
rawv6_rcv_skb                                278     224     -54
raw_rcv                                      591     527     -64
Total: Before=29722890, After=29722668, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409145625.2306224-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-support-for-pic64-hpsc-hx-mdio-controller'

Charles Perry says:

====================
Add support for PIC64-HPSC/HX MDIO controller

This series adds a driver for the two MDIO controllers of PIC64-HPSC/HX.
The hardware supports C22 and C45 but only C22 is implemented for now.

This MDIO hardware is based on a Microsemi design supported in Linux by
mdio-mscc-miim.c. However, The register interface is completely different
with pic64hpsc, hence the need for a separate driver.

The documentation recommends an input clock of 156.25MHz and a prescaler of
39, which yields an MDIO clock of 1.95MHz.

This was tested on Microchip HB1301 evalkit which has a VSC8574 and a
VSC8541. I've tested with bus frequencies of 0.6, 1.95 and 2.5 MHz.

This series also adds a PHY write barrier when disabling PHY interrupts as
discussed in: https://lore.kernel.org/acvUqDgepCIScs8M@shell.armlinux.org.uk
====================

Link: https://patch.msgid.link/20260408131821.1145334-1-charles.perry@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: add a PHY write barrier when disabling interrupts

MDIO bus controllers are not required to wait for write transactions to
complete before returning as synchronization is often achieved by polling
status bits.

This can cause issues when disabling interrupts since an interrupt could
fire before the interrupt handler is unregistered and there's no status
bit to poll.

Add a phy_write_barrier() function and use it in phy_disable_interrupts()
to fix this issue. The write barrier just reads an MII register and
discards the value, which is enough to guarantee that previous writes have
completed.

Signed-off-by: Charles Perry <charles.perry@microchip.com>
Link: https://patch.msgid.link/20260408131821.1145334-4-charles.perry@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: add a driver for PIC64-HPSC/HX MDIO controller

This adds an MDIO driver for PIC64-HPSC/HX. The hardware supports C22
and C45 but only C22 is implemented in this commit.

This MDIO hardware is based on a Microsemi design supported in Linux by
mdio-mscc-miim.c. However, The register interface is completely
different with pic64hpsc, hence the need for a separate driver.

The documentation recommends an input clock of 156.25MHz and a prescaler
of 39, which yields an MDIO clock of 1.95MHz.

The hardware supports an interrupt pin or a "TRIGGER" bit that can be
polled to signal transaction completion. This commit uses polling.

This was tested on Microchip HB1301 evalkit with a VSC8574 and a
VSC8541.

Signed-off-by: Charles Perry <charles.perry@microchip.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260408131821.1145334-3-charles.perry@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: document Microchip PIC64-HPSC/HX MDIO controller

This MDIO hardware is based on a Microsemi design supported in Linux by
mdio-mscc-miim.c. However, The register interface is completely different
with pic64hpsc, hence the need for separate documentation.

The hardware supports C22 and C45.

The documentation recommends an input clock of 156.25MHz and a prescaler
of 39, which yields an MDIO clock of 1.95MHz.

The hardware supports an interrupt pin to signal transaction completion
which is not strictly needed as the software can also poll a "TRIGGER"
bit for this.

Signed-off-by: Charles Perry <charles.perry@microchip.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260408131821.1145334-2-charles.perry@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fix a return path in get_phy_c45_ids()

The return value of phy_c45_probe_present() is stored in "ret", not
"phy_reg", fix this. "phy_reg" always has a positive value if we reach
this return path (since it would have returned earlier otherwise), which
means that the original goal of the patch of not considering -ENODEV
fatal wasn't achieved.

Fixes: 17b447539408 ("net: phy: c45 scanning: Don't consider -ENODEV fatal")
Signed-off-by: Charles Perry <charles.perry@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260409133654.3203336-1-charles.perry@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rtc: ntxec: fix OF node reference imbalance

The driver reuses the OF node of the parent multi-function device but
fails to take another reference to balance the one dropped by the
platform bus code when unbinding the MFD and deregistering the child
devices.

Fix this by using the intended helper for reusing OF nodes.

Fixes: 435af89786c6 ("rtc: New driver for RTC in Netronix embedded controller")
Cc: stable@vger.kernel.org # 5.13
Cc: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260407122717.2676774-1-johan@kernel.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

rtc: pic32: allow driver to be compiled with COMPILE_TEST

This driver currently only supports builds against a PIC32 target. Now
that commit ed65ae9f6c6b ("rtc: pic32: update include to use pic32.h
from platform_data") is merged, it's possible to compile this driver on
other architectures.

To avoid future breakage of this driver in the future, let's update the
Kconfig so that it can be built with COMPILE_TEST enabled on all
architectures.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://patch.msgid.link/20260222-rtc-pic32-v1-1-3f8eb654a34d@redhat.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

rtc: ti-k3: Add support to resume from IO DDR low power mode

Restore the RTC HW context which may be lost when system enters
certain low power mode (IO+DDR mode).
Check if the RTC registers are locked which would indicate loss of
context (reset) and restore the context as needed.

Signed-off-by: Akashdeep Kaur <a-kaur@ti.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Link: https://patch.msgid.link/20260313111740.1492519-1-a-kaur@ti.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

dt-bindings: net: dsa: nxp,sja1105: make spi-cpol optional for sja1110

Currently, the binding requires 'spi-cpha' for SJA1105 and 'spi-cpol'
for SJA1110.

However, the SJA1110 supports both SPI modes 0 and 2. Mode 2
(cpha=0, cpol=1) is used by the NXP LX2160 Bluebox 3.

On the SolidRun i.MX8DXL HummingBoard Telematics, mode 0 is stable,
while forcing mode 2 introduces CRC errors especially during bursts.

Drop the requirement on spi-cpol for SJA1110.

Fixes: af2eab1a8243 ("dt-bindings: net: nxp,sja1105: document spi-cpol/cpha")
Signed-off-by: Josua Mayer <josua@solid-run.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260409-imx8dxl-sr-som-v2-1-83ff20629ba0@solid-run.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeon_ep: Remove unnecessary semicolons in octep_oq_drop_rx()

Remove unnecessary semicolons in octep_oq_drop_rx().

Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.x90@mail.toshiba>
Link: https://patch.msgid.link/1775711291-13938-1-git-send-email-nobuhiro.iwamatsu.x90@mail.toshiba
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'more-fixes-for-the-ipa-driver'

Luca Weiss says:

====================
More fixes for the IPA driver

Two more fixes for the Qualcomm IPA driver.
====================

Link: https://patch.msgid.link/20260409-ipa-fixes-v1-0-a817c30678ac@fairphone.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: Fix decoding EV_PER_EE for IPA v5.0+

Initially 'reg' and 'val' are assigned from HW_PARAM_2.

But since IPA v5.0+ takes EV_PER_EE from HW_PARAM_4 (instead of
NUM_EV_PER_EE from HW_PARAM_2), we not only need to re-assign 'reg' but
also read the register value of that register into 'val' so that
reg_decode() works on the correct value.

Fixes: f651334e1ef5 ("net: ipa: add HW_PARAM_4 GSI register")
Link: https://sashiko.dev/#/patchset/20260403-milos-ipa-v1-0-01e9e4e03d3e%40fairphone.com?part=2
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20260409-ipa-fixes-v1-2-a817c30678ac@fairphone.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: Fix programming of QTIME_TIMESTAMP_CFG

The 'val' variable gets overwritten multiple times, discarding previous
values. Looking at the git log shows these should be combined with |=
instead.

Fixes: 9265a4f0f0b4 ("net: ipa: define even more IPA register fields")
Link: https://sashiko.dev/#/patchset/20260403-milos-ipa-v1-0-01e9e4e03d3e%40fairphone.com?part=4
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20260409-ipa-fixes-v1-1-a817c30678ac@fairphone.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Linux 7.0

ppp: require CAP_NET_ADMIN in target netns for unattached ioctls

/dev/ppp open is currently authorized against file->f_cred->user_ns,
while unattached administrative ioctls operate on current->nsproxy->net_ns.

As a result, a local unprivileged user can create a new user namespace
with CLONE_NEWUSER, gain CAP_NET_ADMIN only in that new user namespace,
and still issue PPPIOCNEWUNIT, PPPIOCATTACH, or PPPIOCATTCHAN against
an inherited network namespace.

Require CAP_NET_ADMIN in the user namespace that owns the target network
namespace before handling unattached PPP administrative ioctls.

This preserves normal pppd operation in the network namespace it is
actually privileged in, while rejecting the userns-only inherited-netns
case.

Fixes: 273ec51dd7ce ("net: ppp_generic - introduce net-namespace functionality v2")
Signed-off-by: Taegu Ha <hataegu0826@gmail.com>
Link: https://patch.msgid.link/20260409071117.4354-1-hataegu0826@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge patch series "bpf: Fix OOB in pcpu_init_value and add a test"

xulang <xulang@uniontech.com> says:
====================

Fix OOB read when copying element from a BPF_MAP_TYPE_CGROUP_STORAGE
map to another pcpu map with the same value_size that is not rounded
up to 8 bytes, and add a test case to reproduce the issue.

The root cause is that pcpu_init_value() uses copy_map_value_long() which
rounds up the copy size to 8 bytes, but CGROUP_STORAGE map values are not
8-byte aligned (e.g., 4-byte). This causes a 4-byte OOB read when
the copy is performed.
====================

Link: https://lore.kernel.org/r/7653EEEC2BAB17DF+20260402073948.2185396-1-xulang@uniontech.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add test for cgroup storage OOB read

Add a test case to reproduce the out-of-bounds read issue when copying
from a cgroup storage map to a pcpu map with a value_size not rounded
up to 8 bytes.

The test creates:
1. A CGROUP_STORAGE map with 4-byte value (not 8-byte aligned)
2. A LRU_PERCPU_HASH map with 4-byte value (same size)

When a socket is created in the cgroup, the BPF program triggers
bpf_map_update_elem() which calls copy_map_value_long(). This function
rounds up the copy size to 8 bytes, but the cgroup storage buffer is
only 4 bytes, causing an OOB read (before the fix).

Signed-off-by: Lang Xu <xulang@uniontech.com>
Link: https://lore.kernel.org/r/D63BF0DBFF1EA122+20260402074236.2187154-2-xulang@uniontech.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Fix OOB in pcpu_init_value

An out-of-bounds read occurs when copying element from a
BPF_MAP_TYPE_CGROUP_STORAGE map to another pcpu map with the
same value_size that is not rounded up to 8 bytes.

The issue happens when:
1. A CGROUP_STORAGE map is created with value_size not aligned to
8 bytes (e.g., 4 bytes)
2. A pcpu map is created with the same value_size (e.g., 4 bytes)
3. Update element in 2 with data in 1

pcpu_init_value assumes that all sources are rounded up to 8 bytes,
and invokes copy_map_value_long to make a data copy, However, the
assumption doesn't stand since there are some cases where the source
may not be rounded up to 8 bytes, e.g., CGROUP_STORAGE, skb->data.
the verifier verifies exactly the size that the source claims, not
the size rounded up to 8 bytes by kernel, an OOB happens when the
source has only 4 bytes while the copy size(4) is rounded up to 8.

Fixes: d3bec0138bfb ("bpf: Zero-fill re-used per-cpu map element")
Reported-by: Kaiyan Mei <kaiyanm@hust.edu.cn>
Closes: https://lore.kernel.org/all/14e6c70c.6c121.19c0399d948.Coremail.kaiyanm@hust.edu.cn/
Link: https://lore.kernel.org/r/420FEEDDC768A4BE+20260402074236.2187154-1-xulang@uniontech.com
Signed-off-by: Lang Xu <xulang@uniontech.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'net-rds-fix-use-after-free-in-rds-ib-for-non-init-namespaces'

Allison Henderson says:

====================
net/rds: Fix use-after-free in RDS/IB for non-init namespaces

This series fixes syzbot bug da8e060735ae02c8f3d1
https://syzkaller.appspot.com/bug?extid=da8e060735ae02c8f3d1

The report finds a use-after-free bug where ib connections access an
invalid network namespace after it has been freed.  The stack is:

    rds_rdma_cm_event_handler_cmn
      rds_conn_path_drop
        rds_destroy_pending
          check_net()  <-- use-after-free

This is initially introduced in:
d5a8ac28a7ff ("RDS-TCP: Make RDS-TCP work correctly when it is set up
in a netns other than init_net").

Here, we made RDS aware of the namespace by storing a net pointer in
each connection.  But it is not explicitly restricted to init_net in
the case of ib. The RDS/TCP transport has its own pernet exit handler
(rds_tcp_exit_net) that destroys connections when a namespace is torn
down. But RDS/IB does not support more than the initial namespace and
has no such handler. The initial namespace is statically allocated,
and never torn down, so it always has at least one reference.

Allowing non init namespaces that do not have a persistent reference
means that when their refcounts drop to zero, they are released through
cleanup_net(). Which would call any registered pernet clean up handlers
if it had any, but since they don't in this case, the extra
rds_connections remain with stale c_net pointers.  Which are then
accessed later causing the use-after-free bug.

So, the simple fix is to disallow more than the initial namespace
to be created in the case of ib connections.

Fixes are ported from UEK patches found here:

  https://github.com/oracle/linux-uek/commit/8ed9a82376b7
  Patch 1 is a prerequisite optimization to rds_ib_laddr_check() that
  avoids excessive rdma_bind_addr() calls during transport probing by
  first checking rds_ib_get_device().  This is needed because patch 2
  adds a namespace check at the top of the same function.

    UEK: 8ed9a82376b7 ("rds: ib: Optimize rds_ib_laddr_check")

  https://github.com/oracle/linux-uek/commit/bd9489a08004
  Patch 2 restricts RDS/IB to the initial network namespace.  It adds
  checks in both rds_ib_laddr_check() and rds_set_transport() to reject
  IB use from non-init namespaces with -EPROTOTYPE.  This prevents the
  use-after-free by ensuring IB connections cannot exist in namespaces
  that may be torn down.

    UEK: bd9489a08004 ("net/rds: Restrict use of RDS/IB to the initial
    network namespace")

Questions, comments and feedback appreciated!
====================

Link: https://patch.msgid.link/20260408080420.540032-1-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/rds: Restrict use of RDS/IB to the initial network namespace

Prevent using RDS/IB in network namespaces other than the initial one.
The existing RDS/IB code will not work properly in non-initial network
namespaces.

Fixes: d5a8ac28a7ff ("RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net")
Reported-by: syzbot+da8e060735ae02c8f3d1@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=da8e060735ae02c8f3d1
Signed-off-by: Greg Jumper <greg.jumper@oracle.com>
Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260408080420.540032-3-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/rds: Optimize rds_ib_laddr_check

rds_ib_laddr_check() creates a CM_ID and attempts to bind the address
in question to it. This in order to qualify the allegedly local
address as a usable IB/RoCE address.

In the field, ExaWatcher runs rds-ping to all ports in the fabric from
all local ports. This using all active ToS'es. In a full rack system,
we have 14 cell servers and eight db servers. Typically, 6 ToS'es are
used. This implies 528 rds-ping invocations per ExaWatcher's "RDSinfo"
interval.

Adding to this, each rds-ping invocation creates eight sockets and
binds the local address to them:

socket(AF_RDS, SOCK_SEQPACKET, 0)       = 3
bind(3, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 4
bind(4, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 5
bind(5, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 6
bind(6, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 7
bind(7, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 8
bind(8, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 9
bind(9, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("192.168.36.2")}, 16) = 0
socket(AF_RDS, SOCK_SEQPACKET, 0)       = 10
bind(10, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("192.168.36.2")}, 16) = 0

So, at every interval ExaWatcher executes rds-ping's, 4224 CM_IDs are
allocated, considering this full-rack system. After the a CM_ID has
been allocated, rdma_bind_addr() is called, with the port number being
zero. This implies that the CMA will attempt to search for an un-used
ephemeral port. Simplified, the algorithm is to start at a random
position in the available port space, and then if needed, iterate
until an un-used port is found.

The book-keeping of used ports uses the idr system, which again uses
slab to allocate new struct idr_layer's. The size is 2092 bytes and
slab tries to reduce the wasted space. Hence, it chooses an order:3
allocation, for which 15 idr_layer structs will fit and only 1388
bytes are wasted per the 32KiB order:3 chunk.

Although this order:3 allocation seems like a good space/speed
trade-off, it does not resonate well with how it used by the CMA. The
combination of the randomized starting point in the port space (which
has close to zero spatial locality) and the close proximity in time of
the 4224 invocations of the rds-ping's, creates a memory hog for
order:3 allocations.

These costly allocations may need reclaims and/or compaction. At
worst, they may fail and produce a stack trace such as (from uek4):

[<ffffffff811a72d5>] __inc_zone_page_state+0x35/0x40
[<ffffffff811c2e97>] page_add_file_rmap+0x57/0x60
[<ffffffffa37ca1df>] remove_migration_pte+0x3f/0x3c0 [ksplice_6cn872bt_vmlinux_new]
[<ffffffff811c3de8>] rmap_walk+0xd8/0x340
[<ffffffff811e8860>] remove_migration_ptes+0x40/0x50
[<ffffffff811ea83c>] migrate_pages+0x3ec/0x890
[<ffffffff811afa0d>] compact_zone+0x32d/0x9a0
[<ffffffff811b00ed>] compact_zone_order+0x6d/0x90
[<ffffffff811b03b2>] try_to_compact_pages+0x102/0x270
[<ffffffff81190e56>] __alloc_pages_direct_compact+0x46/0x100
[<ffffffff8119165b>] __alloc_pages_nodemask+0x74b/0xaa0
[<ffffffff811d8411>] alloc_pages_current+0x91/0x110
[<ffffffff811e3b0b>] new_slab+0x38b/0x480
[<ffffffffa41323c7>] __slab_alloc+0x3b7/0x4a0 [ksplice_s0dk66a8_vmlinux_new]
[<ffffffff811e42ab>] kmem_cache_alloc+0x1fb/0x250
[<ffffffff8131fdd6>] idr_layer_alloc+0x36/0x90
[<ffffffff8132029c>] idr_get_empty_slot+0x28c/0x3d0
[<ffffffff813204ad>] idr_alloc+0x4d/0xf0
[<ffffffffa051727d>] cma_alloc_port+0x4d/0xa0 [rdma_cm]
[<ffffffffa0517cbe>] rdma_bind_addr+0x2ae/0x5b0 [rdma_cm]
[<ffffffffa09d8083>] rds_ib_laddr_check+0x83/0x2c0 [ksplice_6l2xst5i_rds_rdma_new]
[<ffffffffa05f892b>] rds_trans_get_preferred+0x5b/0xa0 [rds]
[<ffffffffa05f09f2>] rds_bind+0x212/0x280 [rds]
[<ffffffff815b4016>] SYSC_bind+0xe6/0x120
[<ffffffff815b4d3e>] SyS_bind+0xe/0x10
[<ffffffff816b031a>] system_call_fastpath+0x18/0xd4

To avoid these excessive calls to rdma_bind_addr(), we optimize
rds_ib_laddr_check() by simply checking if the address in question has
been used before. The rds_rdma module keeps track of addresses
associated with IB devices, and the function rds_ib_get_device() is
used to determine if the address already has been qualified as a valid
local address. If not found, we call the legacy rds_ib_laddr_check(),
now renamed to rds_ib_laddr_check_cm().

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260408080420.540032-2-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-hamradio-fix-missing-input-validation-in-bpqether-and-scc'

Mashiro Chen says:

====================
net: hamradio: fix missing input validation in bpqether and scc

This series fixes two missing input validation bugs in the hamradio
drivers. Both patches were reviewed by Joerg Reuter (hamradio
maintainer).
====================

Link: https://patch.msgid.link/20260409024927.24397-1-mashiro.chen@mailbox.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hamradio: scc: validate bufsize in SIOCSCCSMEM ioctl

The SIOCSCCSMEM ioctl copies a scc_mem_config from user space and
assigns its bufsize field directly to scc->stat.bufsize without any
range validation:

scc->stat.bufsize = memcfg.bufsize;

If a privileged user (CAP_SYS_RAWIO) sets bufsize to 0, the receive
interrupt handler later calls dev_alloc_skb(0) and immediately writes
a KISS type byte via skb_put_u8() into a zero-capacity socket buffer,
corrupting the adjacent skb_shared_info region.

Reject bufsize values smaller than 16; this is large enough to hold
at least one KISS header byte plus useful data.

Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org>
Acked-by: Joerg Reuter <jreuter@yaina.de>
Link: https://patch.msgid.link/20260409024927.24397-3-mashiro.chen@mailbox.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hamradio: bpqether: validate frame length in bpq_rcv()

The BPQ length field is decoded as:

  len = skb->data[0] + skb->data[1] * 256 - 5;

If the sender sets bytes [0..1] to values whose combined value is
less than 5, len becomes negative.  Passing a negative int to
skb_trim() silently converts to a huge unsigned value, causing the
function to be a no-op.  The frame is then passed up to AX.25 with
its original (untrimmed) payload, delivering garbage beyond the
declared frame boundary.

Additionally, a negative len corrupts the 64-bit rx_bytes counter
through implicit sign-extension.

Add a bounds check before pulling the length bytes: reject frames
where len is negative or exceeds the remaining skb data.

Acked-by: Joerg Reuter <jreuter@yaina.de>
Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org>
Link: https://patch.msgid.link/20260409024927.24397-2-mashiro.chen@mailbox.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/bpf: Fix reg_bounds to match new tnum-based refinement

Commit efc11a667878 ("bpf: Improve bounds when tnum has a single
possible value") improved the bounds refinement to detect when the tnum
and u64 range overlap in a single value (and the bounds can thus be set
to that value).

Eduard then noticed that it broke the slow-mode reg_bounds selftests
because they don't have an equivalent logic and are therefore unable to
refine the bounds as much as the verifier. The following test case
illustrates this.

  ACTUAL   TRUE1:  scalar(u64=0xffffffff00000000,u32=0,s64=0xffffffff00000000,s32=0)
  EXPECTED TRUE1:  scalar(u64=[0xfffffffe00000001; 0xffffffff00000000],u32=0,s64=[0xfffffffe00000001; 0xffffffff00000000],s32=0)
  [...]
  #323/1007 reg_bounds_gen_consts_s64_s32/(s64)[0xfffffffe00000001; 0xffffffff00000000] (s32)<op> S64_MIN:FAIL

with the verifier logs:

  [...]
  19: w0 = w6                 ; R0=scalar(smin=0,smax=umax=0xffffffff,
                                          var_off=(0x0; 0xffffffff))
                                R6=scalar(smin=0xfffffffe00000001,smax=0xffffffff00000000,
                                          umin=0xfffffffe00000001,umax=0xffffffff00000000,
                                          var_off=(0xfffffffe00000000; 0x1ffffffff))
  20: w0 = w7                 ; R0=0 R7=0x8000000000000000
  21: if w6 == w7 goto pc+3
  [...]
  from 21 to 25: [...]
  25: w0 = w6                 ; R0=0 R6=0xffffffff00000000
                              ;         ^
                              ;         unexpected refined value
  26: w0 = w7                 ; R0=0 R7=0x8000000000000000
  27: exit

When w6 == w7 is true, the verifier can deduce that the R6's tnum is
equal to (0xfffffffe00000000; 0x100000000) and then use that information
to refine the bounds: the tnum only overlap with the u64 range in
0xffffffff00000000. The reg_bounds selftest doesn't know about tnums
and therefore fails to perform the same refinement.

This issue happens when the tnum carries information that cannot be
represented in the ranges, as otherwise the selftest could reach the
same refined value using just the ranges. The tnum thus needs to
represent non-contiguous values (ex., R6's tnum above, after the
condition). The only way this can happen in the reg_bounds selftest is
at the boundary between the 32 and 64bit ranges. We therefore only need
to handle that case.

This patch fixes the selftest refinement logic by checking if the u32
and u64 ranges overlap in a single value. If so, the ranges can be set
to that value. We need to handle two cases: either they overlap in
umin64...

  u64 values
  matching u32 range:     xxx        xxx        xxx        xxx
                      |--------------------------------------|
  u64 range:          0                xxxxx                 UMAX64

or in umax64:

  u64 values
  matching u32 range:     xxx        xxx        xxx        xxx
                      |--------------------------------------|
  u64 range:          0          xxxxx                       UMAX64

To detect the first case, we decrease umax64 to the maximum value that
matches the u32 range. If that happens to be umin64, then umin64 is the
only overlap. We proceed similarly for the second case, increasing
umin64 to the minimum value that matches the u32 range.

Note this is similar to how the verifier handles the general case using
tnum, but we don't need to care about a single-value overlap in the
middle of the range. That case is not possible when comparing two
ranges.

This patch also adds two test cases reproducing this bug as part of the
normal test runs (without SLOW_TESTS=1).

Fixes: efc11a667878 ("bpf: Improve bounds when tnum has a single possible value")
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Closes: https://lore.kernel.org/bpf/4e6dd64a162b3cab3635706ae6abfdd0be4db5db.camel@gmail.com/
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/ada9UuSQi2SE2IfB@mail.gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

net: rose: reject truncated CLEAR_REQUEST frames in state machines

All five ROSE state machines (states 1-5) handle ROSE_CLEAR_REQUEST
by reading the cause and diagnostic bytes directly from skb->data[3]
and skb->data[4] without verifying that the frame is long enough:

rose_disconnect(sk, ..., skb->data[3], skb->data[4]);

The entry-point check in rose_route_frame() only enforces
ROSE_MIN_LEN (3 bytes), so a remote peer on a ROSE network can
send a syntactically valid but truncated CLEAR_REQUEST (3 or 4
bytes) while a connection is open in any state. Processing such a
frame causes a one- or two-byte out-of-bounds read past the skb
data, leaking uninitialized heap content as the cause/diagnostic
values returned to user space via getsockopt(ROSE_GETCAUSE).

Add a single length check at the rose_process_rx_frame() dispatch
point, before any state machine is entered, to drop frames that
carry the CLEAR_REQUEST type code but are too short to contain the
required cause and diagnostic fields.

Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org>
Link: https://patch.msgid.link/20260408172551.281486-1-mashiro.chen@mailbox.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

i3c: mipi-i3c-hci: fix IBI payload length calculation for final status

In DMA mode, the IBI status descriptor encodes the payload using
CHUNKS (number of chunks) and DATA_LENGTH (valid bytes in the last
chunk). All preceding chunks are implicitly full-sized.

The current code accumulates full chunk sizes for non-final status
descriptors, but for the final status descriptor it only adds
DATA_LENGTH. This ignores the contribution of the preceding full
chunks described by the same final status entry.

As a result, the computed IBI payload length is truncated whenever
the final status spans multiple chunks. For example, with a chunk
size of 4 bytes, CHUNKS=2 and DATA_LENGTH=1 should result in a total
payload size of 5 bytes, but the current code reports only 1 byte.

Fix the calculation by adding the size of (CHUNKS - 1) full chunks
plus DATA_LENGTH for the last chunk.

Fixes: 9ad9a52cce28 ("i3c/master: introduce the mipi-i3c-hci driver")
Signed-off-by: Billy Tsai <billy_tsai@aspeedtech.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260407-i3c-hci-dma-v2-1-a583187b9d22@aspeedtech.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

Merge branch 'net-enetc-improve-statistics-for-v1-and-add-statistics-for-v4'

Wei Fang says:

====================
net: enetc: improve statistics for v1 and add statistics for v4

For ENETC v1, some standardized statistics were redundantly included in
the unstructured statistics, so remove these duplicated entries.
Previously, the unstructured statistics only contained eMAC data and
did not include pMAC data; add pMAC statistics to ensure completeness.

For ENETC v4, the driver previously reported MAC statistics only for the
internal ENETC (Pseudo MAC). Extend the implementation to provide
additional statistics for both the internal ENETC and the standalone
ENETC.
====================

Link: https://patch.msgid.link/20260408055849.1314033-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: add unstructured counters for ENETC v4

Like ENETC v1, ENETC v4 also has many non-standard counters, so these
counters are added to improve statistical coverage.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260408055849.1314033-6-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: add unstructured pMAC counters for ENETC v1

The ENETC v1 has two MACs (eMAC and pMAC) to support preemption. The
existing unstructured counters include the eMAC counters, but not the
pMAC counters. So add pMAC counters to improve statistical coverage.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260408055849.1314033-5-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: remove standardized counters from enetc_pm_counters

The standardized counters are already exposed via the get_pause_stats(),
get_rmon_stats(), get_eth_ctrl_stats() and get_eth_mac_stats()
interfaces. Keeping the same counters in enetc_pm_counters results in
redundant output.

Remove these standardized counters from enetc_pm_counters and rely on
the existing statistics interfaces to report them.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260408055849.1314033-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: show RX drop counters only for assigned RX rings

For ENETC v1, each SI provides 16 RBDCR registers for RX ring drop
counters, but this does not imply that an SI actually owns 16 RX rings.
The ENETC hardware supports a total of 16 RX rings, which are assigned
to 3 SIs (1 PSI and 2 VSIs), so each SI is assigned fewer than 16 RX
rings.

The current implementation always reports 16 RX drop counters per SI,
leading to redundant output for SIs with fewer RX rings. Update the
logic to display drop counters only for the RX rings that are actually
assigned to the SI.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260408055849.1314033-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: add support for the standardized counters

ENETC v4 provides 64-bit counters for IEEE 802.3 basic and mandatory
managed objects, the IETF Management Information Database (MIB) package
(RFC2665), and Remote Network Monitoring (RMON) statistics. In addition,
some ENETCs support preemption, so these ENETCs have two MACs: MAC 0 is
the express MAC (eMAC), MAC 1 is the preemptible MAC (pMAC). Both MACs
support these statistics.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260408055849.1314033-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/bpf: Add tests for non-arena/arena operations

Add a selftest that ensures instructions with arena source and
non-arena destination registers are accepted by the verifier.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20260412174546.18684-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Allow instructions with arena source and non-arena dest registers

The compiler sometimes stores the result of a PTR_TO_ARENA and SCALAR
operation into the scalar register rather than the pointer register.
Relax the verifier to allow operations between a source arena register
and a destination non-arena register, marking the destination's value
as a PTR_TO_ARENA.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Song Liu <song@kernel.org>
Fixes: 6082b6c328b5 ("bpf: Recognize addr_space_cast instruction in the verifier.")
Link: https://lore.kernel.org/r/20260412174546.18684-2-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-add-the-missing-fsession'

Menglong Dong says:

====================
bpf: add the missing fsession

Add the missing fsession attach type to the BPF docs, verifier log and
bpftool.

Changes since v2:
- replace "FENTRY/FEXIT/FSESSION" with "Tracing" in the 1st patch
- v2: https://lore.kernel.org/all/20260408062109.386083-1-dongml2@chinatelecom.cn/

Changes since v1:
- add a missing FSESSION in bpf_check_attach_target() in the 1st patch
- v1: https://lore.kernel.org/all/20260408031416.266229-1-dongml2@chinatelecom.cn/
====================

Link: https://patch.msgid.link/20260412060346.142007-1-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpftool: add missing fsession to the usage and docs of bpftool

Add the fsession attach type to the usage of bpftool in do_help().
Meanwhile, add it to the bash-completion and bpftool-prog.rst too.

Acked-by: Leon Hwang <leon.hwang@linux.dev>
Acked-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20260412060346.142007-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

docs/bpf: add missing fsession attach type to docs

Add the fsession attach type to program_types.rst and drgn.rst.

Acked-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20260412060346.142007-3-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: add missing fsession to the verifier log

The fsession attach type is missed in the verifier log in
check_get_func_ip(), bpf_check_attach_target() and check_attach_btf_id().
Update them to make the verifier log proper. Meanwhile, update the
corresponding selftests.

Acked-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20260412060346.142007-2-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-split-verifier-c'

Alexei Starovoitov says:

====================
v3->v4: Restore few minor comments and undo few function moves
v2->v3: Actually restore comments lost in patch 3
(instead of adding them to patch 4)
v1->v2: Restore comments lost in patch 3

verifier.c is huge. Split it into logically independent pieces.
No functional changes.
The diff is impossible to review over email.
'git show' shows minimal actual changes. Only plenty of moved lines.
Such split may cause backport headaches.
We should have split it long ago.
Even after split verifier.c is still 20k lines,
but further split is harder.
====================

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260412152936.54262-1-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Move BTF checking logic into check_btf.c

BTF validation logic is independent from the main verifier.
Move it into check_btf.c

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260412152936.54262-7-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Move backtracking logic to backtrack.c

Move precision propagation and backtracking logic to backtrack.c
to reduce verifier.c size.

No functional changes.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260412152936.54262-6-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Move state equivalence logic to states.c

verifier.c is huge. Move is_state_visited() to states.c,
so that all state equivalence logic is in one file.

Mechanical move. No functional changes.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260412152936.54262-5-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Move check_cfg() into cfg.c

verifier.c is huge. Move check_cfg(), compute_postorder(),
compute_scc() into cfg.c

Mechanical move. No functional changes.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260412152936.54262-4-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Move compute_insn_live_regs() into liveness.c

verifier.c is huge. Move compute_insn_live_regs() into liveness.c.

Mechanical move. No functional changes.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260412152936.54262-3-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Move fixup/post-processing logic from verifier.c into fixups.c

verifier.c is huge. Split fixup/post-processing logic that runs after
the verifier accepted the program into fixups.c.

Mechanical move. No functional changes.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260412152936.54262-2-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

gre: Count GRE packet drops

GRE is silently dropping packets without updating statistics.

In case of drop, increment rx_dropped counter to provide visibility into
packet loss. For the case where no GRE protocol handler is registered,
use rx_nohandler.

Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260409090945.1542440-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'bpf-fix-sock_ops_get_sk-same-register-oob-read-in-sock_ops-and-add-selftest'

Jiayuan Chen says:

====================
bpf: Fix SOCK_OPS_GET_SK same-register OOB read in sock_ops and add selftest

When a BPF sock_ops program accesses ctx fields with dst_reg == src_reg,
the SOCK_OPS_GET_SK() and SOCK_OPS_GET_FIELD() macros fail to zero the
destination register in the !fullsock / !locked_tcp_sock path, leading to
OOB read (GET_SK) and kernel pointer leak (GET_FIELD).

Patch 1: Fix both macros by adding BPF_MOV64_IMM(si->dst_reg, 0) in the
!fullsock landing pad.
Patch 2: Add selftests covering same-register and different-register cases
for both GET_SK and GET_FIELD.

[1] https://lore.kernel.org/bpf/6fe1243e-149b-4d3b-99c7-fcc9e2f75787@std.uestc.edu.cn/T/#u
====================

Link: https://patch.msgid.link/20260407022720.162151-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/bpf: Add tests for sock_ops ctx access with same src/dst register

Add selftests to verify SOCK_OPS_GET_SK() and SOCK_OPS_GET_FIELD() correctly
return NULL/zero when dst_reg == src_reg and is_fullsock == 0.

Three subtests are included:
- get_sk: ctx->sk with same src/dst register (SOCK_OPS_GET_SK)
- get_field: ctx->snd_cwnd with same src/dst register (SOCK_OPS_GET_FIELD)
- get_sk_diff_reg: ctx->sk with different src/dst register (baseline)

Each BPF program uses inline asm (__naked) to force specific register
allocation, reads is_fullsock first, then loads the field using the same
(or different) register. The test triggers TCP_NEW_SYN_RECV via a TCP
handshake and checks that the result is NULL/zero when is_fullsock == 0.

Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260407022720.162151-3-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bpf: Fix same-register dst/src OOB read and pointer leak in sock_ops

When a BPF sock_ops program accesses ctx fields with dst_reg == src_reg,
the SOCK_OPS_GET_SK() and SOCK_OPS_GET_FIELD() macros fail to zero the
destination register in the !fullsock / !locked_tcp_sock path.

Both macros borrow a temporary register to check is_fullsock /
is_locked_tcp_sock when dst_reg == src_reg, because dst_reg holds the
ctx pointer. When the check is false (e.g., TCP_NEW_SYN_RECV state with
a request_sock), dst_reg should be zeroed but is not, leaving the stale
ctx pointer:

- SOCK_OPS_GET_SK: dst_reg retains the ctx pointer, passes NULL checks
   as PTR_TO_SOCKET_OR_NULL, and can be used as a bogus socket pointer,
   leading to stack-out-of-bounds access in helpers like
   bpf_skc_to_tcp6_sock().

- SOCK_OPS_GET_FIELD: dst_reg retains the ctx pointer which the
   verifier believes is a SCALAR_VALUE, leaking a kernel pointer.

Fix both macros by:
- Changing JMP_A(1) to JMP_A(2) in the fullsock path to skip the
   added instruction.
- Adding BPF_MOV64_IMM(si->dst_reg, 0) after the temp register
   restore in the !fullsock path, placed after the restore because
   dst_reg == src_reg means we need src_reg intact to read ctx->temp.

Fixes: fd09af010788 ("bpf: sock_ops ctx access may stomp registers in corner case")
Fixes: 84f44df664e9 ("bpf: sock_ops sk access may stomp registers when dst_reg = src_reg")
Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Reported-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Closes: https://lore.kernel.org/bpf/6fe1243e-149b-4d3b-99c7-fcc9e2f75787@std.uestc.edu.cn/T/#u
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260407022720.162151-2-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

perf sample: Fix documentation typo

s/PEF/PERF/

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Documentation: core-api: real-time: correct spelling

Fix typo "excpetion" with "exception".

Signed-off-by: Sukrut Heroorkar <hsukrut3@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260411155120.233357-1-hsukrut3@gmail.com>

doc: Add CPU Isolation documentation

nohz_full was introduced in v3.10 in 2013, which means this
documentation is overdue for 13 years.

Fortunately Paul wrote a part of the needed documentation a while ago,
especially concerning nohz_full in Documentation/timers/no_hz.rst and
also about per-CPU kthreads in
Documentation/admin-guide/kernel-per-CPU-kthreads.rst

Introduce a new page that gives an overview of CPU isolation in general.

Acked-by: Waiman Long <longman@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260402094749.18879-1-frederic@kernel.org>

Merge tag 'edac_urgent_for_7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC fix from Borislav Petkov:

- Fix the error path ordering when the driver-private descriptor
allocation fails

* tag 'edac_urgent_for_7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/mc: Fix error path ordering in edac_mc_alloc()

NFC: digital: Bounds check NFC-A cascade depth in SDD response handler

The NFC-A anti-collision cascade in digital_in_recv_sdd_res() appends 3
or 4 bytes to target->nfcid1 on each round, but the number of cascade
rounds is controlled entirely by the peer device. The peer sets the
cascade tag in the SDD_RES (deciding 3 vs 4 bytes) and the
cascade-incomplete bit in the SEL_RES (deciding whether another round
follows).

ISO 14443-3 limits NFC-A to three cascade levels and target->nfcid1 is
sized accordingly (NFC_NFCID1_MAXSIZE = 10), but nothing in the driver
actually enforces this. This means a malicious peer can keep the
cascade running, writing past the heap-allocated nfc_target with each
round.

Fix this by rejecting the response when the accumulated UID would exceed
the buffer.

Commit e329e71013c9 ("NFC: nci: Bounds check struct nfc_target arrays")
fixed similar missing checks against the same field on the NCI path.

Cc: Simon Horman <horms@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Thierry Escande <thierry.escande@linux.intel.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Fixes: 2c66daecc409 ("NFC Digital: Add NFC-A technology support")
Cc: stable <stable@kernel.org>
Assisted-by: gregkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026040913-figure-seducing-bd3f@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net_sched: fix skb memory leak in deferred qdisc drops

When the network stack cleans up the deferred list via qdisc_run_end(),
it operates on the root qdisc. If the root qdisc do not implement the
TCQ_F_DEQUEUE_DROPS flag the packets queue to free are never freed and
gets stranded on the child's local to_free list.

Fix this by making qdisc_dequeue_drop() aware of the root qdisc. It
fetches the root qdisc and check for the TCQ_F_DEQUEUE_DROPS flag. If
the flag is present, the packet is appended directly to the root's
to_free list. Otherwise, drop it directly as it was done before the
optimization was implemented.

Fixes: a6efc273ab82 ("net_sched: use qdisc_dequeue_drop() in cake, codel, fq_codel")
Reported-by: Damilola Bello <damilola@aterlo.com>
Closes: https://lore.kernel.org/netdev/CAPgFtOLaedBMU0f_BxV2bXftTJSmJr018Q5uozOo5vVo6b9tjw@mail.gmail.com/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260408100044.4530-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-add-support-for-disabling-autonomous-eee'

Nicolai Buchwitz says:

====================
net: phy: add support for disabling autonomous EEE

Some PHYs implement autonomous EEE where the PHY manages EEE
independently, preventing the MAC from controlling LPI signaling.
This conflicts with MACs that implement their own LPI control.

This series adds a .disable_autonomous_eee callback to struct phy_driver
and calls it from phy_support_eee(). When a MAC indicates it supports
EEE, the PHY's autonomous EEE is automatically disabled. The setting is
persisted across suspend/resume by re-applying it in phy_init_hw() after
soft reset, following the same pattern suggested by Russell King for PHY
tunables [1].

Patch 1 adds the phylib infrastructure.
Patch 2 implements it for Broadcom BCM54xx (AutogrEEEn).
Patch 3 converts the Realtek RTL8211F, which previously unconditionally
disabled PHY-mode EEE in config_init.

This came up while adding EEE support to the Cadence macb driver (used
on Raspberry Pi 5 with a BCM54210PE PHY). The PHY's AutogrEEEn mode
prevented the MAC from tracking LPI state. The Realtek RTL8211F has
the same pattern, unconditionally disabling PHY-mode EEE with the
comment "Disable PHY-mode EEE so LPI is passed to the MAC".

Other BCM54xx PHYs likely have the same AutogrEEEn register layout,
but I only have access to the BCM54210PE/BCM54213PE datasheets. It
would be appreciated if Florian or others could confirm which other
BCM54xx variants share this register so we can wire them up too.

Tested on Raspberry Pi CM4 (bcmgenet + BCM54210PE),
Raspberry Pi CM5 (Cadence GEM + BCM54210PE) and
Raspberry Pi 5 (Cadence GEM + BCM54213PE).

[1] https://lore.kernel.org/netdev/acuwvoydmJusuj9x@shell.armlinux.org.uk/
====================

Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-0-b335e7143711@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: convert RTL8211F to .disable_autonomous_eee

The RTL8211F previously unconditionally disabled PHY-mode EEE in
config_init. Convert this to use the new .disable_autonomous_eee
callback so it is only disabled when the MAC indicates EEE support
via phy_support_eee().

This preserves PHY-autonomous EEE for MACs that do not support EEE,
while still disabling it when the MAC manages LPI.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-3-b335e7143711@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: broadcom: implement .disable_autonomous_eee for BCM54xx

Implement the .disable_autonomous_eee callback for the BCM54210E.

In AutogrEEEn mode the PHY manages EEE autonomously. Clearing the
AutogrEEEn enable bit in MII_BUF_CNTL_0 switches the PHY to Native
EEE mode.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-2-b335e7143711@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>