git.ipfire.org Git - thirdparty/linux.git/log

Merge branch 'enic-sr-iov-v2-preparatory-infrastructure'

Satish Kharat says:

====================
enic: SR-IOV V2 preparatory infrastructure

This is the first of four series adding SR-IOV V2 support to the enic
driver for Cisco VIC 14xx/15xx adapters.

The existing V1 SR-IOV implementation has VFs that interact directly
with the VIC firmware, leaving the PF driver with no visibility or
control over VF behavior. V2 introduces a PF-mediated model where VFs
communicate with the PF through a mailbox over a dedicated admin
channel. This brings enic in line with the standard Linux SR-IOV
model, enabling full PF management of VFs via ip link (MAC, VLAN,
link state, spoofchk, trust, and per-VF statistics).

This preparatory series adds detection and resource helper code with
no functional change to existing driver behavior:

  - Extend BAR resource discovery for admin channel resources
  - Register the V2 VF PCI device ID
  - Detect VF type (V1/V2/usNIC) from SR-IOV PCI capability
  - Make enic_dev_enable/disable ref-counted for shared use by data
    path and admin channel
  - Add type-aware resource allocation for admin WQ/RQ/CQ/INTR
  - Detect presence of admin channel resources at probe time

Tested on VIC 14xx and 15xx series adapters with V2 VFs under KVM
(sriov_numvfs, VF passthrough, ip link VF configuration, VF traffic).

Based in part on initial work by Christian Benvenuti.
====================

Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

enic: detect admin channel resources for SR-IOV

Check for the presence of admin channel BAR resources
(RES_TYPE_ADMIN_WQ, ADMIN_RQ, ADMIN_CQ, SRIOV_INTR) during resource
discovery. Set has_admin_channel when all four are available.

Use ARRAY_SIZE(enic->admin_cq) for the admin CQ count check since the
driver allocates two admin CQs (one for WQ completions, one for RQ
completions) and both must be backed by hardware resources.

Add admin WQ, RQ, CQ and INTR fields to struct enic for use by the
upcoming admin channel open/close paths.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-6-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

enic: add type-aware alloc for WQ, RQ, CQ and INTR resources

The existing vnic_wq_alloc(), vnic_rq_alloc(), vnic_cq_alloc() and
vnic_intr_alloc() hardcode data-path resource types (RES_TYPE_WQ,
RES_TYPE_RQ, RES_TYPE_CQ, RES_TYPE_INTR_CTRL). The upcoming admin
channel uses different BAR resource types (RES_TYPE_ADMIN_WQ/RQ/CQ,
RES_TYPE_SRIOV_INTR) for its queues.

Add _with_type() variants that accept an explicit resource type
parameter. Refactor the original functions as thin wrappers that
pass the default data-path type. No functional change.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-5-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

enic: make enic_dev_enable/disable ref-counted

Both the data path (ndo_open/ndo_stop) and the upcoming admin channel
need to enable and disable the vNIC device independently. Without
reference counting, closing the admin channel while the netdev is up
would inadvertently disable the entire device.

Add an enable_count to struct enic, protected by the existing
devcmd_lock. enic_dev_enable() issues CMD_ENABLE_WAIT only on the
first caller (0 -> 1 transition), and enic_dev_disable() issues
CMD_DISABLE only when the last caller releases (1 -> 0 transition).

Also check the return value of enic_dev_enable() in enic_open() and
fail the open if the firmware enable command fails. Without this check,
a failed enable leaves enable_count at zero while the interface appears
up, which can cause a later admin channel enable/disable cycle to
incorrectly disable the hardware under the active data path.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-4-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

enic: detect SR-IOV VF type from PCI capability

Read the VF device ID from the SR-IOV PCI capability at probe time to
determine whether the PF is configured for V1, USNIC, or V2 virtual
functions. Store the result in enic->vf_type for use by subsequent
SR-IOV operations.

The VF type is a firmware-configured property (set via UCSM, CIMC,
Intersight etc) that is immutable from the driver's perspective. Only
PFs are probed for this capability; VFs and dynamic vnics skip
detection.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-3-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

enic: add V2 SR-IOV VF device ID

Register the V2 VF PCI device ID (0x02b7) so the driver binds to V2
virtual functions created via sriov_configure. Update enic_is_sriov_vf()
to recognize V2 VFs alongside the existing V1 type.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-2-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

enic: extend resource discovery for SR-IOV admin channel

VIC firmware exposes admin channel resources (WQ, RQ, CQ) for PF-VF
communication when SR-IOV is active. Add the corresponding resource
type definitions and teach the discovery and access functions to
handle them.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-1-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: orphan PPP over Ethernet driver

We haven't seen activities from Michal Ostrowski for quite a long time.
The last commit from him is fb64bb560e18 ("PPPoE: Fix flush/close
races."), which was in 2009. Email to mostrows@earthlink.net also
bounces.

Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Link: https://patch.msgid.link/20260401022842.15082-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-microchip-add-downshift-support-for-lan88xx'

Nicolai Buchwitz says:

====================
net: phy: microchip: add downshift support for LAN88xx

Add standard ETHTOOL_PHY_DOWNSHIFT tunable support for the Microchip
LAN88xx PHY, following the same pattern used by Marvell and other PHY
drivers.

Ethernet cables with faulty or missing pairs (specifically C and D)
can successfully auto-negotiate 1000BASE-T but fail to establish a
stable link. The LAN88xx PHY supports automatic downshift to
100BASE-TX after a configurable number of failed attempts (2-5).

Patch 1 adds the get/set tunable implementation.
Patch 2 enables downshift by default with a count of 2. The setting is
stored in the driver's private data so that user changes via ethtool are
preserved across suspend/resume cycles.

Based on an earlier downstream implementation by Phil Elwell.

Tested on Raspberry Pi 3B+ (LAN7515/LAN88xx).
====================

Link: https://patch.msgid.link/20260401123848.696766-1-nb@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: microchip: enable downshift by default on LAN88xx

Enable auto-downshift from 1000BASE-T to 100BASE-TX after 2 failed
auto-negotiation attempts by default. This ensures that links with
faulty or missing cable pairs (C and D) fall back to 100Mbps without
requiring userspace configuration.

The downshift count is stored in the driver's private data and applied
in config_init, so user changes via ethtool are preserved across
suspend/resume cycles.

Users can override or disable downshift at runtime:

ethtool --set-phy-tunable eth0 downshift off

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260401123848.696766-3-nb@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: microchip: add downshift tunable support for LAN88xx

Implement the standard ETHTOOL_PHY_DOWNSHIFT tunable for the LAN88xx
PHY. This allows runtime configuration of the auto-downshift feature
via ethtool:

ethtool --set-phy-tunable eth0 downshift on count 3

The LAN88xx PHY supports downshifting from 1000BASE-T to 100BASE-TX
after 2-5 failed auto-negotiation attempts. Valid count values are
2, 3, 4 and 5.

This is based on an earlier downstream implementation by Phil Elwell.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260401123848.696766-2-nb@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8152: Add helper functions for SRAM2

Add the following helper functions for SRAM2 access to simplify the code
and improve readability:

- sram2_write() - write data to SRAM2 address
- sram2_read() - read data from SRAM2 address
- sram2_write_w0w1() - read-modify-write operation

Signed-off-by: Chih Kai Hsu <hsu.chih.kai@realtek.com>
Reviewed-by: Hayes Wang <hayeswang@realtek.com>
Link: https://patch.msgid.link/20260401115542.34601-1-nic_swsd@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: bcm84881: add LED framework support for BCM84891/BCM84892

Expose LED1 and LED2 pins via the PHY LED framework. Each pin has a
source mask (MASK_LOW + MASK_EXT registers) selecting which hardware
events light it, plus a CTL field in the shared 0xA83B register
(RMW; LED4 is firmware-controlled per the datasheet).

Hardware can offload per-speed link triggers (1000/2500/5000/10000),
RX/TX activity, and force-on. LINK_100 is accepted only alongside
LINK_1000: source bit 4 lights at both speeds and 100-alone isn't
representable, so the unrepresentable case falls to software.

The chip has five LED pins; only LED1/LED2 are exposed here as those
are the only ones characterized on tested hardware. LED4 is firmware-
controlled regardless of strap configuration.

Tested on TRENDnet TEG-S750 (LED1/LED2 wired to an antiparallel
bicolor LED): brightness_set via sysfs; netdev trigger offloaded=1
with amber lit at 100M/1G/2.5G and green lit at 10G via respective
link_* modes; LED off immediately on cable unplug with no software
involvement.

Signed-off-by: Daniel Wagner <wagner.daniel.t@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260401114931.3091818-1-wagner.daniel.t@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'macvlan-broadcast-delivery-changes'

Eric Dumazet says:

====================
macvlan: broadcast delivery changes

First patch adds data-race annotations.

Second patch changes macvlan_broadcast_enqueue() to return
early if the queue is full.
====================

Link: https://patch.msgid.link/20260401103809.3038139-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macvlan: avoid spinlock contention in macvlan_broadcast_enqueue()

Under high stress, we spend a lot of time cloning skbs,
then acquiring a spinlock, then freeing the clone because
the queue is full.

Add a shortcut to avoid these costs under pressure.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260401103809.3038139-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macvlan: annotate data-races around port->bc_queue_len_used

port->bc_queue_len_used is read and written locklessly,
add READ_ONCE()/WRITE_ONCE() annotations.

While WRITE_ONCE() in macvlan_fill_info() is not yet needed,
it is a prereq for future RTNL avoidance.

Fixes: d4bff72c8401 ("macvlan: Support for high multicast packet rate")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260401103809.3038139-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

crypto: qat - add support for zstd

Add support for the ZSTD algorithm for QAT GEN4, GEN5 and GEN6 via the
acomp API.

For GEN4 and GEN5, compression is performed in hardware using LZ4s, a
QAT-specific variant of LZ4. The compressed output is post-processed to
generate ZSTD sequences, and the ZSTD library is then used to produce
the final ZSTD stream via zstd_compress_sequences_and_literals(). Only
inputs between 8 KB and 512 KB are offloaded to the device. The minimum
size restriction will be relaxed once polling support is added. The
maximum size is limited by the use of pre-allocated per-CPU scratch
buffers. On these generations, only compression is offloaded to hardware;
decompression always falls back to software.

For GEN6, both compression and decompression are offloaded to the
accelerator, which natively supports the ZSTD algorithm. There is no
limit on the input buffer size supported. However, since GEN6 is limited
to a history size of 64 KB, decompression of frames compressed with a
larger history falls back to software.

Since GEN2 devices do not support ZSTD or LZ4s, add a mechanism that
prevents selecting GEN2 compression instances for ZSTD or LZ4s when a
GEN2 plug-in card is present on a system with an embedded GEN4, GEN5 or
GEN6 device.

In addition, modify the algorithm registration logic to allow
registering the correct implementation, i.e. LZ4s based for GEN4 and
GEN5 or native ZSTD for GEN6.

Co-developed-by: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com>
Signed-off-by: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Laurent M Coquerel <laurent.m.coquerel@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - use swab32 macro

Replace __builtin_bswap32() with swab32 in icp_qat_hw_20_comp.h to fix
the following build errors on architectures without native byte-swap
support:

   alpha-linux-ld: drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.o: in function `adf_gen4_build_decomp_block':
   drivers/crypto/intel/qat/qat_common/icp_qat_hw_20_comp.h:141:(.text+0xeec): undefined reference to `__bswapsi2'
   alpha-linux-ld: drivers/crypto/intel/qat/qat_common/icp_qat_hw_20_comp.h:141:(.text+0xef8): undefined reference to `__bswapsi2'
   alpha-linux-ld: drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.o: in function `adf_gen4_build_comp_block':
   drivers/crypto/intel/qat/qat_common/icp_qat_hw_20_comp.h:57:(.text+0xf64): undefined reference to `__bswapsi2'
   alpha-linux-ld: drivers/crypto/intel/qat/qat_common/icp_qat_hw_20_comp.h:57:(.text+0xf7c): undefined reference to `__bswapsi2'

Fixes: 5b14b2b307e4 ("crypto: qat - enable deflate for QAT GEN4")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202603290259.Ig9kDOmI-lkp@intel.com/
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: img-hash - drop redundant return variable

In img_hash_digest(), remove the redundant return variable 'err' and
return img_hash_handle_queue() directly.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: img-hash - use list_first_entry_or_null to simplify digest

Use list_first_entry_or_null() to simplify img_hash_digest() and remove
the now-unused local 'struct img_hash_dev *' variables. Use 'ctx->hdev'
when calling img_hash_handle_queue() instead of 'tctx->hdev'.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: cryptomgr - Select algorithm types only when CRYPTO_SELFTESTS

Enabling any template selects CRYPTO_MANAGER, which causes
CRYPTO_MANAGER2 to enable itself, which selects every algorithm type
option. However, pulling in all algorithm types is needed only when the
self-tests are enabled. So condition the selections accordingly.

To make this possible, also add the missing selections to various
symbols that were relying on transitive selections via CRYPTO_MANAGER.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: aspeed - Use memcpy_from_sglist() in aspeed_ahash_dma_prepare()

Replace scatterwalk_map_and_copy() with memcpy_from_sglist() in
aspeed_ahash_dma_prepare(). The latter provides a simpler interface
without requiring a direction parameter, making the code easier to
read and less error-prone.

No functional change intended.

Signed-off-by: Paul Louvel <paul.louvel@bootlin.com>
Reviewed-by: Neal Liu <neal_liu@aspeedtech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: kconfig - fix typos in atmel-ecc and atmel-sha204a help

s/Microhip/Microchip/

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: x86 - Remove des and des3_ede code

Since DES and Triple DES are obsolete, there is very little point in
maintining architecture-optimized code for them. Remove it.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: sparc - Remove des and des3_ede code

Since DES and Triple DES are obsolete, there is very little point in
maintining architecture-optimized code for them. Remove it.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: s390 - Remove des and des3_ede code

Since DES and Triple DES are obsolete, there is very little point in
maintining architecture-optimized code for them. Remove it.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: rng - Don't pull in DRBG when CRYPTO_FIPS=n

crypto_stdrng_get_bytes() is now always available:

    - When CRYPTO_FIPS=n it is an inline function that always calls into
      the always-built-in drivers/char/random.c.

    - When CRYPTO_FIPS=y it is an inline function that calls into either
      random.c or crypto/rng.c, depending on the value of fips_enabled.
      The former is again always built-in.  The latter is built-in as
      well in this case, due to CRYPTO_FIPS=y.

Thus, the CRYPTO_RNG_DEFAULT symbol is no longer needed.  Remove it.

This makes it so that CRYPTO_DRBG_MENU (and hence also CRYPTO_DRBG,
CRYPTO_JITTERENTROPY, and CRYPTO_LIB_SHA3) no longer gets unnecessarily
pulled into CRYPTO_FIPS=n kernels.  I.e. CRYPTO_FIPS=n kernels are no
longer bloated with code that is relevant only to FIPS certifications.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: fips - Depend on CRYPTO_DRBG=y

Currently, the callers of crypto_stdrng_get_bytes() do 'select
CRYPTO_RNG_DEFAULT', which does 'select CRYPTO_DRBG_MENU'.

However, due to the change in how crypto_stdrng_get_bytes() is
implemented, CRYPTO_DRBG_MENU is now needed only when CRYPTO_FIPS.

But, 'select CRYPTO_DRBG_MENU if CRYPTO_FIPS' would cause a recursive
dependency, since CRYPTO_FIPS 'depends on CRYPTO_DRBG'.

Solve this by just making CRYPTO_FIPS depend on CRYPTO_DRBG=y (rather
than CRYPTO_DRBG i.e. CRYPTO_DRBG=y || CRYPTO_DRBG=m). The distros that
use CRYPTO_FIPS=y already set CRYPTO_DRBG=y anyway, which makes sense.

This makes the CRYPTO_RNG_DEFAULT symbol (and its corresponding
selection of CRYPTO_DRBG_MENU) unnecessary. A later commit removes it.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: rng - Make crypto_stdrng_get_bytes() use normal RNG in non-FIPS mode

"stdrng" is needed only in "FIPS mode". Therefore, make
crypto_stdrng_get_bytes() delegate to either the normal Linux RNG or to
"stdrng", depending on the current mode.

This will eliminate the need to built the SP800-90A DRBG and its
dependencies into CRYPTO_FIPS=n kernels.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: rng - Unexport "default RNG" symbols

Now that crypto_default_rng, crypto_get_default_rng(), and
crypto_put_default_rng() have no users outside crypto/rng.c itself,
unexport them and make them static.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

net: tipc: Use crypto_stdrng_get_bytes()

Replace the sequence of crypto_get_default_rng(),
crypto_rng_get_bytes(), and crypto_put_default_rng() with the equivalent
helper function crypto_stdrng_get_bytes().

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: intel/keembay-ocs-ecc - Use crypto_stdrng_get_bytes()

Replace the sequence of crypto_get_default_rng(),
crypto_rng_get_bytes(), and crypto_put_default_rng() with the equivalent
helper function crypto_stdrng_get_bytes().

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: hisilicon/hpre - Use crypto_stdrng_get_bytes()

Replace the sequence of crypto_get_default_rng(),
crypto_rng_get_bytes(), and crypto_put_default_rng() with the equivalent
helper function crypto_stdrng_get_bytes().

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: geniv - Use crypto_stdrng_get_bytes()

Replace the sequence of crypto_get_default_rng(),
crypto_rng_get_bytes(), and crypto_put_default_rng() with the equivalent
helper function crypto_stdrng_get_bytes().

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ecc - Use crypto_stdrng_get_bytes()

Replace the sequence of crypto_get_default_rng(),
crypto_rng_get_bytes(), and crypto_put_default_rng() with the equivalent
helper function crypto_stdrng_get_bytes().

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: dh - Use crypto_stdrng_get_bytes()

Replace the sequence of crypto_get_default_rng(),
crypto_rng_get_bytes(), and crypto_put_default_rng() with the equivalent
helper function crypto_stdrng_get_bytes().

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: rng - Add crypto_stdrng_get_bytes()

All callers of crypto_get_default_rng() use the following sequence:

    crypto_get_default_rng()
    crypto_rng_get_bytes(crypto_default_rng, ...)
    crypto_put_default_rng()

While it may have been intended that callers amortize the cost of
getting and putting the "default RNG" (i.e. "stdrng") over multiple
calls, in practice that optimization is never used.  The callers just
want a function that gets random bytes from the "stdrng".

Therefore, add such a function: crypto_stdrng_get_bytes().

Importantly, this decouples the callers from the crypto_rng API.  That
allows a later commit to make this function simply call
get_random_bytes_wait() unless the kernel is in "FIPS mode".

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: iaa - fix per-node CPU counter reset in rebalance_wq_table()

The cpu counter used to compute the IAA device index is reset to zero
at the start of each NUMA node iteration. This causes CPUs on every
node to map starting from IAA index 0 instead of continuing from the
previous node's last index. On multi-node systems, this results in all
nodes mapping their CPUs to the same initial set of IAA devices,
leaving higher-indexed devices unused.

Move the cpu counter initialization before the for_each_node_with_cpus()
loop so that the IAA index computation accumulates correctly across all
nodes.

Fixes: 714ca27e9bf4 ("crypto: iaa - Optimize rebalance_wq_table()")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - replace scnprintf() with sysfs_emit()

Replace scnprintf() with sysfs_emit() in the three RAS error counter
sysfs show callbacks. sysfs_emit() is the recommended API for sysfs show
functions as per Documentation/filesystems/sysfs.rst; it enforces the
PAGE_SIZE limit implicitly, removing the need to pass it explicitly.

Signed-off-by: Atharv Dubey <atharvd440@gmail.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - fix type mismatch in RAS sysfs show functions

ADF_RAS_ERR_CTR_READ() expands to atomic_read(), which returns int.
The local variable 'counter' was declared as 'unsigned long', causing
a type mismatch on the assignment. The format specifier '%ld' was
consequently wrong in two ways: wrong length modifier and wrong
signedness.

Use int to match the return type of atomic_read() and update the
format specifier to '%d' accordingly.

Fixes: 532d7f6bc458 ("crypto: qat - add error counters")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - fix compression instance leak

qat_comp_alg_init_tfm() acquires a compression instance via
qat_compression_get_instance_node() before calling qat_comp_build_ctx()
to initialize the compression context. If qat_comp_build_ctx() fails, the
function returns an error without releasing the compression instance,
causing a resource leak.

When qat_comp_build_ctx() fails, release the compression instance with
qat_compression_put_instance() and clear the context to avoid leaving a
stale reference to the released instance.

The issue was introduced when build_deflate_ctx() (which always returned
void) was replaced by qat_comp_build_ctx() (which can return an error)
without adding error handling for the failure path.

Fixes: cd0e7160f80f ("crypto: qat - refactor compression template logic")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Laurent M Coquerel <laurent.m.coquerel@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - use acomp_tfm_ctx()

Replace the usage of crypto_acomp_tfm() followed by crypto_tfm_ctx()
with a single call to the equivalent acomp_tfm_ctx().

This does not introduce any functional changes.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Laurent M Coquerel <laurent.m.coquerel@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccp - Replace snprintf("%s") with strscpy

Replace snprintf("%s") with the faster and more direct strscpy().

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: hifn_795x - Replace snprintf("%s") with strscpy

Replace snprintf("%s", ...) with the faster and more direct strscpy().
Check if the return value is less than 0 to detect string truncation.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - disable 420xx AE cluster when lead engine is fused off

The get_ae_mask() function only disables individual engines based on
the fuse register, but engines are organized in clusters of 4. If the
lead engine of a cluster is fused off, the entire cluster must be
disabled.

Replace the single bitmask inversion with explicit test_bit() checks
on the lead engine of each group, disabling the full ADF_AE_GROUP
when the lead bit is set.

Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Fixes: fcf60f4bcf54 ("crypto: qat - add support for 420xx devices")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - disable 4xxx AE cluster when lead engine is fused off

The get_ae_mask() function only disables individual engines based on
the fuse register, but engines are organized in clusters of 4. If the
lead engine of a cluster is fused off, the entire cluster must be
disabled.

Replace the single bitmask inversion with explicit test_bit() checks
on the lead engine of each group, disabling the full ADF_AE_GROUP
when the lead bit is set.

Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Fixes: 8c8268166e834 ("crypto: qat - add qat_4xxx driver")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: testmgr - Add test vectors for authenc(hmac(md5),cbc(aes))

Test vectors were generated starting from existing CBC(AES) test vectors
(RFC3602, NIST SP800-38A) and adding HMAC(MD5) computed with Python
script. Then, the results were double-checked on Mediatek MT7981 (safexcel)
and NXP P2020 (talitos). Both platforms pass self-tests.

Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: af_alg - limit RX SG extraction by receive buffer budget

Make af_alg_get_rsgl() limit each RX scatterlist extraction to the
remaining receive buffer budget.

af_alg_get_rsgl() currently uses af_alg_readable() only as a gate
before extracting data into the RX scatterlist. Limit each extraction
to the remaining af_alg_rcvbuf(sk) budget so that receive-side
accounting matches the amount of data attached to the request.

If skcipher cannot obtain enough RX space for at least one chunk while
more data remains to be processed, reject the recvmsg call instead of
rounding the request length down to zero.

Fixes: e870456d8e7c8d57c059ea479b5aadbb55ff4c3a ("crypto: algif_skcipher - overhaul memory management")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Douya Le <ldy3087146292@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Merge tag 'v7.0-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

- Add missing async markers to tegra

- Fix long hmac key DMA handling in caam

- Fix spurious ENOSPC errors in deflate

- Fix SG chaining in af_alg

- Do not use in-place process in algif_aead

- Fix out-of-place destination overflow in authencesn

* tag 'v7.0-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: authencesn - Do not place hiseq at end of dst for out-of-place decryption
  crypto: algif_aead - Revert to operating out-of-place
  crypto: af-alg - fix NULL pointer dereference in scatterwalk
  crypto: deflate - fix spurious -ENOSPC
  crypto: caam - fix overflow on long hmac keys
  crypto: caam - fix DMA corruption on long hmac keys
  crypto: tegra - Add missing CRYPTO_ALG_ASYNC

lib/crc: arm64: Simplify intrinsics implementation

NEON intrinsics are useful because they remove the need for manual
register allocation, and the resulting code can be re-compiled and
optimized for different micro-architectures, and shared between arm64
and 32-bit ARM.

However, the strong typing of the vector variables can lead to
incomprehensible gibberish, as is the case with the new CRC64
implementation. To address this, let's repaint all variables as
uint64x2_t to minimize the number of vreinterpretq_xxx() calls, and to
be able to rely on the ^ operator for exclusive OR operations. This
makes the code much more concise and readable.

While at it, wrap the calls to vmull_p64() et al in order to have a more
consistent calling convention, and encapsulate any remaining
vreinterpret() calls that are still needed.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260330144630.33026-11-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crc: arm64: Use existing macros for kernel-mode FPU cflags

Use the existing CC_FPU_CFLAGS and CC_NO_FPU_CFLAGS to pass the
appropriate compiler command line options for building kernel mode NEON
intrinsics code. This is tidier, and will make it easier to reuse the
code for 32-bit ARM.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260330144630.33026-9-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crc: arm64: Drop unnecessary chunking logic from crc64

On arm64, kernel mode NEON executes with preemption enabled, so there is
no need to chunk the input by hand.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260330144630.33026-8-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crc: arm64: Assume a little-endian kernel

Since support for big-endian arm64 kernels was removed, the CPU_LE()
macro now unconditionally emits the code it is passed, and the CPU_BE()
macro now unconditionally discards the code it is passed.

Simplify the assembly code in lib/crc/arm64/ accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401004431.151432-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

Merge branch 'task-local-data-bug-fixes-and-improvement'

Amery Hung says:

====================
Task local data bug fixes and improvement

This patchset fixed three task local data bugs, improved the
memory allocation code, and dropped unnecessary TLD_READ_ONCE. Please
find the detail in each patch's commit msg.

One thing worth mentioning is that Patch 3 allows us to renable task
local data selftests as the library now always calls aligned_alloc()
with size matching alignment under default configuration.

v1 -> v2
- Fix potential memory leak
- Drop TLD_READ_ONCE()
Link: https://lore.kernel.org/bpf/20260326052437.590158-1-ameryhung@gmail.com/
====================

Link: https://patch.msgid.link/20260331213555.1993883-1-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Improve task local data documentation and fix potential memory leak

If TLD_FREE_DATA_ON_THREAD_EXIT is not enabled in a translation unit
that calls __tld_create_key() first, another translation unit that
enables it will not get the auto cleanup feature as pthread key is only
created once when allocation metadata. Fix it by always try to create
the pthread key when __tld_create_key() is called.

Also improve the documentation:
- Discourage user from using different options in different translation
units
- Specify calling tld_free() before thread exit as undefined behavior

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-6-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Remove TLD_READ_ONCE() in the user space header

TLD_READ_ONCE() is redundant as the only reference passed to it is
defined as _Atomic. The load is guaranteed to be atomic in C11 standard
(6.2.6.1). Drop the macro.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-5-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Make sure TLD_DEFINE_KEY runs first

Without specifying constructor priority of the hidden constructor
function defined by TLD_DEFINE_KEY, __tld_create_key(..., dyn_data =
false) may run after tld_get_data() called from other constructors.
Threads calling tld_get_data() before __tld_create_key(..., dyn_data
= false) will not allocate enough memory for all TLDs and later result
in OOB access. Therefore, set it to the lowest value available to
users. Note that lower means higher priority and 0-100 is reserved to
the compiler.

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-4-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Simplify task_local_data memory allocation

Simplify data allocation by always using aligned_alloc() and passing
size_pot, size rounded up to the closest power of two to alignment.

Currently, aligned_alloc(page_size, size) is only intended to be used
with memory allocators that can fulfill the request without rounding
size up to page_size to conserve memory. This is enabled by defining
TLD_DATA_USE_ALIGNED_ALLOC. The reason to align to page_size is due to
the limitation of UPTR where only a page can be pinned to the kernel.
Otherwise, malloc(size * 2) is used to allocate memory for data.

However, we don't need to call aligned_alloc(page_size, size) to get
a contiguous memory of size bytes within a page. aligned_alloc(size_pot,
...) will also do the trick. Therefore, just use aligned_alloc(size_pot,
...) universally.

As for the size argument, create a new option,
TLD_DONT_ROUND_UP_DATA_SIZE, to specify not rounding up the size.
This preserves the current TLD_DATA_USE_ALIGNED_ALLOC behavior, allowing
memory allocators with low overhead aligned_alloc() to not waste memory.
To enable this, users need to make sure it is not an undefined behavior
for the memory allocator to have size not being an integral multiple of
alignment.

Compared to the current implementation, !TLD_DATA_USE_ALIGNED_ALLOC
used to always waste size-byte of memory due to malloc(size * 2).
Now the worst case becomes size - 1 and the best case is 0 when the size
is already a power of two.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix task_local_data data allocation size

Currently, when allocating memory for data, size of tld_data_u->start
is not taken into account. This may cause OOB access. Fixed it by adding
the non-flexible array part of tld_data_u.

Besides, explicitly align tld_data_u->data to 8 bytes in case some
fields are added before data in the future. It could break the
assumption that every data field is 8 byte aligned and
sizeof(tld_data_u) will no longer be equal to
offsetof(struct tld_data_u, data), which we use interchangeably.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

genirq/chip: Invoke add_interrupt_randomness() in handle_percpu_devid_irq()

handle_percpu_devid_irq() is a version of handle_percpu_irq() but with the
addition of a pointer to a per-CPU devid.

However, handle_percpu_irq() invokes add_interrupt_randomness(), while
handle_percpu_devid_irq() currently does not.

Add the missing add_interrupt_randomness(), as it is needed when per-CPU
interrupts with devid's are used in VMs for interrupts from the hypervisor.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260402202400.1707-2-mhklkml@zohomail.com

arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06

Update the definition of SMIDR_EL1 in the sysreg definition to reflect the
information in DD0601 2025-06. This includes somewhat more generic ways of
describing the sharing of SMCUs, more information on supported priorities
and provides additional resolution for describing affinity groups.

Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Merge branch 'libbpf-clarify-raw-address-single-kprobe-attach-behavior'

Hoyeon Lee says:

====================
libbpf: clarify raw-address single kprobe attach behavior

Today libbpf documents single-kprobe attach through func_name, with an
optional offset. For the PMU-based path, func_name = NULL with an
absolute address in offset already works as well, but that is not
described in the API.

This patchset clarifies this behavior. First commit fixes kprobe
and uprobe attach error handling to use direct error codes. Next adds
kprobe API comments for the raw-address form and rejects it explicitly
for legacy tracefs/debugfs kprobes. Last adds PERF and LINK selftests
for the raw-address form, and checks that LEGACY rejects it.
---
Changes in v7:
- Change selftest line wrapping and assertions

Changes in v6:
- Split the kprobe/uprobe direct error-code fix into a separate patch

Changes in v5:
- Add kprobe API docs, use -EOPNOTSUPP, and switch selftests to LIBBPF_OPTS

Changes in v4:
- Inline raw-address error formatting and remove the probe_target buffer

Changes in v3:
- Drop bpf_kprobe_opts.addr and reuse offset when func_name is NULL
- Make legacy tracefs/debugfs kprobes reject the raw-address form
- Update selftests to cover PERF/LINK raw-address attach and LEGACY reject

Changes in v2:
- Fix line wrapping and indentation
====================

Link: https://patch.msgid.link/20260401143116.185049-1-hoyeon.lee@suse.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

selftests/bpf: Add test for raw-address single kprobe attach

Currently, attach_probe covers manual single-kprobe attaches by
func_name, but not the raw-address form that the PMU-based
single-kprobe path can accept.

This commit adds PERF and LINK raw-address coverage. It resolves
SYS_NANOSLEEP_KPROBE_NAME through kallsyms, passes the absolute address
in bpf_kprobe_opts.offset with func_name = NULL, and verifies that
kprobe and kretprobe are still triggered. It also verifies that LEGACY
rejects the same form.

Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260401143116.185049-4-hoyeon.lee@suse.com

libbpf: Clarify raw-address single kprobe attach behavior

bpf_program__attach_kprobe_opts() documents single-kprobe attach
through func_name, with an optional offset. For the PMU-based path,
func_name = NULL with an absolute address in offset already works as
well, but that is not described in the API.

This commit clarifies this existing non-legacy behavior. For PMU-based
attach, callers can use func_name = NULL with an absolute address in
offset as the raw-address form. For legacy tracefs/debugfs kprobes,
reject this form explicitly.

Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260401143116.185049-3-hoyeon.lee@suse.com

libbpf: Use direct error codes for kprobe/uprobe attach

perf_event_open_probe() and perf_event_{k,u}probe_open_legacy() helpers
are returning negative error codes directly on failure. This commit
changes bpf_program__attach_{k,u}probe_opts() to use those return
values directly instead of re-reading possibly changed errno.

Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260401143116.185049-2-hoyeon.lee@suse.com

accel: ethosu: Add hardware dependency hint

The Ethos-U NPU is only available on ARM systems, so add a hardware
dependency hint to prevent this driver from being needlessly
included in kernels built for other architectures.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: https://patch.msgid.link/20260401122323.6127a77c@endymion
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

libbpf: Fix BTF handling in bpf_program__clone()

Align bpf_program__clone() with bpf_object_load_prog() by gating
BTF func/line info on FEAT_BTF_FUNC kernel support, and resolve
caller-provided prog_btf_fd before checking obj->btf so that callers
with their own BTF can use clone() even when the object has no BTF
loaded.

While at it, treat func_info and line_info fields as atomic groups
to prevent mismatches between pointer and count from different sources.

Move bpf_program__clone() to libbpf 1.8.

Fixes: 970bd2dced35 ("libbpf: Introduce bpf_program__clone()")
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260401151640.356419-1-mykyta.yatsenko5@gmail.com

arm64: mm: Remove pmd_sect() and pud_sect()

The semantics of pXd_leaf() are very similar to pXd_sect(). The only
difference is that pXd_sect() only considers it a section if PTE_VALID
is set, whereas pXd_leaf() permits both "valid" and "present-invalid"
types.

Using pXd_sect() has caused issues now that large leaf entries can be
present-invalid since commit a166563e7ec37 ("arm64: mm: support large
block mapping when rodata=full"), so let's just remove the API and
standardize on pXd_leaf().

There are a few callsites of the form pXd_leaf(READ_ONCE(*pXdp)). This
was previously fine for the pXd_sect() macro because it only evaluated
its argument once. But pXd_leaf() evaluates its argument multiple times.
So let's avoid unintended side effects by reimplementing pXd_leaf() as
an inline function.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: mm: Handle invalid large leaf mappings correctly

It has been possible for a long time to mark ptes in the linear map as
invalid. This is done for secretmem, kfence, realm dma memory un/share,
and others, by simply clearing the PTE_VALID bit. But until commit
a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") large leaf mappings were never made invalid in this way.

It turns out various parts of the code base are not equipped to handle
invalid large leaf mappings (in the way they are currently encoded) and
I've observed a kernel panic while booting a realm guest on a
BBML2_NOABORT system as a result:

[   15.432706] software IO TLB: Memory encryption is active and system is using DMA bounce buffers
[   15.476896] Unable to handle kernel paging request at virtual address ffff000019600000
[   15.513762] Mem abort info:
[   15.527245]   ESR = 0x0000000096000046
[   15.548553]   EC = 0x25: DABT (current EL), IL = 32 bits
[   15.572146]   SET = 0, FnV = 0
[   15.592141]   EA = 0, S1PTW = 0
[   15.612694]   FSC = 0x06: level 2 translation fault
[   15.640644] Data abort info:
[   15.661983]   ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
[   15.694875]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[   15.723740]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[   15.755776] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081f3f000
[   15.800410] [ffff000019600000] pgd=0000000000000000, p4d=180000009ffff403, pud=180000009fffe403, pmd=00e8000199600704
[   15.855046] Internal error: Oops: 0000000096000046 [#1]  SMP
[   15.886394] Modules linked in:
[   15.900029] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-dirty #4 PREEMPT
[   15.935258] Hardware name: linux,dummy-virt (DT)
[   15.955612] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[   15.986009] pc : __pi_memcpy_generic+0x128/0x22c
[   16.006163] lr : swiotlb_bounce+0xf4/0x158
[   16.024145] sp : ffff80008000b8f0
[   16.038896] x29: ffff80008000b8f0 x28: 0000000000000000 x27: 0000000000000000
[   16.069953] x26: ffffb3976d261ba8 x25: 0000000000000000 x24: ffff000019600000
[   16.100876] x23: 0000000000000001 x22: ffff0000043430d0 x21: 0000000000007ff0
[   16.131946] x20: 0000000084570010 x19: 0000000000000000 x18: ffff00001ffe3fcc
[   16.163073] x17: 0000000000000000 x16: 00000000003fffff x15: 646e612065766974
[   16.194131] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[   16.225059] x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000018
[   16.256113] x8 : 0000000000000018 x7 : 0000000000000000 x6 : 0000000000000000
[   16.287203] x5 : ffff000019607ff0 x4 : ffff000004578000 x3 : ffff000019600000
[   16.318145] x2 : 0000000000007ff0 x1 : ffff000004570010 x0 : ffff000019600000
[   16.349071] Call trace:
[   16.360143]  __pi_memcpy_generic+0x128/0x22c (P)
[   16.380310]  swiotlb_tbl_map_single+0x154/0x2b4
[   16.400282]  swiotlb_map+0x5c/0x228
[   16.415984]  dma_map_phys+0x244/0x2b8
[   16.432199]  dma_map_page_attrs+0x44/0x58
[   16.449782]  virtqueue_map_page_attrs+0x38/0x44
[   16.469596]  virtqueue_map_single_attrs+0xc0/0x130
[   16.490509]  virtnet_rq_alloc.isra.0+0xa4/0x1fc
[   16.510355]  try_fill_recv+0x2a4/0x584
[   16.526989]  virtnet_open+0xd4/0x238
[   16.542775]  __dev_open+0x110/0x24c
[   16.558280]  __dev_change_flags+0x194/0x20c
[   16.576879]  netif_change_flags+0x24/0x6c
[   16.594489]  dev_change_flags+0x48/0x7c
[   16.611462]  ip_auto_config+0x258/0x1114
[   16.628727]  do_one_initcall+0x80/0x1c8
[   16.645590]  kernel_init_freeable+0x208/0x2f0
[   16.664917]  kernel_init+0x24/0x1e0
[   16.680295]  ret_from_fork+0x10/0x20
[   16.696369] Code: 927cec03 cb0e0021 8b0e0042 a9411c26 (a900340c)
[   16.723106] ---[ end trace 0000000000000000 ]---
[   16.752866] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[   16.792556] Kernel Offset: 0x3396ea200000 from 0xffff800080000000
[   16.818966] PHYS_OFFSET: 0xfff1000080000000
[   16.837237] CPU features: 0x0000000,00060005,13e38581,957e772f
[   16.862904] Memory Limit: none
[   16.876526] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

This panic occurs because the swiotlb memory was previously shared to
the host (__set_memory_enc_dec()), which involves transitioning the
(large) leaf mappings to invalid, sharing to the host, then marking the
mappings valid again. But pageattr_p[mu]d_entry() would only update the
entry if it is a section mapping, since otherwise it concluded it must
be a table entry so shouldn't be modified. But p[mu]d_sect() only
returns true if the entry is valid. So the result was that the large
leaf entry was made invalid in the first pass then ignored in the second
pass. It remains invalid until the above code tries to access it and
blows up.

The simple fix would be to update pageattr_pmd_entry() to use
!pmd_table() instead of pmd_sect(). That would solve this problem.

But the ptdump code also suffers from a similar issue. It checks
pmd_leaf() and doesn't call into the arch-specific note_page() machinery
if it returns false. As a result of this, ptdump wasn't even able to
show the invalid large leaf mappings; it looked like they were valid
which made this super fun to debug. the ptdump code is core-mm and
pmd_table() is arm64-specific so we can't use the same trick to solve
that.

But we already support the concept of "present-invalid" for user space
entries. And even better, pmd_leaf() will return true for a leaf mapping
that is marked present-invalid. So let's just use that encoding for
present-invalid kernel mappings too. Then we can use pmd_leaf() where we
previously used pmd_sect() and everything is magically fixed.

Additionally, from inspection kernel_page_present() was broken in a
similar way, so I'm also updating that to use pmd_leaf().

The transitional page tables component was also similarly broken; it
creates a copy of the kernel page tables, making RO leaf mappings RW in
the process. It also makes invalid (but-not-none) pte mappings valid.
But it was not doing this for large leaf mappings. This could have
resulted in crashes at kexec- or hibernate-time. This code is fixed to
flip "present-invalid" mappings back to "present-valid" at all levels.

Finally, I have hardened split_pmd()/split_pud() so that if it is passed
a "present-invalid" leaf, it will maintain that property in the split
leaves, since I wasn't able to convince myself that it would only ever
be called for "present-valid" leaves.

Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
Cc: stable@vger.kernel.org
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: mm: Fix rodata=full block mapping support for realm guests

Commit a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") enabled the linear map to be mapped by block/cont while
still allowing granular permission changes on BBML2_NOABORT systems by
lazily splitting the live mappings. This mechanism was intended to be
usable by realm guests since they need to dynamically share dma buffers
with the host by "decrypting" them - which for Arm CCA, means marking
them as shared in the page tables.

However, it turns out that the mechanism was failing for realm guests
because realms need to share their dma buffers (via
__set_memory_enc_dec()) much earlier during boot than
split_kernel_leaf_mapping() was able to handle. The report linked below
showed that GIC's ITS was one such user. But during the investigation I
found other callsites that could not meet the
split_kernel_leaf_mapping() constraints.

The problem is that we block map the linear map based on the boot CPU
supporting BBML2_NOABORT, then check that all the other CPUs support it
too when finalizing the caps. If they don't, then we stop_machine() and
split to ptes. For safety, split_kernel_leaf_mapping() previously
wouldn't permit splitting until after the caps were finalized. That
ensured that if any secondary cpus were running that didn't support
BBML2_NOABORT, we wouldn't risk breaking them.

I've fix this problem by reducing the black-out window where we refuse
to split; there are now 2 windows. The first is from T0 until the page
allocator is inititialized. Splitting allocates memory for the page
allocator so it must be in use. The second covers the period between
starting to online the secondary cpus until the system caps are
finalized (this is a very small window).

All of the problematic callers are calling __set_memory_enc_dec() before
the secondary cpus come online, so this solves the problem. However, one
of these callers, swiotlb_update_mem_attributes(), was trying to split
before the page allocator was initialized. So I have moved this call
from arch_mm_preinit() to mem_init(), which solves the ordering issue.

I've added warnings and return an error if any attempt is made to split
in the black-out windows.

Note there are other issues which prevent booting all the way to user
space, which will be fixed in subsequent patches.

Reported-by: Jinjiang Tu <tujinjiang@huawei.com>
Closes: https://lore.kernel.org/all/0b2a4ae5-fc51-4d77-b177-b2e9db74f11d@huawei.com/
Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
Cc: stable@vger.kernel.org
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

eventpoll: defer struct eventpoll free to RCU grace period

In certain situations, ep_free() in eventpoll.c will kfree the epi->ep
eventpoll struct while it still being used by another concurrent thread.
Defer the kfree() to an RCU callback to prevent UAF.

Fixes: f2e467a48287 ("eventpoll: Fix semi-unbounded recursion")
Signed-off-by: Nicholas Carlini <nicholas@carlini.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

accel/ivpu: Trigger recovery on TDR with OS scheduling

With OS scheduling mode the driver cannot determine which context
caused the timeout, so context abort cannot be used. Instead of
queuing context_abort_work, directly trigger full device recovery
when a job timeout (TDR) occurs in OS scheduling mode.

Fixes: ade00a6c903f ("accel/ivpu: Perform engine reset instead of device recovery on TDR")
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20260402125526.845210-1-karol.wachowski@linux.intel.com

sched_ext: Fix is_bpf_migration_disabled() false negative on non-PREEMPT_RCU

Since commit 8e4f0b1ebcf2 ("bpf: use rcu_read_lock_dont_migrate() for
trampoline.c"), the BPF prolog (__bpf_prog_enter) calls migrate_disable()
only when CONFIG_PREEMPT_RCU is enabled, via rcu_read_lock_dont_migrate().
Without CONFIG_PREEMPT_RCU, the prolog never touches migration_disabled,
so migration_disabled == 1 always means the task is truly
migration-disabled regardless of whether it is the current task.

The old unconditional p == current check was a false negative in this
case, potentially allowing a migration-disabled task to be dispatched to
a remote CPU and triggering scx_error in task_can_run_on_remote_rq().

Only apply the p == current disambiguation when CONFIG_PREEMPT_RCU is
enabled, where the ambiguity with the BPF prolog still exists.

Fixes: 8e4f0b1ebcf2 ("bpf: use rcu_read_lock_dont_migrate() for trampoline.c")
Cc: stable@vger.kernel.org # v6.18+
Link: https://lore.kernel.org/lkml/20250821090609.42508-8-dongml2@chinatelecom.cn/
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

drm/amd/display: Wire up dcn10_dio_construct() for all pre-DCN401 generations

Description:
- Commit b82f0759346617b2 ("drm/amd/display: Migrate DIO registers access
   from hwseq to dio component") moved DIO_MEM_PWR_CTRL register access
   behind the new dio abstraction layer but only created the dio object for
   DCN 4.01. On all other generations (DCN 10/20/21/201/30/301/302/303/
   31/314/315/316/32/321/35/351/36), the dio pointer is NULL, causing the
   register write to be silently skipped.

   This results in AFMT HDMI memory not being powered on during init_hw,
   which can cause HDMI audio failures and display issues on affected
   hardware including Renoir/Cezanne (DCN 2.1) APUs that use dcn10_init_hw.

   Call dcn10_dio_construct() in each older DCN generation's resource.c
   to create the dio object, following the same pattern as DCN 4.01. This
   ensures the dio pointer is non-NULL and the mem_pwr_ctrl callback works
   through the dio abstraction for all DCN generations.

Fixes: b82f07593466 ("drm/amd/display: Migrate DIO registers access from hwseq to dio component.")
Reviewed-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Ionut Nechita <ionut_n2001@yahoo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

sched_ext: Fix missing warning in scx_set_task_state() default case

In scx_set_task_state(), the default case was setting the
warn flag, but then returning immediately. This is problematic
because the only purpose of the warn flag is to trigger
WARN_ONCE, but the early return prevented it from ever firing,
leaving invalid task states undetected and untraced.

To fix this, a WARN_ONCE call is now added directly in the
default case.

The fix addresses two aspects:

- Guarantees the invalid task states are properly logged
   and traced.

- Provides a distinct warning message
   ("sched_ext: Invalid task state") specifically for
   states outside the defined scx_task_state enum values,
   making it easier to distinguish from other transition
   warnings.

This ensures proper detection and reporting of invalid states.

Signed-off-by: Samuele Mariotti <smariotti@disroot.org>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

Merge tag 'v7.0-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd

Pull smb server fix from Steve French:

- Fix out of bound write

* tag 'v7.0-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd:
ksmbd: fix OOB write in QUERY_INFO for compound requests

ata: libata-transport: remove static variable ata_scsi_transport_template

Simplify the code by making struct ata_scsi_transportt public, instead
of using separate variable ata_scsi_transport_template.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

ata: libata-transport: split struct ata_internal

There's no need for an umbrella struct, so remove it. It's also a
prerequisite for making the embedded struct scsi_transport_template
public.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

ata: libata-transport: use static struct ata_transport_internal to simplify match functions

Both matching functions can make use of static struct
ata_transport_internal. This eliminates the dependency on static
variable ata_scsi_transport_template, and it allows to remove helper
to_ata_internal(). Small drawback is that a forward declaration of
both functions is needed.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

fuse: support FSCONFIG_SET_FD for "fd" option

This is not only cleaner to use in userspace (no need to sprintf the fd to
a string) but also allows userspace to detect that the devfd can be closed
after the fsconfig call.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

fuse: clean up device cloning

- fuse_mutex is not needed for device cloning, because fuse_dev_install()
   uses cmpxcg() to set fud->fc, which prevents races between clone/mount
   or clone/clone.  This makes the logic simpler

- Drop fc->dev_count.  This is only used to check in release if the device
   is the last clone, but checking list_empty(&fc->devices) is equivalent
   after removing the released device from the list.  Removing the fuse_dev
   before calling fuse_abort_conn() is okay, since the processing and io
   lists are now empty for this device.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

ata: libata-transport: inline ata_attach|release_transport

Both functions are helpers which are used only once. So remove them and
merge their code into libata_transport_init() and libata_transport_exit()
respectively.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

ata: libata-transport: instantiate struct ata_internal statically

Struct ata_internal is only instantiated once, in module init code.
So we can also instantiate it statically, which allows simplifying
the code.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

fuse: don't require /dev/fuse fd to be kept open during mount

With the new mount API the sequence of syscalls would be:

fs_fd = fsopen("fuse", 0);
snprintf(opt, sizeof(opt), "%i", devfd);
fsconfig(fs_fd, FSCONFIG_SET_STRING, "fd", opt, 0);
/* ... */
fsconfig(fs_fd, FSCONFIG_CMD_CREATE, 0, 0, 0);

Current mount code just stores the value of devfd in the fs_context and
uses it in during FSCONFIG_CMD_CREATE, which is inelegant.

Instead grab a reference to the underlying fuse_dev, and use that during
the filesystem creation.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

fuse: add refcount to fuse_dev

This will make it possible to grab the fuse_dev and subsequently release
the file that it came from.

In the above case, fud->fc will be set to FUSE_DEV_FC_DISCONNECTED to
indicate that this is no longer a functional device.

When trying to assign an fc to such a disconnected fuse_dev, the fc is set
to the disconnected state.

Use atomic operations xchg() and cmpxchg() to prevent races.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

fuse: create fuse_dev on /dev/fuse open instead of mount

Allocate struct fuse_dev when opening the device.  This means that unlike
before, ->private_data is always set to a valid pointer.

The use of USE_DEV_SYNC_INIT magic pointer for the private_data is now
replaced with a simple bool sync_init member.

If sync INIT is not set, I/O on the device returns error before mount.
Keep this behavior by checking for the ->fc member.  If fud->fc is set, the
mount has succeeded.  Testing this used READ_ONCE(file->private_data) and
smp_mb() to try and provide the necessary semantics.  Switch this to
smp_store_release() and smp_load_acquire().

Setting fud->fc is protected by fuse_mutex, this is unchanged.

Will need this later so the /dev/fuse open file reference is not held
during FSCONFIG_CMD_CREATE.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

fuse: check connection state on notification

Check if the connection is fully initialized and connected before trying to
process a notification form the fuse server.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

fuse: fuse_dev_ioctl_clone() should wait for device file to be initialized

Use fuse_get_dev() not __fuse_get_dev() on the old fd, since in the case of
synchronous INIT the caller will want to wait for the device file to be
available for cloning, just like I/O wants to wait instead of returning an
error.

Fixes: dfb84c330794 ("fuse: allow synchronous FUSE_INIT")
Cc: stable@vger.kernel.org # v6.18
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Merge branch 'net-stmmac-tso-fixes-cleanups'

Russell King says:

====================
net: stmmac: TSO fixes/cleanups

This is a more refined version of the previous patch series fixing
and cleaning up the TSO code.

I'm not sure whether "TSO" or "GSO" should be used to describe this
feature - although it primarily handles TCP, dwmac4 appears to also
be able to handle UDP.

In essence, this series adds a .ndo_features_check() method to handle
whether TSO/GSO can be used for a particular skbuff - checking which
queue the skbuff is destined for and whether that has TBS available
which precludes TSO being enabled on that channel.

I'm also adding a check that the header is smaller than 1024 bytes,
as documented in those sources which have TSO support - this is due
to the hardware buffering the header in "TSO memory" which I guess
is limited to 1KiB. I expect this test never to trigger, but if
the headers ever exceed that size, the hardware will likely fail.
While IPv4 headers are unlikely to be anywhere near this, there is
nothing in the protocol which prevents IPv6 headers up to 64KiB.

As we now have a .ndo_features_check() method, I'm moving the VLAN
insertion for TSO packets into core code by unpublishing the VLAN
insertion features when we use TSO. Another move is for checksumming,
which is required for TSO, but stmmac's requirements for offloading
checksums are more strict - and this seems to be a bug in the TSO
path.

I've changed the hardware initialisation to always enable TSO support
on the channels even if the user requests TSO/GSO to be disabled -
this fixes another issue as pointed out by Jakub in a previous review.

I'm moving the setup of the GSO features, cleaning those up, and
adding a warning if platform glue requests this to be enabled but the
hardware has no support. Hopefully this will never trigger if everyone
got the STMMAC_FLAG_TSO_EN flag correct. Also adding a check for TxPBL
value.

Finally, moving the "TSO supported" message to the new
stmmac_set_gso_features() function so keep all this TSO stuff together.
====================

Link: https://patch.msgid.link/aczHVF04LIGq_lYO@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: move "TSO supported" message to stmmac_set_gso_features()

Move the "TSO supported" message to stmmac_set_gso_features() so that
we group all probe-time TSO stuff in one place.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pu8-0000000Eau5-3Zne@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: check txpbl for TSO

Documentation states that TxPBL must be >= 4 to allow TSO support, but
the driver doesn't check this. TxPBL comes from the platform glue code
or DT. Add a check with a warning if platform glue code attempts to
enable TSO support with TxPBL too low.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pu3-0000000Eatz-39ts@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: add warning when TSO is requested but unsupported

Add a warning message if TSO is requested by the platform glue code but
the core wasn't configured for TSO.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pty-0000000Eatt-2TjZ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: make stmmac_set_gso_features() more readable

Make stmmac_set_gso_features() more readable by adding some whitespace
and getting rid of the indentation.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptt-0000000Eatn-1ziK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: split out gso features setup

Move the GSO features setup into a separate function, co-loated with
other GSO/TSO support.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pto-0000000Eath-1VDH@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: simplify GSO/TSO test in stmmac_xmit()

The test in stmmac_xmit() to see whether we should pass the skbuff to
stmmac_tso_xmit() is more complex than it needs to be. This test can
be simplified by storing the mask of GSO types that we will pass, and
setting it according to the enabled features.

Note that "tso" is a mis-nomer since commit b776620651a1 ("net:
stmmac: Implement UDP Segmentation Offload"). Also note that this
commit controls both via the TSO feature. We preserve this behaviour
in this commit.

Also, this commit unconditionally accessed skb_shinfo(skb)->gso_type
for all frames, even when skb_is_gso() was false. This access is
eliminated.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptj-0000000Eatb-11zK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: move check for hardware checksum supported

Add a check in .ndo_features_check() to indicate whether hardware
checksum can be performed on the skbuff. Where hardware checksum is
not supported - either because the channel does not support Tx COE
or the skb isn't suitable (stmmac uses a tighter test than
can_checksum_protocol()) we also need to disable TSO, which will be
done by harmonize_features() in net/core/dev.c

This fixes a bug where a channel which has COE disabled may still
receive TSO skbuffs.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pte-0000000EatU-0ILt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: move TSO VLAN tag insertion to core code

stmmac_tso_xmit() checks whether the skbuff is trying to offload
vlan tag insertion to hardware, which from the comment in the code
appears to be buggy when the TSO feature is used.

Rather than stmmac_tso_xmit() inserting the VLAN tag, handle this
in stmmac_features_check() which will then use core net code to
handle this. See net/core/dev.c::validate_xmit_skb()

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptY-0000000EatO-42Qv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: add GSO MSS checks

Add GSO MSS checks to stmmac_features_check().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptT-0000000EatI-3feh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: add TSO check for header length

According to the STM32MP151 documentation which covers dwmac v4.2, the
hardware TSO feature can handle header lengths up to a maximum of 1023
bytes.

Add a .ndo_features_check() method implementation to check the header
length meets these requirements, otherwise fall back to software GSO.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptO-0000000EatC-39il@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: add stmmac_tso_header_size()

We will need to compute the size of the protocol headers in two places,
so move this into a separate function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptJ-0000000Eat5-2ZlA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>