git.ipfire.org Git - thirdparty/linux.git/log

crypto: drbg - Change DRBG_MAX_REQUESTS to 4096

Currently a formal reseed happens only after each 1048576 requests.

That's quite a high number. Let's follow the example of BoringSSL and
use a more conservative value of 4096.

Note that in practice this makes little difference, now that we're
including 32 bytes from get_random_bytes() in the additional input on
every request anyway, which is a de facto reseed.

But for the same reason, we might as well decrease the actual reseed
interval to something more reasonable.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Include get_random_bytes() output in additional input

Woodage & Shumow (2018) (https://eprint.iacr.org/2018/349.pdf) showed
that contrary to the claims made by NIST in SP800-90A, HMAC_DRBG doesn't
satisfy the formal definition of forward secrecy (i.e. "backtracking
resistance") when it's called with an empty additional input string.

The actual attack seems pretty benign, as it doesn't actually give the
attacker any previous RNG output, but rather just allows them to test
whether their guess of the previous block of RNG output is correct.
Regardless, it's an annoying design flaw, and it's yet another example
of why NIST's DRBGs aren't all that great.

Meanwhile, the kernel's HMAC_DRBG code also tries to reseed itself
automatically after random.c has reseeded itself.  But the
implementation is buggy, as it just checks whether 300 seconds have
elapsed, rather than looking at the actual generation counter.

Let's just follow the example of BoringSSL and use the conservative
approach of always including 32 bytes of "regular" random data in the
additional input string.  This fixes both issues described above.

This does reduce performance.  But this should be tolerable, since:

  - Due to earlier changes, the kernel code that was previously using
    drbg.c regardless of FIPS mode is now using it only in FIPS mode.

  - The additional input string is processed only once per request.  So
    if a lot of bytes are generated at once, the cost is amortized.

  - The NIST DRBGs are notoriously slow anyway.

Note that this fix should have no impact (either positive or negative)
on FIPS 140 certifiability.  From FIPS's point of view the code added by
this commit simply doesn't matter: it adds zero entropy to something
that doesn't need to contain entropy.

Fixes: 541af946fe13 ("crypto: drbg - SP800-90A Deterministic Random Bit Generator")
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Simplify "uninstantiate" logic

drbg_kcapi_seed() calls drbg_uninstantiate() only to free drbg->jent and
set drbg->instantiated = false.  However, the latter is necessary only
because drbg_kcapi_seed() sets drbg->instantiated = true too early.  Fix
that, then just inline the freeing of drbg->jent.

Then, simplify the actual "uninstantiate" in drbg_kcapi_exit().  Just
free drbg->jent (note that this is a no-op on error and null pointers),
then memzero_explicit() the entire drbg_state.

Note that in reality the memzero_explicit() is redundant, since the
crypto_rng API zeroizes the memory anyway.  But the way SP800-90A is
worded, it's easy to imagine that someone assessing conformance with it
would be looking for code in drbg.c that says it does an "Uninstantiate"
and does the zeroization.  So it's probably worth keeping it somewhat
explicit, even though that means double zeroization in practice.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Fold drbg_prepare_hrng() into drbg_kcapi_seed()

Fold drbg_prepare_hrng() into its only caller.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Separate "reseed" case in drbg_kcapi_seed()

Clearly separate the code for the "reseed" and "instantiate" cases,
since what they need to do is quite different.

Note that the mutex guard makes the mutex start being held during the
call to drbg_uninstantiate(), which is fine.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Fold drbg_instantiate() into drbg_kcapi_seed()

Fold drbg_instantiate() into its only caller.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Put rng_alg methods in logical order

Put the DRBG implementation of the rng_alg methods in the order in which
they're called (cra_init => set_ent => seed => generate => cra_exit) so
that it's easier to understand the flow.

Also rename drbg_kcapi_random to drbg_kcapi_generate, and
drbg_kcapi_cleanup to drbg_kcapi_exit, so they match the method names.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Simplify drbg_generate_long() and fold into caller

Simplify drbg_generate_long() to use a more straightforward loop, and
then fold it into its only caller. No functional change.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Eliminate use of 'drbg_string' and lists

Use straightforward (buffer, len) parameters instead of struct
drbg_string or lists of strings. This simplifies the code considerably.

For now struct drbg_string is still used in crypto_drbg_ctr_df(), so
move its definition to crypto/df_sp80090a.h.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Consolidate "instantiate" logic and remove drbg_state::C

Currently some of the steps of "Instantiate the DRBG" occur in a
convoluted way in different places in the call stack:

drbg_instantiate()

    => drbg_seed(reseed=0) sets the raw HMAC key drbg_state::C to its
       correct initial value, and it sets the state value
       drbg_state::V to an *incorrect* initial value.

        => drbg_hmac_update(reseed=0) overwrites drbg_state::V with the
           correct initial value, then prepares the hmac_sha512_key
           drbg_state::key from the initial raw HMAC key drbg_state::C.

Later, each time the HMAC key is updated, drbg_hmac_update() also uses
drbg_state::C to temporarily store the new raw key.

Simplify all of this by:

    - Making drbg_instantiate() set the correct initial values of
      drbg_state::V and drbg_state::key.

    - Converting drbg_hmac_update() to generate the raw key in a
      temporary on-stack array instead of drbg_state::C.

    - Removing drbg_state::C.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Move module aliases to end of file

Move the MODULE_ALIAS_CRYPTO lines in the middle of the file to the end
so that they are in the usual place and next to the other one.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Install separate seed functions for pr and nopr

Set rng_alg::seed to different functions for the prediction-resistant
and non-prediction-resistant algorithms, so that the function does not
need to parse the algorithm name to figure out which algorithm it is.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Remove drbg_core

Now that none of the information in struct drbg_core is used, remove it.

The null-ity of the pointer drbg_state::core was used to keep track of
whether the DRBG has been instantiated. Replace it with a boolean.

No functional change.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Use HMAC-SHA512 library API

Since the HMAC algorithm is now fixed at HMAC-SHA512, just use the
HMAC-SHA512 library API. This is simpler and more efficient.

Remove error-handling code that is no longer needed.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Embed V and C into struct drbg_state

Now that the sizes of V and C are known at compile time, embed them into
struct drbg_state rather than using separate allocations.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Move fixed values into constants

Since only one drbg_core remains, the state length, block length, and
security strength are now fixed values. Moreover, the maximum request
length, maximum additional data length, and maximum number of requests
were all already fixed values.

Simplify the code by just using #defines for all these fixed values.

In drbg_seed_from_random(), take advantage of the constant to define the
array size. Remove assertions that are no longer useful.

In the case of drbg_blocklen() and drbg_statelen(), replace these with a
single value DRBG_STATE_LEN, as for HMAC_DRBG they are the same thing.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - De-virtualize drbg_state_ops

Now that there's only one set of state operations, use direct calls to
those operations.

No change in behavior. In particular, drbg_alloc_state() doesn't change
behavior, because the only remaining drbg_core uses HMAC_DRBG.
drbg_uninstantiate() doesn't change behavior, because a NULL d_ops
implied NULL priv_data which makes a drbg_fini_hash_kernel() a no-op.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Simplify algorithm registration

Now that "drbg_pr_hmac_sha512" and "drbg_nopr_hmac_sha512" are the only
crypto_rng algorithms left in crypto/drbg.c, simplify the algorithm
registration logic to register these more directly without relying on
the drbg_cores[] array (which will be removed).

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Remove support for HMAC-SHA256 and HMAC-SHA384

Remove support for the HMAC-SHA256 and HMAC-SHA384 variants of
HMAC_DRBG, leaving only the HMAC-SHA512 variant of HMAC_DRBG.

HMAC-SHA512 is already the default.  The default did used to be
HMAC-SHA256, but several years ago it was upgraded to HMAC-SHA512 "to
support compliance with SP800-90B and SP800-90C".  Given that the point
of crypto/drbg.c is compliance with those standards, and there's also no
technical reason to prefer HMAC-SHA384 in this situation even if
acceptable, there's really no point in offering anything else.

Note: now that only HMAC-SHA512 remains, a lot of unnecessary
abstractions can be removed.  A later commit will do that.  This commit
just straightforwardly removes the HMAC-SHA256 and HMAC-SHA384 code.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: testmgr - Update test for drbg_nopr_hmac_sha512

Synchronize the drbg_nopr_hmac_sha512 test vector with the first test
vector from the latest ACVP json files, so that both of the DRBG test
vectors are pulled from a consistent source.

Note that the new test vector has a nonempty personalization string.
That should be helpful as well: Some FIPS labs require this, due to
their interpretation of SP800-90A 11.3.2 which says that a
"representative" value of the personalization string must be tested.

It also now does an explicit reseed, which makes it clearer that the
requirement to test "Reseed" is met, without having to interpret the
additional input processing as covering that.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: testmgr - Add test for drbg_pr_hmac_sha512

Add a test for drbg_pr_hmac_sha512, which previously didn't have a test.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Flatten the DRBG menu

Now that the menuconfig CRYPTO_DRBG_MENU has no options in it other than
the hidden symbol CRYPTO_DRBG, remove it and move CRYPTO_DRBG to its
parent menu. Give CRYPTO_DRBG an appropriate prompt and help text.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Remove support for HASH_DRBG

Remove the support for HASH_DRBG.  It's likely unused code, seeing as
HMAC_DRBG is always enabled and prioritized over it unless
NETLINK_CRYPTO is used to change the algorithm priorities.

There's also no compelling reason to support more than one of
[HMAC_DRBG, HASH_DRBG, CTR_DRBG].  By definition, callers cannot tell
any difference in their outputs.  And all are FIPS-certifiable, which is
the only point of the kernel's NIST DRBGs anyway.

Switching to HASH_DRBG doesn't seem all that compelling, either.  For
one, it's more complex than HMAC_DRBG.

Thus, let's just drop HASH_DRBG support and focus on HMAC_DRBG.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Remove support for CTR_DRBG

Remove the support for CTR_DRBG.  It's likely unused code, seeing as
HMAC_DRBG is always enabled and prioritized over it unless
NETLINK_CRYPTO is used to change the algorithm priorities.

There's also no compelling reason to support more than one of
[HMAC_DRBG, HASH_DRBG, CTR_DRBG].  By definition, callers cannot tell
any difference in their outputs.  And all are FIPS-certifiable, which is
the only point of the kernel's NIST DRBGs anyway.

Switching to CTR_DRBG doesn't seem all that compelling, either.  While
it's often the fastest NIST DRBG, it has several disadvantages:

- CTR_DRBG uses AES.  Some platforms don't have AES acceleration at all,
  causing a fallback to the table-based AES code which is very slow and
  can be vulnerable to cache-timing attacks.  In contrast, HMAC_DRBG
  uses primitives that are consistently constant-time.

- CTR_DRBG is usually considered to be somewhat less cryptographically
  robust than HMAC_DRBG.  Granted, HMAC_DRBG isn't all that great
  either, e.g. given the negative result from Woodage & Shumow (2018)
  (https://eprint.iacr.org/2018/349.pdf), but that can be worked around.

- CTR_DRBG is more complex than HMAC_DRBG, risking bugs.  Indeed, while
  reviewing the CTR_DRBG code, I found two bugs, including one where it
  can return success while leaving the output buffer uninitialized.

- The kernel's implementation of CTR_DRBG uses an "ctr(aes)"
  crypto_skcipher and relies on it returning the next counter value.
  That's fragile, and indeed historically many "ctr(aes)"
  crypto_skcipher implementations haven't done that.  E.g. see
  commit 511306b2d075 ("crypto: arm/aes-ce - update IV after partial final CTR block"),
  commit fa5fd3afc7e6 ("crypto: arm64/aes-blk - update IV after partial final CTR block"),
  commit 371731ec2179 ("crypto: atmel-aes - Fix saving of IV for CTR mode"),
  commit 25baaf8e2c93 ("crypto: crypto4xx - fix ctr-aes missing output IV"),
  commit 334d37c9e263 ("crypto: caam - update IV using HW support"),
  commit 0a4491d3febe ("crypto: chelsio - count incomplete block in IV"),
  commit e8e3c1ca57d4 ("crypto: s5p - update iv after AES-CBC op end").

  I.e., there were many years where the kernel's CTR_DRBG code (if it
  were to have actually been used) repeated outputs on some platforms.

  AES-CTR also uses a 128-bit counter, which creates overflow edge cases
  that are sometimes gotten wrong.  E.g. see commit 009b30ac7444
  ("crypto: vmx - CTR: always increment IV as quadword").

So, while switching to CTR_DRBG for performance reasons isn't completely
out of the question (notably BoringSSL uses it), it would take quite a
bit more work to create a solid implementation of it in the kernel,
including a more solid implementation of AES-CTR itself (in lib/crypto/,
with a scalar bit-sliced fallback, etc).  Since HMAC_DRBG has always
been the default NIST DRBG variant in the kernel and is in a better
state, let's just standardize on it for now.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Remove import of crypto_cipher functions

The inclusion of <crypto/internal/cipher.h> and the import of the
internal crypto namespace became unnecessary in commit ba0570bdf1d9
("crypto: drbg - Replace AES cipher calls with library calls").

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Fold include/crypto/drbg.h into crypto/drbg.c

include/crypto/drbg.h no longer contains anything that is used
externally to crypto/drbg.c. Therefore, fold it into crypto/drbg.c.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Remove obsolete FIPS 140-2 continuous test

FIPS 140-2 required that a continuous test for repeated outputs be done
on both "Approved RNGs" and "Non-Approved RNGs".

That's apparently why crypto/drbg.c does such a test on the bytes it
pulls from get_random_bytes(), despite get_random_bytes() being a
"Non-Approved RNG" that is credited with zero entropy for FIPS purposes.
(From FIPS's point of view, the "Approved RNG" is jitterentropy.)

FIPS 140-3 "modernized" the continuous RNG test requirements.  They're
now a bit more sophisticated, requiring both an "Adaptive Proportion
Test" and a "Repetition Count Test".

At the same time, FIPS 140-3 doesn't require continuous RNG tests on
"Non-Approved RNGs" if a "vetted conditioning component" is used.  The
SP800-90A DRBGs are exactly such a vetted conditioning component, by
their design.  (In the case of HASH_DRBG and CTR_DRBG, the derivation
function does have to be implemented.  But the kernel does that.)

In other words: from FIPS 140-3's point of view, get_random_bytes()
still produces zero entropy, but the way the DRBG combines those bytes
with the jitterentropy bytes preserves all the "approved" entropy from
jitterentropy.  Thus no test for get_random_bytes() is required.

Seeing as FIPS 140-2 certificates stopped being issued in 2021 in favor
of FIPS 140-3, this means this code is obsolete.  Remove it.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Remove unhelpful helper functions

Fold the contents of the inline functions crypto_drbg_get_bytes_addtl(),
crypto_drbg_get_bytes_addtl_test(), and crypto_drbg_reset_test() into
their only caller in drbg_cavs_test(). It ends up being much simpler.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Remove broken commented-out code

This commented-out code doesn't compile. Even if it did, it wouldn't
actually do what it was apparently intended to do, seeing as the "test"
for "drbg_pr_hmac_sha512" and "drbg_pr_ctr_aes256" is alg_test_null().

Just delete it to avoid keeping broken code around, and so that there
isn't any perceived need to try to update it as the DRBG code is
refactored.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Remove always-enabled symbol CRYPTO_DRBG_HMAC

The kconfig symbol CRYPTO_DRBG_HMAC is always enabled when
CRYPTO_DRBG_MENU is enabled, and all checks for CRYPTO_DRBG_HMAC are in
code conditional on CRYPTO_DRBG_MENU. Thus, the only purpose of the
CRYPTO_DRBG_HMAC symbol is to select CRYPTO_HMAC and CRYPTO_SHA512.

Move those two selections to CRYPTO_DRBG_MENU, remove the checks for
CRYPTO_DRBG_HMAC, and remove the CRYPTO_DRBG_HMAC symbol itself.

Note that this also fixes an issue where CRYPTO_HMAC and CRYPTO_SHA512
were unnecessarily being forced to built-in when CRYPTO_DRBG=m.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Fix the fips_enabled priority boost

When fips_enabled=1, it seems to have been intended for one of the
algorithms defined in crypto/drbg.c to be the highest priority "stdrng"
algorithm, so that it is what is used by "stdrng" users.

However, the code only boosts the priority to 400, which is less than
the priority 500 used in drivers/crypto/caam/caamprng.c. Thus, the CAAM
RNG could be used instead.

Fix this by boosting the priority by 2000 instead of 200.

Fixes: 541af946fe13 ("crypto: drbg - SP800-90A Deterministic Random Bit Generator")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Fix drbg_max_addtl() on 64-bit kernels

On 64-bit kernels, drbg_max_addtl() returns 2**35 bytes.  That's too
large, for two reasons:

1. SP800-90A says the maximum limit is 2**35 *bits*, not 2**35 bytes.
   So the implemented limit has confused bits and bytes.

2. When drbg_kcapi_hash() calls crypto_shash_update() on the additional
   information string, the length is implicitly cast to 'unsigned int'.
   That truncates the additional information string to U32_MAX bytes.

Fix the maximum additional information string length to always be
U32_MAX - 1, causing an error to be returned for any longer lengths.

Fixes: 541af946fe13 ("crypto: drbg - SP800-90A Deterministic Random Bit Generator")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Fix ineffective sanity check

Fix drbg_healthcheck_sanity() to correctly check the return value of
drbg_generate(). drbg_generate() returns 0 on success, or a negative
errno value on failure. drbg_healthcheck_sanity() incorrectly assumed
that it returned a positive value on success.

This didn't make the sanity check fail, but it made it ineffective.

Fixes: cde001e4c3c3 ("crypto: rng - RNGs must return 0 in success case")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Fix misaligned writes in CTR_DRBG and HASH_DRBG

drbg_cpu_to_be32() is being used to do a plain write to a byte array,
which doesn't have any alignment guarantee. This can cause a misaligned
write. Replace it with the correct function, put_unaligned_be32().

Fixes: 72f3e00dd67e ("crypto: drbg - replace int2byte with cpu_to_be")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Fix returning success on failure in CTR_DRBG

drbg_ctr_generate() sometimes returns success when it fails, leaving the
output buffer uninitialized. Fix it.

Fixes: cde001e4c3c3 ("crypto: rng - RNGs must return 0 in success case")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

dt-bindings: crypto: qcom-qce: Document the Glymur crypto engine

Document the crypto engine on Glymur platform.

Signed-off-by: Harshal Dev <harshal.dev@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

dt-bindings: trivial-devices: add atmel,atecc608b

Add entry for ATECC608B. Update the ATECC508A comment for consistency.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: atmel-ecc - add support for atecc608b

Tested on hardware with an ATECC608B at 0x60. The device binds
successfully, passes the driver's sanity check, and registers the
ecdh-nist-p256 KPP algorithm.

The hardware ECDH path was also exercised using a minimal KPP test
module, covering private key generation, public key derivation, and
shared secret computation.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccp - Initialize data during __sev_snp_init_locked()

Sashiko notes:

> is the stack variable data left uninitialized when taking the else branch?
> Since data.tio_en is later evaluated unconditionally, could stack garbage
> cause it to evaluate to true, leading to erroneous attempts to allocate
> pages and initialize SEV-TIO on unsupported hardware?

If the firmware is too old to support SEV_INIT_EX, data is left
uninitialized but used in the debug logging about whether TIO is enabled or
not.

Fixes: 4be423572da1 ("crypto/ccp: Implement SEV-TIO PCIe IDE (phase1)")
Reported-by: Sashiko
Assisted-by: Gemini:gemini-3.1-pro-preview
Link: https://sashiko.dev/#/patchset/20260324161301.1353976-1-tycho%40kernel.org
Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccp - Check for page allocation failure correctly in TIO

Sashiko notes:

> if __snp_alloc_firmware_pages() returns NULL under memory pressure, is it
> safe to pass it directly to page_address()?
>
> On architectures without HASHED_PAGE_VIRTUAL, page_address(NULL) might
> compute a deterministic but invalid, non-zero virtual address. The
> subsequent if (tio_status) check would then evaluate to true, and
> sev_tsm_init_locked() would dereference the invalid pointer.

Indeed, page_address(NULL) will return non-NULL garbage here. Fix this by
checking the page allocation itself for NULL, not the resulting virtual
address.

Fixes: 4be423572da1 ("crypto/ccp: Implement SEV-TIO PCIe IDE (phase1)")
Reported-by: Sashiko
Assisted-by: Gemini:gemini-3.1-pro-preview
Link: https://sashiko.dev/#/patchset/20260324161301.1353976-1-tycho%40kernel.org
Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccp - Fix snp_filter_reserved_mem_regions() off-by-one

Sashiko notes:

> regarding the bounds check in snp_filter_reserved_mem_regions()
> called via walk_iomem_res_desc(): does the check
> if ((range_list->num_elements * 16 + 8) > PAGE_SIZE)
> allow an off-by-one heap buffer overflow?
>
> If range_list->num_elements is 255, 255 * 16 + 8 = 4088, which is <= 4096.
> Writing range->base (8 bytes) fills 4088-4095, but writing range->page_count
> (4 bytes) would write to 4096-4099, overflowing the kzalloc-allocated
> PAGE_SIZE buffer.

Fix this by accounting for the entry about to be written to, in addition to
the entries that are already allocated.

Fixes: 1ca5614b84ee ("crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP")
Reported-by: Sashiko
Assisted-by: Gemini:gemini-3.1-pro-preview
Link: https://sashiko.dev/#/patchset/20260324161301.1353976-1-tycho%40kernel.org
Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccp - Reverse the cleanup order in psp_dev_destroy()

Before SNP x86 shutdown [1], all HV_FIXED pages were always leaked on
module unload. Now pages can be reclaimed if they are freed before SNP
shutdown.

The SFS driver does sfs_dev_destroy() -> snp_free_hv_fixed_pages(), marking
the command buffer as free. But this happens after sev_dev_destroy() in
psp_dev_destroy(), so the pages are always leaked.

Rearrange psp_dev_destroy() to destroy things in the reverse order from
psp_init(), so that any dependencies can be unwound accordingly. This lets
SFS free the page and the subsequent SNP shutdown release it.

This was identified with use of Chris Mason's review-prompts:
https://github.com/masoncl/review-prompts

[1]: https://lore.kernel.org/all/20260324161301.1353976-1-tycho@kernel.org/

Fixes: 648dbccc03a0 ("crypto: ccp - Add AMD Seamless Firmware Servicing (SFS) driver")
Reported-by: review-prompts
Assisted-by: Claude:claude-4.6-opus
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Reviewed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

gpio: amd8111: Drop useless zeros in array initialisation

The compiler fills in zeros as needed, so there is no technical reason
to add explicit zeros at the end of a list initializer. Drop them.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/20260506144918.2445358-2-u.kleine-koenig@baylibre.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>

drm/panthor: Fix missing declaration for panthor_transparent_hugepage

sparse reports:
drivers/gpu/drm/panthor/panthor_drv.c:1805:6: warning: symbol 'panthor_transparent_hugepage' was not declared. Should it be static?

Make it clean.

Signed-off-by: gyeyoung <gye976@gmail.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patch.msgid.link/20260503144234.2150138-1-gye976@gmail.com

x86/boot/e820: Re-enable BIOS fallback if e820 table is empty

In commit:

157266edcc56 ("x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables")

the check on the number of entries in the e820 table was removed. The intention
was to support single-entry maps, but by removing the check entirely, we also
skip the fallback (to, e.g., the BIOS 88h function).

This means that if no E820 map is passed in from the bootloader (which is the
case on some bootloaders, like linld), we end up with an empty memory map, and
the kernel fails to boot (either by deadlocking on OOM, or by failing to
allocate the real mode trampoline, or similar).

Re-instate the check in append_e820_table(), but only check that nr_entries is
non-zero. This allows e820__memory_setup_default() to fall back to other memory
size sources, and doesn't affect e820__memory_setup_extended(), as the latter
ignores the return value from append_e820_table().

In doing so, we also update the return values to be proper error codes, with
-ENOENT for this case (there are no entries), and -EINVAL for the case where an
entry appears invalid. Given none of the callers check the actual value -- just
whether it's nonzero -- this is largely aesthetic in practice.

Tested against linld, and the kernel boots again fine.

[ mingo: Readability edits to the comment and the changelog. ]

Fixes: 157266edcc56 ("x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables")
Signed-off-by: David Gow <david@davidgow.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://patch.msgid.link/20260416065746.1896647-1-david@davidgow.net

xfrm: route MIGRATE notifications to caller's netns

xfrm_send_migrate() in net/xfrm/xfrm_user.c and pfkey_send_migrate()
in net/key/af_key.c both hardcode &init_net for the multicast that
announces a successful XFRM_MSG_MIGRATE / SADB_X_MIGRATE.

XFRM_MSG_MIGRATE arrives on a per-netns NETLINK_XFRM socket, and the
rest of the xfrm/af_key netlink path was made netns-aware in 2008.
The other 14 multicast paths in xfrm_user.c route their event using
xs_net(x), xp_net(xp) or sock_net(skb->sk); only the migrate path
was missed.

Two consequences of the init_net hardcoding:

  1. The notification (selector, old/new endpoint addresses, and the
     km_address) is delivered to listeners on init_net's
     XFRMNLGRP_MIGRATE / pfkey BROADCAST_ALL groups rather than on
     the issuing netns. An IKE daemon running in init_net therefore
     receives migration notifications originating from any other
     netns on the host.

  2. An IKE daemon running inside a non-init netns and subscribed
     to its own XFRMNLGRP_MIGRATE / pfkey groups never receives the
     notification of its own migration. IKEv2 MOBIKE / address-update
     handling inside a netns is silently broken.

Thread struct net through km_migrate() and the xfrm_mgr.migrate
function pointer, drop the &init_net override in xfrm_send_migrate()
and pfkey_send_migrate(), and pass the caller's net (already in
scope in xfrm_migrate() via sock_net(skb->sk)) all the way down.
struct xfrm_mgr is in-tree only and not exported as a stable API,
so the function-pointer signature change is internal.

pfkey_broadcast() is already netns-aware via net_generic(net,
pfkey_net_id) since the pernet conversion. The five other
pfkey_broadcast() callers in af_key.c already pass xs_net(x),
sock_net(sk) or a per-netns net, so this only removes the
&init_net outlier.

Fixes: 5c79de6e79cd ("[XFRM]: User interface for handling XFRM_MSG_MIGRATE")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

drm/gpusvm: Drop redundant @flags.* kernel-doc on struct drm_gpusvm_pages

The kernel-doc block above struct drm_gpusvm_pages duplicates the
descriptions of the bit-flags that live in struct drm_gpusvm_pages_flags
using dotted notation (@flags.migrate_devmem, @flags.unmapped, ...).
That dotted notation is intended for nested anonymous structs/unions that
the parser flattens into the parent's parameter list.  Here, however,
flags is of a named external type, so the parser does not flatten its
members and the dotted entries do not match any member of
drm_gpusvm_pages.  They also duplicate the canonical descriptions already
present in the kernel-doc of struct drm_gpusvm_pages_flags itself.

Drop the five @flags.* lines and replace them with a single @flags entry
that cross-references the type via kernel-doc's "&struct ..." syntax.
This eliminates the redundancy and removes warnings emitted by the new
parameterdescs check in scripts/kernel-doc:

  Excess struct member 'flags.migrate_devmem' description in
   'drm_gpusvm_pages'
  Excess struct member 'flags.unmapped' description in 'drm_gpusvm_pages'
  Excess struct member 'flags.partial_unmap' description in
   'drm_gpusvm_pages'
  Excess struct member 'flags.has_devmem_pages' description in
   'drm_gpusvm_pages'
  Excess struct member 'flags.has_dma_mapping' description in
   'drm_gpusvm_pages'

No functional change.

Assisted-by: Claude:claude-opus-4.6
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260501175956.4054088-1-shuicheng.lin@intel.com

Merge tag 'v7.1-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- Fix memory leak in connection free

- Fix inherited ACL ACE validation

- Minor cleanup

- Fix for share config

- Fix durable handle cleanup race

- Fix close_file_table_ids in session teardown

- smbdirect fixes:
    - Fix memory region registration
    - Two fixes for out-of-tree builds

* tag 'v7.1-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: validate inherited ACE SID length
  ksmbd: fix kernel-doc warnings from ksmbd_conn_get/put()
  ksmbd: fail share config requests when path allocation fails
  ksmbd: close durable scavenger races against m_fp_list lookups
  ksmbd: harden file lifetime during session teardown
  ksmbd: centralize ksmbd_conn final release to plug transport leak
  smb: smbdirect: fix MR registration for coalesced SG lists
  smb: smbdirect: introduce and use include/linux/smbdirect.h
  smb: smbdirect: make use of DEFAULT_SYMBOL_NAMESPACE and EXPORT_SYMBOL_GPL

Merge tag 'chrome-platform-fixes-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome-platform fix from Tzung-Bi Shih:

- Fix a NULL dereference in cros_ec_typec

* tag 'chrome-platform-fixes-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_typec: Init mutex in Thunderbolt registration

OPP: Fix race between OPP addition and lookup

A race exists between dev_pm_opp_add_dynamic() and
dev_pm_opp_find_freq_exact():

  CPU0 (add)                          CPU1 (lookup)
  -------------------------------     ------------------------------
  _opp_add()
    mutex_lock()
    list_add(&new_opp->node, head)
    mutex_unlock()                    _opp_table_find_key()
                                        mutex_lock()
                                        dev_pm_opp_get(opp)
                                          kref_get()
                                        mutex_unlock()
    kref_init(&new_opp->kref)
                                      dev_pm_opp_put()
                                        kref_put_mutex()

The newly added OPP is inserted into the list before its kref is
initialized. A concurrent lookup can find this OPP and increment its
reference count while it is still uninitialized, leading to refcount
corruption and a potential premature free.

Fix this by initializing ->kref and ->opp_table before making the OPP
visible via list_add(). This ensures any concurrent lookup observes a
fully initialized object.

Fixes: 7034764a1e4a (PM / OPP: Add 'struct kref' to struct dev_pm_opp)
Co-developed-by: Ling Xu <ling_ling.xu@unisoc.com>
Signed-off-by: Ling Xu <ling_ling.xu@unisoc.com>
Signed-off-by: Di Shen <di.shen@unisoc.com>
[ Viresh: Updated commit log ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

w5100: remove unused gpio link detection

Since the platform_device support is now gone, nothing ever passes a
valid gpio number, and all the link state handling can go away.

An earlier version of my patch changed this to look up the GPIO descriptor
from devicetree and convert it all to the modern interface, but there
are no users of that binding at the moment.

Remove the gpio handling, which is now one of the last users of the
legacy gpio interface in platform-independent code.

Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20230127095839.3266452-1-arnd@kernel.org/
Link: https://patch.msgid.link/20260505180459.1247690-3-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

w5300: remove unused driver

Unlike w5100, this driver does not support SPI mode or devicetree
bindings, and is hence entirely unusable without third-party board
support patches that likely haven't existed for any recent kernel
version.

Remove the entire driver.

If anyone is in fact using it with their custom board files, they
can bring it back and include an earlier patch I sent to add
DT based probing for the GPIO lines.

Link: https://lore.kernel.org/all/20260427142924.2702598-1-arnd@kernel.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260505180459.1247690-2-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

w5100: remove MMIO support

This driver supports both SPI and MMIO based register access, but only
the former has devicetree support. While MMIO mode would have worked
with old-style board files, those have never defined such a device
upstream.

Remove the MMIO mode, leaving SPI as the only way to use this driver,
but leave it in two loadable modules. More cleanups can be done by
combining the two into one file.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260505180459.1247690-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx5-improve-representor-lifecycle-and-late-ib-representor-loading'

Tariq Toukan says:

====================
net/mlx5: Improve representor lifecycle and late IB representor loading

This series addresses two problems that have been present for years, and
fixes one representor reload error-unwind case exposed while making the
reload path reusable.

First, there is no coordination between E-Switch reconfiguration and
representor registration. The E-Switch can be mid-way through a mode
change or VF count update while mlx5_ib walks in and registers or
unregisters representors. Nothing stops them. The race window is small
and there is no field report, but it is clearly wrong.

Second, loading mlx5_ib while the device is already in switchdev mode
does not bring up the IB representors. mlx5_eswitch_register_vport_reps()
only stores callbacks; nobody triggers the actual load after registration.

The series fixes the registration race with a per-E-Switch representor
mutex. The lock is introduced first, then LAG shared-FDB and multiport
E-Switch transitions are adjusted so auxiliary device rescans and IB
representor reloads do not hold ldev->lock while taking the representor
lock. This keeps the intermediate commits bisectable before the stricter
E-Switch serialization and lock assertions are enabled.

After the LAG ordering is fixed, all E-Switch reconfiguration paths that
create, destroy, load, or unload representors take the representor mutex.
esw_mode_change() deliberately drops the mutex around
mlx5_rescan_drivers_locked(), because auxiliary probe and remove paths
re-enter mlx5_eswitch_register_vport_reps() and
mlx5_eswitch_unregister_vport_reps() on the same thread.

The shared-FDB peer IB registration path can hold one E-Switch
representor mutex and then register peer representor ops on another
E-Switch. The series annotates that case as nested locking so lockdep can
distinguish it from recursive locking on the same E-Switch.

For the missing IB representors, mlx5_eswitch_register_vport_reps() queues
a work item that acquires the devlink lock and loads all relevant
representors. This is the change that actually fixes the long-standing
bug.

The reload path also learns to track which representor types were loaded by
the current attempt, so an error does not unload representors that were
already active before the retry.

Patch 1 is cleanup. LAG and MPESW had the same representor reload
sequence duplicated in several places and the copies had started to
drift. This consolidates them into one helper.

Patch 2 lets E-Switch workqueue callers choose GFP allocation flags.

Patch 3 adds the per-E-Switch representor lifecycle lock and helper APIs.

Patch 4 adjusts the LAG shared-FDB and multiport E-Switch transitions so
auxiliary device rescans and IB representor reloads run without
ldev->lock held while taking the representor lock.

Patch 5 protects the E-Switch reconfiguration, representor registration
and peer IB representor paths with the representor lock.

Patch 6 fixes representor load error unwind so only representor types
loaded by the current attempt are unloaded on failure.

Patch 7 moves the representor load triggered by
mlx5_eswitch_register_vport_reps() onto the work queue. This is the patch
that fixes IB representors not coming up when mlx5_ib is loaded while the
device is already in switchdev mode.
====================

Link: https://patch.msgid.link/20260503202726.266415-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: E-Switch, load reps via work queue after registration

mlx5_eswitch_register_vport_reps() only installs representor callbacks and
marks the rep type as registered. If the E-Switch is already in switchdev
mode, the newly registered rep type must then be loaded for already enabled
vports.

That load path needs to run under the devlink lock, which is not held by
the auxiliary driver registration context. Queue the reload to the E-Switch
workqueue, whose handler acquires the devlink lock, and load the relevant
representors from there.

Since representor registration runs from sleepable auxiliary-driver
context, queue the late reload with GFP_KERNEL. The functions-change
notifier path remains the GFP_ATOMIC user of mlx5_esw_add_work().

The unregister path is unchanged and still unloads representors
synchronously while tearing down the registered callbacks.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: E-Switch, unwind only newly loaded representor types

__esw_offloads_load_rep() may return success without invoking the
representor load callback when the representor type is already loaded.

On a later load failure, mlx5_esw_offloads_rep_load() unconditionally
unloaded all previously iterated representor types. This could unload
representor types that were already loaded before this load attempt.

Track which representor types were actually loaded by the current call and
unwind only those on error. Also restore the representor state back to
REP_REGISTERED when the load callback itself fails.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: E-Switch, serialize representor lifecycle

Representor callbacks can be registered and unregistered while the
E-Switch is already in switchdev mode, and the same E-Switch may also be
reconfigured by devlink, VF changes and SF changes. Serialize these paths
with the per-E-Switch representor mutex instead of relying on ad-hoc bit
state and wait queues.

Take the representor lock around the mode transition, VF/SF representor
changes and representor ops registration. Keep mode_lock and the
representor lock unnested by using the operation flag while the mode lock
is dropped. During mode changes, drop the representor lock around the
auxiliary bus rescan because driver bind/unbind may register or unregister
representor ops.

Split representor ops registration into locked public wrappers and blocked
internal helpers, clear the ops pointer on unregister, and add nested
wrappers for the shared-FDB master IB path that registers peer
representor ops while another E-Switch representor lock is already held.

On unregister, always call __unload_reps_all_vport() before marking reps
unregistered and clearing rep_ops. The per-representor state check makes
this a no-op for types that were not loaded, so unregister no longer has
to infer load state from esw->mode.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Lag, avoid LAG and representor lock cycles

The LAG shared-FDB and multiport E-Switch transitions rescan auxiliary
devices and reload IB representors while holding ldev->lock. Driver
bind/unbind paths may register or unregister E-Switch representor ops, and
representor load paths may enter LAG code, so holding ldev->lock across
those calls creates lock-order cycles with the E-Switch representor lock.

Keep the devcom component locked for the transition, but drop ldev->lock
before rescanning auxiliary devices or reloading IB representors. Mark the
LAG transition as in progress while the lock is dropped and assert the
devcom lock where the helper relies on it. This preserves LAG serialization
while avoiding ldev->lock nesting under E-Switch representor registration.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: E-Switch, add representor lifecycle lock

Add a per-E-Switch mutex for serializing representor lifecycle work and
provide small helpers for taking and dropping it. Initialize and destroy
the mutex with the E-Switch offloads state.

Add the lock and helper API first. Follow-up patches will take the lock in
the individual representor lifecycle components. This keeps the functional
changes split by component and leaves this patch without intended behavior
change, making the series easier to review and bisectable.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: E-Switch, let esw work callers choose GFP flags

mlx5_esw_add_work() always allocates the queued work item with
GFP_ATOMIC. That is required for the E-Switch functions-change notifier,
but not every caller of this helper will run from atomic context.

Pass an allocation flag to mlx5_esw_add_work() and keep the notifier
caller using GFP_ATOMIC. This allows sleepable callers to use GFP_KERNEL
instead of unnecessarily relying on atomic reserves.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Lag: refactor representor reload handling

Representor reload during LAG/MPESW transitions has to be repeated in
several flows, and each open-coded loop was easy to get out of sync
when adding new flags or tweaking error handling. Move the sequencing
into a single helper so that all call sites share the same ordering
and checks.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'r8152-add-support-for-the-rtl8159-10gbit-usb-ethernet-chip'

Birger Koblitz says:

====================
r8152: Add support for the RTL8159 10Gbit USB Ethernet chip

Add support for the RTL8159, which is a 10GBit USB-Ethernet adapter
chip in the RTL815x family of chips.

The RTL8159 re-uses the frame descriptor format and SRAM2 access introduced
with the RTL8157 as well as most of the setup and PM logic of the RTL8157.

The module was tested with a Lekuo DR59R11 USB-C 10GbE Ethernet Adapter:
[ 2502.906947] usb 2-1: new SuperSpeed USB device number 3 using xhci_hcd
[ 2502.927859] usb 2-1: New USB device found, idVendor=0bda, idProduct=815a, bcdDevice=30.00
[ 2502.927867] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=7
[ 2502.927871] usb 2-1: Product: USB 10/100/1G/2.5G/5G/10G LAN
[ 2502.927873] usb 2-1: Manufacturer: Realtek
[ 2502.927875] usb 2-1: SerialNumber: 000388C9B3B5XXXX
[ 2503.063745] r8152-cfgselector 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
[ 2503.123876] r8152 2-1:1.0: Requesting firmware: rtl_nic/rtl8159-1.fw
[ 2503.126267] r8152 2-1:1.0: PHY firmware installed 0 to be loaded: 20
[ 2503.156265] r8152 2-1:1.0: load rtl8159-1 v1 2026/01/01 successfully
[ 2503.270729] r8152 2-1:1.0 eth0: v1.12.13
[ 2503.289349] r8152 2-1:1.0 enx88c9b3b5xxxx: renamed from eth0
[ 2507.777055] r8152 2-1:1.0 enx88c9b3b5xxxx: carrier on

The RTL8159 adapter was tested against an AQC107 PCIe-card supporting
10GBit/s and an RTL8157 5Gbit USB-Ethernet adapter supporting 5GBit/s for
performance, link speed and EEE negotiation. Using USB3.2 Gen 2 (20GBit) with
the RTL8159 USB adapter and running iperf3 against the AQC107 PCIe
card resulted in 8.96 Gbits/sec transfer speed.

The code is based on the out-of-tree r8152 driver published by Realtek under
the GPL.

The RTL8159 requires firmware for the PHY in order to achieve a 10GBit link
speed. Without firmware, only 5GBit were achieved. The firmware can be
extracted from the out-of-tree r8152 driver-code where it is stored in the
ram17 u8-array. Code is added to use the existing firmware upload mechanism
of the driver for the RTL8157/9 PHY firmware code. The firmware will be
submitted separately to linux-firmware.
====================

Link: https://patch.msgid.link/20260505-rtl8159_net_next-v4-0-1a648a9c4d8d@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8152: Add firmware upload capability for RTL8157/RTL8159

The RTL8159 (RTL_VER_17) requires firmware for its PHY in order to work
at connection speeds > 5GBit. Add support for uploading firmware for
the PHY using the existing rtl8152_apply_firmware() function
in r8157_hw_phy_cfg() and set up the correct names for the firmware
files.

This also adds support for uploading firmware for the RTL8157
(RTL_VER_16) PHY, for which firmware is however not strictly necessary
to work. Still, this allows to upload newer versions of the firmware used
by this chip, e.g. to improve interoperability.

If no firmware is found, both the RTL8157 and the RTL8159 will continue
to work.

Signed-off-by: Birger Koblitz <mail@birger-koblitz.de>
Tested-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Link: https://patch.msgid.link/20260505-rtl8159_net_next-v4-3-1a648a9c4d8d@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8152: Add support for the RTL8159 chip

The RTL8159 re-uses the packet descriptor format introduced with the
RTL8157 and other hardware features of the RTL8157 (RTL_VER_16) such
as the SRAM access. The support therefore consists in expanding the
existing RTL8157 code for initialization and USB power management
to also be used for the RTL8159 (RTL_VER_17).

Most of the additional code is added in r8157_hw_phy_cfg() to configure
the RTL8159 PHY.

Add support for the USB device ID of Realtek RTL8159-based adapters,
for which the product ID is 0x815a. Detect the RTL8159 as RTL_VER_17
and set it up.

Signed-off-by: Birger Koblitz <mail@birger-koblitz.de>
Tested-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Link: https://patch.msgid.link/20260505-rtl8159_net_next-v4-2-1a648a9c4d8d@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8152: Add support for 10Gbit Link Speeds and EEE

The RTL8159 supports 10GBit Link speeds. Add support for this speed
in the setup and setting/getting through ethtool. Also add 10GBit EEE.
Add functionality for setup and ethtool get/set methods.

Signed-off-by: Birger Koblitz <mail@birger-koblitz.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Link: https://patch.msgid.link/20260505-rtl8159_net_next-v4-1-1a648a9c4d8d@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: cortina: Drop half-assembled SKB

In gmac_rx() (drivers/net/ethernet/cortina/gemini.c), when
gmac_get_queue_page() returns NULL for the second page of a multi-page
fragment, the driver logs an error and continues — but does not free the
partially assembled skb that was being assembled via napi_build_skb() /
napi_get_frags().

Free the in-progress partially assembled skb via napi_free_frags()
and increase the number of dropped frames appropriately
and assign the skb pointer NULL to make sure it is not lingering
around, matching the pattern already used elsewhere in the driver.

Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Signed-off-by: Andreas Haarmann-Thiemann <eitschman@nebelreich.de>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260505-gemini-ethernet-fix-v2-1-997c31d06079@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx5e-report-more-netdev-stats'

Tariq Toukan says:

====================
net/mlx5e: Report more netdev stats

This series by Gal extends the set of counters reported in netdev stats,
by adding:
- hw_gso_packets/bytes
- RX HW-GRO stats
- TX csum_none
- TX queue stop/wake

It also aligns the tso_bytes/tso_inner_bytes counters with the netdev
stats API and virtio spec definition.
====================

Link: https://patch.msgid.link/20260504183704.272322-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Report stop and wake TX queue stats

Report TX queue stop and wake statistics via the netdev queue stats API
by mapping the existing stopped and wake counters to the stop and wake
fields.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504183704.272322-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Report TX csum_none netdev stat

Report TX csum_none statistic via the netdev queue stats API by mapping
the existing csum_none counter to the csum_none field.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504183704.272322-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Report RX HW-GRO netdev stats

Report RX hardware GRO statistics via the netdev queue stats API by
mapping the existing gro_packets, gro_bytes and gro_skbs counters to the
hw_gro_wire_packets, hw_gro_wire_bytes and hw_gro_packets fields.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504183704.272322-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Report hw_gso_packets and hw_gso_bytes netdev stats

Report hardware GSO statistics via the netdev queue stats API by mapping
the existing TSO counters to hw_gso_packets and hw_gso_bytes fields.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504183704.272322-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Count full skb length in TSO byte counters

The tso_bytes and tso_inner_bytes counters currently subtract the header
length from skb->len, counting only the payload. This is confusing and
doesn't align with the behavior of other _bytes counters in the driver.

Report the full skb length to align with this expectation.

This also makes our behavior consistent with the netdev stats API and
virtio spec definition.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504183704.272322-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

spi: s3c64xx: fix all kernel-doc warnings

Add kernel-doc for one struct member and use the correct function name
to eliminate kernel-doc warnings:

Warning: include/linux/platform_data/spi-s3c64xx.h:40 struct member
'polling' not described in 's3c64xx_spi_info'
Warning: include/linux/platform_data/spi-s3c64xx.h:51 expecting prototype
for s3c64xx_spi_set_platdata(). Prototype was for
s3c64xx_spi0_set_platdata() instead

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Link: https://patch.msgid.link/20260506175144.449364-1-rdunlap@infradead.org
Signed-off-by: Mark Brown <broonie@kernel.org>

Merge branch 'mptcp-pm-misc-fixes-for-v7-1-rc3'

Matthieu Baerts says:

====================
mptcp: pm: misc. fixes for v7.1-rc3

Here are various fixes, mainly related to ADD_ADDRs:

- Patch 1: save ADD_ADDR for rtx with ID0 when needed. A fix for v6.1.

- Patch 2: remove unneeded exception for ID 0. A fix for v5.10.

- Patches 3-5: fix potential data-race and leaks during ADD_ADDR rtx. A
  fix for v5.10.

- Patch 6: resched blocked ADD_ADDR rtx after a more appropriated
  timeout, not after 15 seconds. A fix for v5.10.

- Patch 7: skip inactive subflows when when looking at the max RTO. A
  fix for v6.18.

- Patch 8: avoid iterating over all subflows when there is no need to. A
  fix for v6.18.

- Patch 9: skip closed subflows when looking at sending MP_PRIO. A fix
  for v5.17.

- Patch 10: properly catch errors when using check_output() in the
  selftests. A fix for v6.9.

- Patch 11: skip the 'unknown' flag test when 'ip mptcp' is used. A fix
  for v6.10.
====================

Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-0-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: pm: restrict 'unknown' check to pm_nl_ctl

When pm_netlink.sh is executed with '-i', 'ip mptcp' is used instead of
'pm_nl_ctl'. IPRoute2 doesn't support the 'unknown' flag, which has only
been added to 'pm_nl_ctl' for this specific check: to ensure that the
kernel ignores such unsupported flag.

No reason to add this flag to 'ip mptcp'. Then, this check should be
skipped when 'ip mptcp' is used.

Fixes: 0cef6fcac24d ("selftests: mptcp: ip_mptcp option for more scripts")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-11-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: check output: catch cmd errors

Using '${?}' inside the if-statement to check the returned value from
the command that was evaluated as part of the if-statement is not
correct: here, '${?}' will be linked to the previous instruction, not
the one that is expected here (${cmd}).

Instead, simply mark the error, except if an error is expected. If
that's the case, 1 can be passed as the 4th argument of this helper.
Three checks from pm_netlink.sh expect an error.

While at it, improve the error message when the command unexpectedly
fails or succeeds.

Note that we could expect a specific returned value, but the checks
currently expecting an error can be used with 'ip mptcp' or 'pm_nl_ctl',
and these two tools don't return the same error code.

Fixes: 2d0c1d27ea4e ("selftests: mptcp: add mptcp_lib_check_output helper")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-10-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: prio: skip closed subflows

When sending an MP_PRIO, closed subflows need to be skipped.

This fixes the case where the initial subflow got closed, re-opened
later, then an MP_PRIO is needed for the same local address.

Note that explicit MP_PRIO cannot be sent during the 3WHS, so it is fine
to use __mptcp_subflow_active().

Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support")
Cc: stable@vger.kernel.org
Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-9-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: ADD_ADDR rtx: return early if no retrans

No need to iterate over all subflows if there is no retransmission
needed.

Exit early in this case then.

Fixes: 30549eebc4d8 ("mptcp: make ADD_ADDR retransmission timeout adaptive")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-8-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: ADD_ADDR rtx: skip inactive subflows

When looking at the maximum RTO amongst the subflows, inactive subflows
were taken into account: that includes stale ones, and the initial one
if it has been already been closed.

Unusable subflows are now simply skipped. Stale ones are used as an
alternative: if there are only stale ones, to take their maximum RTO and
avoid to eventually fallback to net.mptcp.add_addr_timeout, which is set
to 2 minutes by default.

Fixes: 30549eebc4d8 ("mptcp: make ADD_ADDR retransmission timeout adaptive")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-7-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quicker

When an ADD_ADDR needs to be retransmitted and another one has already
been prepared -- e.g. multiple ADD_ADDRs have been sent in a row and
need to be retransmitted later -- this additional retransmission will
need to wait.

In this case, the timer was reset to TCP_RTO_MAX / 8, which is ~15
seconds. This delay is unnecessary long: it should just be rescheduled
at the next opportunity, e.g. after the retransmission timeout.

Without this modification, some issues can be seen from time to time in
the selftests when multiple ADD_ADDRs are sent, and the host takes time
to process them, e.g. the "signal addresses, ADD_ADDR timeout" MPTCP
Join selftest, especially with a debug kernel config.

Note that on older kernels, 'timeout' is not available. It should be
enough to replace it by one second (HZ).

Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-6-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: ADD_ADDR rtx: free sk if last

When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer(),
and released at the end.

If at that moment, it was the last reference being held, the sk would
not be freed. sock_put() should then be called instead of __sock_put().

But that's not enough: if it is the last reference, sock_put() will call
sk_free(), which will end up calling sk_stop_timer_sync() on the same
timer, and waiting indefinitely to finish. So it is needed to mark that
the timer is done at the end of the timer handler when it has not been
rescheduled, not to call sk_stop_timer_sync() on "itself".

Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-5-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: ADD_ADDR rtx: always decrease sk refcount

When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer().
It should then be released in all cases at the end.

Some (unlikely) checks were returning directly instead of calling
sock_put() to decrease the refcount. Jump to a new 'exit' label to call
__sock_put() (which will become sock_put() in the next commit) to fix
this potential leak.

While at it, drop the '!msk' check which cannot happen because it is
never reset, and explicitly mark the remaining one as "unlikely".

Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-4-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: ADD_ADDR rtx: fix potential data-race

This mptcp_pm_add_timer() helper is executed as a timer callback in
softirq context. To avoid any data races, the socket lock needs to be
held with bh_lock_sock().

If the socket is in use, retry again soon after, similar to what is done
with the keepalive timer.

Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-3-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: ADD_ADDR rtx: allow ID 0

ADD_ADDR can be sent for the ID 0, which corresponds to the local
address and port linked to the initial subflow.

Indeed, this address could be removed, and re-added later on, e.g. what
is done in the "delete re-add signal" MPTCP Join selftests. So no reason
to ignore it.

Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-2-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: kernel: correctly retransmit ADD_ADDR ID 0

When adding the ADD_ADDR to the list, the address including the IP, port
and ID are copied. On the other hand, when the endpoint corresponds to
the one from the initial subflow, the ID is set to 0, as specified by
the MPTCP protocol.

The issue is that the ID was reset after having copied the ID in the
ADD_ADDR entry. So the retransmission was done, but using a different ID
than the initial one.

Fixes: 8b8ed1b429f8 ("mptcp: pm: reuse ID 0 after delete and re-add")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-1-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: tcp_child_process() related UAF

tcp_child_process( .. child ...) currently calls sock_put(child).

Unfortunately @child (named @nsk in callers) can be used after
this point to send a RST packet.

To fix this UAF, I remove the sock_put() from tcp_child_process()
and let the callers handle this after it is safe.

Remove @rsk variable in tcp_v4_do_rcv() and change tcp_v6_do_rcv()
so that both functions look the same.

Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260505153927.3435532-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: sch_sfq: annotate data-races from sfq_dump_class_stats()

sfq_dump_class_stats() runs locklessly, add needed READ_ONCE()
and WRITE_ONCE() annotations.

Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260505091133.2452510-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

inetpeer: add a missing read_seqretry() in inet_getpeer()

When performing a lockless lookup over the inet_peer rbtree,
if a matching node is found, inet_getpeer() returns it immediately
without validating the seqlock sequence.

This missing check introduces a race condition:

Trigger Path: When a host receives an incoming fragmented IPv4 packet,
ip4_frag_init() (in net/ipv4/ip_fragment.c) calls inet_getpeer_v4()
to track the peer.

The Race: If the packet is from a new source IP, CPU A acquires the
write_seqlock, allocates a new inet_peer node (p), sets its IP address
(daddr), and links it to the rbtree (rb_link_node).

Uninitialized Access: Due to the lack of memory barriers between
rb_link_node and the initialization of the rest of the struct
(like refcount_set(&p->refcnt, 1)), CPU A can make the node visible
to readers before its refcnt is initialized.
This is especially true on weakly-ordered architectures like ARM64
where the CPU can reorder the memory stores.

Lockless Reader: Concurrently, CPU B processes a second fragmented packet
from the same source IP. CPU B does a lockless lookup, finds the newly
inserted node, and returns it immediately.

Use-After-Free (UAF): CPU B reads p->refcnt as uninitialized garbage
(left over from previous kmalloc-128/192 allocations).
If the garbage is > 0, refcount_inc_not_zero(&p->refcnt) succeeds.
CPU A then executes refcount_set(&p->refcnt, 1), overwriting CPU B's increment.
When CPU B finishes with the fragment queue, it calls inet_putpeer(),
which drops the refcount to 0 and frees the node via RCU.
The node is now freed but remains linked in the rbtree,
resulting in a Use-After-Free in the rbtree.

Fixes: b145425f269a ("inetpeer: remove AVL implementation in favor of RB tree")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260505133233.3039575-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: rtsn: fix mdio_node leak in rtsn_mdio_alloc()

of_get_child_by_name() takes a reference. The rtsn_reset() and
rtsn_change_mode() failure paths jump to out_free_bus and leak
mdio_node.

Add out_put_node to drop it before falling through.

Fixes: b0d3969d2b4d ("net: ethernet: rtsn: Add support for Renesas Ethernet-TSN")
Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260505123236.406000-1-shitalkumar.gandhi@cambiumnetworks.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'netdevsim-psp-fix-init-and-uninit-bugs'

Daniel Zahka says:

====================
netdevsim: psp: fix init and uninit bugs

This series has three fixes. The first is a straightforward NULL
pointer dereference that is reachable by creating and destroying some
vfs on a kernel with INET_PSP enabled.

The last two patches deal with nsim_psp_rereg_write(), which is a
debugfs handler that reregisters netdevsim's psp_dev without
aquiescing and disabling tx/rx processing. This was added to enable
some tests in psp.py where a psp device is unregistered while it still
referenced by tcp socket state.

There are two issues with this code:
1. Calls to nsim_psp_uninit() are not properly serialized
2. netdevsim's psp_dev refcount can be released while nsim_do_psp() is
reading from it.
====================

Link: https://patch.msgid.link/20260505-psd-rcu-v1-0-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netdevsim: psp: rcu protect psp_dev reference

There are two issues with the way psp_dev is used in nsim_do_psp():

1. There is no check for IS_ERR() on the peers psp_dev, before
dereferencing.
2. The refcount on this psp_dev can be dropped by
nsim_psp_rereg_write()

To fix this, we can make netdevsim's reference to its psp_dev an rcu
reference, and then nsim_do_psp() can read the fields it needs from an
rcu critical section.

Fixes: f857478d6206 ("netdevsim: a basic test PSP implementation")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260505-psd-rcu-v1-3-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netdevsim: psp: serialize calls to nsim_psp_uninit()

The debugfs write handler, nsim_psp_rereg_write(), can race against
nsim_destroy() and against itself, causing nsim_psp_uninit() to run
more than once concurrently. Two complementary changes serialize all
callers:

1. Delete the psp_rereg debugfs file from nsim_psp_uninit() before
   doing the actual teardown. debugfs_remove() drains any in-flight
   writers and prevents new ones from starting.

2. Add a mutex around the body of nsim_psp_rereg_write() so that two
   concurrent userspace writers cannot both enter the teardown path
   at once.

The teardown work itself is moved into a new __nsim_psp_uninit() that
the rereg handler calls under the mutex, while the public
nsim_psp_uninit() wraps it with the debugfs_remove()/mutex_destroy()
pair so nsim_destroy() doesn't have to know about the psp internals.

Fixes: f857478d6206 ("netdevsim: a basic test PSP implementation")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260505-psd-rcu-v1-2-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netdevsim: psp: only call nsim_psp_uninit() on PFs

VFs go through nsim_init_netdevsim_vf() which never calls
nsim_psp_init(), so ns->psp.dev stays NULL. nsim_psp_uninit() guards
with !IS_ERR(ns->psp.dev), so destroying a VF reaches
psp_dev_unregister(NULL) and dereferences NULL on the first
mutex_lock(&psd->lock):

  BUG: kernel NULL pointer dereference, address: 0000000000000020
  RIP: 0010:mutex_lock+0x1c/0x30
  Call Trace:
   psp_dev_unregister+0x2a/0x1a0
   nsim_psp_uninit+0x1f/0x40 [netdevsim]
   nsim_destroy+0x61/0x1e0 [netdevsim]
   __nsim_dev_port_del+0x47/0x90 [netdevsim]
   nsim_drv_configure_vfs+0xc9/0x130 [netdevsim]
   nsim_bus_dev_numvfs_store+0x79/0xb0 [netdevsim]

Gate nsim_psp_uninit() on nsim_dev_port_is_pf(), matching the pattern
already used for nsim_exit_netdevsim() and the bpf/ipsec/macsec/queue
teardowns.

Reproducer:
  modprobe netdevsim
  echo "10 1" > /sys/bus/netdevsim/new_device
  echo 1 > /sys/bus/netdevsim/devices/netdevsim10/sriov_numvfs
  devlink dev eswitch set netdevsim/netdevsim10 mode switchdev
  echo 0 > /sys/bus/netdevsim/devices/netdevsim10/sriov_numvfs

Fixes: f857478d6206 ("netdevsim: a basic test PSP implementation")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260505-psd-rcu-v1-1-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: fix potential UAF caused by ip6_forward_proxy_check()

ip6_forward_proxy_check() calls pskb_may_pull() which might re-allocate
skb->head.

Reload ipv6_hdr() after the pskb_may_pull() call to avoid using
the freed memory.

Fixes: e21e0b5f19ac ("[IPV6] NDISC: Handle NDP messages to proxied addresses.")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260505130056.2927197-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rtase: Fix flow control configuration

The hardware has two sets of registers controlling TX/RX flow control.
The effective flow control state is determined by the logical OR of
these two sets of bits.

RTASE_FORCE_TXFLOW_EN and RTASE_FORCE_RXFLOW_EN in RTASE_CPLUS_CMD are
the bits used by the driver to control TX/RX flow control according to
the ethtool pause configuration.

RTASE_TXFLOW_EN and RTASE_RXFLOW_EN in RTASE_GPHY_STD_00 are another
set of TX/RX flow control enable bits. Clear them by default so they do
not keep flow control enabled independently of the driver setting.

With the RTASE_GPHY_STD_00 bits cleared, the effective flow control
state is controlled through RTASE_CPLUS_CMD, so the ethtool setting can
take effect correctly.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260505064121.31286-1-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: fix sort order of makefile and config

Recent changes added configs and tests in the wrong spot.

Link: https://lore.kernel.org/20260506170435.34984dfc@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'amd-drm-next-7.2-2026-05-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-7.2-2026-05-06:

amdgpu:
- GFX9 fixes
- Hawaii SMU fixes
- SDMA4 fix
- GART fixes
- Userq fixes
- Finish support for using multiple SDMA queues for TTM operations
- SWSMU updates
- Misc cleanups and fixes
- GC 12.1 updates
- RAS updates
- SMU 15.0.8 updates
- DCN 4.2 updates
- DC type conversion fixes
- Enable DC power module
- Replay/PSR updates
- SMU 13.x updates
- Compute queue quantum MQD updates
- ASPM fix
- GPUVM fixes
- DCE 6 fixes
- Align VKMS with common implementation
- RDNA 4 fix
- DC analog support fixes
- UVD 3 fixes
- TCC harvesting fixes for SI
- GC 11 APU module reload fix
- NBIO 6.3.2 support
- IH 7.1 updates
- DC cursor fixes
- VCN user fence fixes
- JPEG user fence fixes
- DC support for connectors without DDC
- Prefer ROM BAR for default VGA device
- DC bandwidth fixes

amdkfd:
- GPUVM TLB flush fix
- Hotplug fix
- Boundary check fixes
- Misc cleanups and fixes
- SVM fixes
- CRIU fixes

radeon:
- Hawaii SMU fixes
- Misc cleanups and fixes

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260506164726.1733646-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge tag 'ipsec-2026-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2026-05-05

1. Fix an IPv6 encapsulation error path that leaked route references
   when UDPv6 ESP decapsulation resolved to an error route.
   From Yilin Zhu.

2. Fix AH with ESN on async crypto paths by accounting for the extra
   high-order sequence number when reconstructing the temporary
   authentication layout in the completion callbacks.
   From Michael Bomarito.

3. Fix XFRM output so it does not overwrite already-correct inner header
   pointers when a tunnel layer such as VXLAN has already saved them.
   The fix comes with new selftests. From Cosmin Ratiu.

4. Add the missing native payload size entry for XFRM_MSG_MAPPING in the
   compat translation path. From Ruijie Li.

5. Harden __xfrm_state_delete() against repeated or inconsistent unhashing
   of state list nodes by keying the removal on actual list membership and
   using delete-and-init helpers. From Michal Kosiorek.

6. Prevent ESP from decrypting shared splice-backed skb fragments in place
   by marking UDP splice frags as shared and forcing copy-on-write in ESP
   input when needed. From Kuan-Ting Chen.

* tag 'ipsec-2026-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: esp: avoid in-place decrypt on shared skb frags
  xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete
  xfrm: provide message size for XFRM_MSG_MAPPING
  xfrm: Don't clobber inner headers when already set
  tools/selftests: Add a VXLAN+IPsec traffic test
  tools/selftests: Use a sensible timeout value for iperf3 client
  xfrm: ah: account for ESN high bits in async callbacks
  ipv6: xfrm6: release dst on error in xfrm6_rcv_encap()
====================

Link: https://patch.msgid.link/20260505132326.1362733-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selinux: check for simple types

Validate that the target of AVTAB_TYPE rules and file transitions are
simple types and not attributes.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: merge fuzz, dropped parts due to dependencies]
Signed-off-by: Paul Moore <paul@paul-moore.com>

selinux: more strict bounds check

Validate the types used in bounds checks.
Replace the usage of BUG(), to avoid halting the system on malformed
polices.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: merge fuzz, fixed typo identified by Smalley]
Signed-off-by: Paul Moore <paul@paul-moore.com>