git.ipfire.org Git - thirdparty/linux.git/log

tcp: add a drop_reason pointer to tcp_check_req()

We want to add new drop reasons for packets dropped in 3WHS in the
following patches.

tcp_rcv_state_process() has to set reason to TCP_FASTOPEN,
because tcp_check_req() will conditionally overwrite the drop_reason.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250301201424.2046477-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipv4-fib-convert-rtm_newroute-and-rtm_delroute-to-per-netns-rtnl'

Kuniyuki Iwashima says:

====================
ipv4: fib: Convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL.

Patch 1 is misc cleanup.
Patch 2 ~ 8 converts two fib_info hash tables to per-netns.
Patch 9 ~ 12 converts rtnl_lock() to rtnl_net_lcok().

v2: https://lore.kernel.org/20250226192556.21633-1-kuniyu@amazon.com
v1: https://lore.kernel.org/20250225182250.74650-1-kuniyu@amazon.com
====================

Link: https://patch.msgid.link/20250228042328.96624-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL.

We converted fib_info hash tables to per-netns one and now ready to
convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL.

Let's hold rtnl_net_lock() in inet_rtm_newroute() and inet_rtm_delroute().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-13-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config().

fib_valid_key_len() is called in the beginning of fib_table_insert()
or fib_table_delete() to check if the prefix length is valid.

fib_table_insert() and fib_table_delete() are called from 3 paths

  - ip_rt_ioctl()
  - inet_rtm_newroute() / inet_rtm_delroute()
  - fib_magic()

In the first ioctl() path, rtentry_to_fib_config() checks the prefix
length with bad_mask().  Also, fib_magic() always passes the correct
prefix: 32 or ifa->ifa_prefixlen, which is already validated.

Let's move fib_valid_key_len() to the rtnetlink path, rtm_to_fib_config().

While at it, 2 direct returns in rtm_to_fib_config() are changed to
goto to match other places in the same function

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-12-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Hold rtnl_net_lock() in ip_rt_ioctl().

ioctl(SIOCADDRT/SIOCDELRT) calls ip_rt_ioctl() to add/remove a route in
the netns of the specified socket.

Let's hold rtnl_net_lock() there.

Note that rtentry_to_fib_config() can be called without rtnl_net_lock()
if we convert rtentry.dev handling to RCU later.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-11-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Hold rtnl_net_lock() for ip_fib_net_exit().

ip_fib_net_exit() requires RTNL and is called from fib_net_init()
and fib_net_exit_batch().

Let's hold rtnl_net_lock() before ip_fib_net_exit().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-10-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Namespacify fib_info hash tables.

We will convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL.
Then, we need to have per-netns hash tables for struct fib_info.

Let's allocate the hash tables per netns.

fib_info_hash, fib_info_hash_bits, and fib_info_cnt are now moved
to struct netns_ipv4 and accessed with net->ipv4.fib_XXX.

Also, the netns checks are removed from fib_find_info_nh() and
fib_find_info().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-9-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Add fib_info_hash_grow().

When the number of struct fib_info exceeds the hash table size in
fib_create_info(), we try to allocate a new hash table with the
doubled size.

The allocation is done in fib_create_info(), and if successful, each
struct fib_info is moved to the new hash table by fib_info_hash_move().

Let's integrate the allocation and fib_info_hash_move() as
fib_info_hash_grow() to make the following change cleaner.

While at it, fib_info_hash_grow() is placed near other hash-table-specific
functions.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Remove fib_info_hash_size.

We will allocate the fib_info hash tables per netns.

There are 5 global variables for fib_info hash tables:
fib_info_hash, fib_info_laddrhash, fib_info_hash_size,
fib_info_hash_bits, fib_info_cnt.

However, fib_info_laddrhash and fib_info_hash_size can be
easily calculated from fib_info_hash and fib_info_hash_bits.

Let's remove fib_info_hash_size and use (1 << fib_info_hash_bits)
instead.

Now we need not pass the new hash table size to fib_info_hash_move().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Remove fib_info_laddrhash pointer.

We will allocate the fib_info hash tables per netns.

There are 5 global variables for fib_info hash tables:
fib_info_hash, fib_info_laddrhash, fib_info_hash_size,
fib_info_hash_bits, fib_info_cnt.

However, fib_info_laddrhash and fib_info_hash_size can be
easily calculated from fib_info_hash and fib_info_hash_bits.

Let's remove the fib_info_laddrhash pointer and instead use
fib_info_hash + (1 << fib_info_hash_bits).

While at it, fib_info_laddrhash_bucket() is moved near other
hash-table-specific functions.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Make fib_info_hashfn() return struct hlist_head.

Every time fib_info_hashfn() returns a hash value, we fetch
&fib_info_hash[hash].

Let's return the hlist_head pointer from fib_info_hashfn() and
rename it to fib_info_hash_bucket() to match a similar function,
fib_info_laddrhash_bucket().

Note that we need to move the fib_info_hash assignment earlier in
fib_info_hash_move() to use fib_info_hash_bucket() in the for loop.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Allocate fib_info_hash[] during netns initialisation.

We will allocate fib_info_hash[] and fib_info_laddrhash[] for each netns.

Currently, fib_info_hash[] is allocated when the first route is added.

Let's move the first allocation to a new __net_init function.

Note that we must call fib4_semantics_exit() in fib_net_exit_batch()
because ->exit() is called earlier than ->exit_batch().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Allocate fib_info_hash[] and fib_info_laddrhash[] by kvcalloc().

Both fib_info_hash[] and fib_info_laddrhash[] are hash tables for
struct fib_info and are allocated by kvzmalloc() separately.

Let's replace the two kvzmalloc() calls with kvcalloc() to remove
the fib_info_laddrhash pointer later.

Note that fib_info_hash_alloc() allocates a new hash table based on
fib_info_hash_bits because we will remove fib_info_hash_size later.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Use cached net in fib_inetaddr_event().

net is available in fib_inetaddr_event(), let's use it.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: report output format as TAP 13 in Python tests

The Python lib based tests report that they are producing
"KTAP version 1", but really we aren't making use of any
KTAP features, like subtests. Our output is plain TAP.

Report TAP 13 instead of KTAP 1, this is what mptcp tests do,
and what NIPA knows how to parse best. For HW testing we need
precise subtest result tracking.

Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250228180007.83325-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-usb-net-support-for-telit-cinterion-fn990b'

Fabio Porcedda says:

====================
Add usb net support for Telit Cinterion FN990B

Add usb net support for Telit Cinterion FE990B.
Also fix Telit Cinterion FE990A name.

Connection with ModemManager was tested also AT ports.

There is a different patch set for the usb option part.

0x10b0: rmnet + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) +
            tty (diag) + DPL + QDSS (Qualcomm Debug SubSystem) + adb
    T:  Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  7 Spd=480  MxCh= 0
    D:  Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
    P:  Vendor=1bc7 ProdID=10b0 Rev=05.15
    S:  Manufacturer=Telit Cinterion
    S:  Product=FE990
    S:  SerialNumber=28c2595e
    C:  #Ifs= 9 Cfg#= 1 Atr=e0 MxPwr=500mA
    I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
    E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
    I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
    E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
    E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
    E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=88(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
    E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=8a(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
    E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 6 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none)
    E:  Ad=8c(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 7 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=70 Driver=(none)
    E:  Ad=8d(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 8 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
    E:  Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

    0x10b1: MBIM + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) +
            tty (diag) + DPL + QDSS (Qualcomm Debug SubSystem) + adb
    T:  Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  8 Spd=480  MxCh= 0
    D:  Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
    P:  Vendor=1bc7 ProdID=10b1 Rev=05.15
    S:  Manufacturer=Telit Cinterion
    S:  Product=FE990
    S:  SerialNumber=28c2595e
    C:  #Ifs=10 Cfg#= 1 Atr=e0 MxPwr=500mA
    I:  If#= 0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim
    E:  Ad=82(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
    I:  If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
    E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
    E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
    E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
    E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=88(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
    E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=8a(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
    E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 7 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none)
    E:  Ad=8c(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 8 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=70 Driver=(none)
    E:  Ad=8d(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 9 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
    E:  Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

    0x10b2: RNDIS + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) +
            tty (diag) + DPL + QDSS (Qualcomm Debug SubSystem) + adb
    T:  Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  9 Spd=480  MxCh= 0
    D:  Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
    P:  Vendor=1bc7 ProdID=10b2 Rev=05.15
    S:  Manufacturer=Telit Cinterion
    S:  Product=FE990
    S:  SerialNumber=28c2595e
    C:  #Ifs=10 Cfg#= 1 Atr=e0 MxPwr=500mA
    I:  If#= 0 Alt= 0 #EPs= 1 Cls=ef(misc ) Sub=04 Prot=01 Driver=rndis_host
    E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
    I:  If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host
    E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
    E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
    E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
    E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=88(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
    E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=8a(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
    E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 7 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none)
    E:  Ad=8c(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 8 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=70 Driver=(none)
    E:  Ad=8d(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 9 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
    E:  Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

    0x10d3: ECM + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) +
            tty (diag) + DPL + QDSS (Qualcomm Debug SubSystem) + adb
    T:  Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 11 Spd=480  MxCh= 0
    D:  Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
    P:  Vendor=1bc7 ProdID=10b3 Rev=05.15
    S:  Manufacturer=Telit Cinterion
    S:  Product=FE990
    S:  SerialNumber=28c2595e
    C:  #Ifs=10 Cfg#= 1 Atr=e0 MxPwr=500mA
    I:  If#= 0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=06 Prot=00 Driver=cdc_ether
    E:  Ad=82(I) Atr=03(Int.) MxPS=  16 Ivl=32ms
    I:  If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether
    E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
    E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
    E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
    E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=88(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
    E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=8a(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    I:  If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
    E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 7 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none)
    E:  Ad=8c(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 8 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=70 Driver=(none)
    E:  Ad=8d(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:  If#= 9 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
    E:  Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
====================

Link: https://patch.msgid.link/20250227112441.3653819-1-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: cdc_mbim: fix Telit Cinterion FE990A name

The correct name for FE990 is FE990A so use it in order to avoid
confusion with FE990B.

Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Link: https://patch.msgid.link/20250227112441.3653819-4-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: qmi_wwan: fix Telit Cinterion FE990A name

The correct name for FE990 is FE990A so use it in order to avoid
confusion with FE990B.

Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Link: https://patch.msgid.link/20250227112441.3653819-3-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: qmi_wwan: add Telit Cinterion FE990B composition

Add the following Telit Cinterion FE990B composition:
0x10b0: rmnet + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) +
        tty (diag) + DPL + QDSS (Qualcomm Debug SubSystem) + adb

usb-devices:
T:  Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  7 Spd=480  MxCh= 0
D:  Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=10b0 Rev=05.15
S:  Manufacturer=Telit Cinterion
S:  Product=FE990
S:  SerialNumber=28c2595e
C:  #Ifs= 9 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=88(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8a(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 6 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none)
E:  Ad=8c(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 7 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=70 Driver=(none)
E:  Ad=8d(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 8 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
E:  Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

Cc: stable@vger.kernel.org
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Link: https://patch.msgid.link/20250227112441.3653819-2-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'inet-ping-remove-extra-skb_clone-consume_skb'

Eric Dumazet says:

====================
inet: ping: remove extra skb_clone()/consume_skb()

First patch in the series moves ICMP_EXT_ECHOREPLY handling in icmp_rcv()
to prepare the second patch.

The second patch removes one skb_clone()/consume_skb() pair
when processing ICMP_EXT_REPLY packets. Some people
use hundreds of "ping -fq ..." to stress hosts :)
====================

Link: https://patch.msgid.link/20250226183437.1457318-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

inet: ping: avoid skb_clone() dance in ping_rcv()

ping_rcv() callers currently call skb_free() or consume_skb(),
forcing ping_rcv() to clone the skb.

After this patch ping_rcv() is now 'consuming' the original skb,
either moving to a socket receive queue, or dropping it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250226183437.1457318-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: icmp: do not process ICMP_EXT_ECHOREPLY for broadcast/multicast addresses

There is no point processing ICMP_EXT_ECHOREPLY for routes
which would drop ICMP_ECHOREPLY (RFC 1122 3.2.2.6, 3.2.2.8)

This seems an oversight of the initial implementation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250226183437.1457318-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-cleanup-transmit-clock-setting'

Russell King says:

====================
net: stmmac: cleanup transmit clock setting

A lot of stmmac platform code which sets the transmit clock is very
similar - they decode the speed to the clock rate (125, 25 or 2.5 MHz)
and then set a clock to that rate.

The DWMAC core appears to have a clock input for the transmit section
called clk_tx_i which requires this rate.

This series moves the code which sets this clock into the core stmmac
code.

Patch 1 adds a hook that platforms can use to configure the clock rate.
Patch 2 adds a generic implementation.
The remainder of the patches convert the glue code for various platforms
to use this new infrastructure.
====================

Link: https://patch.msgid.link/Z8AtX-wyPal1auVO@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: thead: switch to use set_clk_tx_rate() hook

Switch from using the fix_mac_speed() hook to set_clk_tx_rate() to
manage the transmit clock.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tna14-0052tT-S4@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: meson: switch to use set_clk_tx_rate() hook

Switch from using the fix_mac_speed() hook to set_clk_tx_rate() to
manage the transmit clock.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tna0z-0052tN-O1@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: ipq806x: switch to use set_clk_tx_rate() hook

Switch from using the fix_mac_speed() hook to set_clk_tx_rate() to
manage the transmit clock.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tna0u-0052tH-KQ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: rk: switch to use set_clk_tx_rate() hook

Switch from using the fix_mac_speed() hook to set_clk_tx_rate() to
manage the transmit clock.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tna0p-0052t8-Gn@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: imx: use generic stmmac_set_clk_tx_rate()

Convert non-i.MX93 users to use the generic stmmac_set_clk_tx_rate() to
configure the MAC transmit clock rate.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tna0k-0052t2-Cc@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: intel: use generic stmmac_set_clk_tx_rate()

Use the generic stmmac_set_clk_tx_rate() to configure the MAC transmit
clock.

Note that given the current unpatched driver structure,
plat_dat->fix_mac_speed will always be populated with
kmb_eth_fix_mac_speed(), even when no clock is present. We preserve
this behaviour in this patch by always initialising plat_dat->clk_tx_i
and plat_dat->set_clk_tx_rate.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tna0f-0052sw-8r@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: s32: use generic stmmac_set_clk_tx_rate()

Use the generic stmmac_set_clk_tx_rate() to configure the MAC transmit
clock.

Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tna0a-0052sq-59@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: starfive: use generic stmmac_set_clk_tx_rate()

Use the generic stmmac_set_clk_tx_rate() to configure the MAC transmit
clock.

Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tna0V-0052sk-1L@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwc-qos: use generic stmmac_set_clk_tx_rate()

Use the generic stmmac_set_clk_tx_rate() to configure the MAC transmit
clock.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tna0P-0052se-Tv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: provide generic implementation for set_clk_tx_rate method

Provide a generic implementation for the set_clk_tx_rate method
introduced by the previous patch, which is capable of configuring the
MAC transmit clock for 10M, 100M and 1000M speeds for at least MII,
GMII, RGMII and RMII interface modes.

Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tna0K-0052sY-QF@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: provide set_clk_tx_rate() hook

Several stmmac sub-drivers which support RGMII follow the same pattern.
They calculate the transmit clock rate, and then call clk_set_rate().

Analysis of several implementation documents suggests that the platform
is responsible for providing the transmit clock to the DWMAC core's
clk_tx_i. The expected rates are:

10Mbps 100Mbps 1Gbps
MII 2.5MHz 25MHz
RMII 2.5MHz 25MHz
GMII 125MHz
RGMI 2.5MHz 25MHz 125MHz

It seems some platforms require this clock to be manually configured,
but there are outputs from the MAC core that indicate the speed, so a
platform may use these to automatically configure the clock. Thus, we
can't just provide one solution to configure this clock rate.

Moreover, the clock may need to be derived from one of several sources
depending on the interface mode.

Provide a platform hook that is passed the transmit clock, interface
mode and speed.

Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tna0F-0052sS-Lr@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-health-syndrome'

Tariq Toukan says:

====================
mlx5: Trust lockdown health syndrome

This series introduces a new error type in the health syndrome,
specifically for trust lock-down. Additionally, it exposes the CRR bit
in the health buffer, which, when set, indicates that the error cannot
be recovered without a process involving a cold reset. We add The CRR
bit value to the health buffer info log and update it to be logged on
any syndrome.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Add trust lockdown error to health syndrome print function

Add the new health syndrome value to hsynd_str() function
to indicate that the device got a trust lockdown fault.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Expose crr in health buffer

Expose crr bit in struct health buffer. When set, it indicates that
the error cannot be recovered without flow involving a cold reset.
Add its value to the health buffer info log.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Log health buffer data on any syndrome

Currently health buffer data is logged either when FW fatal error
detected or miss counter reached max misses threshold.

Log health buffer whenever new health syndrome is detected.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Avoid report two health errors on same syndrome

In case health counter has not increased for few polling intervals, miss
counter will reach max misses threshold and health report will be
triggered for FW health reporter. In case syndrome found on same health
poll another health report will be triggered.

Avoid two health reports on same syndrome by marking this syndrome as
already known.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'selftests-net-deflake-gro-tests-and-fix-return-value-and-output'

Kevin Krakauer says:

====================
selftests/net: deflake GRO tests and fix return value and output

The GRO selftests can flake and have some confusing behavior. These
changes make the output and return value of GRO behave as expected, then
deflake the tests.

v1: https://lore.kernel.org/20250218164555.1955400-1-krakauer@google.com
====================

Link: https://patch.msgid.link/20250226192725.621969-1-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: deflake GRO tests

GRO tests are timing dependent and can easily flake. This is partially
mitigated in gro.sh by giving each subtest 3 chances to pass. However,
this still flakes on some machines. Reduce the flakiness by:

- Bumping retries to 6.
- Setting napi_defer_hard_irqs to 1 to reduce the chance that GRO is
flushed prematurely. This also lets us reduce the gro_flush_timeout
from 1ms to 100us.

Tested: Ran `gro.sh -t large` 1000 times. There were no failures with
this change. Ran inside strace to increase flakiness.

Signed-off-by: Kevin Krakauer <krakauer@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250226192725.621969-4-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: only print passing message in GRO tests when tests pass

gro.c:main no longer erroneously claims a test passes when running as a
sender.

Tested: Ran `gro.sh -t large` to verify the sender no longer prints a
status.

Signed-off-by: Kevin Krakauer <krakauer@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250226192725.621969-3-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: have `gro.sh -t` return a correct exit code

Modify gro.sh to return a useful exit code when the -t flag is used. It
formerly returned 0 no matter what.

Tested: Ran `gro.sh -t large` and verified that test failures return 1.
Signed-off-by: Kevin Krakauer <krakauer@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250226192725.621969-2-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-missing-netlink-error-message-macros-to-coccinelle-test'

Gal Pressman says:

====================
Add missing netlink error message macros to coccinelle test

The newline_in_nl_msg.cocci test is missing some variants in the list of
checked macros, add them and fix all reported issues.
====================

Link: https://patch.msgid.link/20250226093904.6632-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: dpll: Remove newline at the end of a netlink error message

Netlink error messages should not have a newline at the end of the
string.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250226093904.6632-6-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sched: Remove newline at the end of a netlink error message

Netlink error messages should not have a newline at the end of the
string.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250226093904.6632-5-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: Remove newline at the end of a netlink error message

Netlink error messages should not have a newline at the end of the
string.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250226093904.6632-4-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Remove newline at the end of a netlink error message

Netlink error messages should not have a newline at the end of the
string.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250226093904.6632-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

coccinelle: Add missing (GE)NL_SET_ERR_MSG_* to strings ending with newline test

Add missing (GE)NL_SET_ERR_MSG_*() variants to the list of macros
checked for strings ending with a newline.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250226093904.6632-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net-sysfs: remove unused initial ret values

In some net-sysfs functions the ret value is initialized but never used
as it is always overridden. Remove those.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/20250226174644.311136-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

geneve, specs: Add port range to rt_link specification

Add the port range to rt_link, example:

  # tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/rt_link.yaml \
    --do getlink --json '{"ifname": "geneve1"}' --output-json | jq
  {
    "ifname": "geneve1",
    [...]
    "linkinfo": {
      "kind": "geneve",
      "data": {
        "id": 1000,
        "remote": "147.28.227.100",
        "udp-csum": 0,
        "ttl": 0,
        "tos": 0,
        "label": 0,
        "df": 0,
        "port": 49431,
        "udp-zero-csum6-rx": 1,
        "ttl-inherit": 0,
        "port-range": {
          "low": 4000,
          "high": 5000
        }
      }
    },
    [...]
  }

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20250226182030.89440-2-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

geneve: Allow users to specify source port range

Recently, in case of Cilium, we run into users on Azure who require to use
tunneling for east/west traffic due to hitting IPAM API limits for Kubernetes
Pods if they would have gone with publicly routable IPs for Pods. In case
of tunneling, Cilium supports the option of vxlan or geneve. In order to
RSS spread flows among remote CPUs both derive a source port hash via
udp_flow_src_port() which takes the inner packet's skb->hash into account.
For clusters with many nodes, this can then hit a new limitation [0]: Today,
the Azure networking stack supports 1M total flows (500k inbound and 500k
outbound) for a VM. [...] Once this limit is hit, other connections are
dropped. [...] Each flow is distinguished by a 5-tuple (protocol, local IP
address, remote IP address, local port, and remote port) information. [...]

For vxlan and geneve, this can create a massive amount of UDP flows which
then run into the limits if stale flows are not evicted fast enough. One
option to mitigate this for vxlan is to narrow the source port range via
IFLA_VXLAN_PORT_RANGE while still being able to benefit from RSS. However,
geneve currently does not have this option and it spreads traffic across
the full source port range of [1, USHRT_MAX]. To overcome this limitation
also for geneve, add an equivalent IFLA_GENEVE_PORT_RANGE setting for users.

Note that struct geneve_config before/after still remains at 2 cachelines
on x86-64. The low/high members of struct ifla_geneve_port_range (which is
uapi exposed) are of type __be16. While they would be perfectly fine to be
of __u16 type, the consensus was that it would be good to be consistent
with the existing struct ifla_vxlan_port_range from a uapi consumer PoV.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://learn.microsoft.com/en-us/azure/virtual-network/virtual-machine-network-throughput
Link: https://patch.msgid.link/20250226182030.89440-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Avoid a hundred -Wflex-array-member-not-at-end warnings

-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

So, in this particular case, we create a new `struct mlx5e_umr_wqe_hdr`
to enclose the header part of flexible structure `struct mlx5e_umr_wqe`.
This is, all the members except the flexible arrays `inline_mtts`,
`inline_klms` and `inline_ksms` in the anonymous union. We then replace
the header part with `struct mlx5e_umr_wqe_hdr hdr;` in `struct
mlx5e_umr_wqe`, and change the type of the object currently causing
trouble `umr_wqe` from `struct mlx5e_umr_wqe` to `struct
mlx5e_umr_wqe_hdr` --this last bit gets rid of the flex-array-in-the-middle
part and avoid the warnings.

Also, no new members should be added to `struct mlx5e_umr_wqe`, instead
any new members must be included in the header structure `struct
mlx5e_umr_wqe_hdr`. To enforce this, we use `static_assert()`, ensuring
that the memory layout of both the flexible structure and the newly
created header struct remain consistent.

The next step is to refactor the rest of the related code accordingly,
which means adding a bunch of `hdr.` wherever needed.

Lastly, we use `container_of()` whenever we need to retrieve a pointer
to the flexible structure `struct mlx5e_umr_wqe`.

So, with these changes, fix 125 of the following warnings:

drivers/net/ethernet/mellanox/mlx5/core/en.h:664:48: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/Z76HzPW1dFTLOSSy@kspp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netkit: Remove double invocation to clear ipvs property flag

With ipvs_reset() now done unconditionally in skb_scrub_packet()
we would then call the former twice netkit_prep_forward(). Thus
remove the now unnecessary explicit call.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250225212927.69271-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: qed: make 'qed_ll2_ops_pass' as __maybe_unused

gcc warns about unused const variables even in header files when
building with W=1:

    In file included from include/linux/qed/qed_rdma_if.h:14,
                     from drivers/net/ethernet/qlogic/qed/qed_rdma.h:16,
                     from drivers/net/ethernet/qlogic/qed/qed_cxt.c:23:
    include/linux/qed/qed_ll2_if.h:270:33: error: 'qed_ll2_ops_pass' defined but not used [-Werror=unused-const-variable=]
      270 | static const struct qed_ll2_ops qed_ll2_ops_pass = {

This one is intentional, so mark it as __maybe_unused to it can be
included from a file that doesn't use this variable.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Link: https://patch.msgid.link/20250225200926.4057723-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.14-rc5).

Conflicts:

drivers/net/ethernet/cadence/macb_main.c
  fa52f15c745c ("net: cadence: macb: Synchronize stats calculations")
  75696dd0fd72 ("net: cadence: macb: Convert to get_stats64")
https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/intel/ice/ice_sriov.c
  79990cf5e7ad ("ice: Fix deinitializing VF in error path")
  a203163274a4 ("ice: simplify VF MSI-X managing")

net/ipv4/tcp.c
  18912c520674 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace")
  297d389e9e5b ("net: prefix devmem specific helpers")

net/mptcp/subflow.c
  8668860b0ad3 ("mptcp: reset when MPTCP opts are dropped after join")
  c3349a22c200 ("mptcp: consolidate subflow cleanup")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth.

  We didn't get netfilter or wireless PRs this week, so next week's PR
  is probably going to be bigger. A healthy dose of fixes for bugs
  introduced in the current release nonetheless.

  Current release - regressions:

   - Bluetooth: always allow SCO packets for user channel

   - af_unix: fix memory leak in unix_dgram_sendmsg()

   - rxrpc:
       - remove redundant peer->mtu_lock causing lockdep splats
       - fix spinlock flavor issues with the peer record hash

   - eth: iavf: fix circular lock dependency with netdev_lock

   - net: use rtnl_net_dev_lock() in
     register_netdevice_notifier_dev_net() RDMA driver register notifier
     after the device

  Current release - new code bugs:

   - ethtool: fix ioctl confusing drivers about desired HDS user config

   - eth: ixgbe: fix media cage present detection for E610 device

  Previous releases - regressions:

   - loopback: avoid sending IP packets without an Ethernet header

   - mptcp: reset connection when MPTCP opts are dropped after join

  Previous releases - always broken:

   - net: better track kernel sockets lifetime

   - ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels

   - phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT

   - eth: enetc: number of error handling fixes

   - dsa: rtl8366rb: reshuffle the code to fix config / build issue with
     LED support"

* tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
  net: ti: icss-iep: Reject perout generation request
  idpf: fix checksums set in idpf_rx_rsc()
  selftests: drv-net: Check if combined-count exists
  net: ipv6: fix dst ref loop on input in rpl lwt
  net: ipv6: fix dst ref loop on input in seg6 lwt
  usbnet: gl620a: fix endpoint checking in genelink_bind()
  net/mlx5: IRQ, Fix null string in debug print
  net/mlx5: Restore missing trace event when enabling vport QoS
  net/mlx5: Fix vport QoS cleanup on error
  net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination.
  af_unix: Fix memory leak in unix_dgram_sendmsg()
  net: Handle napi_schedule() calls from non-interrupt
  net: Clear old fragment checksum value in napi_reuse_skb
  gve: unlink old napi when stopping a queue using queue API
  net: Use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net().
  tcp: Defer ts_recent changes until req is owned
  net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs()
  net: enetc: remove the mm_lock from the ENETC v4 driver
  net: enetc: add missing enetc4_link_deinit()
  net: enetc: update UDP checksum when updating originTimestamp field
  ...

Merge tag 'sound-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of fixes. The only slightly large change is for ASoC
  Cirrus codec, but that's still in a normal range. All the rest are
  small device-specific fixes and should be fairly safe to take"

* tag 'sound-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Fix microphone regression on ASUS N705UD
  ALSA: hda/realtek: Fix wrong mic setup for ASUS VivoBook 15
  ASoC: cs35l56: Prevent races when soft-resetting using SPI control
  firmware: cs_dsp: Remove async regmap writes
  ASoC: Intel: sof_sdw: warn both sdw and pch dmic are used
  ASoC: SOF: Intel: don't check number of sdw links when set dmic_fixup
  ASoC: dapm-graph: set fill colour of turned on nodes
  ASoC: fsl: Rename stream name of SAI DAI driver
  ASoC: es8328: fix route from DAC to output
  ALSA: usb-audio: Re-add sample rate quirk for Pioneer DJM-900NXS2
  ASoC: tas2764: Set the SDOUT polarity correctly
  ASoC: tas2764: Fix power control mask
  ALSA: usb-audio: Avoid dropping MIDI events at closing multiple ports
  ASoC: tas2770: Fix volume scale

net: ti: icss-iep: Reject perout generation request

IEP driver supports both perout and pps signal generation
but perout feature is faulty with half-cooked support
due to some missing configuration. Remove perout
support from the driver and reject perout requests with
"not supported" error code.

Fixes: c1e0230eeaab2 ("net: ti: icss-iep: Add IEP driver")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250227092441.1848419-1-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

idpf: fix checksums set in idpf_rx_rsc()

idpf_rx_rsc() uses skb_transport_offset(skb) while the transport header
is not set yet.

This triggers the following warning for CONFIG_DEBUG_NET=y builds.

DEBUG_NET_WARN_ON_ONCE(!skb_transport_header_was_set(skb))

[   69.261620] WARNING: CPU: 7 PID: 0 at ./include/linux/skbuff.h:3020 idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf
[   69.261629] Modules linked in: vfat fat dummy bridge intel_uncore_frequency_tpmi intel_uncore_frequency_common intel_vsec_tpmi idpf intel_vsec cdc_ncm cdc_eem cdc_ether usbnet mii xhci_pci xhci_hcd ehci_pci ehci_hcd libeth
[   69.261644] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Tainted: G S      W          6.14.0-smp-DEV #1697
[   69.261648] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
[   69.261650] RIP: 0010:idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf
[   69.261677] ? __warn (kernel/panic.c:242 kernel/panic.c:748)
[   69.261682] ? idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf
[   69.261687] ? report_bug (lib/bug.c:?)
[   69.261690] ? handle_bug (arch/x86/kernel/traps.c:285)
[   69.261694] ? exc_invalid_op (arch/x86/kernel/traps.c:309)
[   69.261697] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[   69.261700] ? __pfx_idpf_vport_splitq_napi_poll (drivers/net/ethernet/intel/idpf/idpf_txrx.c:4011) idpf
[   69.261704] ? idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf
[   69.261708] ? idpf_vport_splitq_napi_poll (drivers/net/ethernet/intel/idpf/idpf_txrx.c:3072) idpf
[   69.261712] __napi_poll (net/core/dev.c:7194)
[   69.261716] net_rx_action (net/core/dev.c:7265)
[   69.261718] ? __qdisc_run (net/sched/sch_generic.c:293)
[   69.261721] ? sched_clock (arch/x86/include/asm/preempt.h:84 arch/x86/kernel/tsc.c:288)
[   69.261726] handle_softirqs (kernel/softirq.c:561)

Fixes: 3a8845af66edb ("idpf: add RX splitq napi poll support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alan Brady <alan.brady@intel.com>
Cc: Joshua Hay <joshua.a.hay@intel.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20250226221253.1927782-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: Check if combined-count exists

Some drivers, like tg3, do not set combined-count:

$ ethtool -l enp4s0f1
Channel parameters for enp4s0f1:
Pre-set maximums:
RX: 4
TX: 4
Other: n/a
Combined: n/a
Current hardware settings:
RX: 4
TX: 1
Other: n/a
Combined: n/a

In the case where combined-count is not set, the ethtool netlink code
in the kernel elides the value and the code in the test:

  netnl.channels_get(...)

With a tg3 device, the returned dictionary looks like:

{'header': {'dev-index': 3, 'dev-name': 'enp4s0f1'},
'rx-max': 4,
'rx-count': 4,
'tx-max': 4,
'tx-count': 1}

Note that the key 'combined-count' is missing. As a result of this
missing key the test raises an exception:

# Exception|     if channels['combined-count'] == 0:
# Exception|        ~~~~~~~~^^^^^^^^^^^^^^^^^^
# Exception| KeyError: 'combined-count'

Change the test to check if 'combined-count' is a key in the dictionary
first and if not assume that this means the driver has separate RX and
TX queues.

With this change, the test now passes successfully on tg3 and mlx5
(which does have a 'combined-count').

Fixes: 1cf270424218 ("net: selftest: add test for netdev netlink queue-get API")
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250226181957.212189-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only

mtk_foe_entry_set_vlan() in mtk_ppe.c already supports double vlan
tagging, but mtk_flow_offload_replace() in mtk_ppe_offload.c only allows
for 1 vlan tag, optionally in combination with pppoe and dsa tags.

However, mtk_foe_entry_set_vlan() only allows for setting the vlan id.
The protocol cannot be set, it is always ETH_P_8021Q, for inner and outer
tag. This patch adds QinQ support to mtk_flow_offload_replace(), only in
the case that both inner and outer tags are ETH_P_8021Q.

Only PPPoE-in-Q (as before) and Q-in-Q are allowed. A combination
of PPPoE and Q-in-Q is not allowed.

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Link: https://patch.msgid.link/20250225201509.20843-1-ericwouds@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'fixes-for-seg6-and-rpl-lwtunnels-on-input'

Justin Iurman says:

====================
fixes for seg6 and rpl lwtunnels on input

As a follow up to commit 92191dd10730 ("net: ipv6: fix dst ref loops in
rpl, seg6 and ioam6 lwtunnels"), we also need a conditional dst cache on
input for seg6_iptunnel and rpl_iptunnel to prevent dst ref loops (i.e.,
if the packet destination did not change, we may end up recording a
reference to the lwtunnel in its own cache, and the lwtunnel state will
never be freed). This series provides a fix to respectively prevent a
dst ref loop on input in seg6_iptunnel and rpl_iptunnel.

v2:
- https://lore.kernel.org/netdev/20250211221624.18435-1-justin.iurman@uliege.be/
v1:
- https://lore.kernel.org/netdev/20250209193840.20509-1-justin.iurman@uliege.be/
====================

Link: https://patch.msgid.link/20250225175139.25239-1-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ipv6: fix dst ref loop on input in rpl lwt

Prevent a dst ref loop on input in rpl_iptunnel.

Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ipv6: fix dst ref loop on input in seg6 lwt

Prevent a dst ref loop on input in seg6_iptunnel.

Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
Cc: David Lebrun <dlebrun@google.com>
Cc: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'bpf-cpumap-enable-gro-for-xdp_pass-frames'

Alexander Lobakin says:

====================
bpf: cpumap: enable GRO for XDP_PASS frames

Several months ago, I had been looking through my old XDP hints tree[0]
to check whether some patches not directly related to hints can be sent
standalone. Roughly at the same time, Daniel appeared and asked[1] about
GRO for cpumap from that tree.

Currently, cpumap uses its own kthread which processes cpumap-redirected
frames by batches of 8, without any weighting (but with rescheduling
points). The resulting skbs get passed to the stack via
netif_receive_skb_list(), which means no GRO happens.
Even though we can't currently pass checksum status from the drivers,
in many cases GRO performs better than the listified Rx without the
aggregation, confirmed by tests.

In order to enable GRO in cpumap, we need to do the following:

* patches 1-2: decouple the GRO struct from the NAPI struct and allow
  using it out of a NAPI entity within the kernel core code;
* patch 3: switch cpumap from netif_receive_skb_list() to
  gro_receive_skb().

Additional improvements:

* patch 4: optimize XDP_PASS in cpumap by using arrays instead of linked
  lists;
* patch 5-6: introduce and use function do get skbs from the NAPI percpu
  caches by bulks, not one at a time;
* patch 7-8: use that function in veth as well and remove the one that
  was now superseded by it.

My trafficgen UDP GRO tests, small frame sizes:

                GRO off    GRO on
baseline        2.7        N/A       Mpps
patch 3         2.3        4         Mpps
patch 8         2.4        4.7       Mpps

1...3 diff      -17        +48       %
1...8 diff      -11        +74       %

Daniel reported from +14%[2] to +18%[3] of throughput in neper's TCP RR
tests. On my system however, the same test gave me up to +100%.

Note that there's a series from Lorenzo[4] which achieves the same, but
in a different way. During the discussions, the approach using a
standalone GRO instance was preferred over the threaded NAPI.

[0] https://github.com/alobakin/linux/tree/xdp_hints
[1] https://lore.kernel.org/bpf/cadda351-6e93-4568-ba26-21a760bf9a57@app.fastmail.com
[2] https://lore.kernel.org/bpf/merfatcdvwpx2lj4j2pahhwp4vihstpidws3jwljwazhh76xkd@t5vsh4gvk4mh
[3] https://lore.kernel.org/bpf/yzda66wro5twmzpmjoxvy4si5zvkehlmgtpi6brheek3sj73tj@o7kd6nurr3o6
[4] https://lore.kernel.org/bpf/20241130-cpumap-gro-v1-0-c1180b1b5758@kernel.org
====================

Link: https://patch.msgid.link/20250225171751.2268401-1-aleksander.lobakin@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

xdp: remove xdp_alloc_skb_bulk()

The only user was veth, which now uses napi_skb_cache_get_bulk().
It's now preferred over a direct allocation and is exported as
well, so remove this one.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

veth: use napi_skb_cache_get_bulk() instead of xdp_alloc_skb_bulk()

Now that we can bulk-allocate skbs from the NAPI cache, use that
function to do that in veth as well instead of direct allocation from
the kmem caches. veth uses NAPI and GRO, so this is both context-safe
and beneficial.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bpf: cpumap: switch to napi_skb_cache_get_bulk()

Now that cpumap uses GRO, which drops unused skb heads to the NAPI
cache, use napi_skb_cache_get_bulk() to try to reuse cached entries
and lower MM layer pressure. Always disable the BH before checking and
running the cpumap-pinned XDP prog and don't re-enable it in between
that and allocating an skb bulk, as we can access the NAPI caches only
from the BH context.
The better GRO aggregates packets, the less new skbs will be allocated.
If an aggregated skb contains 16 frags, this means 15 skbs were returned
to the cache, so next 15 skbs will be built without allocating anything.

The same trafficgen UDP GRO test now shows:

                GRO off   GRO on
threaded GRO    2.3       4         Mpps
thr bulk GRO    2.4       4.7       Mpps

diff            +4        +17       %

Comparing to the baseline cpumap:

baseline        2.7       N/A       Mpps
thr bulk GRO    2.4       4.7       Mpps
diff            -11       +74       %

Tested-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: skbuff: introduce napi_skb_cache_get_bulk()

Add a function to get an array of skbs from the NAPI percpu cache.
It's supposed to be a drop-in replacement for
kmem_cache_alloc_bulk(skbuff_head_cache, GFP_ATOMIC) and
xdp_alloc_skb_bulk(GFP_ATOMIC). The difference (apart from the
requirement to call it only from the BH) is that it tries to use
as many NAPI cache entries for skbs as possible, and allocate new
ones only if needed.

The logic is as follows:

* there is enough skbs in the cache: decache them and return to the
  caller;
* not enough: try refilling the cache first. If there is now enough
  skbs, return;
* still not enough: try allocating skbs directly to the output array
  with %GFP_ZERO, maybe we'll be able to get some. If there's now
  enough, return;
* still not enough: return as many as we were able to obtain.

Most of times, if called from the NAPI polling loop, the first one will
be true, sometimes (rarely) the second one. The third and the fourth --
only under heavy memory pressure.
It can save significant amounts of CPU cycles if there are GRO cycles
and/or Tx completion cycles (anything that descends to
napi_skb_cache_put()) happening on this CPU.

Tested-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bpf: cpumap: reuse skb array instead of a linked list to chain skbs

cpumap still uses linked lists to store a list of skbs to pass to the
stack. Now that we don't use listified Rx in favor of
napi_gro_receive(), linked list is now an unneeded overhead.
Inside the polling loop, we already have an array of skbs. Let's reuse
it for skbs passed to cpumap (generic XDP) and keep there in case of
XDP_PASS when a program is installed to the map itself. Don't list
regular xdp_frames after converting them to skbs as well; store them
in the mentioned array (but *before* generic skbs as the latters have
lower priority) and call gro_receive_skb() for each array element after
they're done.

Tested-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bpf: cpumap: switch to GRO from netif_receive_skb_list()

cpumap has its own BH context based on kthread. It has a sane batch
size of 8 frames per one cycle.
GRO can be used here on its own. Adjust cpumap calls to the upper stack
to use GRO API instead of netif_receive_skb_list() which processes skbs
by batches, but doesn't involve GRO layer at all.
In plenty of tests, GRO performs better than listed receiving even
given that it has to calculate full frame checksums on the CPU.
As GRO passes the skbs to the upper stack in the batches of
@gro_normal_batch, i.e. 8 by default, and skb->dev points to the
device where the frame comes from, it is enough to disable GRO
netdev feature on it to completely restore the original behaviour:
untouched frames will be being bulked and passed to the upper stack
by 8, as it was with netif_receive_skb_list().

Tested-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: gro: expose GRO init/cleanup to use outside of NAPI

Make GRO init and cleanup functions global to be able to use GRO
without a NAPI instance. Taking into account already global gro_flush(),
it's now fully usable standalone.
New functions are not exported, since they're not supposed to be used
outside of the kernel core code.

Tested-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: gro: decouple GRO from the NAPI layer

In fact, these two are not tied closely to each other. The only
requirements to GRO are to use it in the BH context and have some
sane limits on the packet batches, e.g. NAPI has a limit of its
budget (64/8/etc.).
Move purely GRO fields into a new structure, &gro_node. Embed it
into &napi_struct and adjust all the references.
gro_node::cached_napi_id is effectively the same as
napi_struct::napi_id, but to be used on GRO hotpath to mark skbs.
napi_struct::napi_id is now a fully control path field.

Three Ethernet drivers use napi_gro_flush() not really meant to be
exported, so move it to <net/gro.h> and add that include there.
napi_gro_receive() is used in more than 100 drivers, keep it
in <linux/netdevice.h>.
This does not make GRO ready to use outside of the NAPI context
yet.

Tested-by: Daniel Xu <dxu@dxuuu.xyz>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: xgene-v2: remove incorrect ACPI_PTR annotation

Building with W=1 shows a warning about xge_acpi_match being unused when
CONFIG_ACPI is disabled:

drivers/net/ethernet/apm/xgene-v2/main.c:723:36: error: unused variable 'xge_acpi_match' [-Werror,-Wunused-const-variable]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250225163341.4168238-2-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: hisilicon: hns_mdio: remove incorrect ACPI_PTR annotation

Building with W=1 shows a warning about hns_mdio_acpi_match being unused when
CONFIG_ACPI is disabled:

drivers/net/ethernet/hisilicon/hns_mdio.c:631:36: error: unused variable 'hns_mdio_acpi_match' [-Werror,-Wunused-const-variable]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250225163341.4168238-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

pktgen: avoid unused-const-variable warning

When extra warnings are enable, there are configurations that build
pktgen without CONFIG_XFRM, which leaves a static const variable unused:

net/core/pktgen.c:213:1: error: unused variable 'F_IPSEC' [-Werror,-Wunused-const-variable]
  213 | PKT_FLAGS
      | ^~~~~~~~~
net/core/pktgen.c:197:2: note: expanded from macro 'PKT_FLAGS'
  197 |         pf(IPSEC)               /* ipsec on for flows */                \
      |         ^~~~~~~~~

This could be marked as __maybe_unused, or by making the one use visible
to the compiler by slightly rearranging the #ifdef blocks. The second
variant looks slightly nicer here, so use that.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Peter Seiderer <ps.report@gmx.net>
Link: https://patch.msgid.link/20250225085722.469868-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

usbnet: gl620a: fix endpoint checking in genelink_bind()

Syzbot reports [1] a warning in usb_submit_urb() triggered by
inconsistencies between expected and actually present endpoints
in gl620a driver. Since genelink_bind() does not properly
verify whether specified eps are in fact provided by the device,
in this case, an artificially manufactured one, one may get a
mismatch.

Fix the issue by resorting to a usbnet utility function
usbnet_get_endpoints(), usually reserved for this very problem.
Check for endpoints and return early before proceeding further if
any are missing.

[1] Syzbot report:
usb 5-1: Manufacturer: syz
usb 5-1: SerialNumber: syz
usb 5-1: config 0 descriptor??
gl620a 5-1:0.23 usb0: register 'gl620a' at usb-dummy_hcd.0-1, ...
------------[ cut here ]------------
usb 5-1: BOGUS urb xfer, pipe 3 != type 1
WARNING: CPU: 2 PID: 1841 at drivers/usb/core/urb.c:503 usb_submit_urb+0xe4b/0x1730 drivers/usb/core/urb.c:503
Modules linked in:
CPU: 2 UID: 0 PID: 1841 Comm: kworker/2:2 Not tainted 6.12.0-syzkaller-07834-g06afb0f36106 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Workqueue: mld mld_ifc_work
RIP: 0010:usb_submit_urb+0xe4b/0x1730 drivers/usb/core/urb.c:503
...
Call Trace:
<TASK>
usbnet_start_xmit+0x6be/0x2780 drivers/net/usb/usbnet.c:1467
__netdev_start_xmit include/linux/netdevice.h:5002 [inline]
netdev_start_xmit include/linux/netdevice.h:5011 [inline]
xmit_one net/core/dev.c:3590 [inline]
dev_hard_start_xmit+0x9a/0x7b0 net/core/dev.c:3606
sch_direct_xmit+0x1ae/0xc30 net/sched/sch_generic.c:343
__dev_xmit_skb net/core/dev.c:3827 [inline]
__dev_queue_xmit+0x13d4/0x43e0 net/core/dev.c:4400
dev_queue_xmit include/linux/netdevice.h:3168 [inline]
neigh_resolve_output net/core/neighbour.c:1514 [inline]
neigh_resolve_output+0x5bc/0x950 net/core/neighbour.c:1494
neigh_output include/net/neighbour.h:539 [inline]
ip6_finish_output2+0xb1b/0x2070 net/ipv6/ip6_output.c:141
__ip6_finish_output net/ipv6/ip6_output.c:215 [inline]
ip6_finish_output+0x3f9/0x1360 net/ipv6/ip6_output.c:226
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip6_output+0x1f8/0x540 net/ipv6/ip6_output.c:247
dst_output include/net/dst.h:450 [inline]
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
mld_sendpack+0x9f0/0x11d0 net/ipv6/mcast.c:1819
mld_send_cr net/ipv6/mcast.c:2120 [inline]
mld_ifc_work+0x740/0xca0 net/ipv6/mcast.c:2651
process_one_work+0x9c5/0x1ba0 kernel/workqueue.c:3229
process_scheduled_works kernel/workqueue.c:3310 [inline]
worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>

Reported-by: syzbot+d693c07c6f647e0388d3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d693c07c6f647e0388d3
Fixes: 47ee3051c856 ("[PATCH] USB: usbnet (5/9) module for genesys gl620a cables")
Cc: stable@vger.kernel.org
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Link: https://patch.msgid.link/20250224172919.1220522-1-n.zhandarovich@fintech.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-napi-add-cpu-affinity-to-napi-config'

Ahmed Zaki says:

====================
net: napi: add CPU affinity to napi->config

Drivers usually need to re-apply the user-set IRQ affinity to their IRQs
after reset. However, since there can be only one IRQ affinity notifier
for each IRQ, registering IRQ notifiers conflicts with the ARFS rmap
management in the core (which also registers separate IRQ affinity
notifiers).

Move the IRQ affinity management to the napi struct. This way we can have
a unified IRQ notifier to re-apply the user-set affinity and also manage
the ARFS rmaps.

The first patch moves the aRFS rmap management to core. It also adds the
IRQ affinity mask to napi_config and re-applies the mask after reset.
Patches 2, 4 and 5 use the new API for ena, ice and idpf drivers.

ICE does not always delete the NAPIs before releasing the IRQs. The third
patch makes sure the driver removes the IRQ number along with the queue
when the NAPIs are disabled. Without this, the next patches in this series
would free the IRQ before releasing the IRQ notifier (which generates
warnings).

Tested on ice and idpf.

v8: https://lore.kernel.org/20250211210657.428439-1-ahmed.zaki@intel.com
v7: https://lore.kernel.org/20250204220622.156061-1-ahmed.zaki@intel.com
v6: https://lore.kernel.org/20250118003335.155379-1-ahmed.zaki@intel.com
v5: https://lore.kernel.org/20250113171042.158123-1-ahmed.zaki@intel.com
v4: https://lore.kernel.org/20250109233107.17519-1-ahmed.zaki@intel.com
v3: https://lore.kernel.org/20250104004314.208259-1-ahmed.zaki@intel.com
v2: https://lore.kernel.org/202412190454.nwvp3hU2-lkp@intel.com
v1: https://lore.kernel.org/20241210002626.366878-1-ahmed.zaki@intel.com
====================

Link: https://patch.msgid.link/20250224232228.990783-1-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: add tests for napi IRQ affinity notifiers

Add tests to check that the napi retained the IRQ after down/up,
multiple changes in the number of rx queues and after
attaching/releasing XDP program.

Tested on ice and idpf:

   # NETIF=<iface> tools/testing/selftests/drivers/net/hw/irq.py
    KTAP version 1
    1..4
    ok 1 irq.check_irqs_reported
    ok 2 irq.check_reconfig_queues
    ok 3 irq.check_reconfig_xdp
    ok 4 irq.check_down
    # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

Tested-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://patch.msgid.link/20250224232228.990783-7-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

idpf: use napi's irq affinity

Delete the driver CPU affinity info and use the core's napi config
instead.

Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://patch.msgid.link/20250224232228.990783-6-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: use napi's irq affinity and rmap IRQ notifiers

Delete the driver CPU affinity and aRFS rmap info, use the core's
API instead.

Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://patch.msgid.link/20250224232228.990783-5-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: clear NAPI's IRQ numbers in ice_vsi_clear_napi_queues()

We set the NAPI's IRQ number in ice_vsi_set_napi_queues(). Clear the
NAPI's IRQ in ice_vsi_clear_napi_queues().

Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://patch.msgid.link/20250224232228.990783-4-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ena: use napi's aRFS rmap notifers

Use the core's rmap notifiers and delete our own.

Acked-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://patch.msgid.link/20250224232228.990783-3-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: move aRFS rmap management and CPU affinity to core

A common task for most drivers is to remember the user-set CPU affinity
to its IRQs. On each netdev reset, the driver should re-assign the user's
settings to the IRQs. Unify this task across all drivers by moving the CPU
affinity to napi->config.

However, to move the CPU affinity to core, we also need to move aRFS
rmap management since aRFS uses its own IRQ notifiers.

For the aRFS, add a new netdev flag "rx_cpu_rmap_auto". Drivers supporting
aRFS should set the flag via netif_enable_cpu_rmap() and core will allocate
and manage the aRFS rmaps. Freeing the rmap is also done by core when the
netdev is freed. For better IRQ affinity management, move the IRQ rmap
notifier inside the napi_struct and add new notify.notify and
notify.release functions: netif_irq_cpu_rmap_notify() and
netif_napi_affinity_release().

Now we have the aRFS rmap management in core, add CPU affinity mask to
napi_config. To delegate the CPU affinity management to the core, drivers
must:
1 - set the new netdev flag "irq_affinity_auto":
netif_enable_irq_affinity(netdev)
2 - create the napi with persistent config:
netif_napi_add_config()
3 - bind an IRQ to the napi instance: netif_napi_set_irq()

the core will then make sure to use re-assign affinity to the napi's
IRQ.

The default IRQ mask is set to one cpu starting from the closest NUMA.

Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://patch.msgid.link/20250224232228.990783-2-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-misc-fixes-2025-02-25'

Tariq Toukan says:

====================
mlx5 misc fixes 2025-02-25

This small patchset provides misc bug fixes from the team to the mlx5
core driver.
====================

Link: https://patch.msgid.link/20250225072608.526866-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: IRQ, Fix null string in debug print

irq_pool_alloc() debug print can print a null string.
Fix it by providing a default string to print.

Fixes: 71e084e26414 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501141055.SwfIphN0-lkp@intel.com/
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250225072608.526866-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Restore missing trace event when enabling vport QoS

Restore the `trace_mlx5_esw_vport_qos_create` event when creating
the vport scheduling element. This trace event was lost during
refactoring.

Fixes: be034baba83e ("net/mlx5: Make vport QoS enablement more flexible for future extensions")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250225072608.526866-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Fix vport QoS cleanup on error

When enabling vport QoS fails, the scheduling node was never freed,
causing a leak.

Add the missing free and reset the vport scheduling node pointer to
NULL.

Fixes: be034baba83e ("net/mlx5: Make vport QoS enablement more flexible for future extensions")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250225072608.526866-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination.

Non IP flow, with vlan tag not working as expected while
running below command for vlan-priority. fixed that.

ethtool -N eth1 flow-type ether vlan 0x8000 vlan-mask 0x1fff action 0 loc 0

Fixes: 1274daede3ef ("net: mvpp2: cls: Add steering based on vlan Id and priority.")
Signed-off-by: Harshal Chaudhari <hchaudhari@marvell.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250225042058.2643838-1-hchaudhari@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: report duplicate MAC address in all situations

Normally, a bond uses the MAC address of the first added slave as the bond’s
MAC address. And the bond will set active slave’s MAC address to bond’s
address if fail_over_mac is set to none (0) or follow (2).

When the first slave is removed, the bond will still use the removed slave’s
MAC address, which can lead to a duplicate MAC address and potentially cause
issues with the switch. To avoid confusion, let's warn the user in all
situations, including when fail_over_mac is set to 2 or not in active-backup
mode.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250225033914.18617-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Fix memory leak in unix_dgram_sendmsg()

After running the 'sendmsg02' program of Linux Test Project (LTP),
kmemleak reports the following memory leak:

  # cat /sys/kernel/debug/kmemleak
  unreferenced object 0xffff888243866800 (size 2048):
    comm "sendmsg02", pid 67, jiffies 4294903166
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 5e 00 00 00 00 00 00 00  ........^.......
      01 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
    backtrace (crc 7e96a3f2):
      kmemleak_alloc+0x56/0x90
      kmem_cache_alloc_noprof+0x209/0x450
      sk_prot_alloc.constprop.0+0x60/0x160
      sk_alloc+0x32/0xc0
      unix_create1+0x67/0x2b0
      unix_create+0x47/0xa0
      __sock_create+0x12e/0x200
      __sys_socket+0x6d/0x100
      __x64_sys_socket+0x1b/0x30
      x64_sys_call+0x7e1/0x2140
      do_syscall_64+0x54/0x110
      entry_SYSCALL_64_after_hwframe+0x76/0x7e

Commit 689c398885cc ("af_unix: Defer sock_put() to clean up path in
unix_dgram_sendmsg().") defers sock_put() in the error handling path.
However, it fails to account for the condition 'msg->msg_namelen != 0',
resulting in a memory leak when the code jumps to the 'lookup' label.

Fix issue by calling sock_put() if 'msg->msg_namelen != 0' is met.

Fixes: 689c398885cc ("af_unix: Defer sock_put() to clean up path in unix_dgram_sendmsg().")
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Acked-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250225021457.1824-1-ahuang12@lenovo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: skb: free up one bit in tx_flags

The linked series wants to add skb tx completion timestamps.
That needs a bit in skb_shared_info.tx_flags, but all are in use.

A per-skb bit is only needed for features that are configured on a
per packet basis. Per socket features can be read from sk->sk_tsflags.

Per packet tsflags can be set in sendmsg using cmsg, but only those in
SOF_TIMESTAMPING_TX_RECORD_MASK.

Per packet tsflags can also be set without cmsg by sandwiching a
send inbetween two setsockopts:

    val |= SOF_TIMESTAMPING_$FEATURE;
    setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
    write(fd, buf, sz);
    val &= ~SOF_TIMESTAMPING_$FEATURE;
    setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));

Changing a datapath test from skb_shinfo(skb)->tx_flags to
skb->sk->sk_tsflags can change behavior in that case, as the tx_flags
is written before the second setsockopt updates sk_tsflags.

Therefore, only bits can be reclaimed that cannot be set by cmsg and
are also highly unlikely to be used to target individual packets
otherwise.

Free up the bit currently used for SKBTX_HW_TSTAMP_USE_CYCLES. This
selects between clock and free running counter source for HW TX
timestamps. It is probable that all packets of the same socket will
always use the same source.

Link: https://lore.kernel.org/netdev/cover.1739988644.git.pav@iki.fi/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://patch.msgid.link/20250225023416.2088705-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'expand-cmsg_ipv6-sh-with-ipv4-support'

Willem de Bruijn says:

====================
expand cmsg_ipv6.sh with ipv4 support

Expand IPV6_TCLASS to also cover IP_TOS.
Expand IPV6_HOPLIMIT to also cover IP_TTL.

A series of two patches for basic readability (patch 1 is a noop),
and so that git does not interpret code changes + file rename as
a whole file del + add.
====================

Link: https://patch.msgid.link/20250225022431.2083926-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: expand cmsg_ipv6.sh with ipv4

Expand IPV6_TCLASS to also cover IP_TOS.
Expand IPV6_HOPLIMIT to also cover IP_TTL.

Expand csmg_sender.c to allow setting IPv4 setsockopts.
Also rename struct v6 to cmsg to match its expanded scope.
Don't bother updating all occurrences of tclass and hoplimit.

Rename cmsg_ipv6.sh to cmsg_ip.sh to match the expanded scope.

Be careful around the subtle API difference between TCLASS and TOS.
IP_TOS includes ECN bits. Add a test to verify that these are masked
when making routing decisions.

Diff is more concise with --word-diff

Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250225022431.2083926-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: prepare cmsg_ipv6.sh for ipv4

Move IPV6_TCLASS and IPV6_HOPLIMIT into loops, to be able to use them
for IP_TOS and IP_TTL in a follow-on patch.

Indentation in this file is a mix of four spaces and tabs for double
indents. To minimize code churn, maintain that pattern.

Very small diff if viewing with -w.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250225022431.2083926-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Handle napi_schedule() calls from non-interrupt

napi_schedule() is expected to be called either:

* From an interrupt, where raised softirqs are handled on IRQ exit

* From a softirq disabled section, where raised softirqs are handled on
  the next call to local_bh_enable().

* From a softirq handler, where raised softirqs are handled on the next
  round in do_softirq(), or further deferred to a dedicated kthread.

Other bare tasks context may end up ignoring the raised NET_RX vector
until the next random softirq handling opportunity, which may not
happen before a while if the CPU goes idle afterwards with the tick
stopped.

Such "misuses" have been detected on several places thanks to messages
of the kind:

"NOHZ tick-stop error: local softirq work is pending, handler #08!!!"

For example:

       __raise_softirq_irqoff
        __napi_schedule
        rtl8152_runtime_resume.isra.0
        rtl8152_resume
        usb_resume_interface.isra.0
        usb_resume_both
        __rpm_callback
        rpm_callback
        rpm_resume
        __pm_runtime_resume
        usb_autoresume_device
        usb_remote_wakeup
        hub_event
        process_one_work
        worker_thread
        kthread
        ret_from_fork
        ret_from_fork_asm

And also:

* drivers/net/usb/r8152.c::rtl_work_func_t
* drivers/net/netdevsim/netdev.c::nsim_start_xmit

There is a long history of issues of this kind:

019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
330068589389 ("idpf: disable local BH when scheduling napi for marker packets")
e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error")
e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule")
c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule")
970be1dff26d ("mt76: disable BH around napi_schedule() calls")
019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new  function to be called from threaded interrupt")
e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()")
83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule")
bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule")
8cf699ec849f ("mlx4: do not call napi_schedule() without care")
ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule")

This shows that relying on the caller to arrange a proper context for
the softirqs to be handled while calling napi_schedule() is very fragile
and error prone. Also fixing them can also prove challenging if the
caller may be called from different kinds of contexts.

Therefore fix this from napi_schedule() itself with waking up ksoftirqd
when softirqs are raised from task contexts.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: Francois Romieu <romieu@fr.zoreil.com>
Closes: https://lore.kernel.org/lkml/354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de/
Cc: Breno Leitao <leitao@debian.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250223221708.27130-1-frederic@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Clear old fragment checksum value in napi_reuse_skb

In certain cases, napi_get_frags() returns an skb that points to an old
received fragment, This skb may have its skb->ip_summed, csum, and other
fields set from previous fragment handling.

Some network drivers set skb->ip_summed to either CHECKSUM_COMPLETE or
CHECKSUM_UNNECESSARY when getting skb from napi_get_frags(), while
others only set skb->ip_summed when RX checksum offload is enabled on
the device, and do not set any value for skb->ip_summed when hardware
checksum offload is disabled, assuming that the skb->ip_summed
initiated to zero by napi_reuse_skb, ionic driver for example will
ignore/unset any value for the ip_summed filed if HW checksum offload is
disabled, and if we have a situation where the user disables the
checksum offload during a traffic that could lead to the following
errors shown in the kernel logs:
<IRQ>
dump_stack_lvl+0x34/0x48
__skb_gro_checksum_complete+0x7e/0x90
tcp6_gro_receive+0xc6/0x190
ipv6_gro_receive+0x1ec/0x430
dev_gro_receive+0x188/0x360
? ionic_rx_clean+0x25a/0x460 [ionic]
napi_gro_frags+0x13c/0x300
? __pfx_ionic_rx_service+0x10/0x10 [ionic]
ionic_rx_service+0x67/0x80 [ionic]
ionic_cq_service+0x58/0x90 [ionic]
ionic_txrx_napi+0x64/0x1b0 [ionic]
__napi_poll+0x27/0x170
net_rx_action+0x29c/0x370
handle_softirqs+0xce/0x270
__irq_exit_rcu+0xa3/0xc0
common_interrupt+0x80/0xa0
</IRQ>

This inconsistency sometimes leads to checksum validation issues in the
upper layers of the network stack.

To resolve this, this patch clears the skb->ip_summed value for each
reused skb in by napi_reuse_skb(), ensuring that the caller is responsible
for setting the correct checksum status. This eliminates potential
checksum validation issues caused by improper handling of
skb->ip_summed.

Fixes: 76620aafd66f ("gro: New frags interface to avoid copying shinfo")
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250225112852.2507709-1-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: be less liberal in TSEcr received while in SYN_RECV state

Yong-Hao Zou mentioned that linux was not strict as other OS in 3WHS,
for flows using TCP TS option (RFC 7323)

As hinted by an old comment in tcp_check_req(),
we can check the TSEcr value in the incoming packet corresponds
to one of the SYNACK TSval values we have sent.

In this patch, I record the oldest and most recent values
that SYNACK packets have used.

Send a challenge ACK if we receive a TSEcr outside
of this range, and increase a new SNMP counter.

nstat -az | grep TSEcrRejected
TcpExtTSEcrRejected            0                  0.0

Due to TCP fastopen implementation, do not apply yet these checks
for fastopen flows.

v2: No longer use req->num_timeout, but treq->snt_tsval_first
    to detect when first SYNACK is prepared. This means
    we make sure to not send an initial zero TSval.
    Make sure MPTCP and TCP selftests are passing.
    Change MIB name to TcpExtTSEcrRejected

v1: https://lore.kernel.org/netdev/CADVnQykD8i4ArpSZaPKaoNxLJ2if2ts9m4As+=Jvdkrgx1qMHw@mail.gmail.com/T/

Reported-by: Yong-Hao Zou <yonghaoz1994@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250225171048.3105061-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: unlink old napi when stopping a queue using queue API

When a queue is stopped using the ndo queue API, before
destroying its page pool, the associated NAPI instance
needs to be unlinked to avoid warnings.

Handle this by calling page_pool_disable_direct_recycling()
when stopping a queue.

Cc: stable@vger.kernel.org
Fixes: ebdfae0d377b ("gve: adopt page pool for DQ RDA mode")
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250226003526.1546854-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>