git.ipfire.org Git - thirdparty/linux.git/log

netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock

As of commit 7e43e0a1141d
("netfilter: nft_set_rbtree: translate rbtree to array for binary search")
the lock is only taken from control plane, no need to disable BH anymore.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260224205048.4718-8-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipvs: no_cport and dropentry counters can be per-net

Change the no_cport counters to be per-net and address family.
This should reduce the extra conn lookups done during present
NO_CPORT connections.

By changing from global to per-net dropentry counters, one net
will not affect the drop rate of another net.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260224205048.4718-7-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipvs: use more counters to avoid service lookups

When new connection is created we can lookup for services multiple
times to support fallback options. We already have some counters
to skip specific lookups because it costs CPU cycles for hash
calculation, etc.

Add more counters for fwmark/non-fwmark services (fwm_services and
nonfwm_services) and make all counters per address family.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260224205048.4718-6-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipvs: do not keep dest_dst after dest is removed

Before now dest->dest_dst is not released when server is moved into
dest_trash list after removal. As result, we can keep dst/dev
references for long time without actively using them.

It is better to avoid walking the dest_trash list when
ip_vs_dst_event() receives dev events. So, make sure we do not
hold dev references in dest_trash list. As packets can be flying
while server is being removed, check the IP_VS_DEST_F_AVAILABLE
flag in slow path to ensure we do not save new dev references to
removed servers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260224205048.4718-5-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipvs: use single svc table

fwmark based services and non-fwmark based services can be hashed
in same service table. This reduces the burden of working with two
tables.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260224205048.4718-4-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipvs: some service readers can use RCU

Some places walk the services under mutex but they can just use RCU:

* ip_vs_dst_event() uses ip_vs_forget_dev() which uses its own lock
to modify dest
* ip_vs_genl_dump_services(): ip_vs_genl_fill_service() just fills skb
* ip_vs_genl_parse_service(): move RCU lock to callers
ip_vs_genl_set_cmd(), ip_vs_genl_dump_dests() and ip_vs_genl_get_cmd()
* ip_vs_genl_dump_dests(): just fill skb

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260224205048.4718-3-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipvs: make ip_vs_svc_table and ip_vs_svc_fwm_table per netns

Current ipvs uses one global mutex "__ip_vs_mutex" to keep the global
"ip_vs_svc_table" and "ip_vs_svc_fwm_table" safe. But when there are
tens of thousands of services from different netns in the table, it
takes a long time to look up the table, for example, using "ipvsadm
-ln" from different netns simultaneously.

We make "ip_vs_svc_table" and "ip_vs_svc_fwm_table" per netns, and we
add "service_mutex" per netns to keep these two tables safe instead of
the global "__ip_vs_mutex" in current version. To this end, looking up
services from different netns simultaneously will not get stuck,
shortening the time consumption in large-scale deployment. It can be
reproduced using the simple scripts below.

init.sh: #!/bin/bash
for((i=1;i<=4;i++));do
        ip netns add ns$i
        ip netns exec ns$i ip link set dev lo up
        ip netns exec ns$i sh add-services.sh
done

add-services.sh: #!/bin/bash
for((i=0;i<30000;i++)); do
        ipvsadm -A  -t 10.10.10.10:$((80+$i)) -s rr
done

runtest.sh: #!/bin/bash
for((i=1;i<4;i++));do
        ip netns exec ns$i ipvsadm -ln > /dev/null &
done
ip netns exec ns4 ipvsadm -ln > /dev/null

Run "sh init.sh" to initiate the network environment. Then run "time
./runtest.sh" to evaluate the time consumption. Our testbed is a 4-core
Intel Xeon ECS. The result of the original version is around 8 seconds,
while the result of the modified version is only 0.8 seconds.

Signed-off-by: Jiejian Wu <jiejian@linux.alibaba.com>
Co-developed-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260224205048.4718-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pppoe: avoid zero-length arrays in struct pppoe_hdr

Jakub Kicinski reported following issue in upcoming patches:

W=1 C=1 GCC build gives us:

net/bridge/netfilter/nf_conntrack_bridge.c: note: in included file (through
../include/linux/if_pppox.h, ../include/uapi/linux/netfilter_bridge.h,
../include/linux/netfilter_bridge.h): include/uapi/linux/if_pppox.h:
153:29: warning: array of flexible structures

sparse doesn't like that hdr has a zero-length array which overlaps
proto. The kernel code doesn't currently need those arrays.

PPPoE connection is functional after applying this patch.

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Link: https://patch.msgid.link/20260224155030.106918-1-ericwouds@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/ibmveth: fix comment typos in ibmveth.c

Correct spelling mistakes in comments:
- Fix misspelling of gro_receive
- Fix misspelling of Partition

Signed-off-by: Abhilekh Deka <abhindeka@gmail.com>
Reviewed-by: Dave Marquardt <davemarq@linux.ibm.com>
Link: https://patch.msgid.link/20260224153601.17534-1-abhindeka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: cadence: macb: add ethtool nway_reset support

Wire phy_ethtool_nway_reset() as the .nway_reset ethtool operation,
allowing userspace to restart PHY autonegotiation via 'ethtool -r'.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260224145723.49450-1-nb@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-fix-interrupt-coalescing'

Russell King says:

====================
net: stmmac: fix interrupt coalescing

While cleaning up the descriptor handling, I noticed that the accounting
of transmit "packets" for interrupt coalescing was buggy in that it
takes the difference of the two indexes into the circular list of
transmit discriptors and merely subtracts one from the other without
regard for the indexes wrapping.

This can result in a negative number or very large positive number
which would have the effect of either reducing tx_q->tx_count_frames
or making that very large.

Either way, the result is numerically incorrect, and could trigger
interrupts or not trigger interrupts when required.

This series converts stmmac to use the circ_buf helpers, and then fixes
this problem.
====================

Link: https://patch.msgid.link/aZ1o2dmfpeiubCik@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: fix transmit interrupt coalescing

The accounting for transmit frames does not count the descriptors
correctly. It uses:

tx_packets = (tx_q->cur_tx + 1) - first_tx;

however, these are indexes into a circular buffer, so cur_tx can be
less than first_tx, and when that happens, tx_packets becomes a very
large unsigned integer. When this is added to tx_q->tx_count_frames,
it has the effect of reducing the count of frames, possibly causing
it to also wrap to a very large unsigned integer.

Fix this by using CIRC_CNT() to calculate the number of descriptors
used.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/E1vuoIl-0000000Aouz-0ttb@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: use circ_buf helpers for descriptors

The stmmac descriptor queues are circular buffers, operated as far as
the hardware is concerned as either a ring, or a chain that loops back
on itself. From the software perspective, it forms a circular buffer.

We have a few places which calculate the number of in-use and free
entries in these circular buffers, for which we have macros for.
Use CIRC_CNT() and CIRC_SPACE() as appropriate to calculate these
values.

Validating, for stmmac_tx_avail(), which uses CIRC_SPACE():

  dirty_tx = 1, cur_tx = 0 -> 0
  dirty_tx = 0, cur_tx = 0 -> dma_tx_size - 1
  dirty_tx = 0, cur_tx = 1 -> dma_tx_size - 2

dirty_tx passed as end, reduced by one. cur_tx passed as start.
Output on sane computers is identical.

For stmmac_rx_dirty(), which uses CIRC_CNT():

  dirty_rx = 1, cur_rx = 0 -> dma_rx_size - 1
  dirty_rx = 0, cur_rx = 0 -> 0
  dirty_rx = 0, cur_rx = 1 -> 1

dirty_rx passed as start, cur_rx passed as end. Output is identical.

Same validation performed on the is_last_segment calculation, which
also gets converted to CIRC_CNT().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/E1vuoIg-0000000Aout-0LyS@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rds: update outdated comment

The function rds_send_reset() was subsumed by rds_send_path_reset()
by commit d769ef81d5b5 ("RDS: Update rds_conn_shutdown to work with
rds_conn_path"). Update the comment accordingly.

Signed-off-by: kexinsun <kexinsun@smail.nju.edu.cn>
Link: https://patch.msgid.link/20260224020720.1174-1-kexinsun@smail.nju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fs_enet: allow nvmem to override MAC address

NVMEM typically loads after the ethernet driver and
of_get_ethdev_address returns -EPROBE_DEFER. return in such a case to
allow NVMEM to work.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260224014607.353378-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: hw-net: tso: set a TCP window clamp to avoid spurious drops

The TSO test wants to make sure that there isn't a lot of retransmits,
because that could indicate that device has a buggy TSO implementation.
On debug kernels, however, we're likely to see significant packet loss
because we simply overwhelm the receiver.

In a QEMU loop with virtio devices we see ~10% false positive rate
with occasional run hitting the threshold of 25% packet loss.

Since we're only sending 4MB of data, set a TCP_WINDOW_CLAMP to 200k.
This seems to make virtio happy while having little impact since we're
primarily interested in testing the sender, and the test doesn't
currently enable BIG TCP.

Running socat over virtio loop for 2 sec on a debug kernel shows:

  TcpOutSegs                      27327              0.0
  TcpRetransSegs                  83                 0.0

  TcpOutSegs                      30012              0.0
  TcpRetransSegs                  80                 0.0

  TcpOutSegs                      28767              0.0
  TcpRetransSegs                  77                 0.0

But with the clamp the 3 attempts show no retransmit:

  TcpOutSegs                      31537              0.0
  TcpRetransSegs                  0                  0.0

  TcpOutSegs                      30323              0.0
  TcpRetransSegs                  0                  0.0

  TcpOutSegs                      28700              0.0
  TcpRetransSegs                  0                  0.0

Since we expect no receiver-related drops now we can significantly
increase test's sensitivity to drops.

All the testing we do in NIPA uses cubic.

Link: https://patch.msgid.link/20260223204030.4142884-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: fix EEE supportable interfaces

According to the dwmac v3.74a databook, only MII, GMII and RGMII dwmac
interface modes are supported for EEE. Restrict EEE to these modes, or
the modules supported by a PCS other than the GMAC's integrated PCS.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vuUsD-0000000Afci-0XxO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: freescale: ucc_geth: call of_node_put once

Move it up to avoid placing it in both the error and success paths.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260224014141.352642-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-net-py-improve-bkg-error-reporting'

Jakub Kicinski says:

====================
selftests: net: py: improve bkg() error reporting

bkg() is a helper for running commands in the background.
When init or body of a with() block fails check if the bkg()
process already exited and report its status (including stdout/
/stderr). This significantly improves debugability.
====================

Link: https://patch.msgid.link/20260223202633.4126087-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: py: add cmd info for ksft_wait failure

Gal recently complained:

  When [ksft_wait failure] happens, the test fails with a cryptic
  message:
    # Exception| Exception: Did not receive ready message

Let's try to include the stdout/stderr of the command we tried
to start. E.g. for cmd("false", ksft_wait=True):

    # Exception| lib.py.utils.CmdInitFailure: Did not receive ready message
    # Exception| CMD: false
    # Exception|   EXIT: 1

We need to factor out _process_terminate() otherwise the exit
path may try to write to already disconnected self.ksft_term_fd.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260223202633.4126087-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: py: use repr(cmd) for failure exceptions

Reuse repr(cmd) instead of manually formatting a similar string.

Before:
  # Exception| lib.py.utils.CmdExitFailure: Command failed: false
  # Exception| STDOUT: b''
  # Exception| STDERR: b''

After:
  # Exception| lib.py.utils.CmdExitFailure: Command failed
  # Exception| CMD: false
  # Exception|   EXIT: 1

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260223202633.4126087-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: py: avoid masking exceptions in bkg() failures

bkg() failures are currently quite hard to debug and spot.
Often we have code along the lines of:

  with bkg("./cmd_rx_something -p PORT"):
       wait_port_listen(PORT)
       cmd("./cmd_tx_something", host=remote)

When wait_port_listen() fails we don't get to see the exit status
of bkg(). Even tho very often it's a failure in the bkg() command
that's actually to blame. Try not to interfere with the bkg()
command error checking.

With:

   with bkg("false", exit_wait=True):
        time.sleep(0.01)  # let the 'false' cmd run
        raise Exception("bla")

Before:

  .. stack trace ..
  # Exception| Exception: bla

After:

  .. stack trace ..
  # Exception| Exception: bla
  # Exception|
  # Exception| During handling of the above exception, another exception occurred:
  .. stack trace ..
  # Exception| lib.py.utils.CmdExitFailure: Command failed: false
  # Exception| STDOUT: b''
  # Exception| STDERR: b''

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260223202633.4126087-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: bnxt: rename ring_err_stats -> ring_drv_stats

We recently added GRO stats to bnxt, which are maintained
by the driver. Having "err" in the name of the struct for
ring stats no longer makes sense (as pointed out by Michael,
see Link).

Rename them to "drv" stats, as these are all maintained
by the driver (even if partially based on info from descriptors).
Michael suggested calling these misc, happy to go back to
that. IMHO "drv" is a bit more meaningful that "misc".

Pure rename using sed, no functional changes.

Link: https://lore.kernel.org/CACKFLimgibJ0qkM1AacZVh8MKKy-pE_AAc4KPKZ7GUqebmXW9A@mail.gmail.com
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20260223203702.4137801-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: Optimise is_netpoll_tx_blocked().

bond_start_xmit() spends some cycles in is_netpoll_tx_blocked():

  if (unlikely(is_netpoll_tx_blocked(dev)))
      return NETDEV_TX_BUSY;

because of the "pushf;pop reg" sequence (aka irqs_disabled()).

Let's swap the conditions in is_netpoll_tx_blocked() and
convert netpoll_block_tx to a static key.

Before:

   1.23 │       mov    %gs:0x28,%rax
   1.24 │       mov    %rax,0x18(%rsp)
  29.45 │       pushfq
   0.50 │       pop    %rax
   0.47 │       test   $0x200,%eax
        │     ↓ je     1b4
   0.49 │ 32:   lea    0x980(%rsi),%rbx

After:

   0.72 │       mov    %gs:0x28,%rax
   0.81 │       mov    %rax,0x18(%rsp)
   0.82 │       nop
   2.77 │ 2a:   lea    0x980(%rsi),%rbx

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260223230749.2376145-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

icmp: increase net.ipv4.icmp_msgs_{per_sec,burst}

These sysctls were added in 4cdf507d5452 ("icmp: add a global rate
limitation") and their default values might be too small.

Some network tools send probes to closed UDP ports from many hosts
to estimate proportion of packet drops on a particular target.

This patch sets both sysctls to 10000.

Note the per-peer rate-limit (as described in RFC 4443 2.4 (f))
intent is still enforced.

This also increases security, see b38e7819cae9
("icmp: randomize the global rate limiter") for reference.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223161742.929830-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: move inet6_csk_update_pmtu() to tcp_ipv6.c

This function is only called from tcp_v6_mtu_reduced() and can be
(auto)inlined by the compiler.

Note that inet6_csk_route_socket() is no longer (auto)inlined,
which is a good thing as it is slow path.

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1

add/remove: 0/2 grow/shrink: 2/0 up/down: 93/-129 (-36)
Function                                     old     new   delta
tcp_v6_mtu_reduced                           139     228     +89
inet6_csk_route_socket                       486     490      +4
__pfx_inet6_csk_update_pmtu                   16       -     -16
inet6_csk_update_pmtu                        113       -    -113
Total: Before=25076512, After=25076476, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223153047.886683-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: reduce calls to tcp_schedule_loss_probe()

For RPC workloads, we alternate tcp_schedule_loss_probe() calls from
output path and from input path, with tp->packets_out value
oscillating between !zero and zero, leading to poor branch prediction.

Move tp->packets_out check from tcp_schedule_loss_probe() to
tcp_set_xmit_timer().

We avoid one call to tcp_schedule_loss_probe() from tcp_ack()
path for typical RPC workloads, while improving branch prediction.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223113501.4070245-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-qcom-ethqos-cleanups-and-re-organise-serdes-handling'

Russell King says:

====================
net: stmmac: qcom-ethqos: cleanups and re-organise SerDes handling

As the last series had issues with stability, I've changed the approach
in this series to concentrate on keeping much of the SerDes related
code within the qcom-ethqos driver rather than trying to move it out at
this stage. This means it should be possible to bisect these patches and
pinpoint exactly the code movement that causes any instability.

This series starts with various cleanups to qcom-ethqos (the first four
patches) before beginning to move code, passing phylink's phy interface
(which will change) to the fix_mac_speed() method, and then using that
to configure the serdes and inband setting before moving the SerDes
code.

This patch set has been tested.
====================

Link: https://patch.msgid.link/aZwfAFJQcp9f0niI@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: convert to set_clk_tx_rate() method

Set the RGMII link clock using the set_clk_tx_rate() method rather than
coding it into the .fix_mac_speed() method. This simplifies ethqos's
ethqos_fix_mac_speed().

Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vuSLF-0000000ASci-42kh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: move SerDes speed configuration

Move the SerDes speed configuration to phylink's .mac_finish() stage
so that the SerDes is appropriately configured for the interface mode
prior to the link coming up.

Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vuSLA-0000000AScc-3RFf@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: use phy interface mode for inband

qcom-ethqos currently forces inband to be enabled for the Cisco SGMII
speeds (1G, 100M and 10M) but not for 2500BASE-X (2.5G).

Rather than using the speed to determine the forced inband state, use
phylink's PHY interface mode which will switch between SGMII for the
10M, 100M and 1G speeds, and 2500BASE-X for 2.5G.

Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vuSL5-0000000AScX-2wuM@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: pass phy interface mode to configs

Pass the current phylink phy interface mode to the RGMII and "SGMII"
configuration functions.

Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vuSL0-0000000AScM-2TN0@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: pass interface mode into fix_mac_speed() method

Pass the current interface mode reported by phylink into the
fix_mac_speed() method. This will be used by qcom-ethqos for its
"SGMII" configuration.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vuSKv-0000000AScG-1zv6@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: move loopback disable to .mac_finish()

Loopback is enabled to allow the dwmac soft reset to succeed. This
is enabled when clocks are enabled in ethqos_clks_config(), which
happens at driver probe and runtime PM resume - e.g. when the
network device is administratively brought up.

Currently, the loopback is disabled when the link comes up (via
.mac_link_up() calling this driver's .fix_mac_speed().)

Move the qcom_ethqos_set_sgmii_loopback() call which disables
loopback from ethqos_fix_mac_speed() into ethqos' SerDes specific
.mac_finish() method so that loopback is disabled a little earlier
after reset has completed, and dwmac setup has completed.

Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vuSKq-0000000AScA-1Wh3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: move qcom_ethqos_set_sgmii_loopback() up

ethqos_set_func_clk_en() configures both SGMII loopback and the RGMII
functional clock setting. qcom_ethqos_set_sgmii_loopback() is only
called from within ethqos_set_func_clk_en(), and checks for
PHY_INTERFACE_MODE_2500BASEX.

Move qcom_ethqos_set_sgmii_loopback() to the callers of
ethqos_set_func_clk_en() except for ethqos_configure_rgmii() where we
know that ethqos->phy_mode will not be PHY_INTERFACE_MODE_2500BASEX.

Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vuSKl-0000000ASc1-18ka@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: change ethqos_configure*() to return void

The ethqos_configure*() family of functions always return zero, and the
return value is never checked. Change the int return type to void.

Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vuSKg-0000000ASbv-0iWL@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: remove register field value obfuscations

Convert the register field values to something more human readable.

For example, using (BIT(29) | BIT(27)) to update a register field that
consists of bits 29:27 is an obfuscated way of writing decimal 5 for
this field. The comment above needs to explain that this value is 5.

Worse still is BIT(12) | GENMASK(9, 8), which is used to hide the
decimal value 19 for the bitfield 16:8.

Fix these, and a few others by using FIELD_PREP(). While it means we
have bare numeric constants, this is more preferable than having the
obfuscation.

Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vuSKa-0000000ASbo-2zQg@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: rename "por" members to "rgmii_por"

Rename the "por" and "num_por" members to indicate that they are for
RGMII mode only as ethqos_configure_rgmii() is the only place that the
values are programmed into the registers.

Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vuSKV-0000000ASbg-28JK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-ethernet-enic-add-vic-ids-and-link-modes'

Satish Kharat says:

====================
eth: enic: add VIC ids and link modes

Add VIC subsystem ids and their supported/advertised media types so ethtool
reflects the hardware capabilities for the VIC variants.
====================

Link: https://patch.msgid.link/20260223-enic-cscwi36355-v2-0-63488194a974@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net:ethernet:enic: map ethtool link modes by VIC type

Report supported media types based on the VIC subsystem ID so ethtool
reflects the hardware capabilities.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260223-enic-cscwi36355-v2-2-63488194a974@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net:ethernet:enic: add VIC subsystem ids

Add VIC subsystem id for 12xx, 13xx, 14xx and 15xxx series

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260223-enic-cscwi36355-v2-1-63488194a974@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: net: document neigh gc_stale_time sysctl

Add missing documentation for a neighbor table garbage collector sysctl
parameter in ip-sysctl.rst:

neigh/default/gc_stale_time: controls how long an unused neighbor entry
is kept before becoming eligible for garbage collection (default: 60
seconds)

Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
Link: https://patch.msgid.link/20260223101257.47563-1-g.goller@proxmox.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tcp-rework-tcp_v-4-6-_send_check'

Eric Dumazet says:

====================
tcp: rework tcp_v{4,6}_send_check()

tcp_v{4,6}_send_check() are only called from __tcp_transmit_skb()

They currently are in different files (tcp_ipv4.c and tcp_ipv6.c)
thus out of line.

This series move them close to their caller so that compiler
can inline them.

For all patches in the series:

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.3
add/remove: 0/2 grow/shrink: 1/3 up/down: 102/-178 (-76)
Function                                     old     new   delta
__tcp_transmit_skb                          3321    3423    +102
tcp_v4_send_check                            136     132      -4
__tcp_v4_send_check                          130     121      -9
mptcp_subflow_init                           777     763     -14
__pfx_tcp_v6_send_check                       16       -     -16
tcp_v6_send_check                            135       -    -135
Total: Before=25143100, After=25143024, chg -0.00%
====================

Link: https://patch.msgid.link/20260223100729.3761597-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: make tcp_v{4,6}_send_check() static

tcp_v{4,6}_send_check() are only called from tcp_output.c
and should be made static so that the compiler does not need
to put an out of line copy of them.

Remove (struct inet_connection_sock_af_ops) send_check field
and use instead @net_header_len.

Move @net_header_len close to @queue_xmit for data locality
as both are used in TCP tx fast path.

$ scripts/bloat-o-meter -t vmlinux.2 vmlinux.3
add/remove: 0/2 grow/shrink: 0/3 up/down: 0/-172 (-172)
Function                                     old     new   delta
__tcp_transmit_skb                          3426    3423      -3
tcp_v4_send_check                            136     132      -4
mptcp_subflow_init                           777     763     -14
__pfx_tcp_v6_send_check                       16       -     -16
tcp_v6_send_check                            135       -    -135
Total: Before=25143196, After=25143024, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223100729.3761597-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: move tcp_v6_send_check() to tcp_output.c

Move tcp_v6_send_check() so that __tcp_transmit_skb() can inline it.

$ scripts/bloat-o-meter -t vmlinux.1 vmlinux.2
add/remove: 0/0 grow/shrink: 1/0 up/down: 105/0 (105)
Function old new delta
__tcp_transmit_skb 3321 3426 +105
Total: Before=25143091, After=25143196, chg +0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223100729.3761597-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: inline __tcp_v4_send_check()

Inline __tcp_v4_send_check(), like __tcp_v6_send_check().

Move tcp_v4_send_check() to tcp_output.c close to
its fast path caller (__tcp_transmit_skb()).

Note __tcp_v4_send_check() is still out-of-line for tcp4_gso_segment()
because it is called in an unlikely() section.

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-9 (-9)
Function old new delta
__tcp_v4_send_check 130 121 -9
Total: Before=25143100, After=25143091, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223100729.3761597-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: move udp6_csum_init() back to net/ipv6/udp.c

This function has a single caller in net/ipv6/udp.c.

Move it there so that the compiler can decide to (auto)inline
it if he prefers to. IBT glue is removed anyway.

With clang, we can see it was able to inline it and also
inlined one other helper at the same time.

UDPLITE removal will also help.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/2 grow/shrink: 1/0 up/down: 840/-785 (55)
Function                                     old     new   delta
__udp6_lib_rcv                              1247    2087    +840
__pfx_udp6_csum_init                          16       -     -16
udp6_csum_init                               769       -    -769
Total: Before=25074399, After=25074454, chg +0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223093445.3696368-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: __lock_sock() can be static

After commit 6511882cdd82 ("mptcp: allocate fwd memory separately
on the rx and tx path") __lock_sock() can be static again.

Make sure __lock_sock() is not inlined, so that lock_sock_nested()
no longer needs a stack canary.

Add a noinline attribute on lock_sock_nested() so that calls
to lock_sock() from net/core/sock.c are not inlined,
none of them are fast path to deserve that:

- sockopt_lock_sock()
- sock_set_reuseport()
- sock_set_reuseaddr()
- sock_set_mark()
- sock_set_keepalive()
- sock_no_linger()
- sock_bindtoindex()
- sk_wait_data()
- sock_set_rcvbuf()

$ scripts/bloat-o-meter -t vmlinux.old vmlinux
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-312 (-312)
Function                                     old     new   delta
__lock_sock                                  192     188      -4
__lock_sock_fast                             239      86    -153
lock_sock_nested                             227      72    -155
Total: Before=24888707, After=24888395, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223092716.3673939-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: microchip: lan743x: add ethtool nway_reset support

Wire phylink_ethtool_nway_reset() as the .nway_reset ethtool operation,
allowing userspace to restart PHY autonegotiation via 'ethtool -r'.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Russel King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260223085442.42852-1-nb@tipi-net.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: l2tp_eth: Replace deprecated strcpy with strscpy in l2tp_eth_create

strcpy() has been deprecated [1] because it performs no bounds checking
on the destination buffer, which can lead to buffer overflows. Replace
it with the safer strscpy(). Use the two-argument version of strscpy()
to copy 'cfg->ifname'.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260223074137.321862-1-thorsten.blum@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: tc-testing: preserve list order when removing duplicates

Using set() removes duplicates but breaks ordering.
Test output should be deterministic, so replace with dict.fromkeys().

Signed-off-by: Naveen Anandhan <mr.navi8680@gmail.com>
Link: https://patch.msgid.link/20260222095536.17371-1-mr.navi8680@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests/net: add test for IP-in-IPv6 tunneling

commit 81c734dae203 ("ip6_tunnel: use skb_vlan_inet_prepare() in
__ip6_tnl_rcv()") was fine in and of itself, but its backport to 6.12
(and 6.6) broke IPv4-in-IPv6 tunneling, see [1]. This adds a self-test
for basic IPv4-in-IPv6 and IPv6-in-IPv6 functionality.

[1]: https://lore.kernel.org/all/CAA2RiuSnH_2xc+-W6EnFEG00XjS-dszMq61JEvRjcGS31CBw=g@mail.gmail.com/

Signed-off-by: Linus Heckemann <git@sphalerite.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Ricardo B. Marlière <rbm@suse.com>
Tested-by: Ricardo B. Marlière <rbm@suse.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260221114806.1231666-1-git@sphalerite.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter.

  Current release - new code bugs:

   - net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT

   - eth: mlx5e: XSK, Fix unintended ICOSQ change

   - phy_port: correctly recompute the port's linkmodes

   - vsock: prevent child netns mode switch from local to global

   - couple of kconfig fixes for new symbols

  Previous releases - regressions:

   - nfc: nci: fix false-positive parameter validation for packet data

   - net: do not delay zero-copy skbs in skb_attempt_defer_free()

  Previous releases - always broken:

   - mctp: ensure our nlmsg responses to user space are zero-initialised

   - ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()

   - fixes for ICMP rate limiting

  Misc:

   - intel: fix PCI device ID conflict between i40e and ipw2200"

* tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
  net: nfc: nci: Fix parameter validation for packet data
  net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
  net/mlx5e: Fix deadlocks between devlink and netdev instance locks
  net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
  net/mlx5: Fix misidentification of write combining CQE during poll loop
  net/mlx5e: Fix misidentification of ASO CQE during poll loop
  net/mlx5: Fix multiport device check over light SFs
  bonding: alb: fix UAF in rlb_arp_recv during bond up/down
  bnge: fix reserving resources from FW
  eth: fbnic: Advertise supported XDP features.
  rds: tcp: fix uninit-value in __inet_bind
  net/rds: Fix NULL pointer dereference in rds_tcp_accept_one
  octeontx2-af: Fix default entries mcam entry action
  net/mlx5e: XSK, Fix unintended ICOSQ change
  ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero
  ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero
  ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()
  inet: move icmp_global_{credit,stamp} to a separate cache line
  icmp: prevent possible overflow in icmp_global_allow()
  selftests/net: packetdrill: add ipv4-mapped-ipv6 tests
  ...

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

- Fix invalid write loop logic in libbpf's bpf_linker__add_buf() (Amery
   Hung)

- Fix a potential use-after-free of BTF object (Anton Protopopov)

- Add feature detection to libbpf and avoid moving arena global
   variables on older kernels (Emil Tsalapatis)

- Remove extern declaration of bpf_stream_vprintk() from libbpf headers
   (Ihor Solodrai)

- Fix truncated netlink dumps in bpftool (Jakub Kicinski)

- Fix map_kptr grace period wait in bpf selftests (Kumar Kartikeya
   Dwivedi)

- Remove hexdump dependency while building bpf selftests (Matthieu
   Baerts)

- Complete fsession support in BPF trampolines on riscv (Menglong Dong)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Remove hexdump dependency
  libbpf: Remove extern declaration of bpf_stream_vprintk()
  selftests/bpf: Use vmlinux.h in test_xdp_meta
  bpftool: Fix truncated netlink dumps
  libbpf: Delay feature gate check until object prepare time
  libbpf: Do not use PROG_TYPE_TRACEPOINT program for feature gating
  bpf: Add a map/btf from a fd array more consistently
  selftests/bpf: Fix map_kptr grace period wait
  selftests/bpf: enable fsession_test on riscv64
  selftests/bpf: Adjust selftest due to function rename
  bpf, riscv: add fsession support for trampolines
  bpf: Fix a potential use-after-free of BTF object
  bpf, riscv: introduce emit_store_stack_imm64() for trampoline
  libbpf: Fix invalid write loop logic in bpf_linker__add_buf()
  libbpf: Add gating for arena globals relocation feature

net: nfc: nci: Fix parameter validation for packet data

Since commit 9c328f54741b ("net: nfc: nci: Add parameter validation for
packet data") communication with nci nfc chips is not working any more.

The mentioned commit tries to fix access of uninitialized data, but
failed to understand that in some cases the data packet is of variable
length and can therefore not be compared to the maximum packet length
given by the sizeof(struct).

Fixes: 9c328f54741b ("net: nfc: nci: Add parameter validation for packet data")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at>
Reported-by: syzbot+740e04c2a93467a0f8c8@syzkaller.appspotmail.com
Link: https://patch.msgid.link/20260218083000.301354-1-michael.thalmeier@hale.at
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-misc-fixes-2026-02-18'

Tariq Toukan says:

====================
mlx5 misc fixes 2026-02-18

This patchset provides misc bug fixes from the team to the mlx5
core and Eth drivers.
====================

Link: https://patch.msgid.link/20260218072904.1764634-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Use unsigned for mlx5e_get_max_num_channels

The max number of channels is always an unsigned int, use the correct
type to fix compilation errors done with strict type checking, e.g.:

error: call to ‘__compiletime_assert_1110’ declared with attribute
error: min(mlx5e_get_devlink_param_num_doorbells(mdev),
mlx5e_get_max_num_channels(mdev)) signedness error

Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Fix deadlocks between devlink and netdev instance locks

In the mentioned "Fixes" commit, various work tasks triggering devlink
health reporter recovery were switched to use netdev_trylock to protect
against concurrent tear down of the channels being recovered. But this
had the side effect of introducing potential deadlocks because of
incorrect lock ordering.

The correct lock order is described by the init flow:
probe_one -> mlx5_init_one (acquires devlink lock)
-> mlx5_init_one_devl_locked -> mlx5_register_device
-> mlx5_rescan_drivers_locked -...-> mlx5e_probe -> _mlx5e_probe
-> register_netdev (acquires rtnl lock)
-> register_netdevice (acquires netdev lock)
=> devlink lock -> rtnl lock -> netdev lock.

But in the current recovery flow, the order is wrong:
mlx5e_tx_err_cqe_work (acquires netdev lock)
-> mlx5e_reporter_tx_err_cqe -> mlx5e_health_report
-> devlink_health_report (acquires devlink lock => boom!)
-> devlink_health_reporter_recover
-> mlx5e_tx_reporter_recover -> mlx5e_tx_reporter_recover_from_ctx
-> mlx5e_tx_reporter_err_cqe_recover

The same pattern exists in:
mlx5e_reporter_rx_timeout
mlx5e_reporter_tx_ptpsq_unhealthy
mlx5e_reporter_tx_timeout

Fix these by moving the netdev_trylock calls from the work handlers
lower in the call stack, in the respective recovery functions, where
they are actually necessary.

Fixes: 8f7b00307bf1 ("net/mlx5e: Convert mlx5 netdevs to instance locking")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event

The macsec_aso_set_arm_event function calls mlx5_aso_poll_cq once
without a retry loop. If the CQE is not immediately available after
posting the WQE, the function fails unnecessarily.

Use read_poll_timeout() to poll 3-10 usecs for CQE, consistent with
other ASO polling code paths in the driver.

Fixes: 739cfa34518e ("net/mlx5: Make ASO poll CQ usable in atomic context")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Fix misidentification of write combining CQE during poll loop

The write combining completion poll loop uses usleep_range() which can
sleep much longer than requested due to scheduler latency. Under load,
we witnessed a 20ms+ delay until the process was rescheduled, causing
the jiffies based timeout to expire while the thread is sleeping.

The original do-while loop structure (poll, sleep, check timeout) would
exit without a final poll when waking after timeout, missing a CQE that
arrived during sleep.

Instead of the open-coded while loop, use the kernel's poll_timeout_us()
which always performs an additional check after the sleep expiration,
and is less error-prone.

Note: poll_timeout_us() doesn't accept a sleep range, by passing 10
sleep_us the sleep range effectively changes from 2-10 to 3-10 usecs.

Fixes: d98995b4bf98 ("net/mlx5: Reimplement write combining test")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Fix misidentification of ASO CQE during poll loop

The ASO completion poll loop uses usleep_range() which can sleep much
longer than requested due to scheduler latency. Under load, we witnessed
a 20ms+ delay until the process was rescheduled, causing the jiffies
based timeout to expire while the thread is sleeping.

The original do-while loop structure (poll, sleep, check timeout) would
exit without a final poll when waking after timeout, missing a CQE that
arrived during sleep.

Instead of the open-coded while loop, use the kernel's
read_poll_timeout() which always performs an additional check after the
sleep expiration, and is less error-prone.

Note: read_poll_timeout() doesn't accept a sleep range, by passing 10
sleep_us the sleep range effectively changes from 2-10 to 3-10 usecs.

Fixes: 739cfa34518e ("net/mlx5: Make ASO poll CQ usable in atomic context")
Fixes: 7e3fce82d945 ("net/mlx5e: Overcome slow response for first macsec ASO WQE")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Fix multiport device check over light SFs

Driver is using num_vhca_ports capability to distinguish between
multiport master device and multiport slave device. num_vhca_ports is a
capability the driver sets according to the MAX num_vhca_ports
capability reported by FW. On the other hand, light SFs doesn't set the
above capbility.

This leads to wrong results whenever light SFs is checking whether he is
a multiport master or slave.

Therefore, use the MAX capability to distinguish between master and
slave devices.

Fixes: e71383fb9cd1 ("net/mlx5: Light probe local SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: alb: fix UAF in rlb_arp_recv during bond up/down

The ALB RX path may access rx_hashtbl concurrently with bond
teardown. During rapid bond up/down cycles, rlb_deinitialize()
frees rx_hashtbl while RX handlers are still running, leading
to a null pointer dereference detected by KASAN.

However, the root cause is that rlb_arp_recv() can still be accessed
after setting recv_probe to NULL, which is actually a use-after-free
(UAF) issue. That is the reason for using the referenced commit in the
Fixes tag.

[  214.174138] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001d: 0000 [#1] SMP KASAN PTI
[  214.186478] KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
[  214.194933] CPU: 30 UID: 0 PID: 2375 Comm: ping Kdump: loaded Not tainted 6.19.0-rc8+ #2 PREEMPT(voluntary)
[  214.205907] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.14.0 01/14/2022
[  214.214357] RIP: 0010:rlb_arp_recv+0x505/0xab0 [bonding]
[  214.220320] Code: 0f 85 2b 05 00 00 48 b8 00 00 00 00 00 fc ff df 40 0f b6 ed 48 c1 e5 06 49 03 ad 78 01 00 00 48 8d 7d 28 48 89 fa 48 c1 ea 03 <0f> b6
04 02 84 c0 74 06 0f 8e 12 05 00 00 80 7d 28 00 0f 84 8c 00
[  214.241280] RSP: 0018:ffffc900073d8870 EFLAGS: 00010206
[  214.247116] RAX: dffffc0000000000 RBX: ffff888168556822 RCX: ffff88816855681e
[  214.255082] RDX: 000000000000001d RSI: dffffc0000000000 RDI: 00000000000000e8
[  214.263048] RBP: 00000000000000c0 R08: 0000000000000002 R09: ffffed11192021c8
[  214.271013] R10: ffff8888c9010e43 R11: 0000000000000001 R12: 1ffff92000e7b119
[  214.278978] R13: ffff8888c9010e00 R14: ffff888168556822 R15: ffff888168556810
[  214.286943] FS:  00007f85d2d9cb80(0000) GS:ffff88886ccb3000(0000) knlGS:0000000000000000
[  214.295966] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  214.302380] CR2: 00007f0d047b5e34 CR3: 00000008a1c2e002 CR4: 00000000001726f0
[  214.310347] Call Trace:
[  214.313070]  <IRQ>
[  214.315318]  ? __pfx_rlb_arp_recv+0x10/0x10 [bonding]
[  214.320975]  bond_handle_frame+0x166/0xb60 [bonding]
[  214.326537]  ? __pfx_bond_handle_frame+0x10/0x10 [bonding]
[  214.332680]  __netif_receive_skb_core.constprop.0+0x576/0x2710
[  214.339199]  ? __pfx_arp_process+0x10/0x10
[  214.343775]  ? sched_balance_find_src_group+0x98/0x630
[  214.349513]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[  214.356513]  ? arp_rcv+0x307/0x690
[  214.360311]  ? __pfx_arp_rcv+0x10/0x10
[  214.364499]  ? __lock_acquire+0x58c/0xbd0
[  214.368975]  __netif_receive_skb_one_core+0xae/0x1b0
[  214.374518]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  214.380743]  ? lock_acquire+0x10b/0x140
[  214.385026]  process_backlog+0x3f1/0x13a0
[  214.389502]  ? process_backlog+0x3aa/0x13a0
[  214.394174]  __napi_poll.constprop.0+0x9f/0x370
[  214.399233]  net_rx_action+0x8c1/0xe60
[  214.403423]  ? __pfx_net_rx_action+0x10/0x10
[  214.408193]  ? lock_acquire.part.0+0xbd/0x260
[  214.413058]  ? sched_clock_cpu+0x6c/0x540
[  214.417540]  ? mark_held_locks+0x40/0x70
[  214.421920]  handle_softirqs+0x1fd/0x860
[  214.426302]  ? __pfx_handle_softirqs+0x10/0x10
[  214.431264]  ? __neigh_event_send+0x2d6/0xf50
[  214.436131]  do_softirq+0xb1/0xf0
[  214.439830]  </IRQ>

The issue is reproducible by repeatedly running
ip link set bond0 up/down while receiving ARP messages, where
rlb_arp_recv() can race with rlb_deinitialize() and dereference
a freed rx_hashtbl entry.

Fix this by setting recv_probe to NULL and then calling
synchronize_net() to wait for any concurrent RX processing to finish.
This ensures that no RX handler can access rx_hashtbl after it is freed
in bond_alb_deinitialize().

Reported-by: Liang Li <liali@redhat.com>
Fixes: 3aba891dde38 ("bonding: move processing of recv handlers into handle_frame()")
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260218060919.101574-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnge: fix reserving resources from FW

HWRM_FUNC_CFG is used to reserve resources, whereas HWRM_FUNC_QCFG is
intended for querying resource information from the firmware.
Since __bnge_hwrm_reserve_pf_rings() reserves resources for a specific
PF, the command type should be HWRM_FUNC_CFG.

Fixes: 627c67f038d2 ("bng_en: Add resource management support")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260218052755.4097468-1-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: fbnic: Advertise supported XDP features.

Drivers are supposed to advertise the XDP features they support. This was
missed while adding XDP support.

Before:
$ ynl --family netdev --dump dev-get
...
{'ifindex': 3,
  'xdp-features': set(),
  'xdp-rx-metadata-features': set(),
  'xsk-features': set()},
...

After:
$ ynl --family netdev --dump dev-get
...
{'ifindex': 3,
  'xdp-features': {'basic', 'rx-sg'},
  'xdp-rx-metadata-features': set(),
  'xsk-features': set()},
...

Fixes: 168deb7b31b2 ("eth: fbnic: Add support for XDP_TX action")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260218030620.3329608-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

rds: tcp: fix uninit-value in __inet_bind

KMSAN reported an uninit-value access in __inet_bind() when binding
an RDS TCP socket.

The uninitialized memory originates from rds_tcp_conn_alloc(),
which uses kmem_cache_alloc() to allocate the rds_tcp_connection structure.

Specifically, the field 't_client_port_group' is incremented in
rds_tcp_conn_path_connect() without being initialized first:

if (++tc->t_client_port_group >= port_groups)

Since kmem_cache_alloc() does not zero the memory, this field contains
garbage, leading to the KMSAN report.

Fix this by using kmem_cache_zalloc() to ensure the structure is
zero-initialized upon allocation.

Reported-by: syzbot+aae646f09192f72a68dc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=aae646f09192f72a68dc
Tested-by: syzbot+aae646f09192f72a68dc@syzkaller.appspotmail.com
Fixes: a20a6992558f ("net/rds: Encode cp_index in TCP source port")
Signed-off-by: Tabrez Ahmed <tabreztalks@gmail.com>
Reviewed-by: Charalampos Mitrodimas <charmitro@posteo.net>
Reviewed-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260217135350.33641-1-tabreztalks@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/rds: Fix NULL pointer dereference in rds_tcp_accept_one

Save a local pointer to new_sock->sk and hold a reference before
installing callbacks in rds_tcp_accept_one. After
rds_tcp_set_callbacks() or rds_tcp_reset_callbacks(), tc->t_sock is
set to new_sock which may race with the shutdown path. A concurrent
rds_tcp_conn_path_shutdown() may call sock_release(), which sets
new_sock->sk = NULL and may eventually free sk when the refcount
reaches zero.

Subsequent accesses to new_sock->sk->sk_state would dereference NULL,
causing the crash. The fix saves a local sk pointer before callbacks
are installed so that sk_state can be accessed safely even after
new_sock->sk is nulled, and uses sock_hold()/sock_put() to ensure
sk itself remains valid for the duration.

Fixes: 826c1004d4ae ("net/rds: rds_tcp_conn_path_shutdown must not discard messages")
Reported-by: syzbot+96046021045ffe6d7709@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=96046021045ffe6d7709
Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260216222643.2391390-1-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more non-MM updates from Andrew Morton:

- "two fixes in kho_populate()" fixes a couple of not-major issues in
   the kexec handover code (Ran Xiaokai)

- misc singletons

* tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  lib/group_cpus: handle const qualifier from clusters allocation type
  kho: remove unnecessary WARN_ON(err) in kho_populate()
  kho: fix missing early_memunmap() call in kho_populate()
  scripts/gdb: implement x86_page_ops in mm.py
  objpool: fix the overestimation of object pooling metadata size
  selftests/memfd: use IPC semaphore instead of SIGSTOP/SIGCONT
  delayacct: fix build regression on accounting tool

Merge tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM  updates from Andrew Morton:

- "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
   couple of issues in the demotion code - pages were failed demotion
   and were finding themselves demoted into disallowed nodes (Bing Jiao)

- "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
   mapledtree race and performs a number of cleanups (Liam Howlett)

- "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
   them" implements a lot of cleanups following on from the conversion
   of the VMA flags into a bitmap (Lorenzo Stoakes)

- "support batch checking of references and unmapping for large folios"
   implements batching to greatly improve the performance of reclaiming
   clean file-backed large folios (Baolin Wang)

- "selftests/mm: add memory failure selftests" does as claimed (Miaohe
   Lin)

* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
  mm/page_alloc: clear page->private in free_pages_prepare()
  selftests/mm: add memory failure dirty pagecache test
  selftests/mm: add memory failure clean pagecache test
  selftests/mm: add memory failure anonymous page test
  mm: rmap: support batched unmapping for file large folios
  arm64: mm: implement the architecture-specific clear_flush_young_ptes()
  arm64: mm: support batch clearing of the young flag for large folios
  arm64: mm: factor out the address and ptep alignment into a new helper
  mm: rmap: support batched checks of the references for large folios
  tools/testing/vma: add VMA userland tests for VMA flag functions
  tools/testing/vma: separate out vma_internal.h into logical headers
  tools/testing/vma: separate VMA userland tests into separate files
  mm: make vm_area_desc utilise vma_flags_t only
  mm: update all remaining mmap_prepare users to use vma_flags_t
  mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
  mm: update secretmem to use VMA flags on mmap_prepare
  mm: update hugetlbfs to use VMA flags on mmap_prepare
  mm: add basic VMA flag operation helper functions
  tools: bitmap: add missing bitmap_[subset(), andnot()]
  mm: add mk_vma_flags() bitmap flag macro helper
  ...

octeontx2-af: Fix default entries mcam entry action

As per design, AF should update the default MCAM action only when
mcam_index is -1. A bug in the previous patch caused default entries
to be changed even when the request was not for them.

Fixes: 570ba37898ec ("octeontx2-af: Update RSS algorithm index")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260216090338.1318976-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-26-02-17' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

The following patchset contains Netfilter fixes for *net*:

1) Add missing __rcu annotations to NAT helper hook pointers in Amanda,
   FTP, IRC, SNMP and TFTP helpers.  From Sun Jian.

2-4):
- Add global spinlock to serialize nft_counter fetch+reset operations.
- Use atomic64_xchg() for nft_quota reset instead of read+subtract pattern.
   Note AI review detects a race in this change but it isn't new. The
   'racing' bit only exists to prevent constant stream of 'quota expired'
   notifications.
- Revert commit_mutex usage in nf_tables reset path, it caused
   circular lock dependency.  All from Brian Witte.

5) Fix uninitialized l3num value in nf_conntrack_h323 helper.

6) Fix musl libc compatibility in netfilter_bridge.h UAPI header. This
   change isn't nice (UAPI headers should not include libc headers), but
   as-is musl builds may fail due to redefinition of struct ethhdr.

7) Fix protocol checksum validation in IPVS for IPv6 with extension headers,
   from Julian Anastasov.

8) Fix device reference leak in IPVS when netdev goes down. Also from
   Julian.

9) Remove WARN_ON_ONCE when accessing forward path array, this can
   trigger with sufficiently long forward paths.  From Pablo Neira Ayuso.

10) Fix use-after-free in nf_tables_addchain() error path, from Inseo An.

* tag 'nf-26-02-17' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: fix use-after-free in nf_tables_addchain()
  net: remove WARN_ON_ONCE when accessing forward path array
  ipvs: do not keep dest_dst if dev is going down
  ipvs: skip ipv6 extension headers for csum checks
  include: uapi: netfilter_bridge.h: Cover for musl libc
  netfilter: nf_conntrack_h323: don't pass uninitialised l3num value
  netfilter: nf_tables: revert commit_mutex usage in reset path
  netfilter: nft_quota: use atomic64_xchg for reset
  netfilter: nft_counter: serialize reset with spinlock
  netfilter: annotate NAT helper hook pointers with __rcu
====================

Link: https://patch.msgid.link/20260217163233.31455-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: XSK, Fix unintended ICOSQ change

XSK wakeup must use the async ICOSQ (with proper locking), as it is not
guaranteed to run on the same CPU as the channel.

The commit that converted the NAPI trigger path to use the sync ICOSQ
incorrectly applied the same change to XSK, causing XSK wakeups to use
the sync ICOSQ as well. Revert XSK flows to use the async ICOSQ.

XDP program attach/detach triggers channel reopen, while XSK pool
enable/disable can happen on-the-fly via NDOs without reopening
channels. As a result, xsk_pool state cannot be reliably used at
mlx5e_open_channel() time to decide whether an async ICOSQ is needed.

Update the async_icosq_needed logic to depend on the presence of an XDP
program rather than the xsk_pool, ensuring the async ICOSQ is available
when XSK wakeups are enabled.

This fixes multiple issues:

1. Illegal synchronize_rcu() in an RCU read- side critical section via
mlx5e_xsk_wakeup() -> mlx5e_trigger_napi_icosq() ->
synchronize_net(). The stack holds RCU read-lock in xsk_poll().

2. Hitting a NULL pointer dereference in mlx5e_xsk_wakeup():

[] BUG: kernel NULL pointer dereference, address: 0000000000000240
[] #PF: supervisor read access in kernel mode
[] #PF: error_code(0x0000) - not-present page
[] PGD 0 P4D 0
[] Oops: Oops: 0000 [#1] SMP
[] CPU: 0 UID: 0 PID: 2255 Comm: qemu-system-x86 Not tainted 6.19.0-rc5+ #229 PREEMPT(none)
[] Hardware name: [...]
[] RIP: 0010:mlx5e_xsk_wakeup+0x53/0x90 [mlx5_core]

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/all/20260123223916.361295-1-daniel@iogearbox.net/
Fixes: 56aca3e0f730 ("net/mlx5e: Use regular ICOSQ for triggering NAPI")
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Alice Mikityanska <alice.kernel@fastmail.im>
Link: https://patch.msgid.link/20260217074525.1761454-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'icmp-better-deal-with-ddos'

Eric Dumazet says:

====================
icmp: better deal with DDOS

When dealing with death of big UDP servers, admins might want to
increase net.ipv4.icmp_msgs_per_sec and net.ipv4.icmp_msgs_burst
to big values (2,000,000 or more).

They also might need to tune the per-host ratelimit to 1ms or 0ms
in favor of the global rate limit.

This series fixes bugs showing up in all these needs.
====================

Link: https://patch.msgid.link/20260216142832.3834174-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero

If net.ipv6.icmp.ratelimit is zero we do not have to call
inet_getpeer_v6() and inet_peer_xrlim_allow().

Both can be very expensive under DDOS.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero

If net.ipv4.icmp_ratelimit is zero, we do not have to call
inet_getpeer_v4() and inet_peer_xrlim_allow().

Both can be very expensive under DDOS.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()

Following part was needed before the blamed commit, because
inet_getpeer_v6() second argument was the prefix.

/* Give more bandwidth to wider prefixes. */
if (rt->rt6i_dst.plen < 128)
tmo >>= ((128 - rt->rt6i_dst.plen)>>5);

Now inet_getpeer_v6() retrieves hosts, we need to remove
@tmo adjustement or wider prefixes likes /24 allow 8x
more ICMP to be sent for a given ratelimit.

As we had this issue for a while, this patch changes net.ipv6.icmp.ratelimit
default value from 1000ms to 100ms to avoid potential regressions.

Also add a READ_ONCE() when reading net->ipv6.sysctl.icmpv6_time.

Fixes: fd0273d7939f ("ipv6: Remove external dependency on rt6i_dst and rt6i_src")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260216142832.3834174-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

inet: move icmp_global_{credit,stamp} to a separate cache line

icmp_global_credit was meant to be changed ~1000 times per second,
but if an admin sets net.ipv4.icmp_msgs_per_sec to a very high value,
icmp_global_credit changes can inflict false sharing to surrounding
fields that are read mostly.

Move icmp_global_credit and icmp_global_stamp to a separate
cacheline aligned group.

Fixes: b056b4cd9178 ("icmp: move icmp_global.credit and icmp_global.stamp to per netns storage")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

icmp: prevent possible overflow in icmp_global_allow()

Following expression can overflow
if sysctl_icmp_msgs_per_sec is big enough.

sysctl_icmp_msgs_per_sec * delta / HZ;

Fixes: 4cdf507d5452 ("icmp: add a global rate limitation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260216142832.3834174-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: packetdrill: add ipv4-mapped-ipv6 tests

Add ipv4-mapped-ipv6 case to ksft_runner.sh before
an upcoming TCP fix in this area.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260217142924.1853498-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/bpf: Remove hexdump dependency

The verification signature header generation requires converting a
binary certificate to a C array. Previously this only worked with xxd,
and a switch to hexdump has been done in commit b640d556a2b3
("selftests/bpf: Remove xxd util dependency").

hexdump is a more common utility program, yet it might not be installed
by default. When it is not installed, BPF selftests build without
errors, but tests_progs is unusable: it exits with the 255 code and
without any error messages. When manually reproducing the issue, it is
not too hard to find out that the generated verification_cert.h file is
incorrect, but that's time consuming. When digging the BPF selftests
build logs, this line can be seen amongst thousands others, but ignored:

/bin/sh: 2: hexdump: not found

Here, od is used instead of hexdump. od is coming from the coreutils
package, and this new od command produces the same output when using od
from GNU coreutils, uutils, and even busybox. This is more portable, and
it produces a similar results to what was done before with hexdump:
there is an extra comma at the end instead of trailing whitespaces,
but the C code is not impacted.

Fixes: b640d556a2b3 ("selftests/bpf: Remove xxd util dependency")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20260218-bpf-sft-hexdump-od-v2-1-2f9b3ee5ab86@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'kbuild-fixes-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild fixes from Nathan Chancellor:

- Ensure tools/objtool is cleaned by 'make clean' and 'make mrproper'

- Fix test program for CONFIG_CC_CAN_LINK to avoid a warning, which is
   made fatal by -Werror

- Drop explicit LZMA parallel compression in scripts/make_fit.py

- Several fixes for commit 62089b804895 ("kbuild: rpm-pkg: Generate
   debuginfo package manually")

* tag 'kbuild-fixes-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: rpm-pkg: Disable automatic requires for manual debuginfo package
  kbuild: rpm-pkg: Fix manual debuginfo generation when using .src.rpm
  kernel: rpm-pkg: Restore find-debuginfo.sh approach to -debuginfo package
  kbuild: rpm-pkg: Restrict manual debug package creation
  scripts/make_fit.py: Drop explicit LZMA parallel compression
  kbuild: Fix CC_CAN_LINK detection
  kbuild: Add objtool to top-level clean target

Merge branch 'libbpf-remove-extern-declaration-of-bpf_stream_vprintk'

Ihor Solodrai says:

====================
libbpf: Remove extern declaration of bpf_stream_vprintk()

The first patch adjusts a selftest that has been using
bpf_stream_printk() macro. The second patch removes the declaration.
====================

Link: https://patch.msgid.link/20260218215651.2057673-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Remove extern declaration of bpf_stream_vprintk()

An issue was reported that building BPF program which includes both
vmlinux.h and bpf_helpers.h from libbpf fails due to conflicting
declarations of bpf_stream_vprintk().

Remove the extern declaration from bpf_helpers.h to address this.

In order to use bpf_stream_printk() macro, BPF programs are expected
to either include vmlinux.h of the kernel they are targeting, or add
their own extern declaration.

Reported-by: Luca Boccassi <luca.boccassi@gmail.com>
Closes: https://github.com/libbpf/libbpf/issues/947
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260218215651.2057673-3-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Use vmlinux.h in test_xdp_meta

- Replace linux/* includes with vmlinux.h
- Include errno.h
- Include bpf_tracing_net.h for TC_ACT_* and ETH_*
- Use BPF_STDERR instead of BPF_STREAM_STDERR

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260218215651.2057673-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'thermal-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fix from Rafael Wysocki:
"This fixes a sysfs group leak on DLVR registration failure in the
Intel int340x thermal driver (Kaushlendra Kumar)"

* tag 'thermal-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: int340x: Fix sysfs group leak on DLVR registration failure

Merge tag 'acpi-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more ACPI support updates from Rafael J. Wysocki:
"These are mostly fixes and cleanups on top of the ACPI support updates
  merged recently, including two new quirks, an ACPI CPPC library fix,
  and fixes and cleanups of a few core ACPI device drivers:

   - Add an unused power resource handling quirk for THUNDEROBOT ZERO
     (Zhai Can)

   - Fix remaining for_each_possible_cpu() in the ACPI CPPC library to
     use online CPUs (Sean V Kelley)

   - Drop redundant checks from the ACPI notify handler and the driver
     remove callback in the ACPI battery driver (Rafael Wysocki)

   - Move the creation of the wakeup source during the ACPI button
     driver probe to an earlier point to avoid missing a wakeup event
     due to a race and clean up system wakeup handling and remove
     callback in that driver (Rafael Wysocki)

   - Drop unnecessary driver_data pointer clearing from the ACPI EC and
     SMBUS HC drivers and make the ACPI backlight (video) driver clear
     the device's driver_data pointer on remove (Rafael Wysocki)

   - Force enabling of PWM2 on the Yogabook YB1-X90 tablets (Yauhen
     Kharuzhy)"

* tag 'acpi-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: PM: Add unused power resource quirk for THUNDEROBOT ZERO
  ACPI: driver: Drop driver_data pointer clearing from two drivers
  ACPI: video: Clear driver_data pointer on remove
  ACPI: button: Tweak acpi_button_remove()
  ACPI: button: Tweak system wakeup handling
  ACPI: battery: Drop redundant checks from acpi_battery_remove()
  ACPI: CPPC: Fix remaining for_each_possible_cpu() to use online CPUs
  ACPI: x86: Force enabling of PWM2 on the Yogabook YB1-X90
  ACPI: button: Call device_init_wakeup() earlier during probe
  ACPI: battery: Drop redundant check from acpi_battery_notify()

Merge tag 'pm-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
"These are mostly fixes on top of the power management updates merged
  recently in cpuidle governors, in the Intel RAPL power capping driver
  and in the wake IRQ management code:

   - Fix the handling of package-scope MSRs in the intel_rapl power
     capping driver when called from the PMU subsystem and make it add
     all package CPUs to the PMU cpumask to allow tools to read RAPL
     events from any CPU in the package (Kuppuswamy Satharayananyan)

   - Rework the invalid version check in the intel_rapl_tpmi power
     capping driver to account for the fact that on partitioned systems,
     multiple TPMI instances may exist per package, but RAPL registers
     are only valid on one instance (Kuppuswamy Satharayananyan)

   - Describe the new intel_idle.table command line option in the
     admin-guide intel_idle documentation (Artem Bityutskiy)

   - Fix a crash in the ladder cpuidle governor on systems with only one
     (polling) idle state available by making the cpuidle core bypass
     the governor in those cases and adjust the other existing governors
     to that change (Aboorva Devarajan, Christian Loehle)

   - Update kerneldoc comments for wake IRQ management functions that
     have not been matching the code (Wang Jiayue)"

* tag 'pm-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: menu: Remove single state handling
  cpuidle: teo: Remove single state handling
  cpuidle: haltpoll: Remove single state handling
  cpuidle: Skip governor when only one idle state is available
  powercap: intel_rapl_tpmi: Remove FW_BUG from invalid version check
  PM: sleep: wakeirq: Update outdated documentation comments
  Documentation: PM: Document intel_idle.table command line option
  powercap: intel_rapl: Expose all package CPUs in PMU cpumask
  powercap: intel_rapl: Remove incorrect CPU check in PMU context

Merge branches 'acpi-battery', 'acpi-button' and 'acpi-driver'

Merge additional updates of multiple core ACPI device drivers (battery,
button, video, EC, SMBUS HC) for 7.0-rc1:

- Drop redundant checks from the ACPI notify handler and the driver
   remove callback in the ACPI battery driver (Rafael Wysocki)

- Move the creation of the wakeup source during the ACPI button driver
   probe to an earlier point to avoid missing a wakeup event due to a
   race and clean up system wakeup handling and remove callback in that
   driver (Rafael Wysocki)

- Drop unnecessary driver_data pointer clearing from the ACPI EC and
   SMBUS HC drivers and make the ACPI backlight (video) driver clear the
   device's driver_data pointer on remove (Rafael Wysocki)

* acpi-battery:
  ACPI: battery: Drop redundant checks from acpi_battery_remove()
  ACPI: battery: Drop redundant check from acpi_battery_notify()

* acpi-button:
  ACPI: button: Tweak acpi_button_remove()
  ACPI: button: Tweak system wakeup handling
  ACPI: button: Call device_init_wakeup() earlier during probe

* acpi-driver:
  ACPI: driver: Drop driver_data pointer clearing from two drivers
  ACPI: video: Clear driver_data pointer on remove

Merge branches 'acpi-pm' and 'acpi-cppc'

Merge an ACPI power management update and an ACPI CPPC library update
for 7.0-rc1:

- Add an unused power resource handling quirk for THUNDEROBOT ZERO (Zhai
   Can)

- Fix remaining for_each_possible_cpu() in the ACPI CPPC library to use
   online CPUs (Sean V Kelley)

* acpi-pm:
  ACPI: PM: Add unused power resource quirk for THUNDEROBOT ZERO

* acpi-cppc:
  ACPI: CPPC: Fix remaining for_each_possible_cpu() to use online CPUs

Merge branches 'pm-powercap' and 'pm-cpuidle'

Merge additional power capping and cpuidle updates for 7.0-rc1:

- Fix the handling of package-scope MSRs in the intel_rapl power
   capping driver when called from the PMU subsystem and make it add all
   package CPUs to the PMU cpumask to allow tools to read RAPL events
   from any CPU in the package (Kuppuswamy Sathyanarayanan)

- Rework the invalid version check in the intel_rapl_tpmi power capping
   driver to account for the fact that on partitioned systems, multiple
   TPMI instances may exist per package, but RAPL registers are only
   valid on one instance (Kuppuswamy Satharayananyan)

- Describe the new intel_idle.table command line option in the
   admin-guide intel_idle documentation (Artem Bityutskiy)

- Fix a crash in the ladder cpuidle governor on systems with only one
   (polling) idle state available by making the cpuidle core bypass the
   governor in those cases and adjust the other existing governors to
   that change (Aboorva Devarajan, Christian Loehle)

* pm-powercap:
  powercap: intel_rapl_tpmi: Remove FW_BUG from invalid version check
  powercap: intel_rapl: Expose all package CPUs in PMU cpumask
  powercap: intel_rapl: Remove incorrect CPU check in PMU context

* pm-cpuidle:
  cpuidle: menu: Remove single state handling
  cpuidle: teo: Remove single state handling
  cpuidle: haltpoll: Remove single state handling
  cpuidle: Skip governor when only one idle state is available
  Documentation: PM: Document intel_idle.table command line option

Merge tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

Pull sysctl updates from Joel Granados:

- Remove macros from proc handler converters

   Replace the proc converter macros with "regular" functions. Though it
   is more verbose than the macro version, it helps when debugging and
   better aligns with coding-style.rst.

- General cleanup

   Remove superfluous ctl_table forward declarations. Const qualify the
   memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays.
   Add missing kernel doc to proc_dointvec_conv.

- Testing

   This series was run through sysctl selftests/kunit test suite in
   x86_64. And went into linux-next after rc4, giving it a good 3 weeks
   of testing

* tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions
  sysctl: Replace unidirectional INT converter macros with functions
  sysctl: Add kernel doc to proc_douintvec_conv
  sysctl: Replace UINT converter macros with functions
  sysctl: Add CONFIG_PROC_SYSCTL guards for converter macros
  sysctl: clarify proc_douintvec_minmax doc
  sysctl: Return -ENOSYS from proc_douintvec_conv when CONFIG_PROC_SYSCTL=n
  sysctl: Remove unused ctl_table forward declarations
  loadpin: Implement custom proc_handler for enforce
  alloc_tag: move memory_allocation_profiling_sysctls into .rodata
  sysctl: Add missing kernel-doc for proc_dointvec_conv

Merge tag 'turbostat-2026.02.14-AMD-RAPL-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat fix from Len Brown:
"Fix a recent AMD regression due to errant code cleanup"

* tag 'turbostat-2026.02.14-AMD-RAPL-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: Fix AMD RAPL regression

eth: fbnic: Add validation for MTU changes

Increasing the MTU beyond the HDS threshold causes the hardware to
fragment packets across multiple buffers. If a single-buffer XDP program
is attached, the driver will drop all multi-frag frames. While we can't
prevent a remote sender from sending non-TCP packets larger than the MTU,
this will prevent users from inadvertently breaking new TCP streams.

Traditionally, drivers supported XDP with MTU less than 4Kb
(packet per page). Fbnic currently prevents attaching XDP when MTU is too high.
But it does not prevent increasing MTU after XDP is attached.

Fixes: 1b0a3950dbd4 ("eth: fbnic: Add XDP pass, drop, abort support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools/power turbostat: Fix AMD RAPL regression

turbostat.c:8688: rapl_perf_init: Assertion `next_domain < num_domains' failed.

Two recent cleanup patches that were not supposed to change anything
broke the core_id code needed for AMD RAPL initialization:

commit 070e92361eec ("tools/power turbostat: Enhance HT enumeration")
commit ddf60e38ca04 ("tools/power turbostat: Simplify global core_id calculation")

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

net/sched: act_skbedit: fix divide-by-zero in tcf_skbedit_hash()

Commit 38a6f0865796 ("net: sched: support hash selecting tx queue")
added SKBEDIT_F_TXQ_SKBHASH support. The inclusive range size is
computed as:

mapping_mod = queue_mapping_max - queue_mapping + 1;

The range size can be 65536 when the requested range covers all possible
u16 queue IDs (e.g. queue_mapping=0 and queue_mapping_max=U16_MAX).
That value cannot be represented in a u16 and previously wrapped to 0,
so tcf_skbedit_hash() could trigger a divide-by-zero:

queue_mapping += skb_get_hash(skb) % params->mapping_mod;

Compute mapping_mod in a wider type and reject ranges larger than U16_MAX
to prevent params->mapping_mod from becoming 0 and avoid the crash.

Fixes: 38a6f0865796 ("net: sched: support hash selecting tx queue")
Cc: stable@vger.kernel.org # 6.12+
Signed-off-by: Ruitong Liu <cnitlrt@gmail.com>
Link: https://patch.msgid.link/20260213175948.1505257-1-cnitlrt@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vsock: document namespace mode sysctls

Add documentation for the vsock per-namespace sysctls (`ns_mode` and
`child_ns_mode`) to Documentation/admin-guide/sysctl/net.rst.
These sysctls were introduced by commit eafb64f40ca4 ("vsock: add
netns to vsock core").

Document the two namespace modes (`global` and `local`), the
inheritance behavior of `child_ns_mode`, and the restriction preventing
local namespaces from setting `child_ns_mode` to `global`.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260216163147.236844-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ec_bhf: Fix dma_free_coherent() dma handle

dma_free_coherent() in error path takes priv->rx_buf.alloc_len as
the dma handle. This would lead to improper unmapping of the buffer.

Change the dma handle to priv->rx_buf.alloc_phys.

Fixes: 6af55ff52b02 ("Driver for Beckhoff CX5020 EtherCAT master module.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://patch.msgid.link/20260213164340.77272-2-fourier.thomas@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macvlan: observe an RCU grace period in macvlan_common_newlink() error path

valis reported that a race condition still happens after my prior patch.

macvlan_common_newlink() might have made @dev visible before
detecting an error, and its caller will directly call free_netdev(dev).

We must respect an RCU period, either in macvlan or the core networking
stack.

After adding a temporary mdelay(1000) in macvlan_forward_source_one()
to open the race window, valis repro was:

ip link add p1 type veth peer p2
ip link set address 00:00:00:00:00:20 dev p1
ip link set up dev p1
ip link set up dev p2
ip link add mv0 link p2 type macvlan mode source

(ip link add invalid% link p2 type macvlan mode source macaddr add
00:00:00:00:00:20 &) ; sleep 0.5 ; ping -c1 -I p1 1.2.3.4
PING 1.2.3.4 (1.2.3.4): 56 data bytes
RTNETLINK answers: Invalid argument

BUG: KASAN: slab-use-after-free in macvlan_forward_source
(drivers/net/macvlan.c:408 drivers/net/macvlan.c:444)
Read of size 8 at addr ffff888016bb89c0 by task e/175

CPU: 1 UID: 1000 PID: 175 Comm: e Not tainted 6.19.0-rc8+ #33 NONE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:123)
print_report (mm/kasan/report.c:379 mm/kasan/report.c:482)
? macvlan_forward_source (drivers/net/macvlan.c:408 drivers/net/macvlan.c:444)
kasan_report (mm/kasan/report.c:597)
? macvlan_forward_source (drivers/net/macvlan.c:408 drivers/net/macvlan.c:444)
macvlan_forward_source (drivers/net/macvlan.c:408 drivers/net/macvlan.c:444)
? tasklet_init (kernel/softirq.c:983)
macvlan_handle_frame (drivers/net/macvlan.c:501)

Allocated by task 169:
kasan_save_stack (mm/kasan/common.c:58)
kasan_save_track (./arch/x86/include/asm/current.h:25
mm/kasan/common.c:70 mm/kasan/common.c:79)
__kasan_kmalloc (mm/kasan/common.c:419)
__kvmalloc_node_noprof (./include/linux/kasan.h:263 mm/slub.c:5657
mm/slub.c:7140)
alloc_netdev_mqs (net/core/dev.c:12012)
rtnl_create_link (net/core/rtnetlink.c:3648)
rtnl_newlink (net/core/rtnetlink.c:3830 net/core/rtnetlink.c:3957
net/core/rtnetlink.c:4072)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6958)
netlink_rcv_skb (net/netlink/af_netlink.c:2550)
netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344)
netlink_sendmsg (net/netlink/af_netlink.c:1894)
__sys_sendto (net/socket.c:727 net/socket.c:742 net/socket.c:2206)
__x64_sys_sendto (net/socket.c:2209)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)

Freed by task 169:
kasan_save_stack (mm/kasan/common.c:58)
kasan_save_track (./arch/x86/include/asm/current.h:25
mm/kasan/common.c:70 mm/kasan/common.c:79)
kasan_save_free_info (mm/kasan/generic.c:587)
__kasan_slab_free (mm/kasan/common.c:287)
kfree (mm/slub.c:6674 mm/slub.c:6882)
rtnl_newlink (net/core/rtnetlink.c:3845 net/core/rtnetlink.c:3957
net/core/rtnetlink.c:4072)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6958)
netlink_rcv_skb (net/netlink/af_netlink.c:2550)
netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344)
netlink_sendmsg (net/netlink/af_netlink.c:1894)
__sys_sendto (net/socket.c:727 net/socket.c:742 net/socket.c:2206)
__x64_sys_sendto (net/socket.c:2209)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)

Fixes: f8db6475a836 ("macvlan: fix error recovery in macvlan_common_newlink()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: valis <sec@valis.email>
Link: https://patch.msgid.link/20260213142557.3059043-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: tc_actions: don't dump 2MB of \0 to stdout

Since we started running selftests in NIPA we have been seeing
tc_actions.sh generate a soft lockup warning on ~20% of the runs.
On the pre-netdev foundation setup it was actually a missed irq
splat from the console. Now it's either that or a lockup.

I initially suspected a socket locking issue since the test
is exercising local loopback with act_mirred.
After hours of staring at this I noticed in strace that ncat
when -o $file is specified _both_ saves the output to the file
and still prints it to stdout. Because the file being sent
is constructed with:

dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$mirred
^^^^^^^^^

the data printed is all \0. Most terminals don't display nul
characters (and neither does vng output capture save them).
But QEMU's serial console still has to poke them thru which
is very slow and causes the lockup (if the file is >600kB).

Replace the '-o $file' with '> $file'. This speeds the test up
from 2m20s to 18s on debug kernels, and prevents the warnings.

Fixes: ca22da2fbd69 ("act_mirred: use the backlog for nested calls to mirred ingress")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260214035159.2119699-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: addrconf: reduce default temp_valid_lft to 2 days

This is a recommendation from RFC 8981 and it was intended to be changed
by commit 969c54646af0 ("ipv6: Implement draft-ietf-6man-rfc4941bis")
but it only changed the sysctl documentation.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260214172543.5783-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>