git.ipfire.org Git - thirdparty/kernel/linux.git/log

]> git.ipfire.org Git - thirdparty/kernel/linux.git/log

projects / thirdparty / kernel / linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Sanman Pradhan [Mon, 14 Oct 2024 15:27:09 +0000 (08:27 -0700)]

eth: fbnic: Add hardware monitoring support via HWMON interface

This patch adds support for hardware monitoring to the fbnic driver,
allowing for temperature and voltage sensor data to be exposed to
userspace via the HWMON interface. The driver registers a HWMON device
and provides callbacks for reading sensor data, enabling system
admins to monitor the health and operating conditions of fbnic.

Signed-off-by: Sanman Pradhan <sanmanpradhan@meta.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241014152709.2123811-1-sanman.p211993@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Paolo Abeni [Thu, 17 Oct 2024 08:22:03 +0000 (10:22 +0200)]

Merge branch 'ethtool-rss-track-rss-ctx-busy-from-core'

Daniel Zahka says:

====================
ethtool: rss: track rss ctx busy from core

This series prevents deletion of rss contexts that are
in use by ntuple filters from ethtool core.
====================

Link: https://patch.msgid.link/20241011183549.1581021-1-daniel.zahka@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Daniel Zahka [Fri, 11 Oct 2024 18:35:48 +0000 (11:35 -0700)]

selftests: drv-net: rss_ctx: add rss ctx busy testcase

It should be invalid to delete an rss context while it is being
referenced from an ntuple filter. ethtool core should prevent this
from happening. This patch adds a testcase to verify this behavior.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Daniel Zahka [Fri, 11 Oct 2024 18:35:47 +0000 (11:35 -0700)]

ethtool: rss: prevent rss ctx deletion when in use

ntuple filters can specify an rss context to use for packet hashing
and queue selection. When a filter is referencing an rss context, it
should be invalid for that context to be deleted. A list of active
ntuple filters and their associated rss contexts can be compiled by
querying a device's ethtool_ops.get_rxnfc. This patch checks to see if
any ntuple filters are referencing an rss context during context
deletion, and prevents the deletion if the requested context is still
in use.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Daniel Golle [Thu, 10 Oct 2024 13:07:39 +0000 (14:07 +0100)]

net: phy: realtek: clear 1000Base-T link partner advertisement

Clear 1000Base-T link partner advertisement bits in Clause-45
read_status() function in case auto-negotiation is disabled or has not
been completed.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/9dc9b47b2d675708afef3ad366bfd78eb584d958.1728565530.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Daniel Golle [Thu, 10 Oct 2024 13:07:26 +0000 (14:07 +0100)]

net: phy: realtek: change order of calls in C22 read_status()

Always call rtlgen_read_status() first, so genphy_read_status() which
is called by it clears bits in case auto-negotiation has not completed.
Also clear 10GBT link-partner advertisement bits in case auto-negotiation
is disabled or has not completed.

Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/b15929a41621d215c6b2b57393368086589569ec.1728565530.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Daniel Golle [Thu, 10 Oct 2024 13:07:16 +0000 (14:07 +0100)]

net: phy: realtek: read duplex and gbit master from PHYSR register

The PHYSR MMD register is present and defined equally for all RTL82xx
Ethernet PHYs.
Read duplex and Gbit master bits from rtlgen_decode_speed() and rename
it to rtlgen_decode_physr().

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/b9a76341da851a18c985bc4774fa295babec79bb.1728565530.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Jakub Kicinski [Wed, 16 Oct 2024 01:52:28 +0000 (18:52 -0700)]

Merge branch 'rtnetlink-use-rtnl_register_many'

Kuniyuki Iwashima says:

====================
rtnetlink: Use rtnl_register_many().

This series converts all rtnl_register() and rtnl_register_module()
to rtnl_register_many() and finally removes them.

Once this series is applied, I'll start converting doit() to per-netns
RTNL.

v1: https://lore.kernel.org/20241011220550.46040-1-kuniyu@amazon.com/
====================

Link: https://patch.msgid.link/20241014201828.91221-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:28 +0000 (13:18 -0700)]

rtnetlink: Remove rtnl_register() and rtnl_register_module().

No one uses rtnl_register() and rtnl_register_module().

Let's remove them.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-12-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:27 +0000 (13:18 -0700)]

can: gw: Use rtnl_register_many().

We will remove rtnl_register_module() in favour of rtnl_register_many().

rtnl_register_many() will unwind the previous successful registrations
on failure and simplify module error handling.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-11-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:26 +0000 (13:18 -0700)]

dcb: Use rtnl_register_many().

We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-10-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:25 +0000 (13:18 -0700)]

ipmr: Use rtnl_register_many().

We will remove rtnl_register() and rtnl_register_module() in favour
of rtnl_register_many().

When it succeeds for built-in callers, rtnl_register_many() guarantees
all rtnetlink types in the passed array are supported, and there is no
chance that a part of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-9-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:24 +0000 (13:18 -0700)]

ipv6: Use rtnl_register_many().

We will remove rtnl_register_module() in favour of rtnl_register_many().

rtnl_register_many() will unwind the previous successful registrations
on failure and simplify module error handling.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:23 +0000 (13:18 -0700)]

ipv4: Use rtnl_register_many().

We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:22 +0000 (13:18 -0700)]

net: Use rtnl_register_many().

We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:21 +0000 (13:18 -0700)]

net: sched: Use rtnl_register_many().

We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20241014201828.91221-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:20 +0000 (13:18 -0700)]

neighbour: Use rtnl_register_many().

We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:19 +0000 (13:18 -0700)]

rtnetlink: Use rtnl_register_many().

We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241014201828.91221-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:18 +0000 (13:18 -0700)]

rtnetlink: Panic when __rtnl_register_many() fails for builtin callers.

We will replace all rtnl_register() and rtnl_register_module() with
rtnl_register_many().

Currently, rtnl_register() returns nothing and prints an error message
when it fails to register a rtnetlink message type and handlers.

The failure happens only when rtnl_register_internal() fails to allocate
rtnl_msg_handlers[protocol][msgtype], but it's unlikely for built-in
callers on boot time.

rtnl_register_many() unwinds the previous successful registrations on
failure and returns an error, but it will be useless for built-in callers,
especially some subsystems that do not have the legacy ioctl() interface
and do not work without rtnetlink.

Instead of booting up without rtnetlink functionality, let's panic on
failure for built-in rtnl_register_many() callers.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Wed, 16 Oct 2024 01:50:14 +0000 (18:50 -0700)]

Merge branch 'gve-adopt-page-pool'

Harshitha Ramamurthy says:

====================
gve: adopt page pool

This patchset implements page pool support for gve.
The first patch deals with movement of code to make
page pool adoption easier in the next patch. The
second patch adopts the page pool API. The third patch
adds basic per queue stats which includes page pool
allocation failures as well.
====================

Link: https://patch.msgid.link/20241014202108.1051963-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Harshitha Ramamurthy [Mon, 14 Oct 2024 20:21:08 +0000 (13:21 -0700)]

gve: add support for basic queue stats

Implement netdev_stats_ops to export basic per-queue stats.

With page pool support for DQO added in the previous patches,
rx-alloc-fail captures failures in page pool allocations as
well since the rx_buf_alloc_fail stat tracked in the driver
is incremented when gve_alloc_buffer returns error.

Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241014202108.1051963-4-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Harshitha Ramamurthy [Mon, 14 Oct 2024 20:21:07 +0000 (13:21 -0700)]

gve: adopt page pool for DQ RDA mode

For DQ queue format in raw DMA addressing(RDA) mode,
implement page pool recycling of buffers by leveraging
a few helper functions.

DQ QPL mode will continue to use the exisiting recycling
logic. This is because in QPL mode, the pages come from a
constant set of pages that the driver pre-allocates and
registers with the device.

Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241014202108.1051963-3-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Harshitha Ramamurthy [Mon, 14 Oct 2024 20:21:06 +0000 (13:21 -0700)]

gve: move DQO rx buffer management related code to a new file

In preparation for the upcoming page pool adoption for DQO
raw addressing mode, move RX buffer management code to a new
file. In the follow on patches, page pool code will be added
to this file.

No functional change, just movement of code.

Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241014202108.1051963-2-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Wed, 16 Oct 2024 01:43:11 +0000 (18:43 -0700)]

Merge branch 'do-not-leave-dangling-sk-pointers-in-pf-create-functions'

Ignat Korchagin says:

====================
do not leave dangling sk pointers in pf->create functions

Some protocol family create() implementations have an error path after
allocating the sk object and calling sock_init_data(). sock_init_data()
attaches the allocated sk object to the sock object, provided by the
caller.

If the create() implementation errors out after calling sock_init_data(),
it releases the allocated sk object, but the caller ends up having a
dangling sk pointer in its sock object on return. Subsequent manipulations
on this sock object may try to access the sk pointer, because it is not
NULL thus creating a use-after-free scenario.

We have implemented a stable hotfix in commit 631083143315
("net: explicitly clear the sk pointer, when pf->create fails"), but this
series aims to fix it properly by going through each of the pf->create()
implementations and making sure they all don't return a sock object with
a dangling pointer on error.
====================

Link: https://patch.msgid.link/20241014153808.51894-1-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ignat Korchagin [Mon, 14 Oct 2024 15:38:08 +0000 (16:38 +0100)]

Revert "net: do not leave a dangling sk pointer, when socket creation fails"

This reverts commit 6cd4a78d962bebbaf8beb7d2ead3f34120e3f7b2.

inet/inet6->create() implementations have been fixed to explicitly NULL the
allocated sk object on error.

A warning was put in place to make sure any future changes will not leave
a dangling pointer in pf->create() implementations.

So this code is now redundant.

Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-10-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ignat Korchagin [Mon, 14 Oct 2024 15:38:07 +0000 (16:38 +0100)]

net: warn, if pf->create does not clear sock->sk on error

All pf->create implementations have been fixed now to clear sock->sk on
error, when they deallocate the allocated sk object.

Put a warning in place to make sure we don't break this promise in the
future.

Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-9-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ignat Korchagin [Mon, 14 Oct 2024 15:38:06 +0000 (16:38 +0100)]

net: inet6: do not leave a dangling sk pointer in inet6_create()

sock_init_data() attaches the allocated sk pointer to the provided sock
object. If inet6_create() fails later, the sk object is released, but the
sock object retains the dangling sk pointer, which may cause use-after-free
later.

Clear the sock sk pointer on error.

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-8-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ignat Korchagin [Mon, 14 Oct 2024 15:38:05 +0000 (16:38 +0100)]

net: inet: do not leave a dangling sk pointer in inet_create()

sock_init_data() attaches the allocated sk object to the provided sock
object. If inet_create() fails later, the sk object is freed, but the
sock object retains the dangling pointer, which may create use-after-free
later.

Clear the sk pointer in the sock object on error.

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-7-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ignat Korchagin [Mon, 14 Oct 2024 15:38:04 +0000 (16:38 +0100)]

net: ieee802154: do not leave a dangling sk pointer in ieee802154_create()

sock_init_data() attaches the allocated sk object to the provided sock
object. If ieee802154_create() fails later, the allocated sk object is
freed, but the dangling pointer remains in the provided sock object, which
may allow use-after-free.

Clear the sk pointer in the sock object on error.

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-6-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ignat Korchagin [Mon, 14 Oct 2024 15:38:03 +0000 (16:38 +0100)]

net: af_can: do not leave a dangling sk pointer in can_create()

On error can_create() frees the allocated sk object, but sock_init_data()
has already attached it to the provided sock object. This will leave a
dangling sk pointer in the sock object and may cause use-after-free later.

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://patch.msgid.link/20241014153808.51894-5-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ignat Korchagin [Mon, 14 Oct 2024 15:38:02 +0000 (16:38 +0100)]

Bluetooth: RFCOMM: avoid leaving dangling sk pointer in rfcomm_sock_alloc()

bt_sock_alloc() attaches allocated sk object to the provided sock object.
If rfcomm_dlc_alloc() fails, we release the sk object, but leave the
dangling pointer in the sock object, which may cause use-after-free.

Fix this by swapping calls to bt_sock_alloc() and rfcomm_dlc_alloc().

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-4-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ignat Korchagin [Mon, 14 Oct 2024 15:38:01 +0000 (16:38 +0100)]

Bluetooth: L2CAP: do not leave dangling sk pointer on error in l2cap_sock_create()

bt_sock_alloc() allocates the sk object and attaches it to the provided
sock object. On error l2cap_sock_alloc() frees the sk object, but the
dangling pointer is still attached to the sock object, which may create
use-after-free in other code.

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-3-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ignat Korchagin [Mon, 14 Oct 2024 15:38:00 +0000 (16:38 +0100)]

af_packet: avoid erroring out after sock_init_data() in packet_create()

After sock_init_data() the allocated sk object is attached to the provided
sock object. On error, packet_create() frees the sk object leaving the
dangling pointer in the sock object on return. Some other code may try
to use this pointer and cause use-after-free.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-2-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Elena Salomatkina [Sun, 13 Oct 2024 12:45:29 +0000 (15:45 +0300)]

net/sched: cbs: Fix integer overflow in cbs_set_port_rate()

The subsequent calculation of port_rate = speed * 1000 * BYTES_PER_KBIT,
where the BYTES_PER_KBIT is of type LL, may cause an overflow.
At least when speed = SPEED_20000, the expression to the left of port_rate
will be greater than INT_MAX.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Elena Salomatkina <esalomatkina@ispras.ru>
Link: https://patch.msgid.link/20241013124529.1043-1-esalomatkina@ispras.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Kuniyuki Iwashima [Mon, 14 Oct 2024 23:52:16 +0000 (16:52 -0700)]

neighbour: Remove NEIGH_DN_TABLE.

Since commit 1202cdd66531 ("Remove DECnet support from kernel"),
NEIGH_DN_TABLE is no longer used.

MPLS has implicit dependency on it in nla_put_via(), but nla_get_via()
does not support DECnet.

Let's remove NEIGH_DN_TABLE.

Now, neigh_tables[] has only 2 elements and no extra iteration
for DECnet in many places.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241014235216.10785-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Dr. David Alan Gilbert [Sun, 13 Oct 2024 01:29:46 +0000 (02:29 +0100)]

net: cxgb3: Remove stid deadcode

cxgb3_alloc_stid() and cxgb3_free_stid() have been unused since
commit 30e0f6cf5acb ("RDMA/iw_cxgb3: Remove the iw_cxgb3 module
from kernel")

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20241013012946.284721-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Oct 2024 23:47:45 +0000 (16:47 -0700)]

Merge branch 'cxgb4-deadcode-removal'

Dr. David Alan Gilbert says:

====================
cxgb4: Deadcode removal

This is a bunch of deadcode removal in cxgb4.

It's all complete function removal rather than any actual change to
logic.

Build and boot tested, but I don't have the hardware to test
the actual card.
====================

Link: https://patch.msgid.link/20241013203831.88051-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Dr. David Alan Gilbert [Sun, 13 Oct 2024 20:38:31 +0000 (21:38 +0100)]

cxgb4: Remove unused t4_free_ofld_rxqs

t4_free_ofld_rxqs() has been unused since
commit 0fbc81b3ad51 ("chcr/cxgb4i/cxgbit/RDMA/cxgb4: Allocate resources
dynamically for all cxgb4 ULD's")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241013203831.88051-7-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Dr. David Alan Gilbert [Sun, 13 Oct 2024 20:38:30 +0000 (21:38 +0100)]

cxgb4: Remove unused cxgb4_l2t_alloc_switching

cxgb4_l2t_alloc_switching() has been unused since it was added in
commit f7502659cec8 ("cxgb4: Add API to alloc l2t entry; also update
existing ones")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241013203831.88051-6-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Dr. David Alan Gilbert [Sun, 13 Oct 2024 20:38:29 +0000 (21:38 +0100)]

cxgb4: Remove unused cxgb4_scsi_init

cxgb4_iscsi_init() has been unused since 2016's commit
5999299f1ce9 ("cxgb3i,cxgb4i,libcxgbi: remove iSCSI DDP support")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241013203831.88051-5-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Dr. David Alan Gilbert [Sun, 13 Oct 2024 20:38:28 +0000 (21:38 +0100)]

cxgb4: Remove unused cxgb4_get_srq_entry

cxgb4_get_srq_entry() has been unused since 2018's commit
e47094751ddc ("cxgb4: Add support to initialise/read SRQ entries")
which added it.

Remove it.

Note: I'm a bit suspicious whether any of the srq code in there
actually does anything useful; without this get I can't see anything
that reads the data, so perhaps the whole thing should go?
But that however would remove one of the opcode handlers, and I have
no way to test that.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241013203831.88051-4-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Dr. David Alan Gilbert [Sun, 13 Oct 2024 20:38:27 +0000 (21:38 +0100)]

cxgb4: Remove unused cxgb4_alloc/free_raw_mac_filt

cxgb4_alloc_raw_mac_filt() and cxgb4_free_raw_mac_filt() have been
unused since they were added in 2019 commit
5fab51581f62 ("cxgb4: Add MPS TCAM refcounting for raw mac filters")

Remove them.

This was also the last use of cxgb4_mps_ref_dec().
Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241013203831.88051-3-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Dr. David Alan Gilbert [Sun, 13 Oct 2024 20:38:26 +0000 (21:38 +0100)]

cxgb4: Remove unused cxgb4_alloc/free_encap_mac_filt

cxgb4_alloc_encap_mac_filt() and cxgb4_free_encap_mac_filt() have been
unused since
commit 28b3870578ef ("cxgb4: Re-work the logic for mps refcounting")

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241013203831.88051-2-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Oct 2024 17:55:55 +0000 (10:55 -0700)]

Merge branch 'net-ethernet-freescale-use-pa-to-format-resource_size_t'

Simon Horman says:

====================
net: ethernet: freescale: Use %pa to format resource_size_t

This short series addersses the formatting of variables of
type resource_size_t in freescale drivers.

The correct format string for resource_size_t is %pa which
acts on the address of the variable to be formatted [1].

[1] https://elixir.bootlin.com/linux/v6.11.3/source/Documentation/core-api/printk-formats.rst#L229

These problems were introduced by
commit 9d9326d3bc0e ("phy: Change mii_bus id field to a string")

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/netdev/711d7f6d-b785-7560-f4dc-c6aad2cce99@linux-m68k.org/
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
====================

Link: https://patch.msgid.link/20241014-net-pa-fmt-v1-0-dcc9afb8858b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Simon Horman [Mon, 14 Oct 2024 10:48:08 +0000 (11:48 +0100)]

net: ethernet: fs_enet: Use %pa to format resource_size_t

The correct format string for resource_size_t is %pa which
acts on the address of the variable to be formatted [1].

[1] https://elixir.bootlin.com/linux/v6.11.3/source/Documentation/core-api/printk-formats.rst#L229

Introduced by commit 9d9326d3bc0e ("phy: Change mii_bus id field to a string")

Flagged by gcc-14 as:

drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c: In function 'fs_mii_bitbang_init':
drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c:126:46: warning: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long long unsigned int'} [-Wformat=]
  126 |         snprintf(bus->id, MII_BUS_ID_SIZE, "%x", res.start);
      |                                             ~^   ~~~~~~~~~
      |                                              |      |
      |                                              |      resource_size_t {aka long long unsigned int}
      |                                              unsigned int
      |                                             %llx

No functional change intended.
Compile tested only.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/netdev/711d7f6d-b785-7560-f4dc-c6aad2cce99@linux-m68k.org/
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241014-net-pa-fmt-v1-2-dcc9afb8858b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Simon Horman [Mon, 14 Oct 2024 10:48:07 +0000 (11:48 +0100)]

net: fec_mpc52xx_phy: Use %pa to format resource_size_t

The correct format string for resource_size_t is %pa which
acts on the address of the variable to be formatted [1].

[1] https://elixir.bootlin.com/linux/v6.11.3/source/Documentation/core-api/printk-formats.rst#L229

Introduced by commit 9d9326d3bc0e ("phy: Change mii_bus id field to a string")

Flagged by gcc-14 as:

drivers/net/ethernet/freescale/fec_mpc52xx_phy.c: In function 'mpc52xx_fec_mdio_probe':
drivers/net/ethernet/freescale/fec_mpc52xx_phy.c:97:46: warning: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long long unsigned int'} [-Wformat=]
   97 |         snprintf(bus->id, MII_BUS_ID_SIZE, "%x", res.start);
      |                                             ~^   ~~~~~~~~~
      |                                              |      |
      |                                              |      resource_size_t {aka long long unsigned int}
      |                                              unsigned int
      |                                             %llx

No functional change intended.
Compile tested only.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/netdev/711d7f6d-b785-7560-f4dc-c6aad2cce99@linux-m68k.org/
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241014-net-pa-fmt-v1-1-dcc9afb8858b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Oct 2024 17:55:16 +0000 (10:55 -0700)]

Merge branch 'net-string-format-safety-updates'

Simon Horman says:

====================
net: String format safety updates

This series addresses string format safety issues that are
flagged by tooling in files touched by recent patches.

I do not believe that any of these issues are bugs.
Rather, I am providing these updates as I think there is a value
in addressing such warnings so real problems stand out.

v1: https://lore.kernel.org/20241011-string-thing-v1-0-acc506568033@kernel.org
====================

Link: https://patch.msgid.link/20241014-string-thing-v2-0-b9b29625060a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Simon Horman [Mon, 14 Oct 2024 08:52:26 +0000 (09:52 +0100)]

net: txgbe: Pass string literal as format argument of alloc_workqueue()

Recently I noticed that both gcc-14 and clang-18 report that passing
a non-string literal as the format argument of clkdev_create()
is potentially insecure.

E.g. clang-18 says:

.../txgbe_phy.c:582:35: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
  581 |         clock = clkdev_create(clk, NULL, clk_name);
      |                                          ^~~~~~~~
.../txgbe_phy.c:582:35: note: treat the string as an argument to avoid this
  581 |         clock = clkdev_create(clk, NULL, clk_name);
      |                                          ^
      |                                          "%s",

It is always the case where the contents of clk_name is safe to pass as the
format argument. That is, in my understanding, it never contains any
format escape sequences.

However, it seems better to be safe than sorry. And, as a bonus, compiler
output becomes less verbose by addressing this issue as suggested by
clang-18.

Compile tested only.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241014-string-thing-v2-2-b9b29625060a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Simon Horman [Mon, 14 Oct 2024 08:52:25 +0000 (09:52 +0100)]

net: dsa: microchip: copy string using strscpy

Prior to this patch ksz_ptp_msg_irq_setup() uses snprintf() to copy
strings. It does so by passing strings as the format argument of
snprintf(). This appears to be safe, due to the absence of format
specifiers in the strings, which are declared within the same function.
But nonetheless GCC 14 warns about it:

.../ksz_ptp.c:1109:55: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
1109 |         snprintf(ptpmsg_irq->name, sizeof(ptpmsg_irq->name), name[n]);
      |                                                              ^~~~~~~
.../ksz_ptp.c:1109:55: note: treat the string as an argument to avoid this
1109 |         snprintf(ptpmsg_irq->name, sizeof(ptpmsg_irq->name), name[n]);
      |                                                              ^
      |                                                              "%s",

As what we are really dealing with here is a string copy, it seems make
sense to use a function designed for this purpose. In this case null
padding is not required, so strscpy is appropriate. And as the
destination is an array of fixed size, the 2-argument variant may be used.

Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241014-string-thing-v2-1-b9b29625060a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Oct 2024 17:50:21 +0000 (10:50 -0700)]

Merge branch 'replace-call_rcu-by-kfree_rcu-for-simple-kmem_cache_free-callback'

Julia Lawall says:

====================
replace call_rcu by kfree_rcu for simple kmem_cache_free callback

Since SLOB was removed and since
commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.

The changes were done using the following Coccinelle semantic patch.
This semantic patch is designed to ignore cases where the callback
function is used in another way.

// <smpl>

@r@
expression e;
local idexpression e2;
identifier cb,f,g;
position p;
@@

(
call_rcu(...,e2)
|
call_rcu(&e->f,cb@p)
|
call_rcu(&e->f.g,cb@p)
)

@r1@
type T,T1;
identifier x,r.cb;
@@

cb(...) {
(
   kmem_cache_free(...);
|
   T x = ...;
   kmem_cache_free(...,(T1)x);
|
   T x;
   x = ...;
   kmem_cache_free(...,(T1)x);
)
}

@s depends on r1@
position p != r.p;
identifier r.cb;
@@

cb@p

@script:ocaml@
cb << r.cb;
p << s.p;
@@

Printf.eprintf "Other use of %s at %s:%d\n" cb (List.hd p).file (List.hd p).line

@depends on r1 && !s@
expression e;
identifier r.cb,f,g;
position r.p;
@@

(
- call_rcu(&e->f,cb@p)
+ kfree_rcu(e,f)
|
- call_rcu(&e->f.g,cb@p)
+ kfree_rcu(e,f.g)
)

@r1a depends on !s@
type T,T1;
identifier x,r.cb;
@@

- cb(...) {
(
-  kmem_cache_free(...);
|
-  T x = ...;
-  kmem_cache_free(...,(T1)x);
|
-  T x;
-  x = ...;
-  kmem_cache_free(...,(T1)x);
)
- }

@r2 depends on !r1@
identifier r.cb;
@@

cb(...) {
...
}

@script:ocaml depends on !r1 && !r2@
cb << r.cb;
@@

Printf.eprintf "need definition for %s\n" cb
// </smpl>
====================

Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20241013201704.49576-1-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Julia Lawall [Sun, 13 Oct 2024 20:17:01 +0000 (22:17 +0200)]

kcm: replace call_rcu by kfree_rcu for simple kmem_cache_free callback

Since SLOB was removed and since
commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.

The changes were made using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20241013201704.49576-15-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Julia Lawall [Sun, 13 Oct 2024 20:16:55 +0000 (22:16 +0200)]

net: bridge: replace call_rcu by kfree_rcu for simple kmem_cache_free callback

Since SLOB was removed and since
commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.

The changes were made using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20241013201704.49576-9-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Julia Lawall [Sun, 13 Oct 2024 20:16:51 +0000 (22:16 +0200)]

ipv6: replace call_rcu by kfree_rcu for simple kmem_cache_free callback

Since SLOB was removed and since
commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.

The changes were made using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20241013201704.49576-5-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Julia Lawall [Sun, 13 Oct 2024 20:16:50 +0000 (22:16 +0200)]

inetpeer: replace call_rcu by kfree_rcu for simple kmem_cache_free callback

Since SLOB was removed and since
commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.

The changes were made using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20241013201704.49576-4-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Julia Lawall [Sun, 13 Oct 2024 20:16:49 +0000 (22:16 +0200)]

ipv4: replace call_rcu by kfree_rcu for simple kmem_cache_free callback

Since SLOB was removed and since
commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.

The changes were made using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20241013201704.49576-3-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Lorenzo Bianconi [Sat, 12 Oct 2024 09:01:11 +0000 (11:01 +0200)]

net: airoha: Implement BQL support

Introduce BQL support in the airoha_eth driver reporting to the kernel
info about tx hw DMA queues in order to avoid bufferbloat and keep the
latency small.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20241012-en7581-bql-v2-1-4deb4efdb60b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Daniel Golle [Sun, 13 Oct 2024 13:16:44 +0000 (14:16 +0100)]

net: phy: aquantia: fix return value check in aqr107_config_mdi()

of_property_read_u32() returns -EINVAL in case the property cannot be
found rather than -ENOENT. Fix the check to not abort probing in case
of the property being missing, and also in case CONFIG_OF is not set
which will result in -ENOSYS.

Fixes: a2e1ba275eae ("net: phy: aquantia: allow forcing order of MDI pairs")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Closes: https://lore.kernel.org/all/114b4c03-5d16-42ed-945d-cf78eabea12b@nvidia.com/
Suggested-by: Hans-Frieder Vogt <hfdevel@gmx.net>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/f8282e2fc6a5ac91fe91491edc7f1ca8f4a65a0d.1728825323.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Oct 2024 16:52:39 +0000 (09:52 -0700)]

Merge branch 'net-af_packet-allow-joining-a-fanout-when-link-is-down'

Gur Stavi says:

====================
net: af_packet: allow joining a fanout when link is down

PACKET socket can retain its fanout membership through link down and up
and leave a fanout while closed regardless of link state.
However, socket was forbidden from joining a fanout while it was not
RUNNING.

This scenario was identified while studying DPDK pmd_af_packet_drv.
Since sockets are only created during initialization, there is no reason
to fail the initialization if a single link is temporarily down.

This patch allows PACKET socket to join a fanout while not RUNNING.

Selftest psock_fanout is extended to test this "fanout while link down"
scenario.

Selftest psock_fanout is also extended to test fanout create/join by
socket that did not bind or specified a protocol, which carries an
implicit bind.

v3: https://lore.kernel.org/cover.1728555449.git.gur.stavi@huawei.com
v2: https://lore.kernel.org/cover.1728382839.git.gur.stavi@huawei.com
v1: https://lore.kernel.org/cover.1728303615.git.gur.stavi@huawei.com
====================

Link: https://patch.msgid.link/cover.1728802323.git.gur.stavi@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Gur Stavi [Sun, 13 Oct 2024 07:15:27 +0000 (10:15 +0300)]

selftests: net/psock_fanout: unbound socket fanout

Add a test that validates that an unbound packet socket cannot create/join
a fanout group.

Signed-off-by: Gur Stavi <gur.stavi@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/7612fa90f613100e2b64c563cab3d7fdf36010db.1728802323.git.gur.stavi@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Gur Stavi [Sun, 13 Oct 2024 07:15:26 +0000 (10:15 +0300)]

selftests: net/psock_fanout: socket joins fanout when link is down

Modify test_control_group to have toggle parameter.
When toggle is non-zero, loopback device will be set down for the
initialization of fd[1] which is still expected to successfully join
the fanout.

Signed-off-by: Gur Stavi <gur.stavi@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/6f4a506ed5f08f8fc00a966dec8febd1030c6e98.1728802323.git.gur.stavi@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Gur Stavi [Sun, 13 Oct 2024 07:15:25 +0000 (10:15 +0300)]

af_packet: allow fanout_add when socket is not RUNNING

PACKET socket can retain its fanout membership through link down and up
and leave a fanout while closed regardless of link state.
However, socket was forbidden from joining a fanout while it was not
RUNNING.

This patch allows PACKET socket to join fanout while not RUNNING.

Socket can be RUNNING if it has a specified protocol. Either directly
from packet_create (being implicitly bound to any interface) or following
a successful bind. Socket RUNNING state is switched off if it is bound to
an interface that went down.

Instead of the test for RUNNING, this patch adds a test that socket can
become RUNNING.

Signed-off-by: Gur Stavi <gur.stavi@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/4f1a3c37dbef980ef044c4d2adf91c76e2eca14b.1728802323.git.gur.stavi@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Heiner Kallweit [Sun, 13 Oct 2024 09:17:39 +0000 (11:17 +0200)]

r8169: implement additional ethtool stats ops

This adds support for ethtool standard statistics, and makes use of the
extended hardware statistics being available from RTl8125.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/58e0da73-a7dd-4be3-82ae-d5b3f9069bde@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Paolo Abeni [Tue, 15 Oct 2024 13:28:16 +0000 (15:28 +0200)]

Merge tag 'batadv-next-pullrequest-20241015' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

- bump version strings, by Simon Wunderlich

- Add flex array to struct batadv_tvlv_tt_data, by Erick Archer

- Use string choice helper to print booleans, by Sven Eckelmann

- replace call_rcu by kfree_rcu for simple kmem_cache_free callback,
   by Julia Lawall

* tag 'batadv-next-pullrequest-20241015' of git://git.open-mesh.org/linux-merge:
  batman-adv: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
  batman-adv: Use string choice helper to print booleans
  batman-adv: Add flex array to struct batadv_tvlv_tt_data
  batman-adv: Start new development cycle
====================

Link: https://patch.msgid.link/20241015073946.46613-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Paolo Abeni [Tue, 15 Oct 2024 13:19:48 +0000 (15:19 +0200)]

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-10-14

The following pull-request contains BPF updates for your *net-next* tree.

We've added 21 non-merge commits during the last 18 day(s) which contain
a total of 21 files changed, 1185 insertions(+), 127 deletions(-).

The main changes are:

1) Put xsk sockets on a struct diet and add various cleanups. Overall, this helps
   to bump performance by 12% for some workloads, from Maciej Fijalkowski.

2) Extend BPF selftests to increase coverage of XDP features in combination
   with BPF cpumap, from Alexis Lothoré (eBPF Foundation).

3) Extend netkit with an option to delegate skb->{mark,priority} scrubbing to
   its BPF program, from Daniel Borkmann.

4) Make the bpf_get_netns_cookie() helper available also to tc(x) BPF programs,
   from Mahe Tardy.

5) Extend BPF selftests covering a BPF program setting socket options per MPTCP
   subflow, from Geliang Tang and Nicolas Rybowski.

bpf-next-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (21 commits)
  xsk: Use xsk_buff_pool directly for cq functions
  xsk: Wrap duplicated code to function
  xsk: Carry a copy of xdp_zc_max_segs within xsk_buff_pool
  xsk: Get rid of xdp_buff_xsk::orig_addr
  xsk: s/free_list_node/list_node/
  xsk: Get rid of xdp_buff_xsk::xskb_list_node
  selftests/bpf: check program redirect in xdp_cpumap_attach
  selftests/bpf: make xdp_cpumap_attach keep redirect prog attached
  selftests/bpf: fix bpf_map_redirect call for cpu map test
  selftests/bpf: add tcx netns cookie tests
  bpf: add get_netns_cookie helper to tc programs
  selftests/bpf: add missing header include for htons
  selftests/bpf: Extend netkit tests to validate skb meta data
  tools: Sync if_link.h uapi tooling header
  netkit: Add add netkit scrub support to rt_link.yaml
  netkit: Simplify netkit mode over to use NLA_POLICY_MAX
  netkit: Add option for scrubbing skb meta data
  bpf: Remove unused macro
  selftests/bpf: Add mptcp subflow subtest
  selftests/bpf: Add getsockopt to inspect mptcp subflow
  ...
====================

Link: https://patch.msgid.link/20241014211110.16562-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Simon Horman [Fri, 11 Oct 2024 09:20:00 +0000 (10:20 +0100)]

net: gianfar: Use __be64 * to store pointers to big endian values

Timestamp values are read using pointers to 64-bit big endian values.
But the type of these pointers is u64 *, host byte order.
Use __be64 * instead.

Flagged by Sparse:

.../gianfar.c:2212:60: warning: cast to restricted __be64
.../gianfar.c:2475:53: warning: cast to restricted __be64

Introduced by
commit cc772ab7cdca ("gianfar: Add hardware RX timestamping support").

Compile tested only.
No functional change intended.

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://patch.msgid.link/20241011-gianfar-be64-v1-1-a77ebe972176@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Kuniyuki Iwashima [Thu, 10 Oct 2024 17:24:33 +0000 (10:24 -0700)]

rtnl_net_debug: Remove rtnl_net_debug_exit().

kernel test robot reported section mismatch in rtnl_net_debug_exit().

WARNING: modpost: vmlinux: section mismatch in reference: rtnl_net_debug_exit+0x20 (section: .exit.text) -> rtnl_net_debug_net_ops (section: .init.data)

rtnl_net_debug_exit() uses rtnl_net_debug_net_ops() that is annotated
as __net_initdata, but this file is always built-in.

Let's remove rtnl_net_debug_exit().

Fixes: 03fa53485659 ("rtnetlink: Add ASSERT_RTNL_NET() placeholder for netdev notifier.")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410101854.i0vQCaDz-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241010172433.67694-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Jakub Kicinski [Thu, 10 Oct 2024 15:12:48 +0000 (08:12 -0700)]

tools: ynl-gen: use names of constants in generated limits

YNL specs can use string expressions for limits, like s32-min
or u16-max. We convert all of those into their numeric values
when generating the code, which isn't always helpful. Try to
retain the string representations in the output. Any sort of
calculations still need the integers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20241010151248.2049755-1-kuba@kernel.org
[pabeni@redhat.com: regenerated netdev-genl-gen.c]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Siddharth Vadapalli [Thu, 10 Oct 2024 15:05:43 +0000 (20:35 +0530)]

net: ethernet: ti: am65-cpsw: Enable USXGMII mode for J7200 CPSW5G

TI's J7200 SoC supports USXGMII mode. Add USXGMII mode to the
extra_modes member of the J7200 SoC data.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://patch.msgid.link/20241010150543.2620448-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Daniel Golle [Thu, 10 Oct 2024 12:55:29 +0000 (13:55 +0100)]

net: phy: intel-xway: add support for PHY LEDs

The intel-xway PHY driver predates the PHY LED framework and currently
initializes all LED pins to equal default values.

Add PHY LED functions to the drivers and don't set default values if
LEDs are defined in device tree.

According the datasheets 3 LEDs are supported on all Intel XWAY PHYs.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/81f4717ab9acf38f3239727a4540ae96fd01109b.1728558223.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Daniel Golle [Thu, 10 Oct 2024 12:55:17 +0000 (13:55 +0100)]

net: phy: mxl-gpy: correctly describe LED polarity

According the datasheet covering the LED (0x1b) register:
0B Active High LEDx pin driven high when activated
1B Active Low LEDx pin driven low when activated

Make use of the now available 'active-high' property and correctly
reflect the polarity setting which was previously inverted.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/180ccafa837f09908b852a8a874a3808c5ecd2d0.1728558223.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Daniel Golle [Thu, 10 Oct 2024 12:55:00 +0000 (13:55 +0100)]

net: phy: aquantia: correctly describe LED polarity override

Use newly defined 'active-high' property to set the
VEND1_GLOBAL_LED_DRIVE_VDD bit and let 'active-low' clear that bit. This
reflects the technical reality which was inverted in the previous
description in which the 'active-low' property was used to actually set
the VEND1_GLOBAL_LED_DRIVE_VDD bit, which means that VDD (ie. supply
voltage) of the LED is driven rather than GND.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/86a413b4387c42dcb54f587cc2433a06f16aae83.1728558223.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Daniel Golle [Thu, 10 Oct 2024 12:54:19 +0000 (13:54 +0100)]

net: phy: support 'active-high' property for PHY LEDs

In addition to 'active-low' and 'inactive-high-impedance' also
support 'active-high' property for PHY LED pin configuration.
As only either 'active-high' or 'active-low' can be set at the
same time, WARN and return an error in case both are set.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/91598487773d768f254d5faf06cf65b13e972f0e.1728558223.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Paolo Abeni [Tue, 15 Oct 2024 08:44:54 +0000 (10:44 +0200)]

Merge branch 'make-phy-output-rmii-reference-clock'

Wei Fang says:

====================
make PHY output RMII reference clock

The TJA11xx PHYs have the capability to provide 50MHz reference clock
in RMII mode and output on REF_CLK pin. Therefore, add the new property
"nxp,rmii-refclk-output" to support this feature. This property is only
available for PHYs which use nxp-c45-tja11xx driver, such as TJA1103,
TJA1104, TJA1120 and TJA1121.
====================

Link: https://patch.msgid.link/20241010061944.266966-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Wei Fang [Thu, 10 Oct 2024 06:19:44 +0000 (14:19 +0800)]

net: phy: c45-tja11xx: add support for outputting RMII reference clock

For TJA11xx PHYs, they have the capability to output 50MHz reference
clock on REF_CLK pin in RMII mode, which is called "revRMII" mode in
the PHY data sheet.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Wei Fang [Thu, 10 Oct 2024 06:19:43 +0000 (14:19 +0800)]

dt-bindings: net: tja11xx: add "nxp,rmii-refclk-out" property

Per the RMII specification, the REF_CLK is sourced from MAC to PHY
or from an external source. But for TJA11xx PHYs, they support to
output a 50MHz RMII reference clock on REF_CLK pin. Previously the
"nxp,rmii-refclk-in" was added to indicate that in RMII mode, if
this property present, REF_CLK is input to the PHY, otherwise it
is output. This seems inappropriate now. Because according to the
RMII specification, the REF_CLK is originally input, so there is
no need to add an additional "nxp,rmii-refclk-in" property to
declare that REF_CLK is input.
Unfortunately, because the "nxp,rmii-refclk-in" property has been
added for a while, and we cannot confirm which DTS use the TJA1100
and TJA1101 PHYs, changing it to switch polarity will cause an ABI
break. But fortunately, this property is only valid for TJA1100 and
TJA1101. For TJA1103/TJA1104/TJA1120/TJA1121 PHYs, this property is
invalid because they use the nxp-c45-tja11xx driver, which is a
different driver from TJA1100/TJA1101. Therefore, for PHYs using
nxp-c45-tja11xx driver, add "nxp,rmii-refclk-out" property to
support outputting RMII reference clock on REF_CLK pin.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Jakub Kicinski [Fri, 11 Oct 2024 23:03:11 +0000 (16:03 -0700)]

selftests: net: move EXTRA_CLEAN of libynl.a into ynl.mk

Commit 1fd9e4f25782 ("selftests: make kselftest-clean remove libynl outputs")
added EXTRA_CLEAN of YNL generated files to ynl.mk. We already had
a EXTRA_CLEAN in the file including the snippet. Consolidate them.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20241011230311.2529760-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Fri, 11 Oct 2024 23:03:10 +0000 (16:03 -0700)]

selftests: net: rebuild YNL if dependencies changed

Try to rebuild YNL if either user added a new family or the specs
of the families have changed. Stanislav's ncdevmem cause a false
positive build failure in NIPA because libynl.a isn't rebuilt
after ethtool is added to YNL_GENS.

Note that sha1sum is already used in other parts of the build system.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20241011230311.2529760-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Rosen Penev [Fri, 11 Oct 2024 20:02:25 +0000 (13:02 -0700)]

net: mtk_eth_soc: use ethtool_puts

Allows simplifying get_strings and avoids manual pointer manipulation.

Tested on Belkin RT1800.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://patch.msgid.link/20241011200225.7403-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Rosen Penev [Fri, 11 Oct 2024 19:59:55 +0000 (12:59 -0700)]

net: mvneta: use ethtool_puts

Allows simplifying get_strings and avoids manual pointer manipulation.

Tested on Turris Omnia.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://patch.msgid.link/20241011195955.7065-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Oct 2024 00:55:08 +0000 (17:55 -0700)]

Merge branch 'add-support-for-per-napi-config-via-netlink'

Joe Damato says:

====================
Add support for per-NAPI config via netlink

Greetings:

Welcome to v6. Minor changes from v5 [1], please see changelog below.

There were no explicit comments from reviewers on the call outs in my
v5, so I'm retaining them from my previous cover letter just in case :)

A few important call outs for reviewers:

  1. This revision seems to work (see below for a full walk through). I
     think this is the behavior we talked about, but please let me know if
     a use case is missing.

  2. Re a previous point made by Stanislav regarding "taking over a NAPI
     ID" when the channel count changes: mlx5 seems to call napi_disable
     followed by netif_napi_del for the old queues and then calls
     napi_enable for the new ones. In this RFC, the NAPI ID generation is
     deferred to napi_enable. This means we won't end up with two of the
     same NAPI IDs added to the hash at the same time.

     Can we assume all drivers will napi_disable the old queues before
     napi_enable the new ones?
       - If yes: we might not need to worry about a NAPI ID takeover
        function.
       - If no: I'll need to make a change so that the NAPI ID generation
        is deferred only for drivers which have opted into the config
        space via calls to netif_napi_add_config

  3. I made the decision to remove the WARN_ON_ONCE that (I think?)
     Jakub previously suggested in alloc_netdev_mqs (WARN_ON_ONCE(txqs
     != rxqs);) because this was triggering on every kernel boot with my
     mlx5 NIC.

  4. I left the "maxqs = max(txqs, rxqs);" in alloc_netdev_mqs despite
     thinking this is a bit strange. I think it's strange that we might
     be short some number of NAPI configs, but it seems like most people
     are in favor of this approach, so I've left it.

I'd appreciate thoughts from reviewers on the above items, if at all
possible.

Now, on to the implementation.

Firstly, this implementation moves certain settings to napi_struct so that
they are "per-NAPI", while taking care to respect existing sysfs
parameters which are interface wide and affect all NAPIs:
  - NAPI ID
  - gro_flush_timeout
  - defer_hard_irqs

Furthermore:
   - NAPI ID generation and addition to the hash is now deferred to
     napi_enable, instead of during netif_napi_add
   - NAPIs are removed from the hash during napi_disable, instead of
     netif_napi_del.
   - An array of "struct napi_config" is allocated in net_device.

IMPORTANT: The above changes affect all network drivers.

Optionally, drivers may opt-in to using their config space by calling
netif_napi_add_config instead of netif_napi_add.

If a driver does this, the NAPI being added is linked with an allocated
"struct napi_config" and the per-NAPI settings (including NAPI ID) are
persisted even as hardware queues are destroyed and recreated.

To help illustrate how this would end up working, I've added patches for
3 drivers, of which I have access to only 1:
  - mlx5 which is the basis of the examples below
  - mlx4 which has TX only NAPIs, just to highlight that case. I have
    only compile tested this patch; I don't have this hardware.
  - bnxt which I have only compiled tested. I don't have this
    hardware.

NOTE: I only tested this on mlx5; I have no access to the other hardware
for which I provided patches. Hopefully other folks can help test :)

Here's how it works when I test it on my mlx5 system:

$ ethtool -l eth4 | grep Combined | tail -1
Combined:       2

First, output the current NAPI settings:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 0,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
{'defer-hard-irqs': 0,
  'gro-flush-timeout': 0,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now, set the global sysfs parameters:

$ sudo bash -c 'echo 20000 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 100 >/sys/class/net/eth4/napi_defer_hard_irqs'

Output current NAPI settings again:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now set NAPI ID 345, via its NAPI ID to specific values:

$ sudo ./tools/net/ynl/cli.py \
          --spec Documentation/netlink/specs/netdev.yaml \
          --do napi-set \
          --json='{"id": 345,
                   "defer-hard-irqs": 111,
                   "gro-flush-timeout": 11111}'
None

Now output current NAPI settings again to ensure only NAPI ID 345
changed:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 111,
  'gro-flush-timeout': 11111,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now, increase gro-flush-timeout only:

$ sudo ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/netdev.yaml \
       --do napi-set --json='{"id": 345,
                              "gro-flush-timeout": 44444}'
None

Now output the current NAPI settings once more:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
  'gro-flush-timeout': 44444,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now set NAPI ID 345 to have gro_flush_timeout of 0:

$ sudo ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/netdev.yaml \
       --do napi-set --json='{"id": 345,
                              "gro-flush-timeout": 0}'
None

Check that NAPI ID 345 has a value of 0:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Change the queue count, ensuring that NAPI ID 345 retains its settings:

$ sudo ethtool -L eth4 combined 4

Check that the new queues have the system wide settings but that NAPI ID
345 remains unchanged:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 347,
  'ifindex': 7,
  'irq': 529},
{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 346,
  'ifindex': 7,
  'irq': 528},
{'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now reduce the queue count below where NAPI ID 345 is indexed:

$ sudo ethtool -L eth4 combined 1

Check the output:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Re-increase the queue count to ensure NAPI ID 345 is re-assigned the same
values:

$ sudo ethtool -L eth4 combined 2

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Create new queues to ensure the sysfs globals are used for the new NAPIs
but that NAPI ID 345 is unchanged:

$ sudo ethtool -L eth4 combined 8

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[...]
{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 346,
  'ifindex': 7,
  'irq': 528},
{'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Last, but not least, let's try writing the sysfs parameters to ensure
all NAPIs are rewritten:

$ sudo bash -c 'echo 33333 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 222 >/sys/class/net/eth4/napi_defer_hard_irqs'

Check that worked:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[...]
{'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 346,
  'ifindex': 7,
  'irq': 528},
{'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
{'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

[1]: https://lore.kernel.org/20241009005525.13651-1-jdamato@fastly.com

v5: https://lore.kernel.org/20241009005525.13651-1-jdamato@fastly.com
rfcv4: https://lore.kernel.org/lkml/20241001235302.57609-1-jdamato@fastly.com
rfcv3: https://lore.kernel.org/20240912100738.16567-8-jdamato@fastly.com
rfcv2: https://lore.kernel.org/20240908160702.56618-1-jdamato@fastly.com
====================

Link: https://patch.msgid.link/20241011184527.16393-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Joe Damato [Fri, 11 Oct 2024 18:45:04 +0000 (18:45 +0000)]

mlx4: Add support for persistent NAPI config to RX CQs

Use netif_napi_add_config to assign persistent per-NAPI config when
initializing RX CQ NAPIs.

Presently, struct napi_config only has support for two fields used for
RX, so there is no need to support them with TX CQs, yet.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-10-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Joe Damato [Fri, 11 Oct 2024 18:45:03 +0000 (18:45 +0000)]

mlx5: Add support for persistent NAPI config

Use netif_napi_add_config to assign persistent per-NAPI config when
initializing NAPIs.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-9-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Joe Damato [Fri, 11 Oct 2024 18:45:02 +0000 (18:45 +0000)]

bnxt: Add support for persistent NAPI config

Use netif_napi_add_config to assign persistent per-NAPI config when
initializing NAPIs.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-8-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Joe Damato [Fri, 11 Oct 2024 18:45:01 +0000 (18:45 +0000)]

netdev-genl: Support setting per-NAPI config values

Add support to set per-NAPI defer_hard_irqs and gro_flush_timeout.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241011184527.16393-7-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Joe Damato [Fri, 11 Oct 2024 18:45:00 +0000 (18:45 +0000)]

net: napi: Add napi_config

Add a persistent NAPI config area for NAPI configuration to the core.
Drivers opt-in to setting the persistent config for a NAPI by passing an
index when calling netif_napi_add_config.

napi_config is allocated in alloc_netdev_mqs, freed in free_netdev
(after the NAPIs are deleted).

Drivers which call netif_napi_add_config will have persistent per-NAPI
settings: NAPI IDs, gro_flush_timeout, and defer_hard_irq settings.

Per-NAPI settings are saved in napi_disable and restored in napi_enable.

Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241011184527.16393-6-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Joe Damato [Fri, 11 Oct 2024 18:44:59 +0000 (18:44 +0000)]

netdev-genl: Dump gro_flush_timeout

Support dumping gro_flush_timeout for a NAPI ID.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241011184527.16393-5-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Joe Damato [Fri, 11 Oct 2024 18:44:58 +0000 (18:44 +0000)]

net: napi: Make gro_flush_timeout per-NAPI

Allow per-NAPI gro_flush_timeout setting.

The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device gro_flush_timeout
field. Reads from sysfs will read from the net_device field.

The ability to set gro_flush_timeout on specific NAPI instances will be
added in a later commit, via netdev-genl.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-4-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Joe Damato [Fri, 11 Oct 2024 18:44:57 +0000 (18:44 +0000)]

netdev-genl: Dump napi_defer_hard_irqs

Support dumping defer_hard_irqs for a NAPI ID.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Joe Damato [Fri, 11 Oct 2024 18:44:56 +0000 (18:44 +0000)]

net: napi: Make napi_defer_hard_irqs per-NAPI

Add defer_hard_irqs to napi_struct in preparation for per-NAPI
settings.

The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device defer_hard_irq field.
Reads from sysfs show the net_device field.

The ability to set defer_hard_irqs on specific NAPI instances will be
added in a later commit, via netdev-genl.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Daniel Golle [Fri, 11 Oct 2024 03:40:39 +0000 (04:40 +0100)]

net: phylink: allow half-duplex modes with RATE_MATCH_PAUSE

PHYs performing rate-matching using MAC-side flow-control always
perform duplex-matching as well in case they are supporting
half-duplex modes at all.
No longer remove half-duplex modes from their capabilities.

Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/b157c0c289cfba024039a96e635d037f9d946745.1728617993.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Oct 2024 00:39:38 +0000 (17:39 -0700)]

Merge branch 'tcp-add-skb-sk-to-more-control-packets'

Eric Dumazet says:

====================
tcp: add skb->sk to more control packets

Currently, TCP can set skb->sk for a variety of transmit packets.

However, packets sent on behalf of a TIME_WAIT sockets do not
have an attached socket.

Same issue for RST packets.

We want to change this, in order to increase eBPF program
capabilities.

This is slightly risky, because various layers could
be confused by TIME_WAIT sockets showing up in skb->sk.

v2: audited all sk_to_full_sk() users and addressed Martin feedback.
====================

Link: https://patch.msgid.link/20241010174817.1543642-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Thu, 10 Oct 2024 17:48:17 +0000 (17:48 +0000)]

ipv4: tcp: give socket pointer to control skbs

ip_send_unicast_reply() send orphaned 'control packets'.

These are RST packets and also ACK packets sent from TIME_WAIT.

Some eBPF programs would prefer to have a meaningful skb->sk
pointer as much as possible.

This means that TCP can now attach TIME_WAIT sockets to outgoing
skbs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Thu, 10 Oct 2024 17:48:16 +0000 (17:48 +0000)]

ipv6: tcp: give socket pointer to control skbs

tcp_v6_send_response() send orphaned 'control packets'.

These are RST packets and also ACK packets sent from TIME_WAIT.

Some eBPF programs would prefer to have a meaningful skb->sk
pointer as much as possible.

This means that TCP can now attach TIME_WAIT sockets to outgoing
skbs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Thu, 10 Oct 2024 17:48:15 +0000 (17:48 +0000)]

net: add skb_set_owner_edemux() helper

This can be used to attach a socket to an skb,
taking a reference on sk->sk_refcnt.

This helper might be a NOP if sk->sk_refcnt is zero.

Use it from tcp_make_synack().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Thu, 10 Oct 2024 17:48:14 +0000 (17:48 +0000)]

net_sched: sch_fq: prepare for TIME_WAIT sockets

TCP stack is not attaching skb to TIME_WAIT sockets yet,
but we would like to allow this in the future.

Add sk_listener_or_tw() helper to detect the three states
that FQ needs to take care.

Like NEW_SYN_RECV, TIME_WAIT are not full sockets and
do not contain sk->sk_pacing_status, sk->sk_pacing_rate.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Thu, 10 Oct 2024 17:48:13 +0000 (17:48 +0000)]

net: add TIME_WAIT logic to sk_to_full_sk()

TCP will soon attach TIME_WAIT sockets to some ACK and RST.

Make sure sk_to_full_sk() detects this and does not return
a non full socket.

v3: also changed sk_const_to_full_sk()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Simon Horman [Wed, 9 Oct 2024 09:40:10 +0000 (10:40 +0100)]

tg3: Address byte-order miss-matches

Address byte-order miss-matches flagged by Sparse.

In tg3_load_firmware_cpu() and tg3_get_device_address()
this is done using appropriate types to store big endian values.

In the cases of tg3_test_nvram(), where buf is an array which
contains values of several different types, cast to __le32
before converting values to host byte order.

Reported by Sparse as:
.../tg3.c:3745:34: warning: cast to restricted __be32
.../tg3.c:13096:21: warning: cast to restricted __le32
.../tg3.c:13096:21: warning: cast from restricted __be32
.../tg3.c:13101:21: warning: cast to restricted __le32
.../tg3.c:13101:21: warning: cast from restricted __be32
.../tg3.c:17070:63: warning: incorrect type in argument 3 (different base types)
.../tg3.c:17070:63:    expected restricted __be32 [usertype] *val
.../tg3.c:17070:63:    got unsigned int *
dr.../tg3.c:17071:63: warning: incorrect type in argument 3 (different base types)
.../tg3.c:17071:63:    expected restricted __be32 [usertype] *val
.../tg3.c:17071:63:    got unsigned int *

Also, address white-space issues on lines modified for the above.
And, for consistency, lines adjacent to them.

Compile tested only.
No functional change intended.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-tg3-sparse-v1-1-6af38a7bf4ff@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Maciej Fijalkowski [Mon, 7 Oct 2024 12:24:58 +0000 (14:24 +0200)]

xsk: Use xsk_buff_pool directly for cq functions

Currently xsk_cq_{reserve_addr,submit,cancel}_locked() take xdp_sock as
an input argument but it is only used for pulling out xsk_buff_pool
pointer from it.

Change mentioned functions to take pool pointer as an input argument to
avoid unnecessary dereferences.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20241007122458.282590-7-maciej.fijalkowski@intel.com

commit | commitdiff | tree

Maciej Fijalkowski [Mon, 7 Oct 2024 12:24:57 +0000 (14:24 +0200)]

xsk: Wrap duplicated code to function

Both allocation paths have exactly the same code responsible for getting
and initializing xskb. Pull it out to common function.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20241007122458.282590-6-maciej.fijalkowski@intel.com

commit | commitdiff | tree

Maciej Fijalkowski [Mon, 7 Oct 2024 12:24:56 +0000 (14:24 +0200)]

xsk: Carry a copy of xdp_zc_max_segs within xsk_buff_pool

This so we avoid dereferencing struct net_device within hot path.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20241007122458.282590-5-maciej.fijalkowski@intel.com

A mirror of Linus' kernel repository