====================
Converge on using secs_to_jiffies() part two
This is the second series (part 1*) that converts users of msecs_to_jiffies() that
either use the multiply pattern of either of:
- msecs_to_jiffies(N*1000) or
- msecs_to_jiffies(N*MSEC_PER_SEC)
where N is a constant or an expression, to avoid the multiplication.
The conversion is made with Coccinelle with the secs_to_jiffies() script
in scripts/coccinelle/misc. Attention is paid to what the best change
can be rather than restricting to what the tool provides.
net: ipconfig: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication.
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication.
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
gve: make IRQ handlers and page allocation NUMA aware
All memory in GVE is currently allocated without regard for the NUMA
node of the device. Because access to NUMA-local memory access is
significantly cheaper than access to a remote node, this change attempts
to ensure that page frags used in the RX path, including page pool
frags, are allocated on the NUMA node local to the gVNIC device. Note
that this attempt is best-effort. If necessary, the driver will still
allocate non-local memory, as __GFP_THISNODE is not passed. Descriptor
ring allocations are not updated, as dma_alloc_coherent handles that.
This change also modifies the IRQ affinity setting to only select CPUs
from the node local to the device, preserving the behavior that TX and
RX queues of the same index share CPU affinity.
Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250707210107.2742029-1-jeroendb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Add Microchip ZL3073x support (part 1)
Add support for Microchip Azurite DPLL/PTP/SyncE chip family that
provides DPLL and PTP functionality. This series bring first part
that adds the core functionality and basic DPLL support.
The next part of the series will bring additional DPLL functionality
like eSync support, phase offset and frequency offset reporting and
phase adjustments.
Testing was done by myself and by Prathosh Satish on Microchip EDS2
development board with ZL30732 DPLL chip connected over I2C bus.
====================
where the base_freq comes from the list of supported base frequencies
and other parameters are arbitrary numbers. All these parameters are
16-bit unsigned integers.
The frequency for output pin is determined by the frequency of
synthesizer the output pin is connected to and divisor of the output
to which is the given pin belongs. The resulting frequency of the
P-pin and the N-pin from this output pair depends on the signal
format of this output pair.
The device supports so-called N-divided signal formats where for the
N-pin there is an additional divisor. The frequencies for both pins
from such output pair are computed:
For other signal-format types both P and N pin have the same frequency
based only synth frequency and output divisor.
Implement output pin callbacks to get and set frequency. The frequency
setting for the output non-N-divided signal format is simple as we have
to compute just new output divisor. For N-divided formats it is more
complex because by changing of output divisor we change frequency for
both P and N pins. In this case if we are changing frequency for P-pin
we have to compute also new N-divisor for N-pin to keep its current
frequency. From this and the above it follows that the frequency of
the N-pin cannot be higher than the frequency of the P-pin and the
callback must take this limitation into account.
Ivan Vecera [Fri, 4 Jul 2025 18:22:01 +0000 (20:22 +0200)]
dpll: zl3073x: Implement input pin state setting in automatic mode
Implement input pin state setting when the DPLL is running in automatic
mode. Unlike manual mode, the DPLL mode switching is not used here and
the implementation uses special priority value (15) to make the given
pin non-selectable.
When the user sets state of the pin as disconnected the driver
internally sets its priority in HW to 15 that prevents the DPLL to
choose this input pin. Conversely, if the pin status is set to
selectable, the driver sets the pin priority in HW to the original saved
value.
Ivan Vecera [Fri, 4 Jul 2025 18:22:00 +0000 (20:22 +0200)]
dpll: zl3073x: Add support to get/set priority on input pins
Add support for getting and setting input pin priority. Implement
required callbacks and set appropriate capability for input pins.
Although the pin priority make sense only if the DPLL is running in
automatic mode we have to expose this capability unconditionally because
input pins (references) are shared between all DPLLs where one of them
can run in automatic mode while the other one not.
Ivan Vecera [Fri, 4 Jul 2025 18:21:59 +0000 (20:21 +0200)]
dpll: zl3073x: Implement input pin selection in manual mode
Implement input pin state setting if the DPLL is running in manual mode.
The driver indicates manual mode if the DPLL mode is one of ref-lock,
forced-holdover, freerun.
Use these modes to implement input pin state change between connected
and disconnected states. When the user set the particular pin as
connected the driver marks this input pin as forced reference and
switches the DPLL mode to ref-lock. When the use set the pin as
disconnected the driver switches the DPLL to freerun or forced holdover
mode. The switch to holdover mode is done if the DPLL has holdover
capability (e.g is currently locked with holdover acquired).
Ivan Vecera [Fri, 4 Jul 2025 18:21:58 +0000 (20:21 +0200)]
dpll: zl3073x: Register DPLL devices and pins
Enumerate all available DPLL channels and registers a DPLL device for
each of them. Check all input references and outputs and register
DPLL pins for them.
Number of registered DPLL pins depends on configuration of references
and outputs. If the reference or output is configured as differential
one then only one DPLL pin is registered. Both references and outputs
can be also disabled from firmware configuration and in this case
no DPLL pins are registered.
All registrable references are registered to all available DPLL devices
with exception of DPLLs that are configured in NCO (numerically
controlled oscillator) mode. In this mode DPLL channel acts as PHC and
cannot be locked to any reference.
Device outputs are connected to one of synthesizers and each synthesizer
is driven by some DPLL channel. So output pins belonging to given output
are registered to DPLL device that drives associated synthesizer.
Finally add kworker task to monitor async changes on all DPLL channels
and input pins and to notify about them DPLL core. Output pins are not
monitored as their parameters are not changed asynchronously by the
device.
Ivan Vecera [Fri, 4 Jul 2025 18:21:57 +0000 (20:21 +0200)]
dpll: zl3073x: Read DPLL types and pin properties from system firmware
Add support for reading of DPLL types and optional pin properties from
the system firmware (DT, ACPI...).
The DPLL types are stored in property 'dpll-types' as string array and
possible values 'pps' and 'eec' are mapped to DPLL enums DPLL_TYPE_PPS
and DPLL_TYPE_EEC.
The pin properties are stored under 'input-pins' and 'output-pins'
sub-nodes and the following ones are supported:
* reg
integer that specifies pin index
* label
string that is used by driver as board label
* connection-type
string that indicates pin connection type
* supported-frequencies-hz
array of u64 values what frequencies are supported / allowed for
given pin with respect to hardware wiring
Do not blindly trust system firmware and filter out frequencies that
cannot be configured/represented in device (input frequencies have to
be factorized by one of the base frequencies and output frequencies have
to divide configured synthesizer frequency).
Ivan Vecera [Fri, 4 Jul 2025 18:21:56 +0000 (20:21 +0200)]
dpll: zl3073x: Fetch invariants during probe
Several configuration parameters will remain constant at runtime,
so we can load them during probe to avoid excessive reads from
the hardware.
Read the following parameters from the device during probe and store
them for later use:
* enablement status and frequencies of the synthesizers and their
associated DPLL channels
* enablement status and type (single-ended or differential) of input pins
* associated synthesizers, signal format, and enablement status of
outputs
Ivan Vecera [Fri, 4 Jul 2025 18:21:55 +0000 (20:21 +0200)]
dpll: Add basic Microchip ZL3073x support
Microchip Azurite ZL3073x represents chip family providing DPLL
and optionally PHC (PTP) functionality. The chips can be connected
be connected over I2C or SPI bus.
They have the following characteristics:
* up to 5 separate DPLL units (channels)
* 5 synthesizers
* 10 input pins (references)
* 10 outputs
* 20 output pins (output pin pair shares one output)
* Each reference and output can operate in either differential or
single-ended mode (differential mode uses 2 pins)
* Each output is connected to one of the synthesizers
* Each synthesizer is driven by one of the DPLL unit
The device uses 7-bit addresses and 8-bits values. It exposes 8-, 16-,
32- and 48-bits registers in address range <0x000,0x77F>. Due to 7bit
addressing, the range is organized into pages of 128 bytes, with each
page containing a page selector register at address 0x7F.
For reading/writing multi-byte registers, the device supports bulk
transfers.
Add basic functionality to access device registers, probe functionality
both I2C and SPI cases and add devlink support to provide info and
to set clock ID parameter.
Ivan Vecera [Fri, 4 Jul 2025 18:21:53 +0000 (20:21 +0200)]
devlink: Add support for u64 parameters
Only 8, 16 and 32-bit integers are supported for numeric devlink
parameters. The subsequent patch adds support for DPLL clock ID
that is defined as 64-bit number. Add support for u64 parameter
type.
Ivan Vecera [Fri, 4 Jul 2025 18:21:52 +0000 (20:21 +0200)]
dt-bindings: dpll: Add support for Microchip Azurite chip family
Add DT bindings for Microchip Azurite DPLL chip family. These chips
provide up to 5 independent DPLL channels, 10 differential or
single-ended inputs and 10 differential or 20 single-ended outputs.
They can be connected via I2C or SPI busses.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-3-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Fri, 4 Jul 2025 18:21:51 +0000 (20:21 +0200)]
dt-bindings: dpll: Add DPLL device and pin
Add a common DT schema for DPLL device and its associated pins.
The DPLL (device phase-locked loop) is a device used for precise clock
synchronization in networking and telecom hardware.
The device includes one or more DPLLs (channels) and one or more
physical input/output pins.
Each DPLL channel is used either to provide a pulse-per-clock signal or
to drive an Ethernet equipment clock.
The input and output pins have the following properties:
* label: specifies board label
* connection type: specifies its usage depending on wiring
* list of supported or allowed frequencies: depending on how the pin
is connected and where)
* embedded sync capability: indicates whether the pin supports this
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-2-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
virtio-net: xsk: rx: move the xdp->data adjustment to buf_to_xdp()
This commit does not do any functional changes. It moves xdp->data
adjustment for buffer other than first buffer to buf_to_xdp() helper so
that the xdp_buff adjustment does not scatter over different functions.
====================
Add vf drivers for wangxun virtual functions
Introduces basic support for Wangxun’s virtual function (VF) network
drivers, specifically txgbevf and ngbevf. These drivers provide SR-IOV
VF functionality for Wangxun 10/25/40G network devices.
The first three patches add common APIs for Wangxun VF drivers, including
mailbox communication and shared initialization logic.These abstractions
are placed in libwx to reduce duplication across VF drivers.
Patches 4–8 introduce the txgbevf driver, including:
PCI device initialization, Hardware reset, Interrupt setup, Rx/Tx datapath
implementation and link status changeing flow.
Patches 9–12 implement the ngbevf driver, mirroring the functionality
added in txgbevf.
Add link update flow to wangxun 1G virtual functions.
Get link status from pf in mbox, and if it is failed then
check the vx_status, because vx_status switching is too slow.
Add specific parameters for irq alloc, then use
wx_init_interrupt_scheme to initialize interrupt
allocation in probe.
Add .ndo_start_xmit support and start all queues.
Add doc build infrastructure for ngbevf driver.
Implement the basic PCI driver loading and unloading interface.
Initialize the id_table which support 1G virtual
functions for Wangxun.
Add link update flow to wangxun 10/25/40G virtual functions.
Get link status from pf in mbox, and if it is failed then
check the vx_status, because vx_status switching is too slow.
Improve the configuration of Rx and Tx ring.
Setup and alloc resources.
Configure Rx and Tx unit on hardware.
Add .ndo_start_xmit support and start all queues.
Add irq alloc flow functions for vf.
Alloc pcie msix irqs for drivers and request_irq for tx/rx rings
and misc other events.
If the application is successful, config vertors for interrupts.
Enable interrupts mask in wxvf_irq_enable.
Add doc build infrastructure for txgbevf driver.
Implement the basic PCI driver loading and unloading interface.
Initialize the id_table which support 10/25/40G virtual
functions for Wangxun.
Ioremap the space of bar0 and bar4 which will be used.
net: dt-bindings: ixp4xx-ethernet: Support fixed links
This ethernet controller is using fixed links for DSA switches
in two already existing device trees, so make sure the checker
does not complain like this:
intel-ixp42x-linksys-wrv54g.dtb: ethernet@c8009000 (intel,ixp4xx-ethernet):
'fixed-link' does not match any of the regexes: '^pinctrl-[0-9]+$'
from schema $id: http://devicetree.org/schemas/net/intel,ixp4xx-ethernet.yaml#
intel-ixp42x-usrobotics-usr8200.dtb: ethernet@c800a000 (intel,ixp4xx-ethernet):
'fixed-link' does not match any of the regexes: '^pinctrl-[0-9]+$'
from schema $id: http://devicetree.org/schemas/net/intel,ixp4xx-ethernet.yaml#
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507040609.K9KytWBA-lkp@intel.com/ Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250704-ixp4xx-ethernet-binding-fix-v1-1-8ac360d5bc9b@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
ipv6: Drop RTNL from mcast.c and anycast.c
This is a prep series for RCU conversion of RTM_NEWNEIGH, which needs
RTNL during neigh_table.{pconstructor,pdestructor}() touching IPv6
multicast code.
Currently, IPv6 multicast code is protected by lock_sock() and
inet6_dev->mc_lock, and RTNL is not actually needed.
In addition, anycast code is also in the same situation and does not
need RTNL at all.
This series removes RTNL from net/ipv6/{mcast.c,anycast.c} and finally
removes setsockopt_needs_rtnl() from do_ipv6_setsockopt().
inet6_dev->ac_list is protected by inet6_dev->lock, so rtnl_dereference()
is a bit rough annotation.
As done in mcast.c, we can use ac_dereference() that checks if
inet6_dev->lock is held.
Let's replace rtnl_dereference() with a new helper ac_dereference().
Note that now addrconf_join_solict() / addrconf_leave_solict() in
__ipv6_dev_ac_inc() / __ipv6_dev_ac_dec() does not need RTNL, so we
can remove ASSERT_RTNL() there.
ipv6: mcast: Remove unnecessary ASSERT_RTNL and comment.
Now, RTNL is not needed for mcast code, and what's commented in
ip6_mc_msfget() is apparent by for_each_pmc_socklock(), which has
lockdep annotation for lock_sock().
Let's remove the comment and ASSERT_RTNL() in ipv6_mc_rejoin_groups().
ipv6: mcast: Don't hold RTNL for MCAST_ socket options.
In ip6_mc_source() and ip6_mc_msfilter(), per-socket mld data is
protected by lock_sock() and inet6_dev->mc_lock is also held for
some per-interface functions.
ip6_mc_find_dev_rtnl() only depends on RTNL. If we want to remove
it, we need to check inet6_dev->dead under mc_lock to close the race
with addrconf_ifdown(), as mentioned earlier.
Let's do that and drop RTNL for the rest of MCAST_ socket options.
Note that ip6_mc_msfilter() has unnecessary lock dances and they
are integrated into one to avoid the last-minute error and simplify
the error handling.
ipv6: mcast: Don't hold RTNL for IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP.
In __ipv6_sock_mc_join(), per-socket mld data is protected by lock_sock(),
and only __dev_get_by_index() requires RTNL.
Let's use dev_get_by_index() and drop RTNL for IPV6_ADD_MEMBERSHIP and
MCAST_JOIN_GROUP.
Note that we must call rt6_lookup() and dev_hold() under RCU.
If rt6_lookup() returns an entry from the exception table, dst_dev_put()
could change rt->dev.dst to loopback concurrently, and the original device
could lose the refcount before dev_hold() and unblock device registration.
dst_dev_put() is called from NETDEV_UNREGISTER and synchronize_net() follows
it, so as long as rt6_lookup() and dev_hold() are called within the same
RCU critical section, the dev is alive.
Even if the race happens, they are synchronised by idev->dead and mcast
addresses are cleaned up.
For the racy access to rt->dst.dev, we use dst_dev().
Since commit 63ed8de4be81 ("mld: add mc_lock for protecting per-interface
mld data"), the newly allocated struct ifmcaddr6 cannot be removed until
inet6_dev->mc_lock is released, so mca_get() and mc_put() are unnecessary.
Let's remove the extra refcounting.
Note that mca_get() was only used in __ipv6_dev_mc_inc().
The race window can be easily closed by checking inet6_dev->dead
under inet6_dev->mc_lock in __ipv6_dev_mc_inc() as addrconf_ifdown()
will acquire it after marking inet6_dev dead.
Let's check inet6_dev->dead under mc_lock in __ipv6_dev_mc_inc().
Note that now __ipv6_dev_mc_inc() no longer depends on RTNL and
we can remove ASSERT_RTNL() there and the RTNL comment above
addrconf_join_solict().
Jason Xing [Thu, 3 Jul 2025 14:17:12 +0000 (22:17 +0800)]
selftests/bpf: add a new test to check the consumer update case
The subtest sends 33 packets at one time on purpose to see if xsk
exitting __xsk_generic_xmit() updates the global consumer of tx queue
when reaching the max loop (max_tx_budget, 32 by default). The number 33
can avoid xskq_cons_peek_desc() updates the consumer when it's about to
quit sending, to accurately check if the issue that the first patch
resolves remains. The new case will not check this issue in zero copy
mode.
Jason Xing [Thu, 3 Jul 2025 14:17:11 +0000 (22:17 +0800)]
net: xsk: update tx queue consumer immediately after transmission
For afxdp, the return value of sendto() syscall doesn't reflect how many
descs handled in the kernel. One of use cases is that when user-space
application tries to know the number of transmitted skbs and then decides
if it continues to send, say, is it stopped due to max tx budget?
The following formular can be used after sending to learn how many
skbs/descs the kernel takes care of:
Prior to the current patch, in non-zc mode, the consumer of tx queue is
not immediately updated at the end of each sendto syscall when error
occurs, which leads to the consumer value out-of-dated from the perspective
of user space. So this patch requires store operation to pass the cached
value to the shared value to handle the problem.
More than those explicit errors appearing in the while() loop in
__xsk_generic_xmit(), there are a few possible error cases that might
be neglected in the following call trace:
__xsk_generic_xmit()
xskq_cons_peek_desc()
xskq_cons_read_desc()
xskq_cons_is_valid_desc()
It will also cause the premature exit in the while() loop even if not
all the descs are consumed.
Based on the above analysis, using @sent_frame could cover all the possible
cases where it might lead to out-of-dated global state of consumer after
finishing __xsk_generic_xmit().
The patch also adds a common helper __xsk_tx_release() to keep align
with the zc mode usage in xsk_tx_release().
We have an application that uses almost the same code for TCP and
AF_UNIX (SOCK_STREAM).
The application uses TCP_INQ for TCP, but AF_UNIX doesn't have it
and requires an extra syscall, ioctl(SIOCINQ) or getsockopt(SO_MEMINFO)
as an alternative.
Also, ioctl(SIOCINQ) for AF_UNIX SOCK_STREAM is more expensive because
it needs to iterate all skb in the receive queue.
This series adds a cached field for SIOCINQ to speed it up and introduce
SO_INQ, the generic version of TCP_INQ to get the queue length as cmsg in
each recvmsg().
====================
Let's add a simple test to check the basic functionality of SO_INQ.
The test does the following:
1. Create socketpair in self->fd[]
2. Enable SO_INQ
3. Send data via self->fd[0]
4. Receive data from self->fd[1]
5. Compare the SCM_INQ cmsg with ioctl(SIOCINQ)
We have an application that uses almost the same code for TCP and
AF_UNIX (SOCK_STREAM).
TCP can use TCP_INQ, but AF_UNIX doesn't have it and requires an
extra syscall, ioctl(SIOCINQ) or getsockopt(SO_MEMINFO) as an
alternative.
Let's introduce the generic version of TCP_INQ.
If SO_INQ is enabled, recvmsg() will put a cmsg of SCM_INQ that
contains the exact value of ioctl(SIOCINQ). The cmsg is also
included when msg->msg_get_inq is non-zero to make sockets
io_uring-friendly.
Note that SOCK_CUSTOM_SOCKOPT is flagged only for SOCK_STREAM to
override setsockopt() for SOL_SOCKET.
By having the flag in struct unix_sock, instead of struct sock, we
can later add SO_INQ support for TCP and reuse tcp_sk(sk)->recvmsg_inq.
Note also that supporting custom getsockopt() for SOL_SOCKET will need
preparation for other SOCK_CUSTOM_SOCKOPT users (UDP, vsock, MPTCP).
af_unix: Use cached value for SOCK_STREAM in unix_inq_len().
Compared to TCP, ioctl(SIOCINQ) for AF_UNIX SOCK_STREAM socket is more
expensive, as unix_inq_len() requires iterating through the receive queue
and accumulating skb->len.
Let's cache the value for SOCK_STREAM to a new field during sendmsg()
and recvmsg().
The field is protected by the receive queue lock.
Note that ioctl(SIOCINQ) for SOCK_DGRAM returns the length of the first
skb in the queue.
SOCK_SEQPACKET still requires iterating through the queue because we do
not touch functions shared with unix_dgram_ops. But, if really needed,
we can support it by switching __skb_try_recv_datagram() to a custom
version.
af_unix: Don't use skb_recv_datagram() in unix_stream_read_skb().
unix_stream_read_skb() calls skb_recv_datagram() with MSG_DONTWAIT,
which is mostly equivalent to sock_error(sk) + skb_dequeue().
In the following patch, we will add a new field to cache the number
of bytes in the receive queue. Then, we want to avoid introducing
atomic ops in the fast path, so we will reuse the receive queue lock.
As a preparation for the change, let's not use skb_recv_datagram()
in unix_stream_read_skb().
Note that sock_error() is now moved out of the u->iolock mutex as
the mutex does not synchronise the peer's close() at all.
====================
eth: fbnic: Add firmware logging support
Firmware running on fbnic generates device logs. These logs contain useful
information about the device which may or may not be related to the host.
Logs are stored in a ring buffer and accessible through DebugFS.
====================
Lee Trager [Wed, 2 Jul 2025 19:12:11 +0000 (12:12 -0700)]
eth: fbnic: Enable firmware logging
The firmware log buffer is enabled during probe and freed during remove.
Early versions of firmware do not support sending logs. Once the mailbox is
up driver will enable logging when supported firmware versions are detected.
Logging is disabled before the mailbox is freed.
Lee Trager [Wed, 2 Jul 2025 19:12:10 +0000 (12:12 -0700)]
eth: fbnic: Add mailbox support for firmware logs
By default firmware will not send logs to the host. This must be explicitly
enabled by the driver. The mailbox has the concept of a flag which is a u32
used as a boolean. Lack of flag defaults to a value of false. When enabling
logging historical logs may be optionally requested. These are log messages
generated by the NIC before the driver was loaded. The driver also sends a
log version to support changing the logging format in the future.
[SEND_LOGS_REQ] = {
[SEND_LOGS] /* flag to request log reporting */
[SEND_LOGS_HISTORY] /* flag to request historical logs */
[SEND_LOGS_VERSION] /* u32 indicating the log format version */
}
Logs may be sent to the user either one at a time, or when historical logs
are requested in bulk. Firmware may not send more than 14 messages in bulk
to prevent flooding the mailbox.
[LOG_MSG] = {
[LOG_INDEX] /* entry 0 - u64 index of log */
[LOG_MSEC] /* entry 0 - u32 timestamp of log */
[LOG_MSG] /* entry 0 - char log message up to 256 */
[LOG_LENGTH] /* u32 of remaining log items in arrays */
[LOG_INDEX_ARRAY] = {
[LOG_INDEX] /* entry 1 - u64 index of log */
[LOG_INDEX] /* entry 2 - u64 index of log */
...
}
[LOG_MSEC_ARRAY] = {
[LOG_MSEC] /* entry 1 - u32 timestamp of log */
[LOG_MSEC] /* entry 2 - u32 timestamp of log */
...
}
[LOG_MSG_ARRAY] = {
[LOG_MSG] /* entry 1 - char log message up to 256 */
[LOG_MSG] /* entry 2 - char log message up to 256 */
...
}
}
Lee Trager [Wed, 2 Jul 2025 19:12:09 +0000 (12:12 -0700)]
eth: fbnic: Create ring buffer for firmware logs
When enabled, firmware may send logs messages which are specific to the
device and not the host. Create a ring buffer to store these messages
which are read by a user through DebugFS. Buffer access is protected by
a spinlock.
Lee Trager [Wed, 2 Jul 2025 19:12:08 +0000 (12:12 -0700)]
eth: fbnic: Use FIELD_PREP to generate minimum firmware version
Create a new macro based on FIELD_PREP to generate easily readable minimum
firmware version ints. This macro will prevent the mistake from the
previous patch from happening again.
Lee Trager [Wed, 2 Jul 2025 19:12:07 +0000 (12:12 -0700)]
eth: fbnic: Fix incorrect minimum firmware version
The full minimum version is 0.10.6-0. The six is now correctly defined as
patch and shifted appropriately. 0.10.6-0 is a preproduction version of
firmware which was released over a year and a half ago. All production
devices meet this requirement.
====================
net: migrate remaining drivers to dedicated _rxfh_context ops
Around a year ago Ed added dedicated ops for managing RSS contexts.
This significantly improved the clarity of the driver facing API.
Migrate the remaining 3 drivers and remove the old way of muxing
the RSS context operations via .set_rxfh().
Jakub Kicinski [Mon, 7 Jul 2025 18:41:13 +0000 (11:41 -0700)]
eth: mlx5: migrate to the *_rxfh_context ops
Convert mlx5 to dedicated RXFH ops. This is a fairly shallow
conversion, TBH, most of the driver code stays as is, but we
let the core allocate the context ID for the driver.
mlx5e_rx_res_rss_get_rxfh() and friends are made void, since
core only calls the driver for context 0. The second call
is right after context creation so it must exist (tm).
Jakub Kicinski [Mon, 7 Jul 2025 18:41:12 +0000 (11:41 -0700)]
eth: ice: drop the dead code related to rss_contexts
ICE appears to have some odd form of rss_context use plumbed
in for .get_rxfh. The .set_rxfh side does not support creating
contexts, however, so this must be dead code. For at least a year
now (since commit 7964e7884643 ("net: ethtool: use the tracking
array for get_rxfh on custom RSS contexts")) we have not been
calling .get_rxfh with a non-zero rss_context. We just get
the info from the RSS XArray under dev->ethtool.
Remove what must be dead code in the driver, clear the support flags.
Jakub Kicinski [Mon, 7 Jul 2025 18:41:11 +0000 (11:41 -0700)]
eth: otx2: migrate to the *_rxfh_context ops
otx2 only supports additional indirection tables (no separate keys
etc.) so the conversion to dedicated callbacks and core-allocated
context is mostly removing the code which stores the extra tables
in the driver. Core already stores the indirection tables for
additional contexts, and doesn't call .get for them.
One subtle change here is that we'll now start with the table
covering all queues, not directing all traffic to queue 0.
This is what core expects if the user doesn't pass the initial
indir table explicitly (there's a WARN_ON() in the core trying
to make sure driver authors don't forget to populate ctx to
defaults).
Drivers implementing .create_rxfh_context don't have to set
cap_rss_ctx_supported, so remove it.
Fengyuan Gong [Wed, 2 Jul 2025 16:07:41 +0000 (16:07 +0000)]
net: account for encap headers in qdisc pkt len
Refine qdisc_pkt_len_init to include headers up through
the inner transport header when computing header size
for encapsulations. Also refine net/sched/sch_cake.c
borrowed from qdisc_pkt_len_init().
Signed-off-by: Fengyuan Gong <gfengyuan@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250702160741.1204919-1-gfengyuan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the driver uses phylib to operate PHY by default.
On some boards, the PHY device is separated from the MAC device.
As a result, the hibmcge driver cannot operate the PHY device.
In this patch, the driver determines whether a PHY is available
based on register configuration. If no PHY is available,
the driver will use fixed_phy to register fake phydev.
====================
net: Remove unused function parameters in skbuff.c
Couple of cleanup patches to get rid of unused function parameters around
skbuff.c, plus little things spotted along the way.
Offshoot of my question in [1], but way more contained. Found by adding
"-Wunused-parameter -Wno-error" to KBUILD_CFLAGS and grepping for specific
skbuff.c warnings.
Michal Luczaj [Wed, 2 Jul 2025 13:38:12 +0000 (15:38 +0200)]
net: skbuff: Drop unused @skb
Since its introduction in commit 6fa01ccd8830 ("skbuff: Add pskb_extract()
helper function"), pskb_carve_frag_list() never used the argument @skb.
Drop it and adapt the only caller.
No functional change intended.
Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Luczaj [Wed, 2 Jul 2025 13:38:11 +0000 (15:38 +0200)]
net: skbuff: Drop unused @skb
Since its introduction in commit ce098da1497c ("skbuff: Introduce
slab_build_skb()"), __slab_build_skb() never used the @skb argument. Remove
it and adapt both callers.
No functional change intended.
Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Luczaj [Wed, 2 Jul 2025 13:38:08 +0000 (15:38 +0200)]
net: splice: Drop unused @gfp
Since its introduction in commit 2e910b95329c ("net: Add a function to
splice pages into an skbuff for MSG_SPLICE_PAGES"), skb_splice_from_iter()
never used the @gfp argument. Remove it and adapt callers.
net: Use of_reserved_mem_region_to_resource{_byname}() for "memory-region"
Use the newly added of_reserved_mem_region_to_resource{_byname}()
functions to handle "memory-region" properties.
The error handling is a bit different for mtk_wed_mcu_load_firmware().
A failed match of the "memory-region-names" would skip the entry, but
then other errors in the lookup and retrieval of the address would not
skip the entry. However, that distinction is not really important.
Either the region is available and usable or it is not. So now, errors
from of_reserved_mem_region_to_resource() are ignored so the region is
simply skipped.
Hannes Reinecke [Tue, 1 Jul 2025 14:46:57 +0000 (16:46 +0200)]
net/handshake: Add new parameter 'HANDSHAKE_A_ACCEPT_KEYRING'
Add a new netlink parameter 'HANDSHAKE_A_ACCEPT_KEYRING' to provide
the serial number of the keyring to use.
Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250701144657.104401-1-hare@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Wed, 2 Jul 2025 07:12:30 +0000 (07:12 +0000)]
net/sched: acp_api: no longer acquire RTNL in tc_action_net_exit()
tc_action_net_exit() got an rtnl exclusion in commit a159d3c4b829 ("net_sched: acquire RTNL in tc_action_net_exit()")
Since then, commit 16af6067392c ("net: sched: implement reference
counted action release") made this RTNL exclusion obsolete for
most cases.
Only tcf_action_offload_del() might still require it.
Move the rtnl locking into tcf_idrinfo_destroy() when
an offload action is found.
Most netns do not have actions, yet deleting them is adding a lot
of pressure on RTNL, which is for many the most contended mutex
in the kernel.
We are moving to a per-netns 'rtnl', so tc_action_net_exit()
will not be able to grab 'rtnl' a single time for a batch of netns.
Before the patch:
perf probe -a rtnl_lock
perf record -e probe:rtnl_lock -a /bin/bash -c 'unshare -n "/bin/true"; sleep 1'
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.305 MB perf.data (25 samples) ]
After the patch:
perf record -e probe:rtnl_lock -a /bin/bash -c 'unshare -n "/bin/true"; sleep 1'
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.304 MB perf.data (9 samples) ]
====================
net: mctp: Add support for gateway routing
This series adds a gateway route type for the MCTP core, allowing
non-local EIDs as the match for a route.
Example setup using the mctp tools:
mctp route add 9 via mctpi2c0
mctp neigh add 9 dev mctpi2c0 lladdr 0x1d
mctp route add 10 gw 9
- will route packets to eid 10 through mctpi2c0, using a dest lladdr
of 0x1d (ie, that of the directly-attached eid 9).
The core change to support this is the introduction of a struct
mctp_dst, which represents the result of a route lookup. Since this
involves a bit of surgery through the routing code, we add a few tests
along the way.
We're introducing an ABI change in the new RTM_{NEW,GET,DEL}ROUTE
netlink formats, with the support for a RTA_GATEWAY attribute. Because
we need a network ID specified to fully-qualify a gateway EID, the
RTA_GATEWAY attribute carries the (net, eid) tuple in full:
struct mctp_fq_addr {
unsigned int net;
mctp_eid_t eid;
}
Of course, any questions, comments etc are most welcome.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
====================
Jeremy Kerr [Wed, 2 Jul 2025 06:20:14 +0000 (14:20 +0800)]
net: mctp: test: Add tests for gateway routes
Add a few kunit tests for the gateway routing. Because we have multiple
route types now (direct and gateway), rename mctp_test_create_route to
mctp_test_create_route_direct, and add a _gateway variant too.
Jeremy Kerr [Wed, 2 Jul 2025 06:20:13 +0000 (14:20 +0800)]
net: mctp: add gateway routing support
This change allows for gateway routing, where a route table entry
may reference a routable endpoint (by network and EID), instead of
routing directly to a netdevice.
We add support for a RTM_GATEWAY attribute for netlink route updates,
with an attribute format of:
struct mctp_fq_addr {
unsigned int net;
mctp_eid_t eid;
}
- we need the net here to uniquely identify the target EID, as we no
longer have the device reference directly (which would provide the net
id in the case of direct routes).
This makes route lookups recursive, as a route lookup that returns a
gateway route must be resolved into a direct route (ie, to a device)
eventually. We provide a limit to the route lookups, to prevent infinite
loop routing.
The route lookup populates a new 'nexthop' field in the dst structure,
which now specifies the key for the neighbour table lookup on device
output, rather than using the packet destination address directly.
Jeremy Kerr [Wed, 2 Jul 2025 06:20:12 +0000 (14:20 +0800)]
net: mctp: allow NL parsing directly into a struct mctp_route
The netlink route parsing functions end up setting a bunch of output
variables from the rt attributes. This will get messy when the routes
become more complex.
So, split the rt parsing into two types: a lookup (returning route
target data suitable for a route lookup, like when deleting a route) and
a populate (setting fields of a struct mctp_route).
In doing this, we need to separate the route allocation from
mctp_route_add, so add some comments on the lifetime semantics for the
latter.