This series addresses several string truncation issues that are flagged
by gcc-14. I do not have any reason to believe these are bugs, so I am
targeting this at net-next and have not provided Fixes tags.
Simon Horman [Tue, 13 Aug 2024 14:32:56 +0000 (15:32 +0100)]
bnxt_en: avoid truncation of per rx run debugfs filename
Although it seems unlikely in practice - there would need to be
rx ring indexes greater than 10^10 - it is theoretically possible
for the filename of per rx ring debugfs files to be truncated.
This is because although a 16 byte buffer is provided, the length
of the filename is restricted to 10 bytes. Remove this restriction
and allow the entire buffer to be used.
Also reduce the buffer to 12 bytes, which is sufficient.
Given that the range of rx ring indexes likely much smaller than the
maximum range of a 32-bit signed integer, a smaller buffer could be
used, with some further changes. But this change seems simple, robust,
and has minimal stack overhead.
Flagged by gcc-14:
.../bnxt_debugfs.c: In function 'bnxt_debug_dev_init':
drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c:69:30: warning: '%d' directive output may be truncated writing between 1 and 11 bytes into a region of size 10 [-Wformat-truncation=]
69 | snprintf(qname, 10, "%d", ring_idx);
| ^~
In function 'debugfs_dim_ring_init',
inlined from 'bnxt_debug_dev_init' at .../bnxt_debugfs.c:87:4:
.../bnxt_debugfs.c:69:29: note: directive argument in the range [-2147483643, 2147483646]
69 | snprintf(qname, 10, "%d", ring_idx);
| ^~~~
.../bnxt_debugfs.c:69:9: note: 'snprintf' output between 2 and 12 bytes into a destination of size 10
69 | snprintf(qname, 10, "%d", ring_idx);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Simon Horman [Tue, 13 Aug 2024 14:32:55 +0000 (15:32 +0100)]
bnxt_en: Extend maximum length of version string by 1 byte
This corrects an out-by-one error in the maximum length of the package
version string. The size argument of snprintf includes space for the
trailing '\0' byte, so there is no need to allow extra space for it by
reducing the value of the size argument by 1.
Simon Horman [Mon, 12 Aug 2024 11:24:13 +0000 (12:24 +0100)]
net: mvneta: Use __be16 for l3_proto parameter of mvneta_txq_desc_csum()
The value passed as the l3_proto argument of mvneta_txq_desc_csum()
is __be16. And mvneta_txq_desc_csum uses this parameter as a __be16
value. So use __be16 as the type for the parameter, rather than
type with host byte order.
Flagged by Sparse as:
.../mvneta.c:1796:25: warning: restricted __be16 degrades to integer
.../mvneta.c:1979:45: warning: incorrect type in argument 2 (different base types)
.../mvneta.c:1979:45: expected int l3_proto
.../mvneta.c:1979:45: got restricted __be16 [usertype] l3_proto
'struct config_item_type' is not modified in this driver.
This structure is only used with config_group_init_type_name() which takes
a const struct config_item_type* as a 3rd argument.
This also makes things consistent with 'netconsole_target_type' witch is
already const.
Constifying this structure moves some data to a read-only section, so
increase overall security, especially when the structure holds some
function pointers.
On a x86_64, with allmodconfig:
Before:
======
text data bss dec hex filename
33007 3952 1312 38271 957f drivers/net/netconsole.o
After:
=====
text data bss dec hex filename
33071 3888 1312 38271 957f drivers/net/netconsole.o
Ziwei Xiao [Mon, 12 Aug 2024 22:20:12 +0000 (15:20 -0700)]
gve: Add RSS device option
Add a device option to inform the driver about the hash key size and
hash table size used by the device. This information will be stored and
made available for RSS ethtool operations.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240812222013.1503584-2-pkaligineedi@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Frank Li [Sun, 11 Aug 2024 18:40:49 +0000 (14:40 -0400)]
dt-bindings: net: fsl,qoriq-mc-dpmac: using unevaluatedProperties
Replace additionalProperties with unevaluatedProperties because it have
allOf: $ref: ethernet-controller.yaml#.
Remove all properties, which already defined in ethernet-controller.yaml.
Fixed below CHECK_DTBS warnings:
arch/arm64/boot/dts/freescale/fsl-lx2160a-bluebox3.dtb:
fsl-mc@80c000000: dpmacs:ethernet@11: 'fixed-link' does not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/misc/fsl,qoriq-mc.yaml#
The current locking mechanism in netconsole is unsafe and suboptimal due
to the following issues:
1) Lock Release and Reacquisition Mid-Loop:
In netconsole_netdev_event(), the target_list_lock is released and
reacquired within a loop, potentially causing collisions and cleaning up
targets that are being enabled.
These issues stem from the following limitations in netconsole's locking
design:
1) write_{ext_}msg() functions:
a) Cannot sleep
b) Must iterate through targets and send messages to all enabled entries.
c) List iteration is protected by target_list_lock spinlock.
2) Network event handling in netconsole_netdev_event():
a) Needs to sleep
b) Requires iteration over the target list (holding
target_list_lock spinlock).
c) Some events necessitate netpoll struct cleanup, which *needs*
to sleep.
The target_list_lock needs to be used by non-sleepable functions while
also protecting operations that may sleep, leading to the current unsafe
design.
Solution:
========
1) Dual Locking Mechanism:
- Retain current target_list_lock for non-sleepable use cases.
- Introduce target_cleanup_list_lock (mutex) for sleepable
operations.
2) Deferred Cleanup:
- Implement atomic, deferred cleanup of structures using the new
mutex (target_cleanup_list_lock).
- Avoid the `goto` in the middle of the list_for_each_entry
3) Separate Cleanup List:
- Create target_cleanup_list for deferred cleanup, protected by
target_cleanup_list_lock.
- This allows cleanup() to sleep without affecting message
transmission.
- When iterating over targets, move devices needing cleanup to
target_cleanup_list.
- Handle cleanup under the target_cleanup_list_lock mutex.
4) Make a clear locking hierarchy
- The target_cleanup_list_lock takes precedence over target_list_lock.
- Major Workflow Locking Sequences:
a) Network Event Affecting Netpoll (netconsole_netdev_event):
rtnl -> target_cleanup_list_lock -> target_list_lock
b) Message Writing (write_msg()):
console_lock -> target_list_lock
c) Configfs Target Enable/Disable (enabled_store()):
dynamic_netconsole_mutex -> target_cleanup_list_lock -> target_list_lock
This hierarchy ensures consistent lock acquisition order across
different operations, preventing deadlocks and maintaining proper
synchronization. The target_cleanup_list_lock's higher priority allows
for safe deferred cleanup operations without interfering with regular
message transmission protected by target_list_lock. Each workflow
follows a specific locking sequence, ensuring that operations like
network event handling, message writing, and target management are
properly synchronized and do not conflict with each other.
Changelog:
v3:
* Move netconsole_process_cleanups() function to inside
CONFIG_NETCONSOLE_DYNAMIC block, avoiding Werror=unused-function
(Jakub)
v2:
* The selftest has been removed from the patchset because veth is now
IFF_DISABLE_NETPOLL. A new test will be sent separately.
* https://lore.kernel.org/all/20240807091657.4191542-1-leitao@debian.org/
Breno Leitao [Thu, 8 Aug 2024 12:25:11 +0000 (05:25 -0700)]
net: netconsole: Defer netpoll cleanup to avoid lock release during list traversal
Current issue:
- The `target_list_lock` spinlock is held while iterating over
target_list() entries.
- Mid-loop, the lock is released to call __netpoll_cleanup(), then
reacquired.
- This practice compromises the protection provided by
`target_list_lock`.
Reason for current design:
1. __netpoll_cleanup() may sleep, incompatible with holding a spinlock.
2. target_list_lock must be a spinlock because write_msg() cannot sleep.
(See commit b5427c27173e ("[NET] netconsole: Support multiple logging
targets"))
Defer the cleanup of the netpoll structure to outside the
target_list_lock() protected area. Create another list
(target_cleanup_list) to hold the entries that need to be cleaned up,
and clean them using a mutex (target_cleanup_list_lock).
Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Breno Leitao [Thu, 8 Aug 2024 12:25:10 +0000 (05:25 -0700)]
net: netconsole: Unify Function Return Paths
The return flow in netconsole's dynamic functions is currently
inconsistent. This patch aims to streamline and standardize the process
by ensuring that the mutex is unlocked before returning the ret value.
Additionally, this update includes a minor functional change where
certain strnlen() operations are performed with the
dynamic_netconsole_mutex locked. This adjustment is not anticipated to
cause any issues, however, it is crucial to document this change for
clarity.
Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Breno Leitao [Thu, 8 Aug 2024 12:25:09 +0000 (05:25 -0700)]
net: netconsole: Standardize variable naming
Update variable names from err to ret in cases where the variable may
return non-error values.
This change facilitates a forthcoming patch that relies on ret being
used consistently to handle return values, regardless of whether they
indicate an error or not.
Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
stmmac: Add Loongson platform support
v17:
* As Serge's comments:
Add return 0 for _dt_config().
Get back the conditional MSI-clear method execution.
v16:
* As Serge's comments:
Move the of_node_put(plat->mdio_node) call to the DT-config/clear methods.
Drop 'else if'.
* Modify the commit message of 7/14. (LS2K CPU -> LS2K SOC)
V15:
* Drop return that will not be executed.
* Move pdev from patch 12 to patch 13 to pass W=1 builds.
RFC v15:
* As Serge's comments:
Extend the commit message.(patch 7 and patch 11)
Add fixes tag for patch 8.
Add loongson_dwmac_dt_clear() patch.
Modify loongson_dwmac_msi_config().
...
* Pick Huacai's Acked-by tag.
* Pick Serge's Reviewed-by tag.
* I have already contacted the author(ZhangQing) of the module,
so I copied her valid email: diasyzhang@tencent.com.
Note:
I replied to the comments on v14 last Sunday, but all of Loongson's
email servers failed to deliver. The network administrator told me
today that he has fixed the problem and re-delivered all the failed
emails, but I did not see them on the mailing list. I hope they will
not suddenly appear in everyone's mailbox one day. I apologize for
this. (The email content mainly agrees with Serge's suggestion.)
v14:
Because Loongson GMAC can be also found with the 8-channels AV feature
enabled, we'll need to reconsider the patches logic and thus the
commit logs too. As Serge's comments and Russell's comments:
[PATCH net-next v14 01/15] net: stmmac: Move the atds flag to the stmmac_dma_cfg structure
[PATCH net-next v14 02/15] net: stmmac: Add multi-channel support
[PATCH net-next v14 03/15] net: stmmac: Export dwmac1000_dma_ops
[PATCH net-next v14 04/15] net: stmmac: dwmac-loongson: Drop duplicated hash-based filter size init
[PATCH net-next v14 05/15] net: stmmac: dwmac-loongson: Drop pci_enable/disable_msi calls
[PATCH net-next v14 06/15] net: stmmac: dwmac-loongson: Use PCI_DEVICE_DATA() macro for device identification
[PATCH net-next v14 07/15] net: stmmac: dwmac-loongson: Detach GMAC-specific platform data init
+-> Init the plat_stmmacenet_data::{tx_queues_to_use,rx_queues_to_use}
in the loongson_gmac_data() method.
[PATCH net-next v14 08/15] net: stmmac: dwmac-loongson: Init ref and PTP clocks rate
[PATCH net-next v14 09/15] net: stmmac: dwmac-loongson: Add phy_interface for Loongson GMAC
[PATCH net-next v14 10/15] net: stmmac: dwmac-loongson: Introduce PCI device info data
+-> Make sure the setup() method is called after the pci_enable_device()
invocation.
[PATCH net-next v14 11/15] net: stmmac: dwmac-loongson: Add DT-less GMAC PCI-device support
+-> Introduce the loongson_dwmac_dt_config() method here instead of
doing that in a separate patch.
+-> Add loongson_dwmac_acpi_config() which would just get the IRQ from
the pdev->irq field and make sure it is valid.
[PATCH net-next v14 12/15] net: stmmac: Fixed failure to set network speed to 1000.
+-> Drop the patch as Russell's comments, At the same time, he provided another
better repair suggestion, and I decided to send it separately after the
patch set was merged. See:
<https://lore.kernel.org/netdev/ZoW1fNqV3PxEobFx@shell.armlinux.org.uk/>
[PATCH net-next v14 13/15] net: stmmac: dwmac-loongson: Add Loongson Multi-channels GMAC support
+-> This is former "net: stmmac: dwmac-loongson: Add Loongson GNET
support" patch, but which adds the support of the Loongson GMAC with the
8-channels AV-feature available.
+-> loongson_dwmac_intx_config() shall be dropped due to the
loongson_dwmac_acpi_config() method added in the PATCH 11/15.
+-> Make sure loongson_data::loongson_id is initialized before the
stmmac_pci_info::setup() is called.
+-> Move the rx_queues_to_use/tx_queues_to_use and coe_unsupported
fields initialization to the loongson_gmac_data() method.
+-> As before, call the loongson_dwmac_msi_config() method if the multi-channels
Loongson MAC has been detected.
+-> Move everything GNET-specific to the next patch.
[PATCH net-next v14 14/15] net: stmmac: dwmac-loongson: Add Loongson GNET support
+-> Everything Loonsgson GNET-specific is supposed to be added in the
framework of this patch:
+ PCI_DEVICE_ID_LOONGSON_GNET macro
+ loongson_gnet_fix_speed() method
+ loongson_gnet_data() method
+ loongson_gnet_pci_info data
+ The GNET-specific part of the loongson_dwmac_setup() method.
+ ...
[PATCH net-next v14 15/15] net: stmmac: dwmac-loongson: Add loongson module author
Other's:
Pick Serge's Reviewed-by tag.
v13:
* Sorry, we have clarified some things in the past 10 days. I did not
give you a clear reply to the following questions in v12, so I need
to reply again:
1. The current LS2K2000 also have a GMAC(and two GNET) that supports 8
channels, so we have to reconsider the initialization of
tx/rx_queues_to_use into probe();
2. In v12, we disagreed on the loongson_dwmac_msi_config method, but I changed
it based on Serge's comments(If I understand correctly):
if (dev_of_node(&pdev->dev)) {
ret = loongson_dwmac_dt_config(pdev, plat, &res);
}
if (ld->loongson_id == DWMAC_CORE_LS2K2000) {
ret = loongson_dwmac_msi_config(pdev, plat, &res);
} else {
ret = loongson_dwmac_intx_config(pdev, plat, &res);
}
3. Our priv->dma_cap.pcs is false, so let's use PHY_INTERFACE_MODE_NA;
4. Our GMAC does not support Delay, so let's use PHY_INTERFACE_MODE_RGMII_ID,
the current dts is wrong, a fix patch will be sent to the LoongArch list
later.
Others:
* Re-split a part of the patch (it seems we do this with every version);
* Copied Serge's comments into the commit message of patch;
* Fixed the stmmac_dma_operation_mode() method;
* Changed some code comments.
v12:
* The biggest change is the re-splitting of patches.
* Add a "gmac_version" in loongson_data, then we only
read it once in the _probe().
* Drop Serge's patch.
* Rebase to the latest code state.
* Fixed the gnet commit message.
v11:
* Break loongson_phylink_get_caps(), fix bad logic.
* Remove a unnecessary ";".
* Remove some unnecessary "{}".
* add a blank.
* Move the code of fix _force_1000 to patch 6/6.
The main changes occur in these two functions:
loongson_dwmac_probe();
loongson_dwmac_setup();
v10:
As Andrew's comment:
* Add a #define for the 0x37.
* Add a #define for Port Select.
others:
* Pick Serge's patch, This patch resulted from the process
of reviewing our patch set.
* Based on Serge's patch, modify our loongson_phylink_get_caps().
* Drop patch 3/6, we need mac_interface.
* Adjusted the code layout of gnet patch.
* Corrected several errata in commit message.
* Move DISABLE_FORCE flag to loongson_gnet_data().
v9:
We have not provided a detailed list of equipment for a long time,
and I apologize for this. During this period, I have collected some
information and now present it to you, hoping to alleviate the pressure
of review.
1. IP core
We now have two types of IP cores, one is 0x37, similar to dwmac1000;
The other is 0x10. Compared to 0x37, we split several DMA registers
from one to two, and it is not worth adding a new entry for this.
According to Serge's comment, we made these devices work by overwriting
priv->synopsys_id = 0x37 and mac->dma = <LS_dma_ops>.
1.1. Some more detailed information
The number of DMA channels for 0x37 is 1; The number of DMA channels
for 0x10 is 8. Except for channel 0, otherchannels do not support
sending hardware checksums. Supported AV features are Qav, Qat, and Qas,
and the rest are consistent with 3.73.
2. DEVICE
We have two types of devices,
one is GMAC, which only has a MAC chip inside and needs an external PHY
chip;
the other is GNET, which integrates both MAC and PHY chips inside.
2.1. Some more detailed information
GMAC device: LS7A1000, LS2K1000, these devices do not support any pause
mode.
gnet device: LS7A2000, LS2K2000, the chip connection between the mac and
phy of these devices is not normal and requires two rounds of
negotiation; LS7A2000 does not support half-duplex and
multi-channel;
to enable multi-channel on LS2K2000, you need to turn off
hardware checksum.
**Note**: Only the LS2K2000's IP core is 0x10, while the IP cores of other
devices are 0x37.
* passed the CI
<https://github.com/linux-netdev/nipa/blob/main/tests/patch/checkpatch
/checkpatch.sh>
* reverse xmas tree order.
* Silence build warning.
* Re-split the patch.
* Add more detailed commit message.
* Add more code comment.
* Reduce modification of generic code.
* using the GNET-specific prefix.
* define a new macro for the GNET MAC.
* Use an easier way to overwrite mac.
* Removed some useless printk.
v8:
* The biggest change is according to Serge's comment in the previous
edition:
Seeing the patch in the current state would overcomplicate the generic
code and the only functions you need to update are
dwmac_dma_interrupt()
dwmac1000_dma_init_channel()
you can have these methods re-defined with all the Loongson GNET
specifics in the low-level platform driver (dwmac-loongson.c). After
that you can just override the mac_device_info.dma pointer with a
fixed stmmac_dma_ops descriptor. Here is what should be done for that:
1. Keep the Patch 4/9 with my comments fixed. First it will be partly
useful for your GNET device. Second in general it's a correct
implementation of the normal DW GMAC v3.x multi-channels feature and
will be useful for the DW GMACs with that feature enabled.
2. Create the Loongson GNET-specific
stmmac_dma_ops.dma_interrupt()
stmmac_dma_ops.init_chan()
methods in the dwmac-loongson.c driver. Don't forget to move all the
Loongson-specific macros from dwmac_dma.h to dwmac-loongson.c.
3. Create a Loongson GNET-specific platform setup method with the next
semantics:
+ allocate stmmac_dma_ops instance and initialize it with
dwmac1000_dma_ops.
+ override the stmmac_dma_ops.{dma_interrupt, init_chan} with
the pointers to the methods defined in 2.
+ allocate mac_device_info instance and initialize the
mac_device_info.dma field with a pointer to the new
stmmac_dma_ops instance.
+ call dwmac1000_setup() or initialize mac_device_info in a way
it's done in dwmac1000_setup() (the later might be better so you
wouldn't need to export the dwmac1000_setup() function).
+ override stmmac_priv.synopsys_id with a correct value.
4. Initialize plat_stmmacenet_data.setup() with the pointer to the
method created in 3.
* Others:
Re-split the patch.
Passed checkpatch.pl test.
v7:
* Refer to andrew's suggestion:
- Add DMA_INTR_ENA_NIE_RX and DMA_INTR_ENA_NIE_TX #define's, etc.
* Others:
- Using --subject-prefix="PATCH net-next vN" to indicate that the
patches are for the networking tree.
- Rebase to the latest networking tree:
<git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git>
v6:
* Refer to Serge's suggestion:
- Add new platform feature flag:
include/linux/stmmac.h:
+#define STMMAC_FLAG_HAS_LGMAC BIT(13)
- Add the IRQs macros specific to the Loongson Multi-channels GMAC:
drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h:
+#define DMA_INTR_ENA_NIE_LOONGSON 0x00060000 /* ...*/
#define DMA_INTR_ENA_NIE 0x00010000 /* Normal Summary */
...
- Drop all of redundant changes that don't require the
prototypes being converted to accepting the stmmac_priv
pointer.
* Refer to andrew's suggestion:
- Drop white space changes.
- break patch up into lots of smaller parts.
Some small patches have been put into another series as a preparation
see <https://lore.kernel.org/loongarch/cover.1702289232.git.siyanteng@loongson.cn/T/#t>
*note* : This series of patches relies on the three small patches above.
* others
- Drop irq_flags changes.
- Changed patch order.
v4 -> v5:
* Remove an ugly and useless patch (fix channel number).
* Remove the non-standard dma64 driver code, and also remove
the HWIF entries, since the associated custom callbacks no
longer exist.
* Refer to Serge's suggestion: Update the dwmac1000_dma.c to
support the multi-DMA-channels controller setup.
Yanteng Si [Wed, 7 Aug 2024 13:48:55 +0000 (21:48 +0800)]
net: stmmac: dwmac-loongson: Add Loongson GNET support
The new generation Loongson LS2K2000 SoC and LS7A2000 chipset are
equipped with the network controllers called Loongson GNET. It's the
single and multi DMA-channels Loongson GMAC but with a PHY attached.
Here is the summary of the DW GMAC features the controller has:
DW GMAC IP-core: v3.73a
Speeds: 10/100/1000Mbps
Duplex: Full (both versions), Half (LS2K2000 GNET only)
DMA-descriptors type: enhanced
L3/L4 filters availability: Y
VLAN hash table filter: Y
PHY-interface: GMII (PHY is integrated into the chips)
Remote Wake-up support: Y
Mac Management Counters (MMC): Y
Number of additional MAC addresses: 5
MAC Hash-based filter: Y
Hash Table Size: 256
AV feature: Y (LS2K2000 GNET only)
DMA channels: 8 (LS2K2000 GNET), 1 (LS7A2000 GNET)
Let's update the Loongson DWMAC driver to supporting the new Loongson
GNET controller. The change is mainly trivial: the driver shall be
bound to the PCIe device with DID 0x7a13, and the device-specific
setup() method shall be called for it. The only peculiarity concerns
the integrated PHY speed change procedure. The PHY has a weird problem
with switching from the low speeds to 1000Mbps mode. The speedup
procedure requires the PHY-link re-negotiation. So the suggested
change provide the device-specific fix_mac_speed() method to overcome
the problem.
Yanteng Si [Wed, 7 Aug 2024 13:48:54 +0000 (21:48 +0800)]
net: stmmac: dwmac-loongson: Add Loongson Multi-channels GMAC support
The Loongson DWMAC driver currently supports the Loongson GMAC
devices (based on the DW GMAC v3.50a/v3.73a IP-core) installed to the
LS2K1000 SoC and LS7A1000 chipset. But recently a new generation
LS2K2000 SoC was released with the new version of the Loongson GMAC
synthesized in. The new controller is based on the DW GMAC v3.73a
IP-core with the AV-feature enabled, which implies the multi
DMA-channels support. The multi DMA-channels feature has the next
vendor-specific peculiarities:
1. Split up Tx and Rx DMA IRQ status/mask bits:
Name Tx Rx
DMA_INTR_ENA_NIE = 0x00040000 | 0x00020000;
DMA_INTR_ENA_AIE = 0x00010000 | 0x00008000;
DMA_STATUS_NIS = 0x00040000 | 0x00020000;
DMA_STATUS_AIS = 0x00010000 | 0x00008000;
DMA_STATUS_FBI = 0x00002000 | 0x00001000;
2. Custom Synopsys ID hardwired into the GMAC_VERSION.SNPSVER register
field. It's 0x10 while it should have been 0x37 in accordance with
the actual DW GMAC IP-core version.
3. There are eight DMA-channels available meanwhile the Synopsys DW
GMAC IP-core supports up to three DMA-channels.
4. It's possible to have each DMA-channel IRQ independently delivered.
The MSI IRQs must be utilized for that.
Thus in order to have the multi-channels Loongson GMAC controllers
supported let's modify the Loongson DWMAC driver in accordance with
all the peculiarities described above:
1. Create the multi-channels Loongson GMAC-specific
stmmac_dma_ops::dma_interrupt()
stmmac_dma_ops::init_chan()
callbacks due to the non-standard DMA IRQ CSR flags layout.
2. Create the Loongson DWMAC-specific platform setup() method
which gets to initialize the DMA-ops with the dwmac1000_dma_ops
instance and overrides the callbacks described in 1. The method also
overrides the custom Synopsys ID with the real one in order to have
the rest of the HW-specific callbacks correctly detected by the driver
core.
3. Make sure the platform setup() method enables the flow control and
duplex modes supported by the controller.
Yanteng Si [Wed, 7 Aug 2024 13:48:05 +0000 (21:48 +0800)]
net: stmmac: dwmac-loongson: Add DT-less GMAC PCI-device support
The Loongson GMAC driver currently supports the network controllers
installed on the LS2K1000 SoC and LS7A1000 chipset, for which the GMAC
devices are required to be defined in the platform device tree source.
But Loongson machines may have UEFI (implies ACPI) or PMON/UBOOT
(implies FDT) as the system bootloaders. In order to have both system
configurations support let's extend the driver functionality with the
case of having the Loongson GMAC probed on the PCI bus with no device
tree node defined for it. That requires to make the device DT-node
optional, to rely on the IRQ line detected by the PCI core and to
have the MDIO bus ID calculated using the PCIe Domain+BDF numbers.
In order to have the device probe() and remove() methods less
complicated let's move the DT- and ACPI-specific code to the
respective sub-functions.
Yanteng Si [Wed, 7 Aug 2024 13:48:04 +0000 (21:48 +0800)]
net: stmmac: dwmac-loongson: Introduce PCI device info data
The Loongson GNET device support is about to be added in one of the
next commits. As another preparation for that introduce the PCI device
info data with a setup() callback performing the device-specific
platform data initializations. Currently it is utilized for the
already supported Loongson GMAC device only.
Yanteng Si [Wed, 7 Aug 2024 13:48:03 +0000 (21:48 +0800)]
net: stmmac: dwmac-loongson: Add phy_interface for Loongson GMAC
PHY-interface of the Loongson GMAC device is RGMII with no internal
delays added to the data lines signal. So to comply with that let's
pre-initialize the platform-data field with the respective enum
constant.
Yanteng Si [Wed, 7 Aug 2024 13:48:02 +0000 (21:48 +0800)]
net: stmmac: dwmac-loongson: Init ref and PTP clocks rate
Reference and PTP clocks rate of the Loongson GMAC devices is 125MHz.
(So is in the GNET devices which support is about to be added.) Set
the respective plat_stmmacenet_data field up in accordance with that
so to have the coalesce command and timestamping work correctly.
Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The driver currently supports the chips with the Loongson GMAC network
device synthesized with a single DMA-channel available. As a
preparation before adding the Loongson GNET support detach the
Loongson GMAC-specific platform data initializations to the
loongson_gmac_data() method and preserve the common settings in the
loongson_default_data().
While at it drop the return value statement from the
loongson_default_data() method as redundant.
Note there is no intermediate vendor-specific PCS in between the MAC
and PHY on Loongson GMAC and GNET. So the plat->mac_interface field
can be freely initialized with the PHY_INTERFACE_MODE_NA value.
Yanteng Si [Wed, 7 Aug 2024 13:47:09 +0000 (21:47 +0800)]
net: stmmac: dwmac-loongson: Use PCI_DEVICE_DATA() macro for device identification
For the readability sake convert the hard-coded Loongson GMAC PCI ID to
the respective macro and use the PCI_DEVICE_DATA() macro-function to
create the pci_device_id array entry. The later change will be
specifically useful in order to assign the device-specific data for the
currently supported device and for about to be added Loongson GNET
controller.
Yanteng Si [Wed, 7 Aug 2024 13:47:08 +0000 (21:47 +0800)]
net: stmmac: dwmac-loongson: Drop pci_enable/disable_msi calls
The Loongson GMAC driver currently doesn't utilize the MSI IRQs, but
retrieves the IRQs specified in the device DT-node. Let's drop the
direct pci_enable_msi()/pci_disable_msi() calls then as redundant
Yanteng Si [Wed, 7 Aug 2024 13:47:07 +0000 (21:47 +0800)]
net: stmmac: dwmac-loongson: Drop duplicated hash-based filter size init
The plat_stmmacenet_data::multicast_filter_bins field is twice
initialized in the loongson_default_data() method. Drop the redundant
initialization, but for the readability sake keep the filters init
statements defined in the same place of the method.
Yanteng Si [Wed, 7 Aug 2024 13:45:30 +0000 (21:45 +0800)]
net: stmmac: Export dwmac1000_dma_ops
Export the DW GMAC DMA-ops descriptor so one could be available in
the low-level platform drivers. It will be utilized to override some
callbacks in order to handle the LS2K2000 GNET device specifics. The
GNET controller support is being added in one of the following up
commits.
Yanteng Si [Wed, 7 Aug 2024 13:45:29 +0000 (21:45 +0800)]
net: stmmac: Add multi-channel support
DW GMAC v3.73 can be equipped with the Audio Video (AV) feature which
enables transmission of time-sensitive traffic over bridged local area
networks (DWC Ethernet QoS Product). In that case there can be up to two
additional DMA-channels available with no Tx COE support (unless there is
vendor-specific IP-core alterations). Each channel is implemented as a
separate Control and Status register (CSR) for managing the transmit and
receive functions, descriptor handling, and interrupt handling.
Add the multi-channels DW GMAC controllers support just by making sure the
already implemented DMA-configs are performed on the per-channel basis.
Note the only currently known instance of the multi-channel DW GMAC
IP-core is the LS2K2000 GNET controller, which has been released with the
vendor-specific feature extension of having eight DMA-channels. The device
support will be added in one of the following up commits.
Yanteng Si [Wed, 7 Aug 2024 13:45:28 +0000 (21:45 +0800)]
net: stmmac: Move the atds flag to the stmmac_dma_cfg structure
ATDS (Alternate Descriptor Size) is a part of the DMA Bus Mode configs
(together with PBL, ALL, EME, etc) of the DW GMAC controllers. Seeing
it's not changed at runtime but is activated as long as the IP-core
has it supported (at least due to the Type 2 Full Checksum Offload
Engine feature), move the respective parameter from the
stmmac_dma_ops::init() callback argument to the stmmac_dma_cfg
structure, which already have the rest of the DMA-related configs
defined.
Besides the being added in the next commit DW GMAC multi-channels
support will require to add the stmmac_dma_ops::init_chan() callback
and have the ATDS flag set/cleared for each channel in there. Having
the atds-flag in the stmmac_dma_cfg structure will make the parameter
accessible from stmmac_dma_ops::init_chan() callback too.
net/smc: Use static_assert() to check struct sizes
Commit 9748dbc9f265 ("net/smc: Avoid -Wflex-array-member-not-at-end
warnings") introduced tagged `struct smc_clc_v2_extension_fixed` and
`struct smc_clc_smcd_v2_extension_fixed`. We want to ensure that when
new members need to be added to the flexible structures, they are
always included within these tagged structs.
So, we use `static_assert()` to ensure that the memory layout for
both the flexible structure and the tagged struct is the same after
any changes.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jan Karcher <jaka@linux.ibm.com> Link: https://patch.msgid.link/ZrVBuiqFHAORpFxE@cute Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit d88cabfd9abc ("nfp: Avoid -Wflex-array-member-not-at-end
warnings") introduced tagged `struct nfp_dump_tl_hdr`. We want
to ensure that when new members need to be added to the flexible
structure, they are always included within this tagged struct.
So, we use `static_assert()` to ensure that the memory layout for
both the flexible structure and the tagged struct is the same after
any changes.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/ZrVB43Hen0H5WQFP@cute Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Remove unnecessary flex-array member `pad[]` and refactor the related
code a bit.
Fix the following warning:
net/sched/act_ct.c:57:29: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Jakub Kicinski [Tue, 13 Aug 2024 00:50:36 +0000 (17:50 -0700)]
Merge branch 'net-nexthop-increase-weight-to-u16'
Petr Machata says:
====================
net: nexthop: Increase weight to u16
In CLOS networks, as link failures occur at various points in the network,
ECMP weights of the involved nodes are adjusted to compensate. With high
fan-out of the involved nodes, and overall high number of nodes,
a (non-)ECMP weight ratio that we would like to configure does not fit into
8 bits. Instead of, say, 255:254, we might like to configure something like
1000:999. For these deployments, the 8-bit weight may not be enough.
To that end, in this patchset increase the next hop weight from u8 to u16.
Patch #1 adds a flag that indicates whether the reserved fields are zeroed.
This is a follow-up to a new fix merged in commit 6d745cd0e972 ("net:
nexthop: Initialize all fields in dumped nexthops"). The theory behind this
patch is that there is a strict ordering between the fields actually being
zeroed, the kernel declaring that they are, and the kernel repurposing the
fields. Thus clients can use the flag to tell if it is safe to interpret
the reserved fields in any way.
Patch #2 contains the substantial code and the commit message covers the
details of the changes.
Patches #3 to #6 add selftests.
====================
Petr Machata [Wed, 7 Aug 2024 14:13:49 +0000 (16:13 +0200)]
selftests: router_mpath_nh: Test 16-bit next hop weights
Add tests that exercise full 16 bits of NH weight.
To test the 255:65535, it is necessary to run more packets than for the
other tests. On a debug kernel, the test can take up to a minute, therefore
avoid the test when KSFT_MACHINE_SLOW.
Petr Machata [Wed, 7 Aug 2024 14:13:48 +0000 (16:13 +0200)]
selftests: router_mpath: Sleep after MZ
In the context of an offloaded datapath, it may take a while for the ip
link stats to be updated. This causes the test to fail when MZ_DELAY is too
low. Sleep after the packets are sent for the link stats to get up to date.
Petr Machata [Wed, 7 Aug 2024 14:13:47 +0000 (16:13 +0200)]
net: nexthop: Increase weight to u16
In CLOS networks, as link failures occur at various points in the network,
ECMP weights of the involved nodes are adjusted to compensate. With high
fan-out of the involved nodes, and overall high number of nodes,
a (non-)ECMP weight ratio that we would like to configure does not fit into
8 bits. Instead of, say, 255:254, we might like to configure something like
1000:999. For these deployments, the 8-bit weight may not be enough.
To that end, in this patch increase the next hop weight from u8 to u16.
Increasing the width of an integral type can be tricky, because while the
code still compiles, the types may not check out anymore, and numerical
errors come up. To prevent this, the conversion was done in two steps.
First the type was changed from u8 to a single-member structure, which
invalidated all uses of the field. This allowed going through them one by
one and audit for type correctness. Then the structure was replaced with a
vanilla u16 again. This should ensure that no place was missed.
The UAPI for configuring nexthop group members is that an attribute
NHA_GROUP carries an array of struct nexthop_grp entries:
struct nexthop_grp {
__u32 id; /* nexthop id - must exist */
__u8 weight; /* weight of this nexthop */
__u8 resvd1;
__u16 resvd2;
};
The field resvd1 is currently validated and required to be zero. We can
lift this requirement and carry high-order bits of the weight in the
reserved field:
struct nexthop_grp {
__u32 id; /* nexthop id - must exist */
__u8 weight; /* weight of this nexthop */
__u8 weight_high;
__u16 resvd2;
};
Keeping the fields split this way was chosen in case an existing userspace
makes assumptions about the width of the weight field, and to sidestep any
endianness issues.
The weight field is currently encoded as the weight value minus one,
because weight of 0 is invalid. This same trick is impossible for the new
weight_high field, because zero must mean actual zero. With this in place:
- Old userspace is guaranteed to carry weight_high of 0, therefore
configuring 8-bit weights as appropriate. When dumping nexthops with
16-bit weight, it would only show the lower 8 bits. But configuring such
nexthops implies existence of userspace aware of the extension in the
first place.
- New userspace talking to an old kernel will work as long as it only
attempts to configure 8-bit weights, where the high-order bits are zero.
Old kernel will bounce attempts at configuring >8-bit weights.
Renaming reserved fields as they are allocated for some purpose is commonly
done in Linux. Whoever touches a reserved field is doing so at their own
risk. nexthop_grp::resvd1 in particular is currently used by at least
strace, however they carry an own copy of UAPI headers, and the conversion
should be trivial. A helper is provided for decoding the weight out of the
two fields. Forcing a conversion seems preferable to bending backwards and
introducing anonymous unions or whatever.
Petr Machata [Wed, 7 Aug 2024 14:13:46 +0000 (16:13 +0200)]
net: nexthop: Add flag to assert that NHGRP reserved fields are zero
There are many unpatched kernel versions out there that do not initialize
the reserved fields of struct nexthop_grp. The issue with that is that if
those fields were to be used for some end (i.e. stop being reserved), old
kernels would still keep sending random data through the field, and a new
userspace could not rely on the value.
In this patch, use the existing NHA_OP_FLAGS, which is currently inbound
only, to carry flags back to the userspace. Add a flag to indicate that the
reserved fields in struct nexthop_grp are zeroed before dumping. This is
reliant on the actual fix from commit 6d745cd0e972 ("net: nexthop:
Initialize all fields in dumped nexthops").
Implement netdev_stat_ops and export the basic per-queue stats.
This interface expect users to set the values that are used
either to zero or to some other preserved value (they are 0xff by
default). So here we export bytes/packets/drops from tx and rx_stats
plus set some of the values that are exposed by queue stats
to zero.
Jakub Kicinski [Sat, 10 Aug 2024 05:43:21 +0000 (22:43 -0700)]
eth: fbnic: add basic rtnl stats
Count packets, bytes and drop on the datapath, and report
to the user. Since queues are completely freed when the
device is down - accumulate the stats in the main netdev struct.
This means that per-queue stats will only report values since
last reset (per qstat recommendation).
David S. Miller [Mon, 12 Aug 2024 13:16:25 +0000 (14:16 +0100)]
Merge branch 'ethtool-rss-driver-tweaks'
Jakub Kicinski says:
====================
ethtool: rss: driver tweaks and netlink context dumps
This series is a semi-related collection of RSS patches.
Main point is supporting dumping RSS contexts via ethtool netlink.
At present additional RSS contexts can be queried one by one, and
assuming user know the right IDs. This series uses the XArray
added by Ed to provide netlink dump support for ETHTOOL_GET_RSS.
Patch 1 is a trivial selftest debug patch.
Patch 2 coverts mvpp2 for no real reason other than that I had
a grand plan of converting all drivers at some stage.
Patch 3 removes a now moot check from mlx5 so that all tests
can pass.
Patch 4 and 5 make a bit used for context support optional,
for easier grepping of drivers which need converting
if nothing else.
Patch 6 OTOH adds a new cap bit; some devices don't support
using a different key per context and currently act
in surprising ways.
Patch 7 and 8 update the RSS netlink code to use XArray.
Patch 9 and 10 add support for dumping contexts.
Patch 11 and 12 are small adjustments to spec and a new test.
I'm getting distracted with other work, so probably won't have
the time soon to complete next steps, but things which are missing
are (and some of these may be bad ideas):
- better discovery
Some sort of API to tell the user who many contexts the device
can create. Upper bound, devices often share contexts between
ports etc. so it's hard to tell exactly and upfront number of
contexts for a netdev. But order of magnitude (4 vs 10s) may
be enough for container management system to know whether to bother.
- create/modify/delete via netlink
The only question here is how to handle all the tricky IOCTL
legacy. "No change" maps trivially to attribute not present.
"reset" (indir_size = 0) probably needs to be a new NLA_FLAG?
- better table size handling
The current API assumes the LUT has fixed size, which isn't
true for modern devices. We should have better APIs for the
drivers to resize the tables, and in user facing API -
the ability to specify pattern and min size rather than
exact table expected (sort of like ethtool CLI already does).
- recounted / socket-bound contexts
Support for contexts which get "cleaned up" when their parent
netlink socket gets closed. The major catch is that ntuple
filters (which we don't currently track) depend on the context,
so we need auto-removal for both.
v5:
- fix build
v4: https://lore.kernel.org/20240809031827.2373341-1-kuba@kernel.org
- adjust to the meaning of max context from net
v3: https://lore.kernel.org/20240806193317.1491822-1-kuba@kernel.org
- quite a few code comments and commit message changes
- mvpp2: fix interpretation of max_context_id (I'll take care of
the net -> net-next merge as needed)
- filter by ifindex in the selftest
v2: https://lore.kernel.org/20240803042624.970352-1-kuba@kernel.org
- fix bugs and build in mvpp2
v1: https://lore.kernel.org/20240802001801.565176-1-kuba@kernel.org
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 10 Aug 2024 05:37:28 +0000 (22:37 -0700)]
selftests: drv-net: rss_ctx: test dumping RSS contexts
Add a test for dumping RSS contexts. Make sure indir table
and key are sane when contexts are created with various
combination of inputs. Test the dump filtering by ifname
and start-context.
Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 10 Aug 2024 05:37:27 +0000 (22:37 -0700)]
netlink: specs: decode indirection table as u32 array
Indirection table is dumped as a raw u32 array, decode it.
It's tempting to decode hash key, too, but it is an actual
bitstream, so leave it be for now.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 10 Aug 2024 05:37:26 +0000 (22:37 -0700)]
ethtool: rss: support skipping contexts during dump
Applications may want to deal with dynamic RSS contexts only.
So dumping context 0 will be counter-productive for them.
Support starting the dump from a given context ID.
Alternative would be to implement a dump flag to skip just
context 0, not sure which is better...
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 10 Aug 2024 05:37:25 +0000 (22:37 -0700)]
ethtool: rss: support dumping RSS contexts
Now that we track RSS contexts in the core we can easily dump
them. This is a major introspection improvement, as previously
the only way to find all contexts would be to try all ids
(of which there may be 2^32 - 1).
Don't use the XArray iterators (like xa_for_each_start()) as they
do not move the index past the end of the array once done, which
caused multiple bugs in Netlink dumps in the past.
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 10 Aug 2024 05:37:24 +0000 (22:37 -0700)]
ethtool: rss: report info about additional contexts from XArray
IOCTL already uses the XArray when reporting info about additional
contexts. Do the same thing in netlink code.
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 10 Aug 2024 05:37:22 +0000 (22:37 -0700)]
ethtool: rss: don't report key if device doesn't support it
marvell/otx2 and mvpp2 do not support setting different
keys for different RSS contexts. Contexts have separate
indirection tables but key is shared with all other contexts.
This is likely fine, indirection table is the most important
piece.
Don't report the key-related parameters from such drivers.
This prevents driver-errors, e.g. otx2 always writes
the main key, even when user asks to change per-context key.
The second reason is that without this change tracking
the keys by the core gets complicated. Even if the driver
correctly reject setting key with rss_context != 0,
change of the main key would have to be reflected in
the XArray for all additional contexts.
Since the additional contexts don't have their own keys
not including the attributes (in Netlink speak) seems
intuitive. ethtool CLI seems to deal with it just fine.
Having to set the flag in majority of the drivers is
a bit tedious but not reporting the key is a safer
default.
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 10 Aug 2024 05:37:21 +0000 (22:37 -0700)]
eth: remove .cap_rss_ctx_supported from updated drivers
Remove .cap_rss_ctx_supported from drivers which moved to the new API.
This makes it easy to grep for drivers which still need to be converted.
Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 10 Aug 2024 05:37:20 +0000 (22:37 -0700)]
ethtool: make ethtool_ops::cap_rss_ctx_supported optional
cap_rss_ctx_supported was created because the API for creating
and configuring additional contexts is mux'ed with the normal
RSS API. Presence of ops does not imply driver can actually
support rss_context != 0 (in fact drivers mostly ignore that
field). cap_rss_ctx_supported lets core check that the driver
is context-aware before calling it.
Now that we have .create_rxfh_context, there is no such
ambiguity. We can depend on presence of the op.
Make setting the bit optional.
Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 10 Aug 2024 05:37:19 +0000 (22:37 -0700)]
eth: mlx5: allow disabling queues when RSS contexts exist
Since commit 24ac7e544081 ("ethtool: use the rss context XArray
in ring deactivation safety-check") core will prevent queues from
being disabled while being used by additional RSS contexts.
The safety check is no longer necessary, and core will do a more
accurate job of only rejecting changes which can actually break
things.
Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 10 Aug 2024 05:37:18 +0000 (22:37 -0700)]
eth: mvpp2: implement new RSS context API
Implement the separate create/modify/delete ops for RSS.
No problems with IDs - even tho RSS tables are per device
the driver already seems to allocate IDs linearly per port.
There's a translation table from per-port context ID
to device context ID.
mvpp2 doesn't have a key for the hash, it defaults to
an empty/previous indir table.
Note that there is no key at all, so we don't have to be
concerned with reporting the wrong one (which is addressed
by a patch later in the series).
Compile-tested only.
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 10 Aug 2024 05:37:17 +0000 (22:37 -0700)]
selftests: drv-net: rss_ctx: add identifier to traffic comments
Include the "name" of the context in the comment for traffic
checks. Makes it easier to reason about which context failed
when we loop over 32 contexts (it may matter if we failed in
first vs last, for example).
Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Sat, 10 Aug 2024 02:06:32 +0000 (10:06 +0800)]
net: vxlan: remove duplicated initialization in vxlan_xmit
The variable "did_rsc" is initialized twice, which is unnecessary. Just
remove one of them.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: microchip: ksz9477: split half-duplex monitoring function
In order to respect the 80 columns limit, split the half-duplex
monitoring function in two.
This is just a styling change, no functional change.
Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This is v2 of the patch (now patches) adding support for ethtool
!autoneg while respecting the requirements of IEEE 802.3.
v2 fixes the build errors in the previous patch by first constifying
the "advertisement" argument to the linkmode functions that only
read from this pointer. It also fixes the incorrectly named
linkmode_set function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: phylib: do not disable autoneg for fixed speeds >= 1G
We have an increasing number of drivers that are forcing
auto-negotiation to be enabled for speeds of 1G or faster.
It would appear that auto-negotiation is mandatory for speeds above
100M. In 802.3, Annex 40C's state diagrams seems to imply that
mr_autoneg_enable (BMCR AN ENABLE) doesn't affect whether or not the
AN state machines work for 1000base-T, and some PHY datasheets (e.g.
Marvell Alaska) state that disabling mr_autoneg_enable leaves AN
enabled but forced to 1G full duplex.
Other PHY datasheets imply that BMCR AN ENABLE should not be cleared
for >= 1G.
Thus, this should be handled in phylib rather than in each driver.
Rather than erroring out, arrange to implement the Marvell Alaska
solution but in software for all PHYs: generate an appropriate
single-speed advertisement for the requested speed, and keep AN
enabled to the PHY driver. However, to avoid userspace API breakage,
continue to report to userspace that we have AN disabled.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
These two patches used to be part of another series [1] that did not
apply to the networking tree without conflicts. This is therefore just a
partial resend with no code modifications, just rebased onto net/main.
Javier Carrasco [Thu, 8 Aug 2024 09:47:33 +0000 (11:47 +0200)]
net: mvpp2: use device_for_each_child_node() to access device child nodes
The iterated nodes are direct children of the device node, and the
`device_for_each_child_node()` macro accounts for child node
availability.
`fwnode_for_each_available_child_node()` is meant to access the child
nodes of an fwnode, and therefore not direct child nodes of the device
node.
The child nodes within mvpp2_probe are not accessed outside the loops,
and the scoped version of the macro can be used to automatically
decrement the refcount on early exits.
Use `device_for_each_child_node()` and its scoped variant to indicate
device's direct child nodes.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Javier Carrasco [Thu, 8 Aug 2024 09:47:32 +0000 (11:47 +0200)]
net: mvpp2: use port_count to remove ports
As discussed in [1], there is no need to iterate over child nodes to
remove the list of ports. Instead, a loop up to `port_count` ports can
be used, and is in fact more reliable in case the child node
availability changes.
The suggested approach removes the need for the `fwnode` and
`port_fwnode` variables in mvpp2_remove() as well.
====================
fix bnxt_en queue reset when queue is active
The current bnxt_en queue API implementation is buggy when resetting a
queue that has active traffic. The problem is that there is no FW
involved to stop the flow of packets and relying on napi_disable() isn't
enough.
To fix this, call bnxt_hwrm_vnic_update() with MRU set to 0 for both the
default and the ntuple vnic to stop the flow of packets. This works for
any Rx queue and not only those that have ntuple rules since every Rx
queue is either in the default or the ntuple vnic.
For bnxt_hwrm_vnic_update() to work, proper flushing must be done by the
FW. A FW flag is there to indicate support and queue_mgmt_ops is keyed
behind this.
The first three patches are from Michael Chan and adds the prerequisite
vnic functions and FW flags indicating that it will properly flush
during vnic update.
Tested on BCM957504 while iperf3 is active:
1. Reset a queue that has an ntuple rule steering flow into it
2. Reset all queues in order, one at a time
In both cases the flow is not interrupted.
Sending this to net-next as there is no in-tree kernel consumer of queue
API just yet, and there is a patch that changes when the queue_mgmt_ops
is registered.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
---
v3:
- include patches from Michael Chan that adds a FW flag for vnic flush
capability
- key support for queue_mgmt_ops behind this new flag
v2:
- split setting vnic->mru into a separate patch (Wojciech)
- clarify why napi_enable()/disable() is removed
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Wei [Thu, 8 Aug 2024 05:15:17 +0000 (22:15 -0700)]
bnxt_en: stop packet flow during bnxt_queue_stop/start
The current implementation when resetting a queue while packets are
flowing puts the queue into an inconsistent state.
There needs to be some synchronisation with the FW. Add calls to
bnxt_hwrm_vnic_update() to set the MRU for both the default and ntuple
vnic during queue start/stop. When the MRU is set to 0, flow is stopped.
Each Rx queue belongs to either the default or the ntuple vnic.
With calling bnxt_hwrm_vnic_update() the calls to napi_enable() and
napi_disable() must be removed for reset to work on a queue that has
active traffic flowing e.g. iperf3.
Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 8 Aug 2024 05:15:15 +0000 (22:15 -0700)]
bnxt_en: Check the FW's VNIC flush capability
Check the HWRM_VNIC_QCAPS FW response for the receive engine flush
capability. This capability indicates that we can reliably support
RX ring restart when calling HWRM_VNIC_UPDATE with MRU set to 0.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 8 Aug 2024 05:15:14 +0000 (22:15 -0700)]
bnxt_en: Add support to call FW to update a VNIC
Add the function bnxt_hwrm_vnic_update() to call FW to update
a VNIC. This call can be used when disabling and enabling a
receive ring within a VNIC. The mru which is the maximum receive
size of packets received by the VNIC can be updated.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 11 Aug 2024 03:38:50 +0000 (04:38 +0100)]
Merge branch 'l2tp-misc-improvements'
James Chapman says:
====================
l2tp: misc improvements
This series makes several improvements to l2tp:
* update documentation to be consistent with recent l2tp changes.
* move l2tp_ip socket tables to per-net data.
* fix handling of hash key collisions in l2tp_v3_session_get
* implement and use get-next APIs for management and procfs/debugfs.
* improve l2tp refcount helpers.
* use per-cpu dev->tstats in l2tpeth devices.
* fix a lockdep splat.
* fix a race between l2tp_pre_exit_net and pppol2tp_release.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 7 Aug 2024 06:54:52 +0000 (07:54 +0100)]
l2tp: flush workqueue before draining it
syzbot exposes a race where a net used by l2tp is removed while an
existing pppol2tp socket is closed. In l2tp_pre_exit_net, l2tp queues
TUNNEL_DELETE work items to close each tunnel in the net. When these
are run, new SESSION_DELETE work items are queued to delete each
session in the tunnel. This all happens in drain_workqueue. However,
drain_workqueue allows only new work items if they are queued by other
work items which are already in the queue. If pppol2tp_release runs
after drain_workqueue has started, it may queue a SESSION_DELETE work
item, which results in the warning below in drain_workqueue.
Address this by flushing the workqueue before drain_workqueue such
that all queued TUNNEL_DELETE work items run before drain_workqueue is
started. This will queue SESSION_DELETE work items for each session in
the tunnel, hence pppol2tp_release or other API requests won't queue
SESSION_DELETE requests once drain_workqueue is started.
Fixes: fc7ec7f554d7 ("l2tp: delete sessions using work queue") Reported-by: syzbot+0e85b10481d2f5478053@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0e85b10481d2f5478053 Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 7 Aug 2024 06:54:51 +0000 (07:54 +0100)]
l2tp: l2tp_eth: use per-cpu counters from dev->tstats
l2tp_eth uses old-style dev->stats for fastpath packet/byte
counters. Convert it to use dev->tstats per-cpu counters.
Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 7 Aug 2024 06:54:50 +0000 (07:54 +0100)]
l2tp: improve tunnel/session refcount helpers
l2tp_tunnel_inc_refcount and l2tp_session_inc_refcount wrap
refcount_inc. They add no value so just use the refcount APIs directly
and drop l2tp's helpers. l2tp already uses refcount_inc_not_zero
anyway.
Rename l2tp_tunnel_dec_refcount and l2tp_session_dec_refcount to
l2tp_tunnel_put and l2tp_session_put to better match their use pairing
various _get getters.
Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 7 Aug 2024 06:54:49 +0000 (07:54 +0100)]
l2tp: use get_next APIs for management requests and procfs/debugfs
l2tp netlink and procfs/debugfs iterate over tunnel and session lists
to obtain data. They currently use very inefficient get_nth functions
to do so. Replace these with get_next.
For netlink, use nl cb->ctx[] for passing state instead of the
obsolete cb->args[].
l2tp_tunnel_get_nth and l2tp_session_get_nth are no longer used so
they can be removed.
Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 7 Aug 2024 06:54:48 +0000 (07:54 +0100)]
l2tp: add tunnel/session get_next helpers
l2tp management APIs and procfs/debugfs iterate over l2tp tunnel and
session lists. Since these lists are now implemented using IDR, we can
use IDR get_next APIs to iterate them. Add tunnel/session get_next
functions to do so.
The session get_next functions get the next session in a given tunnel
and need to account for l2tpv2 and l2tpv3 differences:
* l2tpv2 sessions are keyed by tunnel ID / session ID. Iteration for
a given tunnel ID, TID, can therefore start with a key given by
TID/0 and finish when the next entry's tunnel ID is not TID. This
is possible only because the tunnel ID part of the key is the upper
16 bits and the session ID part the lower 16 bits; when idr_next
increments the key value, it therefore finds the next sessions of
the current tunnel before those of the next tunnel. Entries with
session ID 0 are always skipped because they are used internally by
pppol2tp.
* l2tpv3 sessions are keyed by session ID. Iteration starts at the
first IDR entry and skips entries where the tunnel does not
match. Iteration must also consider session ID collisions and walk
the list of colliding sessions (if any) for one which matches the
supplied tunnel.
Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 7 Aug 2024 06:54:47 +0000 (07:54 +0100)]
l2tp: handle hash key collisions in l2tp_v3_session_get
To handle colliding l2tpv3 session IDs, l2tp_v3_session_get searches a
hashed list keyed by ID and sk. Although unlikely, if hash keys
collide, it is possible that hash_for_each_possible loops over a
session which doesn't have the ID that we are searching for. So check
for session ID match when looping over possible hash key matches.
Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 7 Aug 2024 06:54:46 +0000 (07:54 +0100)]
l2tp: move l2tp_ip and l2tp_ip6 data to pernet
l2tp_ip[6] have always used global socket tables. It is therefore not
possible to create l2tpip sockets in different namespaces with the
same socket address.
To support this, move l2tpip socket tables to pernet data.
Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 7 Aug 2024 06:54:45 +0000 (07:54 +0100)]
l2tp: remove inline from functions in c sources
Update l2tp to remove the inline keyword from several functions in C
sources, since this is now discouraged.
Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 7 Aug 2024 06:54:44 +0000 (07:54 +0100)]
documentation/networking: update l2tp docs
l2tp no longer uses sk_user_data in tunnel sockets and now manages
tunnel/session lifetimes slightly differently. Update docs to cover
this.
CC: linux-doc@vger.kernel.org CC: corbet@lwn.net Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, replacing a connection tracking steering entry was done by
adding a new rule (with the same tag but possibly different mod hdr
actions/labels) then removing the old rule.
This approach doesn't work in hardware steering because two steering
entries with the same tag cannot coexist in a hardware steering table.
This commit prepares for that by adding a new ct_rule_update operation on
the ct_fs_ops struct which is used instead of add+delete.
Implementations for both dmfs (firmware steering) and smfs (software
steering) are provided, which simply add the new rule and delete the old
one.
Cosmin Ratiu [Thu, 8 Aug 2024 05:59:26 +0000 (08:59 +0300)]
net/mlx5e: CT: 'update' rules instead of 'replace'
Offloaded rules can be updated with a new modify header action
containing a changed restore cookie. This was done using the verb
'replace', while in some configurations 'update' is a better fit.
This commit renames the functions used to reflect that.
Gal Pressman [Thu, 8 Aug 2024 05:59:25 +0000 (08:59 +0300)]
net/mlx5e: Use extack in get module eeprom by page callback
In case of errors in get module eeprom by page, reflect it through
extack instead of a dmesg print.
While at it, make the messages more human friendly.
Gal Pressman [Thu, 8 Aug 2024 05:59:22 +0000 (08:59 +0300)]
net/mlx5e: Use extack in set ringparams callback
In case of errors in set ringparams, reflect it through extack instead
of a dmesg print.
While at it, make the messages more human friendly and remove two
redundant checks that are already validated by the core.
Gal Pressman [Thu, 8 Aug 2024 05:59:21 +0000 (08:59 +0300)]
net/mlx5e: Be consistent with bitmap handling of link modes
Use the bitmap operations when accessing the advertised/supported link
modes and remove places that access them as arrays of unsigned longs
(underlying implementation of the bitmap), this makes the code much more
readable and clear.
Jianbo Liu [Thu, 8 Aug 2024 05:59:20 +0000 (08:59 +0300)]
net/mlx5e: TC, Offload rewrite and mirror to both internal and external dests
Firmware has the limitation that it cannot offload a rule with rewrite
and mirror to internal and external destinations simultaneously.
This patch adds a workaround to this issue. Here the destination array
is split again, just like what's done in previous commit, but after
the action indexed by split_count - 1. An extra rule is added for the
leftover destinations. Such rule can be offloaded, even there are
destinations to both internal and external destinations, because the
header rewrite is left in the original FTE.
Jianbo Liu [Thu, 8 Aug 2024 05:59:19 +0000 (08:59 +0300)]
net/mlx5e: TC, Offload rewrite and mirror on tunnel over ovs internal port
To offload the encap rule when the tunnel IP is configured on an
openvswitch internal port, driver need to overwrite vport metadata in
reg_c0 to the value assigned to the internal port, then forward
packets to root table to be processed again by the rules matching on
the metadata for such internal port.
When such rule is combined with header rewrite and mirror, openvswitch
generates the rule like the following, because it resets mirror after
packets are modified.
in_port(enp8s0f0npf0sf1),..,
actions:enp8s0f0npf0sf2,set(tunnel(...)),set(ipv4(...)),vxlan_sys_4789,enp8s0f0npf0sf2
The split_count was introduced before to support rewrite and mirror.
Driver splits the rule into two different hardware rules in order to
offload it. But it's not enough to offload the above complicated rule
because of the limitations, in both driver and firmware.
To resolve this issue, the destination array is split again after the
destination indexed by split_count. An extra rule is added for the
leftover destinations (in the above example, it is enp8s0f0npf0sf2),
and is inserted to post_act table. And the extra destination is added
in the original rule to forward to post_act table, so the extra mirror
is done there.
Jianbo Liu [Thu, 8 Aug 2024 05:59:18 +0000 (08:59 +0300)]
net/mlx5e: Enable remove flow for hard packet limit
In the commit a2a73ea14b1a ("net/mlx5e: Don't listen to remove flows
event"), remove_flow_enable event is removed, and the hard limit
usually relies on software mechanism added in commit b2f7b01d36a9
("net/mlx5e: Simulate missing IPsec TX limits hardware
functionality"). But the delayed work is rescheduled every one second,
which is slow for fast traffic. As a result, traffic can't be blocked
even reaches the hard limit, which usually happens when soft and hard
limits are very close.
In reality it won't happen because soft limit is much lower than hard
limit. But, as an optimization for RX to block traffic when reaching
hard limit, need to set remove_flow_enable. When remove flow is
enabled, IPSEC HARD_LIFETIME ASO syndrome will be set in the metadata
defined in the ASO return register if packets reach hard lifetime
threshold. And those packets are dropped immediately by the steering
table.