Jason Xing [Fri, 2 Aug 2024 10:21:09 +0000 (18:21 +0800)]
tcp: rstreason: introduce SK_RST_REASON_TCP_STATE for active reset
Introducing a new type TCP_STATE to handle some reset conditions
appearing in RFC 793 due to its socket state. Actually, we can look
into RFC 9293 which has no discrepancy about this part.
Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Xing [Fri, 2 Aug 2024 10:21:08 +0000 (18:21 +0800)]
tcp: rstreason: introduce SK_RST_REASON_TCP_ABORT_ON_MEMORY for active reset
Introducing a new type TCP_ABORT_ON_MEMORY for tcp reset reason to handle
out of memory case.
Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Xing [Fri, 2 Aug 2024 10:21:07 +0000 (18:21 +0800)]
tcp: rstreason: introduce SK_RST_REASON_TCP_ABORT_ON_LINGER for active reset
Introducing a new type TCP_ABORT_ON_LINGER for tcp reset reason to handle
negative linger value case.
Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Xing [Fri, 2 Aug 2024 10:21:06 +0000 (18:21 +0800)]
tcp: rstreason: introduce SK_RST_REASON_TCP_ABORT_ON_CLOSE for active reset
Introducing a new type TCP_ABORT_ON_CLOSE for tcp reset reason to handle
the case where more data is unread in closing phase.
Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Youwan Wang [Wed, 31 Jul 2024 09:15:37 +0000 (17:15 +0800)]
net: phy: phy_device: fix PHY WOL enabled, PM failed to suspend
If the PHY of the mido bus is enabled with Wake-on-LAN (WOL),
we cannot suspend the PHY. Although the WOL status has been
checked in phy_suspend(), returning -EBUSY(-16) would cause
the Power Management (PM) to fail to suspend. Since
phy_suspend() is an exported symbol (EXPORT_SYMBOL),
timely error reporting is needed. Therefore, an additional
check is performed here. If the PHY of the mido bus is enabled
with WOL, we skip calling phy_suspend() to avoid PM failure.
From the following logs, it has been observed that the phydev->attached_dev
is NULL, phydev is "stmmac-0:01", it not attached, but it will affect suspend
and resume.The actually attached "stmmac-0:00" will not dpm_run_callback():
mdio_bus_phy_suspend().
Add a function that phylib can inquire of the driver whether WoL
has been enabled at the PHY.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Youwan Wang <youwan@nfschina.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://patch.msgid.link/20240731091537.771391-1-youwan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Breno Leitao [Mon, 5 Aug 2024 09:40:11 +0000 (02:40 -0700)]
net: veth: Disable netpoll support
The current implementation of netpoll in veth devices leads to
suboptimal behavior, as it triggers warnings due to the invocation of
__netif_rx() within a softirq context. This is not compliant with
expected practices, as __netif_rx() has the following statement:
====================
mlx5 PTM cross timestamping support
This patchset by Rahul and Carolina adds PTM (Precision Time Measurement)
support to the mlx5 driver.
PTM is a PCI extended capability introduced by PCI-SIG for providing an
accurate read of the device clock offset without being impacted by
asymmetric bus transfer rates.
The performance of PTM on ConnectX-7 was evaluated using both real-time
(RTC) and free-running (FRC) clocks under traffic and no traffic
conditions. Tests with phc2sys measured the maximum offset values at a 50Hz
rate, with and without PTM.
net/mlx5: Implement PTM cross timestamping support
Expose Precision Time Measurement support through related PTP ioctl.
The performance of PTM on ConnectX-7 was evaluated using both real-time
(RTC) and free-running (FRC) clocks under traffic and no traffic
conditions. Tests with phc2sys measured the maximum offset values at a 50Hz
rate, with and without PTM.
Merge patch series "can: kvaser_usb: Add hardware timestamp support to all devices"
Jimmy Assarsson <extja@kvaser.com> says:
This patch series add hardware timestamp support to all devices supported
by the kvaser_usb driver.
The first patches resolves a known issue; "Hardware timestamps are not set
for CAN Tx frames". I can't remember why this wasn't implemented in the
first version of the hydra driver.
Followed by, hardware timestamp support for leaf and usbcan based devices.
The final patches are removing code used for selecting the correct ethtool
and netdev ops.
Note: This patch series depends on patch
"can: kvaser_usb: Explicitly initialize family in leafimx..." [1].
Changes in v2:
- Replaced patch 3/15
can: kvaser_usb: Add function kvaser_usb_ticks_to_ktime()
with a new patch
can: kvaser_usb: Add helper functions to convert device timestamp into ktime
and put it first in this series
- Resolved Vincent MAILHOL's review comments regarding duplicated code when converting timestamps [2] [3]
- As pointed out by Vincent MAILHOL [4], the clock overflow commands is not
dispatched in this patch
moved code from 10/15
can: kvaser_usb: leaf: Add structs for Tx ACK and clock overflow commands
to 11/15
can: kvaser_usb: leaf: Store MSB of timestamp
where it's actually used
Jimmy Assarsson [Mon, 1 Jul 2024 15:49:33 +0000 (17:49 +0200)]
can: kvaser_usb: leaf: Add hardware timestamp support to usbcan devices
Add hardware timestamp support for all usbcan based devices (M16C).
The usbcan firmware is slightly different compared to the other Kvaser USB
interfaces:
- The timestamp is provided by a 32-bit counter, with 10us resolution.
Hence, the hardware timestamp will wrap after less than 12 hours.
- Each Rx CAN or Tx ACK command only contains the 16-bits LSB of the
timestamp counter.
- The 16-bits MSB are sent in an asynchronous event (command), if any
change occurred in the MSB since the last event.
Jimmy Assarsson [Mon, 1 Jul 2024 15:49:31 +0000 (17:49 +0200)]
can: kvaser_usb: leaf: Add structs for Tx ACK and clock overflow commands
For usbcan devices (M16C), add struct usbcan_cmd_tx_acknowledge for Tx ACK
commands and struct usbcan_cmd_clk_overflow_event for clock overflow event
commands.
Jimmy Assarsson [Mon, 1 Jul 2024 15:49:24 +0000 (17:49 +0200)]
can: kvaser_usb: hydra: Add struct for Tx ACK commands
Add, struct kvaser_cmd_tx_ack, for standard Tx ACK commands.
Expand kvaser_usb_hydra_ktime_from_cmd() to extract timestamps from both
standard and extended Tx ACK commands. Unsupported commands are silently
ignored, and 0 is returned.
Jimmy Assarsson [Mon, 1 Jul 2024 15:49:22 +0000 (17:49 +0200)]
can: kvaser_usb: Add helper functions to convert device timestamp into ktime
Add helper function kvaser_usb_ticks_to_ktime() that converts from
device ticks to ktime.
And kvaser_usb_timestamp{48,64}_to_ktime() that converts from device
48-bit or 64-bit timestamp, to ktime.
Merge patch series "can: esd_402_pci: Do cleanup; Add one-shot mode"
Stefan Mätje <stefan.maetje@esd.eu> says:
The goal of this patch series is to do some cleanup
and also add the support for the one-shot mode before
the next patch introduces CAN-FD support for this
driver.
Stefan Mätje [Wed, 17 Jul 2024 21:44:09 +0000 (23:44 +0200)]
can: esd_402_pci: Add support for one-shot mode
This patch adds support for one-shot mode. In this mode there happens no
automatic retransmission in the case of an arbitration lost error or on
any bus error.
Rename macros to use for esdACC CTRL register access to match the
internal documentation and to make the macro prefix consistent.
- ACC_CORE_OF_CTRL_MODE -> ACC_CORE_OF_CTRL
Makes the name match the documentation.
- ACC_REG_CONTROL_MASK_MODE_ -> ACC_REG_CTRL_MASK_
ACC_REG_CONTROL_MASK_ -> ACC_REG_CTRL_MASK_
Makes the prefix consistent for macros describing masks in the same
register (CTRL).
Haibo Chen [Thu, 1 Aug 2024 00:00:26 +0000 (20:00 -0400)]
can: flexcan: add wakeup support for imx95
iMX95 defines a bit in GPR that sets/unsets the IPG_STOP signal to the
FlexCAN module, controlling its entry into STOP mode. Wakeup should work
even if FlexCAN is in STOP mode.
Due to iMX95 architecture design, the A-Core cannot access GPR; only the
system manager (SM) can configure GPR. To support the wakeup feature,
follow these steps:
- For suspend:
1) During Linux suspend, when CAN suspends, do nothing for GPR and keep
CAN-related clocks on.
2) In ATF, check whether CAN needs to support wakeup; if yes, send a
request to SM through the SCMI protocol.
3) In SM, configure the GPR and unset IPG_STOP.
4) A-Core suspends.
- For wakeup and resume:
1) A-Core wakeup event arrives.
2) In SM, deassert IPG_STOP.
3) Linux resumes.
Add a new fsl_imx95_devtype_data and FLEXCAN_QUIRK_SETUP_STOP_MODE_SCMI to
reflect this.
Reviewed-by: Han Xu <han.xu@nxp.com> Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/all/20240731-flexcan-v4-2-82ece66e5a76@nxp.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The flexcan in iMX95 is not compatible with imx93 because wakeup method is
difference. Make fsl,imx95-flexcan not fallback to fsl,imx93-flexcan.
Reviewed-by: Han Xu <han.xu@nxp.com> Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Frank Li <Frank.Li@nxp.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/all/20240731-flexcan-v4-1-82ece66e5a76@nxp.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Breno Leitao [Fri, 2 Aug 2024 08:07:23 +0000 (01:07 -0700)]
net: netconsole: Fix MODULE_AUTHOR format
Update the MODULE_AUTHOR for netconsole, according to the format, as
stated in module.h:
use "Name <email>" or just "Name"
Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pawel Dembicki [Fri, 2 Aug 2024 05:16:09 +0000 (07:16 +0200)]
net: phy: vitesse: implement downshift in vsc73xx phys
This commit implements downshift feature in vsc73xx family phys.
By default downshift was enabled with maximum tries.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 2 Aug 2024 00:19:56 +0000 (17:19 -0700)]
net: skbuff: sprinkle more __GFP_NOWARN on ingress allocs
build_skb() and frag allocations done with GFP_ATOMIC will
fail in real life, when system is under memory pressure,
and there's nothing we can do about that. So no point
printing warnings.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Frank Li [Sat, 29 Jun 2024 02:17:54 +0000 (22:17 -0400)]
dt-bindings: can: fsl,flexcan: add common 'can-transceiver' for fsl,flexcan
Add common 'can-transceiver' children node for fsl,flexcan.
Fix below warning:
arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dtb: can@2180000: 'can-transceiver' does not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/net/can/fsl,flexcan.yaml#
David S. Miller [Sun, 4 Aug 2024 14:22:31 +0000 (15:22 +0100)]
Merge branch 'dsa-en7581' into main
Lorenzo Bianconi says:
====================
Add second QDMA support for EN7581 eth controller
EN7581 SoC supports two independent QDMA controllers to connect the
Ethernet Frame Engine (FE) to the CPU. Introduce support for the second
QDMA controller. This is a preliminary series to support multiple FE ports
(e.g. connected to a second PHY controller).
Changes since v1:
- squash patch 6/9 and 7/9
- move some duplicated code from patch 2/9 in 1/9
- cosmetics
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce support for the DSA built-in switch available on the EN7581
development board. EN7581 support is similar to MT7988 one except
it requires to set MT7530_FORCE_MODE bit in MT753X_PMCR_P register
for on cpu port.
Tested-by: Benjamin Larsson <benjamin.larsson@genexis.eu> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add documentation for the built-in switch which can be found in the
Airoha EN7581 SoC.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: Initialise net.core sysctl defaults in preinit_net().
Commit 7c3f1875c66f ("net: move somaxconn init from sysctl code")
introduced net_defaults_ops to make sure that net.core sysctl knobs
are always initialised even if CONFIG_SYSCTL is disabled.
Such operations better fit preinit_net() added for a similar purpose
by commit 6e77a5a4af05 ("net: initialize net->notrefcnt_tracker earlier").
Let's initialise the sysctl defaults in preinit_net().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp: Don't assign net->gen->ptr[] for pppol2tp_net_ops.
Commit fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and
ppp parts") converted net->gen->ptr[pppol2tp_net_id] in l2tp_ppp.c to
net->gen->ptr[l2tp_net_id] in l2tp_core.c.
Now the leftover wastes one entry of net->gen->ptr[] in each netns.
Let's avoid the unwanted allocation.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The DEVID register contains two pieces of information: the device ID in
the upper nibble, and the silicon revision number in the lower nibble.
The driver should work fine with any silicon revision, so let's mask
that out in the device ID check.
Fixes: 20e6d190ffe1 ("net: pse-pd: Add TI TPS23881 PSE controller driver") Signed-off-by: Kyle Swenson <kyle.swenson@est.tech> Reviewed-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
commit 3cec055c5695 ("rxrpc: Don't hold a ref for connection workqueue")
left behind rxrpc_put_client_conn().
And commit 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
removed rxrpc_accept_incoming_calls() but left declaration.
Dmitry Antipov [Thu, 1 Aug 2024 14:23:11 +0000 (17:23 +0300)]
net: core: annotate socks of struct sock_reuseport with __counted_by
According to '__reuseport_alloc()', annotate flexible array member
'sock' of 'struct sock_reuseport' with '__counted_by()' and use
convenient 'struct_size()' to simplify the math used in 'kzalloc()'.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240801142311.42837-1-dmantipov@yandex.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Thu, 1 Aug 2024 18:35:37 +0000 (19:35 +0100)]
tipc: guard against string buffer overrun
Smatch reports that copying media_name and if_name to name_parts may
overwrite the destination.
.../bearer.c:166 bearer_name_validate() error: strcpy() 'media_name' too large for 'name_parts->media_name' (32 vs 16)
.../bearer.c:167 bearer_name_validate() error: strcpy() 'if_name' too large for 'name_parts->if_name' (1010102 vs 16)
This does seem to be the case so guard against this possibility by using
strscpy() and failing if truncation occurs.
Introduced by commit b97bf3fd8f6a ("[TIPC] Initial merge")
Jakub Kicinski [Fri, 2 Aug 2024 23:39:45 +0000 (16:39 -0700)]
Merge branch 'ibmveth-rr-performance'
Nick Child says:
====================
ibmveth RR performance
This patchset aims to increase the ibmveth drivers small packet
request response rate.
These 2 patches address:
1. NAPI rescheduling technique
2. Driver-side processing of small packets
Testing over several netperf tcp_rr connections, we saw a
30% increase in transactions per second. No regressions
were observed in other workloads.
====================
Nick Child [Thu, 1 Aug 2024 21:12:15 +0000 (16:12 -0500)]
ibmveth: Recycle buffers during replenish phase
When the length of a packet is under the rx_copybreak threshold, the
buffer is copied into a new skb and sent up the stack. This allows the
dma mapped memory to be recycled back to FW.
Previously, the reuse of the DMA space was handled immediately.
This means that further packet processing has to wait until
h_add_logical_lan finishes for this packet.
Therefore, when reusing a packet, offload the hcall to the replenish
function. As a result, much of the shared logic between the recycle and
replenish functions can be removed.
This change increases TCP_RR packet rate by another 15% (370k to 430k
txns). We can see the ftrace data supports this:
PREV: ibmveth_poll = 8078553.0 us / 190999.0 hits = AVG 42.3 us
NEW: ibmveth_poll = 7632787.0 us / 224060.0 hits = AVG 34.07 us
Nick Child [Thu, 1 Aug 2024 21:12:14 +0000 (16:12 -0500)]
ibmveth: Optimize poll rescheduling process
When the ibmveth driver processes less than the budget, it must call
napi_complete_done() to release the instance. This function will
return false if the driver should avoid rearming interrupts.
Previously, the driver was ignoring the return code of
napi_complete_done(). As a result, there were unnecessary calls to
enable the veth irq.
Therefore, use the return code napi_complete_done() to determine if
irq rearm is necessary.
Additionally, in the event that new data is received immediately after
rearming interrupts, rather than just rescheduling napi, also jump
back to the poll processing loop since we are already in the poll
function (and know that we did not expense all of budget).
This slight tweak results in a 15% increase in TCP_RR transaction rate
(320k to 370k txns). We can see the ftrace data supports this:
PREV: ibmveth_poll = 8818014.0 us / 182802.0 hits = AVG 48.24
NEW: ibmveth_poll = 8082398.0 us / 191413.0 hits = AVG 42.22
Simon Horman [Thu, 1 Aug 2024 20:00:03 +0000 (21:00 +0100)]
linkmode: Change return type of linkmode_andnot to bool
linkmode_andnot() simply returns the result of bitmap_andnot().
And the return type of bitmap_andnot() is bool.
So it makes sense for the return type of linkmode_andnot()
to also be bool.
I checked all call-sites and they either ignore the return
value or treat it as a bool.
====================
Add second QDMA support for EN7581 eth controller
EN7581 SoC supports two independent QDMA controllers to connect the
Ethernet Frame Engine (FE) to the CPU. Introduce support for the second
QDMA controller. This is a preliminary series to support multiple FE ports
(e.g. connected to a second PHY controller).
====================
net: airoha: Allow mapping IO region for multiple qdma controllers
Map MMIO regions of both qdma controllers available on EN7581 SoC.
Run airoha_hw_cleanup routine for both QDMA controllers available on
EN7581 SoC removing airoha_eth module or in airoha_probe error path.
This is a preliminary patch to support multi-QDMA controllers.
net: airoha: Add airoha_qdma pointer in airoha_tx_irq_queue/airoha_queue structures
Move airoha_eth pointer in airoha_qdma structure from
airoha_tx_irq_queue/airoha_queue ones. This is a preliminary patch to
introduce support for multi-QDMA controllers available on EN7581.
net: airoha: Move irq_mask in airoha_qdma structure
QDMA controllers have independent irq lines, so move irqmask in
airoha_qdma structure. This is a preliminary patch to support multiple
QDMA controllers.
Introduce airoha_qdma struct and move qdma IO register mapping in
airoha_qdma. This is a preliminary patch to enable both QDMA controllers
available on EN7581 SoC.
Jakub Kicinski [Thu, 1 Aug 2024 23:23:17 +0000 (16:23 -0700)]
selftests: net: ksft: print more of the stack for checks
Print more stack frames and the failing line when check fails.
This helps when tests use helpers to do the checks.
Before:
# At ./ksft/drivers/net/hw/rss_ctx.py line 92:
# Check failed 1037698 >= 396893.0 traffic on other queues:[344612, 462380, 233020, 449174, 342298]
not ok 8 rss_ctx.test_rss_context_queue_reconfigure
After:
# Check| At ./ksft/drivers/net/hw/rss_ctx.py, line 387, in test_rss_context_queue_reconfigure:
# Check| test_rss_queue_reconfigure(cfg, main_ctx=False)
# Check| At ./ksft/drivers/net/hw/rss_ctx.py, line 230, in test_rss_queue_reconfigure:
# Check| _send_traffic_check(cfg, port, ctx_ref, { 'target': (0, 3),
# Check| At ./ksft/drivers/net/hw/rss_ctx.py, line 92, in _send_traffic_check:
# Check| ksft_lt(sum(cnts[i] for i in params['noise']), directed / 2,
# Check failed 1045235 >= 405823.5 traffic on other queues (context 1)':[460068, 351995, 565970, 351579, 127270]
not ok 8 rss_ctx.test_rss_context_queue_reconfigure
selftests: net: ksft: support marking tests as disruptive
Add new @ksft_disruptive decorator to mark the tests that might
be disruptive to the system. Depending on how well the previous
test works in the CI we might want to disable disruptive tests
by default and only let the developers run them manually.
KSFT framework runs disruptive tests by default. DISRUPTIVE=False
environment (or config file) can be used to disable these tests.
ksft_setup should be called by the test cases that want to use
new decorator (ksft_setup is only called via NetDrvEnv/NetDrvEpEnv for now).
In the future we can add similar decorators to, for example, avoid
running slow tests all the time. And/or have some option to run
only 'fast' tests for some sort of smoke test scenario.
$ DISRUPTIVE=False ./stats.py
KTAP version 1
1..5
ok 1 stats.check_pause
ok 2 stats.check_fec
ok 3 stats.pkt_byte_sum
ok 4 stats.qstat_by_ifindex
ok 5 stats.check_down # SKIP marked as disruptive
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:1 error:0
v3:
- parse yes and properly treat non-zero nums as true (Petr)
v2:
- convert from cli argument to env variable (Jakub)
selftests: net-drv: exercise queue stats when the device is down
Verify that total device stats don't decrease after it has been turned down.
Also make sure the device doesn't crash when we access per-queue stats
when it's down (in case it tries to access some pointers that are NULL).
KTAP version 1
1..5
ok 1 stats.check_pause
ok 2 stats.check_fec
ok 3 stats.pkt_byte_sum
ok 4 stats.qstat_by_ifindex
ok 5 stats.check_down
# Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0
v3:
- use errno.EOPNOTSUPP (Petr)
- move qstat[0] under try (Petr)
v2:
- KTAP output formatting (Jakub)
- defer instead of try/finally (Jakub)
- disappearing stats is an error (Jakub)
- ksft_ge instead of open coding (Jakub)
Jakub Kicinski [Thu, 1 Aug 2024 16:34:01 +0000 (09:34 -0700)]
net: remove IFF_* re-definition
We re-define values of enum netdev_priv_flags as preprocessor
macros with the same name. I guess this was done to avoid breaking
out of tree modules which may use #ifdef X for kernel compatibility?
Commit 7aa98047df95 ("net: move net_device priv_flags out from UAPI")
which added the enum doesn't say. In any case, the flags with defines
are quite old now, and defines for new flags don't get added.
OOT drivers have to resort to code greps for compat detection, anyway.
Let's delete these defines, save LoC, help LXR link to the right place.
This patchset replace all occurences of (1<<x) by BIT(x) to get rid
of checkpatch.pl "CHECK" output "Prefer using the BIT macro".
It also removes unnecessary ftrace-like logging, add missing blank line
after declaration and remove unnecessary parentheses around 'ndev->mtu
<= XAE_JUMBO_MTU' and 'ndev->mtu > XAE_MTU'.
Changes for v2:
- Split each coding style change into separate patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unnecessary parentheses around 'ndev->mtu
<= XAE_JUMBO_MTU' and 'ndev->mtu > XAE_MTU'. Reported
by checkpatch.
CHECK: Unnecessary parentheses around 'ndev->mtu > XAE_MTU'
+ if ((ndev->mtu > XAE_MTU) &&
+ (ndev->mtu <= XAE_JUMBO_MTU)) {
CHECK: Unnecessary parentheses around 'ndev->mtu <= XAE_JUMBO_MTU'
+ if ((ndev->mtu > XAE_MTU) &&
+ (ndev->mtu <= XAE_JUMBO_MTU)) {
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net: axienet: add missing blank line after declaration
Add missing blank line after declaration. Fixes below
checkpatch warnings.
WARNING: Missing a blank line after declarations
+ struct sockaddr *addr = p;
+ axienet_set_mac_address(ndev, addr->sa_data);
WARNING: Missing a blank line after declarations
+ struct axienet_local *lp = netdev_priv(ndev);
+ disable_irq(lp->tx_irq);
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net: axienet: Replace the occurrences of (1<<x) by BIT(x)
Replace all occurences of (1<<x) by BIT(x) to get rid of checkpatch.pl
"CHECK" output "Prefer using the BIT macro".
Signed-off-by: Appana Durga Kedareswara Rao <appana.durga.rao@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Aug 2024 08:20:29 +0000 (09:20 +0100)]
Merge branch 'vsock-virtio' into main
Luigi Leonardi says:
====================
vsock: avoid queuing on intermediate queue if possible
This series introduces an optimization for vsock/virtio to reduce latency
and increase the throughput: When the guest sends a packet to the host,
and the intermediate queue (send_pkt_queue) is empty, if there is enough
space, the packet is put directly in the virtqueue.
v3->v4
While running experiments on fio with 64B payload, I realized that there
was a mistake in my fio configuration, so I re-ran all the experiments
and now the latency numbers are indeed lower with the patch applied.
I also noticed that I was kicking the host without the lock.
- Fixed a configuration mistake on fio and re-ran all experiments.
- Fio latency measurement using 64B payload.
- virtio_transport_send_skb_fast_path sends kick with the tx_lock acquired
- Addressed all minor style changes requested by maintainer.
- Rebased on latest net-next
- Link to v3: https://lore.kernel.org/r/20240711-pinna-v3-0-697d4164fe80@outlook.com
v2->v3
- Performed more experiments using iperf3 using multiple streams
- Handling of reply packets removed from virtio_transport_send_skb,
as is needed just by the worker.
- Removed atomic_inc/atomic_sub when queuing directly to the vq.
- Introduced virtio_transport_send_skb_fast_path that handles the
steps for sending on the vq.
- Fixed a missing mutex_unlock in error path.
- Changed authorship of the second commit
- Rebased on latest net-next
v1->v2
In this v2 I replaced a mutex_lock with a mutex_trylock because it was
insidea RCU critical section. I also added a check on tx_run, so if the
module is being removed the packet is not queued. I'd like to thank Stefano
for reporting the tx_run issue.
Applied all Stefano's suggestions:
- Minor code style changes
- Minor commit text rewrite
Performed more experiments:
- Check if all the packets go directly to the vq (Matias' suggestion)
- Used iperf3 to see if there is any improvement in overall throughput
from guest to host
- Pinned the vhost process to a pCPU.
- Run fio using 512B payload
Rebased on latest net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Luigi Leonardi [Tue, 30 Jul 2024 19:43:08 +0000 (21:43 +0200)]
test/vsock: add ioctl unsent bytes test
Introduce two tests, one for SOCK_STREAM and one for SOCK_SEQPACKET,
which use SIOCOUTQ ioctl to check that the number of unsent bytes is
zero after delivering a packet.
vsock_connect and vsock_accept are no longer static: this is to
create more generic tests, allowing code to be reused for SEQPACKET
and STREAM.
Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Luigi Leonardi [Tue, 30 Jul 2024 19:43:07 +0000 (21:43 +0200)]
vsock/virtio: add SIOCOUTQ support for all virtio based transports
Introduce support for virtio_transport_unsent_bytes
ioctl for virtio_transport, vhost_vsock and vsock_loopback.
For all transports the unsent bytes counter is incremented
in virtio_transport_get_credit.
In virtio_transport (G2H) and in vhost-vsock (H2G) the counter
is decremented when the skbuff is consumed. In vsock_loopback the
same skbuff is passed from the transmitter to the receiver, so
the counter is decremented before queuing the skbuff to the
receiver.
Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Luigi Leonardi [Tue, 30 Jul 2024 19:43:06 +0000 (21:43 +0200)]
vsock: add support for SIOCOUTQ ioctl
Add support for ioctl(s) in AF_VSOCK.
The only ioctl available is SIOCOUTQ/TIOCOUTQ, which returns the number
of unsent bytes in the socket. This information is transport-specific
and is delegated to them using a callback.
Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Use of_property_read_bool() to read boolean properties rather than
of_find_property(). This is part of a larger effort to remove callers
of of_find_property() and similar functions. of_find_property() leaks
the DT struct property and data pointers which is a problem for
dynamically allocated nodes which may be freed.
net: mdio: Use of_property_count_u32_elems() to get property length
Replace of_get_property() with the type specific
of_property_count_u32_elems() to get the property length.
This is part of a larger effort to remove callers of of_get_property()
and similar functions. of_get_property() leaks the DT property data
pointer which is a problem for dynamically allocated nodes which may
be freed.
net: phy: qca807x: Drop unnecessary and broken DT validation
The check for "leds" and "gpio-controller" both being present is never
true because "leds" is a node, not a property. This could be fixed
with a check for child node, but there's really no need for the kernel
to validate a DT. Just continue ignoring the LEDs if GPIOs are present.
John Wang [Tue, 30 Jul 2024 08:46:35 +0000 (16:46 +0800)]
net: mctp: Consistent peer address handling in ioctl tag allocation
When executing ioctl to allocate tags, if the peer address is 0,
mctp_alloc_local_tag now replaces it with 0xff. However, during tag
dropping, this replacement is not performed, potentially causing the key
not to be dropped as expected.
Linus Torvalds [Thu, 1 Aug 2024 16:42:09 +0000 (09:42 -0700)]
Merge tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from wireless, bleutooth, BPF and netfilter.
Current release - regressions:
- core: drop bad gso csum_start and offset in virtio_net_hdr
- wifi: mt76: fix null pointer access in mt792x_mac_link_bss_remove
- eth: tun: add missing bpf_net_ctx_clear() in do_xdp_generic()
- phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and
aqr115c
Current release - new code bugs:
- smc: prevent UAF in inet_create()
- bluetooth: btmtk: fix kernel crash when entering btmtk_usb_suspend
- eth: bnxt: reject unsupported hash functions
Previous releases - regressions:
- sched: act_ct: take care of padding in struct zones_ht_key
- netfilter: fix null-ptr-deref in iptable_nat_table_init().
- tcp: adjust clamping window for applications specifying SO_RCVBUF
Previous releases - always broken:
- ethtool: rss: small fixes to spec and GET
- mptcp:
- fix signal endpoint re-add
- pm: fix backup support in signal endpoints
- wifi: ath12k: fix soft lockup on suspend
- eth: bnxt_en: fix RSS logic in __bnxt_reserve_rings()
- eth: ice: fix AF_XDP ZC timeout and concurrency issues
- eth: mlx5:
- fix missing lock on sync reset reload
- fix error handling in irq_pool_request_irq"
* tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
mptcp: fix duplicate data handling
mptcp: fix bad RCVPRUNED mib accounting
ipv6: fix ndisc_is_useropt() handling for PIO
igc: Fix double reset adapter triggered from a single taprio cmd
net: MAINTAINERS: Demote Qualcomm IPA to "maintained"
net: wan: fsl_qmc_hdlc: Discard received CRC
net: wan: fsl_qmc_hdlc: Convert carrier_lock spinlock to a mutex
net/mlx5e: Add a check for the return value from mlx5_port_set_eth_ptys
net/mlx5e: Fix CT entry update leaks of modify header context
net/mlx5e: Require mlx5 tc classifier action support for IPsec prio capability
net/mlx5: Fix missing lock on sync reset reload
net/mlx5: Lag, don't use the hardcoded value of the first port
net/mlx5: DR, Fix 'stack guard page was hit' error in dr_rule
net/mlx5: Fix error handling in irq_pool_request_irq
net/mlx5: Always drain health in shutdown callback
net: Add skbuff.h to MAINTAINERS
r8169: don't increment tx_dropped in case of NETDEV_TX_BUSY
netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init().
netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init().
net: drop bad gso csum_start and offset in virtio_net_hdr
...
Simon Horman [Wed, 31 Jul 2024 09:15:28 +0000 (10:15 +0100)]
ethtool: Don't check for NULL info in prepare_data callbacks
Since commit f946270d05c2 ("ethtool: netlink: always pass genl_info to
.prepare_data") the info argument of prepare_data callbacks is never
NULL. Remove checks present in callback implementations.
Commit f4f943c958a2 ("RDS: IB: ack more receive completions to improve performance")
removed rds_ib_recv_tasklet_fn() implementation but not the declaration.
And commit ec16227e1414 ("RDS/IB: Infiniband transport") declared but never implemented
other functions.