]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
5 years agonet: hns3: add a missing mutex destroy in hclge_init_ad_dev()
Huazhong Tan [Thu, 28 May 2020 13:48:09 +0000 (21:48 +0800)] 
net: hns3: add a missing mutex destroy in hclge_init_ad_dev()

Add a mutex destroy call in hclge_init_ae_dev() when fails.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: remove an unnecessary 'goto' in hclge_init_ae_dev()
Huazhong Tan [Thu, 28 May 2020 13:48:08 +0000 (21:48 +0800)] 
net: hns3: remove an unnecessary 'goto' in hclge_init_ae_dev()

Remove the redundant 'goto' and return -ENOMEM directly, when
allocating memory for 'hdev' fails in hclge_init_ae_dev().

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-ks8851-Unify-KS8851-SPI-and-MLL-drivers'
David S. Miller [Thu, 28 May 2020 23:30:04 +0000 (16:30 -0700)] 
Merge branch 'net-ks8851-Unify-KS8851-SPI-and-MLL-drivers'

Marek Vasut says:

====================
net: ks8851: Unify KS8851 SPI and MLL drivers

The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces
of silicon, except the former has an SPI interface, while the later has a
parallel bus interface. Thus far, Linux has two separate drivers for each
and they are diverging considerably.

This series unifies them into a single driver with small SPI and parallel
bus specific parts. The approach here is to first separate out the SPI
specific parts into a separate file, then add parallel bus accessors in
another separate file and then finally remove the old parallel bus driver.
The reason for replacing the old parallel bus driver is because the SPI
bus driver is much higher quality.

Note that I dropped "net: ks8851: Drop define debug and pr_fmt()" for now,
will send it separatelly later.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Remove ks8851_mll.c
Marek Vasut [Thu, 28 May 2020 22:21:46 +0000 (00:21 +0200)] 
net: ks8851: Remove ks8851_mll.c

The ks8851_mll.c is replaced by ks8851_par.c, which is using common code
from ks8851.c, just like ks8851_spi.c . Remove this old ad-hoc driver.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Implement Parallel bus operations
Marek Vasut [Thu, 28 May 2020 22:21:45 +0000 (00:21 +0200)] 
net: ks8851: Implement Parallel bus operations

Implement accessors for KS8851-16MLL/MLLI/MLLU parallel bus variant of
the KS8851. This is based off the ks8851_mll.c , which is a driver for
exactly the same hardware, however the ks8851.c code is much higher
quality. Hence, this patch pulls out the relevant information from the
ks8851_mll.c on how to access the bus, but uses the common ks8851.c
code. To make this patch reviewable, instead of rewriting ks8851_mll.c,
ks8851_mll.c is removed in a separate subsequent patch.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Separate SPI operations into separate file
Marek Vasut [Thu, 28 May 2020 22:21:44 +0000 (00:21 +0200)] 
net: ks8851: Separate SPI operations into separate file

Pull all the SPI bus specific code into a separate file, so that it is
not mixed with the common code. Rename ks8851.c to ks8851_common.c. The
ks8851_common.c is linked with ks8851_spi.c now, so it can call the
accessors in the ks8851_spi.c without any pointer indirection.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Implement register, FIFO, lock accessor callbacks
Marek Vasut [Thu, 28 May 2020 22:21:43 +0000 (00:21 +0200)] 
net: ks8851: Implement register, FIFO, lock accessor callbacks

The register and FIFO accessors are bus specific, so is locking.
Implement callbacks so that each variant of the KS8851 can implement
matching accessors and locking, and use the rest of the common code.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Permit overridding interrupt enable register
Marek Vasut [Thu, 28 May 2020 22:21:42 +0000 (00:21 +0200)] 
net: ks8851: Permit overridding interrupt enable register

The parallel bus variant does not need to use the TX interrupt at all
as it writes the TX FIFO directly with in .ndo_start_xmit, permit the
drivers to configure the interrupt enable bits.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Factor out TX work flush function
Marek Vasut [Thu, 28 May 2020 22:21:41 +0000 (00:21 +0200)] 
net: ks8851: Factor out TX work flush function

While the SPI version of the KS8851 requires a TX worker thread to pump
data via SPI, the parallel bus version can write data into the TX FIFO
directly in .ndo_start_xmit, as the parallel bus access is much faster
and does not sleep. Factor out this TX work flush part, so it can be
overridden by the parallel bus driver.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Split out SPI specific code from probe() and remove()
Marek Vasut [Thu, 28 May 2020 22:21:40 +0000 (00:21 +0200)] 
net: ks8851: Split out SPI specific code from probe() and remove()

Factor out common code into ks8851_probe_common() and
ks8851_remove_common() to permit both SPI and parallel
bus driver variants to use the common code path for
both probing and removal.

There should be no functional change.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Split out SPI specific entries in struct ks8851_net
Marek Vasut [Thu, 28 May 2020 22:21:39 +0000 (00:21 +0200)] 
net: ks8851: Split out SPI specific entries in struct ks8851_net

Add a new struct ks8851_net_spi, which embeds the original
struct ks8851_net and contains the entries specific only to
the SPI variant of KS8851.

There should be no functional change.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Factor out SKB receive function
Marek Vasut [Thu, 28 May 2020 22:21:38 +0000 (00:21 +0200)] 
net: ks8851: Factor out SKB receive function

Factor out this netif_rx_ni(), so it could be overridden by the parallel
bus variant of the KS8851 driver.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Factor out bus lock handling
Marek Vasut [Thu, 28 May 2020 22:21:37 +0000 (00:21 +0200)] 
net: ks8851: Factor out bus lock handling

Pull out bus access locking code into separate functions, this is done
in preparation for unifying the driver with the parallel bus one. The
parallel bus driver does not need heavy mutex locking of the bus and
works better with spinlocks, hence prepare these locking functions to
be overridden then.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Use 16-bit read of RXFC register
Marek Vasut [Thu, 28 May 2020 22:21:36 +0000 (00:21 +0200)] 
net: ks8851: Use 16-bit read of RXFC register

The RXFC register is the only one being read using 8-bit accessors.
To make it easier to support the 16-bit accesses used by the parallel
bus variant of KS8851, use 16-bit accessor to read RXFC register as
well as neighboring RXFCTR register.

Remove ks8851_rdreg8() as it is not used anywhere anymore.

There should be no functional change.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Use 16-bit writes to program MAC address
Marek Vasut [Thu, 28 May 2020 22:21:35 +0000 (00:21 +0200)] 
net: ks8851: Use 16-bit writes to program MAC address

On the SPI variant of KS8851, the MAC address can be programmed with
either 8/16/32-bit writes. To make it easier to support the 16-bit
parallel option of KS8851 too, switch both the MAC address programming
and readout to 16-bit operations.

Remove ks8851_wrreg8() as it is not used anywhere anymore.

There should be no functional change.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Remove ks8851_rdreg32()
Marek Vasut [Thu, 28 May 2020 22:21:34 +0000 (00:21 +0200)] 
net: ks8851: Remove ks8851_rdreg32()

The ks8851_rdreg32() is used only in one place, to read two registers
using a single read. To make it easier to support 16-bit accesses via
parallel bus later on, replace this single read with two 16-bit reads
from each of the registers and drop the ks8851_rdreg32() altogether.

If this has noticeable performance impact on the SPI variant of KS8851,
then we should consider using regmap to abstract the SPI and parallel
bus options and in case of SPI, permit regmap to merge register reads
of neighboring registers into single, longer, read.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Use dev_{get,set}_drvdata()
Marek Vasut [Thu, 28 May 2020 22:21:33 +0000 (00:21 +0200)] 
net: ks8851: Use dev_{get,set}_drvdata()

Replace spi_{get,set}_drvdata() with dev_{get,set}_drvdata(), which
works for both SPI and platform drivers. This is done in preparation
for unifying the KS8851 SPI and parallel bus drivers.

There should be no functional change.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Use devm_alloc_etherdev()
Marek Vasut [Thu, 28 May 2020 22:21:32 +0000 (00:21 +0200)] 
net: ks8851: Use devm_alloc_etherdev()

Use device managed version of alloc_etherdev() to simplify the code.
No functional change intended.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Pass device node into ks8851_init_mac()
Marek Vasut [Thu, 28 May 2020 22:21:31 +0000 (00:21 +0200)] 
net: ks8851: Pass device node into ks8851_init_mac()

Since the driver probe function already has a struct device *dev pointer
and can easily derive of_node pointer from it, pass the of_node pointer as
a parameter to ks8851_init_mac() to avoid fishing it out from ks->spidev.
This is the only reference to spidev in the function, so get rid of it.
This is done in preparation for unifying the KS8851 SPI and parallel bus
drivers.

No functional change.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Replace dev_err() with netdev_err() in IRQ handler
Marek Vasut [Thu, 28 May 2020 22:21:30 +0000 (00:21 +0200)] 
net: ks8851: Replace dev_err() with netdev_err() in IRQ handler

Use netdev_err() instead of dev_err() to avoid accessing the spidev->dev
in the interrupt handler. This is the only place which uses the spidev
in this function, so replace it with netdev_err() to get rid of it. This
is done in preparation for unifying the KS8851 SPI and parallel drivers.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Rename ndev to netdev in probe
Marek Vasut [Thu, 28 May 2020 22:21:29 +0000 (00:21 +0200)] 
net: ks8851: Rename ndev to netdev in probe

Rename ndev variable to netdev for the sake of consistency.

No functional change.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ks8851: Factor out spi->dev in probe()/remove()
Marek Vasut [Thu, 28 May 2020 22:21:28 +0000 (00:21 +0200)] 
net: ks8851: Factor out spi->dev in probe()/remove()

Pull out the spi->dev into one common place in the function instead of
having it repeated over and over again. This is done in preparation for
unifying ks8851 and ks8851-mll drivers. No functional change.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'vmxnet3-upgrade-to-version-4'
David S. Miller [Thu, 28 May 2020 23:26:49 +0000 (16:26 -0700)] 
Merge branch 'vmxnet3-upgrade-to-version-4'

Ronak Doshi says:

====================
vmxnet3: upgrade to version 4

vmxnet3 emulation has recently added several new features which includes
offload support for tunnel packets, support for new commands the driver
can issue to emulation, change in descriptor fields, etc. This patch
series extends the vmxnet3 driver to leverage these new features.

Compatibility is maintained using existing vmxnet3 versioning mechanism as
follows:
 - new features added to vmxnet3 emulation are associated with new vmxnet3
   version viz. vmxnet3 version 4.
 - emulation advertises all the versions it supports to the driver.
 - during initialization, vmxnet3 driver picks the highest version number
 supported by both the emulation and the driver and configures emulation
 to run at that version.

In particular, following changes are introduced:

Patch 1:
  This patch introduces utility macros for vmxnet3 version 4 comparison
  and updates Copyright information.

Patch 2:
  This patch implements get_rss_hash_opts and set_rss_hash_opts methods
  to allow querying and configuring different Rx flow hash configurations
  which can be used to support UDP/ESP RSS.

Patch 3:
  This patch introduces segmentation and checksum offload support for
  encapsulated packets. This avoids segmenting and calculating checksum
  for each segment and hence gives performance boost.

Patch 4:
  With all vmxnet3 version 4 changes incorporated in the vmxnet3 driver,
  with this patch, the driver can configure emulation to run at vmxnet3
  version 4.

Changes in v3 -> v4:
   - Replaced BUG_ON() with WARN_ON_ONCE()

Changes in v2 -> v3:
   - fixed get_rss_hash_opts to return correct values for udp rss

Changes in v2:
   - Fixed compilation issue due to missing closed brace
   - added fallthrough comment
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovmxnet3: update to version 4
Ronak Doshi [Thu, 28 May 2020 21:53:22 +0000 (14:53 -0700)] 
vmxnet3: update to version 4

With all vmxnet3 version 4 changes incorporated in the vmxnet3 driver,
the driver can configure emulation to run at vmxnet3 version 4, provided
the emulation advertises support for version 4.

Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovmxnet3: add geneve and vxlan tunnel offload support
Ronak Doshi [Thu, 28 May 2020 21:53:21 +0000 (14:53 -0700)] 
vmxnet3: add geneve and vxlan tunnel offload support

Vmxnet3 version 3 device supports checksum/TSO offload. Thus, vNIC to
pNIC traffic can leverage hardware checksum/TSO offloads. However,
vmxnet3 does not support checksum/TSO offload for Geneve/VXLAN
encapsulated packets. Thus, for a vNIC configured with an overlay, the
guest stack must first segment the inner packet, compute the inner
checksum for each segment and encapsulate each segment before
transmitting the packet via the vNIC. This results in significant
performance penalty.

This patch will enhance vmxnet3 to support Geneve/VXLAN TSO as well as
checksum offload.

Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovmxnet3: add support to get/set rx flow hash
Ronak Doshi [Thu, 28 May 2020 21:53:20 +0000 (14:53 -0700)] 
vmxnet3: add support to get/set rx flow hash

With vmxnet3 version 4, the emulation supports multiqueue(RSS) for
UDP and ESP traffic. A guest can enable/disable RSS for UDP/ESP over
IPv4/IPv6 by issuing commands introduced in this patch. ESP ipv6 is
not yet supported in this patch.

This patch implements get_rss_hash_opts and set_rss_hash_opts
methods to allow querying and configuring different Rx flow hash
configurations.

Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovmxnet3: prepare for version 4 changes
Ronak Doshi [Thu, 28 May 2020 21:53:19 +0000 (14:53 -0700)] 
vmxnet3: prepare for version 4 changes

vmxnet3 is currently at version 3 and this patch initiates the
preparation to accommodate changes for version 4. Introduced utility
macros for vmxnet3 version 4 comparison and update Copyright
information.

Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc: avoid an unused-variable warning
Arnd Bergmann [Wed, 27 May 2020 13:41:06 +0000 (15:41 +0200)] 
sfc: avoid an unused-variable warning

'nic_data' is no longer used outside of the #ifdef block
in efx_ef10_set_mac_address:

drivers/net/ethernet/sfc/ef10.c:3231:28: error: unused variable 'nic_data' [-Werror,-Wunused-variable]
        struct efx_ef10_nic_data *nic_data = efx->nic_data;

Move the variable into a local scope.

Fixes: dfcabb078847 ("sfc: move vport_id to struct efx_nic")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Thu, 28 May 2020 18:17:20 +0000 (11:17 -0700)] 
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2020-05-27

This series contains updates to the ice driver only.

Jesse fixes a number of issues, starting with fixing the remaining
signed versus unsigned comparison issues.  Cleaned up an unused code
define.  Fixed the implementation of the manage MAC write command, to
simplify it by using a simple array to represent the MAC address when
writing it.

Paul fixes the setting of the VF default LAN address, by removing a
check that assumed that the address had been deleted and zeroed.

Surabhi prevents a memory leak on filter management initialization
failures and during queue initialization and buffer allocation failures.

Brett adds additional receive error counters that are reported by
ethtool.  Fixed the enabling and disabling of VLAN stripping when the
PVID has been set.

Evan fixes a race condition between the firmware and software, which can
occur between the admin queue setup and the first command sent.

Marta fixes the driver when XDP transmit rings are destroyed, also make
sure the XDP transmit queues are also destroyed.  Update the statistics
when XDP transmit programs are loaded and packets are sent.  Changed the
number of XDP transmit queues to match the number of receive queues,
instead of matching the number of transmit queues.

Bruce avoids undefined behavior by not writing the 8-bit element
init_q_state with the associated internal-to-hardware field which is
122-bits.

Anirudh (Ani) refactors the receive checksum checks.

Krzysztof notifies the user if the fill queue is not long enough to
prepare all buffers before packet processing starts and allocates the
buffers during the NAPI poll.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'remove-most-callers-of-kernel_setsockopt-v3'
David S. Miller [Thu, 28 May 2020 18:11:46 +0000 (11:11 -0700)] 
Merge branch 'remove-most-callers-of-kernel_setsockopt-v3'

Christoph Hellwig says:

====================
remove most callers of kernel_setsockopt v3

this series removes most callers of the kernel_setsockopt functions, and
instead switches their users to small functions that implement setting a
sockopt directly using a normal kernel function call with type safety and
all the other benefits of not having a function call.

In some cases these functions seem pretty heavy handed as they do
a lock_sock even for just setting a single variable, but this mirrors
the real setsockopt implementation unlike a few drivers that just set
set the fields directly.

Changes since v2:
 - drop the separately merged kernel_getopt_removal
 - drop the sctp patches, as there is conflicting cleanup going on
 - add an additional ACK for the rxrpc changes

Changes since v1:
 - use ->getname for sctp sockets in dlm
 - add a new ->bind_add struct proto method for dlm/sctp
 - switch the ipv6 and remaining sctp helpers to inline function so that
   the ipv6 and sctp modules are not pulled in by any module that could
   potentially use ipv6 or sctp connections
 - remove arguments to various sock_* helpers that are always used with
   the same constant arguments
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: call tsk_set_importance from tipc_topsrv_create_listener
Christoph Hellwig [Thu, 28 May 2020 05:12:36 +0000 (07:12 +0200)] 
tipc: call tsk_set_importance from tipc_topsrv_create_listener

Avoid using kernel_setsockopt for the TIPC_IMPORTANCE option when we can
just use the internal helper.  The only change needed is to pass a struct
sock instead of tipc_sock, which is private to socket.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: add rxrpc_sock_set_min_security_level
Christoph Hellwig [Thu, 28 May 2020 05:12:35 +0000 (07:12 +0200)] 
rxrpc: add rxrpc_sock_set_min_security_level

Add a helper to directly set the RXRPC_MIN_SECURITY_LEVEL sockopt from
kernel space without going through a fake uaccess.

Thanks to David Howells for the documentation updates.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: add ip6_sock_set_recvpktinfo
Christoph Hellwig [Thu, 28 May 2020 05:12:34 +0000 (07:12 +0200)] 
ipv6: add ip6_sock_set_recvpktinfo

Add a helper to directly set the IPV6_RECVPKTINFO sockopt from kernel
space without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: add ip6_sock_set_addr_preferences
Christoph Hellwig [Thu, 28 May 2020 05:12:33 +0000 (07:12 +0200)] 
ipv6: add ip6_sock_set_addr_preferences

Add a helper to directly set the IPV6_ADD_PREFERENCES sockopt from kernel
space without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: add ip6_sock_set_recverr
Christoph Hellwig [Thu, 28 May 2020 05:12:32 +0000 (07:12 +0200)] 
ipv6: add ip6_sock_set_recverr

Add a helper to directly set the IPV6_RECVERR sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: add ip6_sock_set_v6only
Christoph Hellwig [Thu, 28 May 2020 05:12:31 +0000 (07:12 +0200)] 
ipv6: add ip6_sock_set_v6only

Add a helper to directly set the IPV6_V6ONLY sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: add ip_sock_set_pktinfo
Christoph Hellwig [Thu, 28 May 2020 05:12:30 +0000 (07:12 +0200)] 
ipv4: add ip_sock_set_pktinfo

Add a helper to directly set the IP_PKTINFO sockopt from kernel
space without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: add ip_sock_set_mtu_discover
Christoph Hellwig [Thu, 28 May 2020 05:12:29 +0000 (07:12 +0200)] 
ipv4: add ip_sock_set_mtu_discover

Add a helper to directly set the IP_MTU_DISCOVER sockopt from kernel
space without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com> [rxrpc bits]
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: add ip_sock_set_recverr
Christoph Hellwig [Thu, 28 May 2020 05:12:28 +0000 (07:12 +0200)] 
ipv4: add ip_sock_set_recverr

Add a helper to directly set the IP_RECVERR sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: add ip_sock_set_freebind
Christoph Hellwig [Thu, 28 May 2020 05:12:27 +0000 (07:12 +0200)] 
ipv4: add ip_sock_set_freebind

Add a helper to directly set the IP_FREEBIND sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: add ip_sock_set_tos
Christoph Hellwig [Thu, 28 May 2020 05:12:26 +0000 (07:12 +0200)] 
ipv4: add ip_sock_set_tos

Add a helper to directly set the IP_TOS sockopt from kernel space without
going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: add tcp_sock_set_keepcnt
Christoph Hellwig [Thu, 28 May 2020 05:12:25 +0000 (07:12 +0200)] 
tcp: add tcp_sock_set_keepcnt

Add a helper to directly set the TCP_KEEPCNT sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: add tcp_sock_set_keepintvl
Christoph Hellwig [Thu, 28 May 2020 05:12:24 +0000 (07:12 +0200)] 
tcp: add tcp_sock_set_keepintvl

Add a helper to directly set the TCP_KEEPINTVL sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: add tcp_sock_set_keepidle
Christoph Hellwig [Thu, 28 May 2020 05:12:23 +0000 (07:12 +0200)] 
tcp: add tcp_sock_set_keepidle

Add a helper to directly set the TCP_KEEP_IDLE sockopt from kernel
space without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: add tcp_sock_set_user_timeout
Christoph Hellwig [Thu, 28 May 2020 05:12:22 +0000 (07:12 +0200)] 
tcp: add tcp_sock_set_user_timeout

Add a helper to directly set the TCP_USER_TIMEOUT sockopt from kernel
space without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: add tcp_sock_set_syncnt
Christoph Hellwig [Thu, 28 May 2020 05:12:21 +0000 (07:12 +0200)] 
tcp: add tcp_sock_set_syncnt

Add a helper to directly set the TCP_SYNCNT sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: add tcp_sock_set_quickack
Christoph Hellwig [Thu, 28 May 2020 05:12:20 +0000 (07:12 +0200)] 
tcp: add tcp_sock_set_quickack

Add a helper to directly set the TCP_QUICKACK sockopt from kernel space
without going through a fake uaccess.  Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: add tcp_sock_set_nodelay
Christoph Hellwig [Thu, 28 May 2020 05:12:19 +0000 (07:12 +0200)] 
tcp: add tcp_sock_set_nodelay

Add a helper to directly set the TCP_NODELAY sockopt from kernel space
without going through a fake uaccess.  Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: add tcp_sock_set_cork
Christoph Hellwig [Thu, 28 May 2020 05:12:18 +0000 (07:12 +0200)] 
tcp: add tcp_sock_set_cork

Add a helper to directly set the TCP_CORK sockopt from kernel space
without going through a fake uaccess.  Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: add sock_set_reuseport
Christoph Hellwig [Thu, 28 May 2020 05:12:17 +0000 (07:12 +0200)] 
net: add sock_set_reuseport

Add a helper to directly set the SO_REUSEPORT sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: add sock_set_rcvbuf
Christoph Hellwig [Thu, 28 May 2020 05:12:16 +0000 (07:12 +0200)] 
net: add sock_set_rcvbuf

Add a helper to directly set the SO_RCVBUFFORCE sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: add sock_set_keepalive
Christoph Hellwig [Thu, 28 May 2020 05:12:15 +0000 (07:12 +0200)] 
net: add sock_set_keepalive

Add a helper to directly set the SO_KEEPALIVE sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: add sock_enable_timestamps
Christoph Hellwig [Thu, 28 May 2020 05:12:14 +0000 (07:12 +0200)] 
net: add sock_enable_timestamps

Add a helper to directly enable timestamps instead of setting the
SO_TIMESTAMP* sockopts from kernel space and going through a fake
uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: add sock_bindtoindex
Christoph Hellwig [Thu, 28 May 2020 05:12:13 +0000 (07:12 +0200)] 
net: add sock_bindtoindex

Add a helper to directly set the SO_BINDTOIFINDEX sockopt from kernel
space without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: add sock_set_sndtimeo
Christoph Hellwig [Thu, 28 May 2020 05:12:12 +0000 (07:12 +0200)] 
net: add sock_set_sndtimeo

Add a helper to directly set the SO_SNDTIMEO_NEW sockopt from kernel
space without going through a fake uaccess.  The interface is
simplified to only pass the seconds value, as that is the only
thing needed at the moment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: add sock_set_priority
Christoph Hellwig [Thu, 28 May 2020 05:12:11 +0000 (07:12 +0200)] 
net: add sock_set_priority

Add a helper to directly set the SO_PRIORITY sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: add sock_no_linger
Christoph Hellwig [Thu, 28 May 2020 05:12:10 +0000 (07:12 +0200)] 
net: add sock_no_linger

Add a helper to directly set the SO_LINGER sockopt from kernel space
with onoff set to true and a linger time of 0 without going through a
fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: add sock_set_reuseaddr
Christoph Hellwig [Thu, 28 May 2020 05:12:09 +0000 (07:12 +0200)] 
net: add sock_set_reuseaddr

Add a helper to directly set the SO_REUSEADDR sockopt from kernel space
without going through a fake uaccess.

For this the iscsi target now has to formally depend on inet to avoid
a mostly theoretical compile failure.  For actual operation it already
did depend on having ipv4 or ipv6 support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5-updates-2020-05-26' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Thu, 28 May 2020 18:04:12 +0000 (11:04 -0700)] 
Merge tag 'mlx5-updates-2020-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-05-26

Updates highlights:

1) From Vu Pham (8): Support VM traffics failover with bonded VF
representors and e-switch egress/ingress ACLs

This series introduce the support for Virtual Machine running I/O
traffic over direct/fast VF path and failing over to slower
paravirtualized path using the following features:

     __________________________________
    |  VM      _________________        |
    |          |FAILOVER device |       |
    |          |________________|       |
    |                  |                |
    |              ____|_____           |
    |              |         |          |
    |       ______ |___  ____|_______   |
    |       |  VF PT  |  |VIRTIO-NET |  |
    |       | device  |  | device    |  |
    |       |_________|  |___________|  |
    |___________|______________|________|
                |              |
                | HYPERVISOR   |
                |          ____|______
                |         |  macvtap  |
                |         |virtio BE  |
                |         |___________|
                |               |
                |           ____|_____
                |           |host VF  |
                |           |_________|
                |               |
           _____|______    _____|_____
           |  PT VF    |  |  host VF  |
           |representor|  |representor|
           |___________|  |___________|
                \               /
                 \             /
                  \           /
                   \         /                     _________________
                    \_______/                     |                |
                 _______|________                 |    V-SWITCH    |
                |VF representors |________________|      (OVS)     |
                |      bond      |                |________________|
                |________________|                        |
                                                  ________|________
                                                 |    Uplink       |
                                                 |  representor    |
                                                 |_________________|

Summary:
--------
Problem statement:
------------------
Currently in above topology, when netfailover device is configured using
VFs and eswitch VF representors, and when traffic fails over to stand-by
VF which is exposed using macvtap device to guest VM, eswitch fails to
switch the traffic to the stand-by VF representor. This occurs because
there is no knowledge at eswitch level of the stand-by representor
device.

Solution:
---------
Using standard bonding driver, a bond netdevice is created over VF
representor device which is used for offloading tc rules.
Two VF representors are bonded together, one for the passthrough VF
device and another one for the stand-by VF device.
With this solution, mlx5 driver listens to the failover events
occuring at the bond device level to failover traffic to either of
the active VF representor of the bond.

a. VM with netfailover device of VF pass-thru (PT) device and virtio-net
   paravirtualized device with same MAC-address to handle failover
   traffics at VM level.

b. Host bond is active-standby mode, with the lower devices being the VM
   VF PT representor, and the representor of the 2nd VF to handle
   failover traffics at Hypervisor/V-Switch OVS level.
   - During the steady state (fast datapath): set the bond active
     device to be the VM PT VF representor.
   - During failover: apply bond failover to the second VF representor
     device which connects to the VM non-accelerated path.

c. E-Switch ingress/egress ACL tables to support failover traffics at
   E-Switch level
   I. E-Switch egress ACL with forward-to-vport rule:
     - By default, eswitch vport egress acl forward packets to its
       counterpart NIC vport.
     - During port failover, the egress acl forward-to-vport rule will
       be added to e-switch vport of passive/in-active slave VF
representor
       to forward packets to other e-switch vport ie. the active slave
       representor's e-switch vport to handle egress "failover"
traffics.
     - Using lower change netdev event to detect a representor is a
       lower
       dev (slave) of bond and becomes active, adding egress acl
       forward-to-vport rule of all other slave netdevs to forward to
this
       representor's vport.
     - Using upper change netdev event to detect a representor unslaving
       from bond device to delete its vport's egress acl forward-to-vport
       rule.

   II. E-Switch ingress ACL metadata reg_c for match
     - Bonded representors' vorts sharing tc block have the same
       root ingress acl table and a unique metadata for match.
     - Traffics from both representors's vports will be tagged with same
       unique metadata reg_c.
     - Using upper change netdev event to detect a representor
       enslaving/unslaving from bond device to setup shared root ingress
       acl and unique metadata.

2) From Alex Vesker (2): Slpit RX and TX lock for parallel rule insertion in
software steering

3) Eli Britstein (2): Optimize performance for IPv4/IPv6 ethertype use the HW
ip_version register rather than parsing eth frames for ethertype.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: ipv6: support RFC 6069 (TCP-LD)
Eric Dumazet [Thu, 28 May 2020 00:34:58 +0000 (17:34 -0700)] 
tcp: ipv6: support RFC 6069 (TCP-LD)

Make tcp_ld_RTO_revert() helper available to IPv6, and
implement RFC 6069 :

Quoting this RFC :

3. Connectivity Disruption Indication

   For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of
   the ICMP destination unreachable message of code 0 (net unreachable)
   and of code 1 (host unreachable) is the ICMPv6 destination
   unreachable message of code 0 (no route to destination) [RFC4443].
   As with IPv4, a router should generate an ICMPv6 destination
   unreachable message of code 0 in response to a packet that cannot be
   delivered to its destination address because it lacks a matching
   entry in its routing table.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: offload the Credit-Based Shaper qdisc
Vladimir Oltean [Thu, 28 May 2020 00:27:58 +0000 (03:27 +0300)] 
net: dsa: sja1105: offload the Credit-Based Shaper qdisc

SJA1105, being AVB/TSN switches, provide hardware assist for the
Credit-Based Shaper as described in the IEEE 8021Q-2018 document.

First generation has 10 shapers, freely assignable to any of the 4
external ports and 8 traffic classes, and second generation has 16
shapers.

The Credit-Based Shaper tables are accessed through the dynamic
reconfiguration interface, so we have to restore them manually after a
switch reset. The tables are backed up by the static config only on
P/Q/R/S, and we don't want to add custom code only for that family,
since the procedure that is in place now works for both.

Tested with the following commands:

data_rate_kbps=67000
port_transmit_rate_kbps=1000000
idleslope=$data_rate_kbps
sendslope=$(($idleslope - $port_transmit_rate_kbps))
locredit=$((-0x80000000))
hicredit=$((0x7fffffff))
tc qdisc add dev swp2 root handle 1: mqprio hw 0 num_tc 8 \
        map 0 1 2 3 4 5 6 7 \
        queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7
tc qdisc replace dev swp2 parent 1:1 cbs \
        idleslope $idleslope \
        sendslope $sendslope \
        hicredit $hicredit \
        locredit $locredit \
        offload 1

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add torture tests to nexthop tests
David Ahern [Thu, 28 May 2020 00:03:44 +0000 (18:03 -0600)] 
selftests: Add torture tests to nexthop tests

Add Nik's torture tests as a new set to stress the replace and cleanup
paths.

Torture test created by Nikolay Aleksandrov and then I adapted to
selftest and added IPv6 version.

Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoice: Check UMEM FQ size when allocating bufs
Krzysztof Kazimierczak [Sat, 16 May 2020 00:42:20 +0000 (17:42 -0700)] 
ice: Check UMEM FQ size when allocating bufs

If a UMEM is present on a queue when an interface/queue pair is being
enabled, the driver will try to prepare the Rx buffers in advance to
improve performance. However, if fill queue is shorter than HW Rx ring,
the driver will report failure after getting the last address from the
fill queue.

This still lets the driver process the packets correctly during the NAPI
poll, but leads to a constant NAPI rescheduling. Not allocating the
buffers in advance would result in a potential performance decrease.

Commit d57d76428ae9 ("xsk: Add API to check for available entries in FQ")
provides an API that lets drivers check the number of addresses that the
fill queue holds.

Notify the user if fill queue is not long enough to prepare all buffers
before packet processing starts, and allocate the buffers during the
NAPI poll. If the fill queue size is sufficient, prepare Rx buffers in
advance.

Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agonet/mlx5: DR, Split RX and TX lock for parallel insertion
Alex Vesker [Wed, 20 May 2020 15:09:35 +0000 (18:09 +0300)] 
net/mlx5: DR, Split RX and TX lock for parallel insertion

Change the locking flow to support RX and TX locks, splitting
the single lock to two will allow inserting rules in parallel
for RX and TX parts of the FDB.

Locking the dr_domain will be done by locking the RX domain
and the TX domain locks, this is mostly used for control operations
on the dr_domain. When inserting rules for RX or TX the single
nic_doamin RX or TX lock will be used. Splitting the lock is safe since
RX and TX domains are logically separated from each other, shared
objects such the send-ring and memory pool are protected by locks.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: DR, Add a spinlock to protect the send ring
Alex Vesker [Wed, 20 May 2020 15:09:14 +0000 (18:09 +0300)] 
net/mlx5: DR, Add a spinlock to protect the send ring

Adding this lock will allow writing steering entries without
locking the dr_domain and allow parallel insertion.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Optimize performance for IPv4/IPv6 ethertype
Eli Britstein [Tue, 19 May 2020 05:55:59 +0000 (05:55 +0000)] 
net/mlx5e: Optimize performance for IPv4/IPv6 ethertype

The HW is optimized for IPv4/IPv6. For such cases, pending capability,
avoid matching on ethertype, and use ip_version field instead.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Helper function to set ethertype
Eli Britstein [Mon, 11 May 2020 19:20:29 +0000 (19:20 +0000)] 
net/mlx5e: Helper function to set ethertype

Set ethertype match in a helper function as a pre-step towards
optimizing it.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Add missing mutex destroy
Parav Pandit [Fri, 15 May 2020 04:42:45 +0000 (23:42 -0500)] 
net/mlx5: Add missing mutex destroy

Add mutex destroy calls to balance with mutex_init() done in the init
path.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Use change upper event to setup representors' bond_metadata
Vu Pham [Thu, 12 Mar 2020 17:26:25 +0000 (10:26 -0700)] 
net/mlx5e: Use change upper event to setup representors' bond_metadata

Use change upper event to detect slave representor from
enslaving/unslaving to/from lag device.

On enslaving event, call mlx5_enslave_rep() API to create, add
this slave representor shadow entry to the slaves list of
bond_metadata structure representing master lag device and use
its metadata to setup ingress acl metadata header.

On unslaving event, resetting the vport of unslaved representor
to use its default ingress/egress acls and rx rules with its
default_metadata.

The last slave will free the shared bond_metadata and its
unique metadata.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Slave representors sharing unique metadata for match
Vu Pham [Mon, 2 Mar 2020 18:33:49 +0000 (10:33 -0800)] 
net/mlx5e: Slave representors sharing unique metadata for match

Bonded slave representors' vports must share a unique metadata
for match.

On enslaving event of slave representor to lag device, allocate
new unique "bond_metadata" for match if this is the first slave.
The subsequent enslaved representors will share the same unique
"bond_metadata".

On unslaving event of slave representor, reset the slave
representor's vport to use its own default metadata.

Replace ingress acl and rx rules of the slave representors' vports
using new vport->bond_metadata.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-Switch, Alloc and free unique metadata for match
Vu Pham [Sat, 29 Feb 2020 00:10:34 +0000 (16:10 -0800)] 
net/mlx5: E-Switch, Alloc and free unique metadata for match

Introduce infrastructure to create unique metadata for match
for vport without depending on vport_num. Vport uses its
default metadata for match in standalone configuration but
will share a different unique "bond_metadata" for match with
other vports in bond configuration.

Using ida to generate unique metadata for match for vports
in default and bond configurations.

Introduce APIs to generate, free metadata for match.
Introduce APIs to set vport's bond_metadata and replace its
ingress acl rules with bond_metatada.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Add bond_metadata and its slave entries
Vu Pham [Fri, 28 Feb 2020 22:28:27 +0000 (14:28 -0800)] 
net/mlx5e: Add bond_metadata and its slave entries

Adding bond_metadata and its slave entries to represent a lag device
and its slaves VF representors. Bond_metadata structure includes a
unique metadata shared by slaves VF respresentors, and a list of slaves
representors slave entries.

On enslaving event, create a bond_metadata structure representing
the upper lag device of this slave representor if it has not been
created yet. Create and add entry for the slave representor to the
slaves list.

On unslaving event, free the slave entry of the slave representor.
On the last unslave event, free the bond_metadata structure and its
resources.

Introduce APIs to create and remove bond_metadata and its resources,
enslave and unslave VF representor slave entries.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Offload flow rules to active lower representor
Or Gerlitz [Tue, 5 Mar 2019 19:11:14 +0000 (21:11 +0200)] 
net/mlx5e: Offload flow rules to active lower representor

When a bond device is created over one or more non uplink representors,
and when a flow rule is offloaded to such bond device, offload a rule
to the active lower device.

Assuming that this is active-backup lag, the rules should be offloaded
to the active lower device which is the representor of the direct
path (not the failover).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Support tc block sharing for representors
Vu Pham [Fri, 2 Aug 2019 23:13:10 +0000 (16:13 -0700)] 
net/mlx5e: Support tc block sharing for representors

Currently offloading a rule over a tc block shared by multiple
representors fails because an e-switch global hashtable to keep
the mapping from tc cookies to mlx5e flow instances is used, and
tc block sharing offloads the same rule/cookie multiple times,
each time for different representor sharing the tc block.

Changing the implementation and behavior by acknowledging and returning
success if the same rule/cookie is offloaded again to other slave
representor sharing the tc block by setting, checking and comparing
the netdev that added the rule first.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule
Or Gerlitz [Fri, 21 Jun 2019 20:23:44 +0000 (13:23 -0700)] 
net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule

Register a notifier block to handle netdev events for bond device
of non-uplink representors to support eswitch vports bonding.

When a non-uplink representor is a lower dev (slave) of bond and
becomes active, adding egress acl forward-to-vport rule of all slave
netdevs (active + standby) to forward to this representor's vport. Use
change lower netdev event to do this.

Use change upper event to detect slave representor unslaved from lag
device to delete its vport egress acl forward rule if any.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-Switch, Introduce APIs to enable egress acl forward-to-vport rule
Vu Pham [Tue, 17 Mar 2020 00:32:50 +0000 (17:32 -0700)] 
net/mlx5: E-Switch, Introduce APIs to enable egress acl forward-to-vport rule

By default, e-switch vport's egress acl just forward packets to its
counterpart NIC vport using existing egress acl table.

During port failover in bonding scenario where two VFs representors
are bonded, the egress acl forward-to-vport rule will be added to
the existing egress acl table of e-switch vport of passive/inactive
slave representor to forward packets to other NIC vport ie. the active
slave representor's NIC vport to handle egress "failover" traffic.

Enable egress acl and have APIs to create and destroy egress acl
forward-to-vport rule and group.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-Switch, Refactor eswitch ingress acl codes
Vu Pham [Sat, 28 Mar 2020 06:12:22 +0000 (23:12 -0700)] 
net/mlx5: E-Switch, Refactor eswitch ingress acl codes

Restructure the eswitch ingress acl codes into eswitch directory
and different files:
. Acl ingress helper functions to acl_helper.c/h
. Acl ingress functions used in offloads mode to acl_ingress_ofld.c
. Acl ingress functions used in legacy mode to acl_ingress_lgy.c

This patch does not change any functionality.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
5 years agonet/mlx5: E-Switch, Refactor eswitch egress acl codes
Vu Pham [Wed, 6 Nov 2019 17:57:12 +0000 (09:57 -0800)] 
net/mlx5: E-Switch, Refactor eswitch egress acl codes

Refactor the egress acl codes so that offloads and legacy modes
can configure specifically their own needs of egress acl table,
groups and rules. While at it, restructure the eswitch egress
acl codes into eswitch directory and different files:
. Acl egress helper functions to acl_helper.c/h
. Acl egress functions used in offloads mode to acl_egress_ofld.c
. Acl egress functions used in legacy mode to acl_egress_lgy.c

This patch does not change any functionality.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoice: Refactor Rx checksum checks
Anirudh Venkataramanan [Sat, 16 May 2020 00:42:19 +0000 (17:42 -0700)] 
ice: Refactor Rx checksum checks

We don't need both rx_status and rx_error parameters, as the latter is
a subset of the former. Remove rx_error completely and check the right bit
in rx_status.

Rename rx_status to rx_status0, and rx_status_err1 to
rx_status1. This naming more closely reflects the specification.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: avoid undefined behavior
Bruce Allan [Sat, 16 May 2020 00:42:18 +0000 (17:42 -0700)] 
ice: avoid undefined behavior

When writing the driver's struct ice_tlan_ctx structure, do not write the
8-bit element int_q_state with the associated internal-to-hardware field
which is 122-bits, otherwise the helper function ice_write_byte() will use
undefined behavior when setting the mask used for that write.  This should
not cause any functional change and will avoid use of undefined behavior.
Also, update a comment to highlight this structure element is not written.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Change number of XDP Tx queues to match number of Rx queues
Marta Plantykow [Sat, 16 May 2020 00:42:17 +0000 (17:42 -0700)] 
ice: Change number of XDP Tx queues to match number of Rx queues

In current implementation number of XDP Tx queues is the same as
the number of transmit queues, which is not always true. This
patch changes this number to match the number of receive queues.
XDP programs are running on Rx rings, so what we actually need to
provide is the XDP Tx ring per each Rx ring so that the whole XDP
ecosystem is functional, e.g. if the result of XDP prog is XDP_TX
then you have the need to access the XDP Tx ring.

Signed-off-by: Marta Plantykow <marta.a.plantykow@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Add XDP Tx to VSI ring stats
Marta Plantykow [Sat, 16 May 2020 00:42:16 +0000 (17:42 -0700)] 
ice: Add XDP Tx to VSI ring stats

When XDP Tx program is loaded and packets are sent from
interface, VSI statistics are not updated. This patch adds
packets sent on Tx XDP ring to VSI ring stats.

Signed-off-by: Marta Plantykow <marta.a.plantykow@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Change number of XDP TxQ to 0 when destroying rings
Marta Plantykow [Sat, 16 May 2020 00:42:15 +0000 (17:42 -0700)] 
ice: Change number of XDP TxQ to 0 when destroying rings

When XDP Tx rings are destroyed the number of XDP Tx queues
is not changing. This patch is changing this number to 0.

Signed-off-by: Marta Plantykow <marta.a.plantykow@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Handle critical FW error during admin queue initialization
Evan Swanson [Sat, 16 May 2020 00:42:14 +0000 (17:42 -0700)] 
ice: Handle critical FW error during admin queue initialization

A race condition between FW and SW can occur between admin queue setup and
the first command sent. A link event may occur and FW attempts to notify a
non-existent queue. FW will set the critical error bit and disable the
queue. When this happens retry queue setup.

Signed-off-by: Evan Swanson <evan.swanson@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Don't allow VLAN stripping change when pvid set
Brett Creeley [Sat, 16 May 2020 00:42:13 +0000 (17:42 -0700)] 
ice: Don't allow VLAN stripping change when pvid set

Currently, if the PVID is set in the VLAN handling section of the VSI
context the driver still allows VLAN stripping to be enabled/disabled.
VLAN stripping should only be modifiable when the PVID is not set. Fix
this by preventing VLAN stripping modification when PVID is set.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Add more Rx errors to netdev's rx_error counter
Brett Creeley [Sat, 16 May 2020 00:36:44 +0000 (17:36 -0700)] 
ice: Add more Rx errors to netdev's rx_error counter

Currently we are only including illegal_bytes and rx_crc_errors in the
PF netdev's rx_error counter. There are many more causes of Rx errors
that the device supports and reports via Ethtool. Accumulate all Rx
errors in the PF netdev's rx_error counter.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Fix for memory leaks and modify ICE_FREE_CQ_BUFS
Surabhi Boob [Sat, 16 May 2020 00:36:43 +0000 (17:36 -0700)] 
ice: Fix for memory leaks and modify ICE_FREE_CQ_BUFS

Handle memory leaks during control queue initialization and
buffer allocation failures. The macro ICE_FREE_CQ_BUFS is modified to
re-use for this fix.

Signed-off-by: Surabhi Boob <surabhi.boob@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Fix memory leak
Surabhi Boob [Sat, 16 May 2020 00:36:42 +0000 (17:36 -0700)] 
ice: Fix memory leak

Handle memory leak on filter management initialization failure.

Signed-off-by: Surabhi Boob <surabhi.boob@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: fix MAC write command
Jesse Brandeburg [Sat, 16 May 2020 00:36:41 +0000 (17:36 -0700)] 
ice: fix MAC write command

The manage MAC write command was implemented in an overly complex way
that actually didn't work, as it wasn't symmetric to the manage MAC
read command, and was feeding bytes out of order to the firmware. Fix
the implementation by just using a simple array to represent the MAC
address when it is being written via firmware command.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: set VF default LAN address
Paul Greenwalt [Sat, 16 May 2020 00:36:40 +0000 (17:36 -0700)] 
ice: set VF default LAN address

Remove is_zero_ether_add() check when setting the VF default LAN address.
This check assumed that the address had been delete and zeroed before
calling ice_vc_add_mac_addr(). Now the default LAN address will be set
to the last unicast MAC address added by the VF.

The default LAN address is reported by the PF via ndo_get_vf_config.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: remove unused macro
Jesse Brandeburg [Sat, 16 May 2020 00:36:39 +0000 (17:36 -0700)] 
ice: remove unused macro

The driver had an unused define that can be removed.  Found by
compiler -Werror=unused-macros check.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: fix signed vs unsigned comparisons
Jesse Brandeburg [Sat, 16 May 2020 00:36:38 +0000 (17:36 -0700)] 
ice: fix signed vs unsigned comparisons

Fix the remaining signed vs unsigned issues, which appear
when compiling with -Werror=sign-compare.

Many of these are because there is an external interface that is passing
an int to us (which we can't change) but that we (rightfully) store
and compare against as an unsigned in our data structures.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoMerge branch 'remove-kernel_getsockopt'
David S. Miller [Wed, 27 May 2020 22:11:33 +0000 (15:11 -0700)] 
Merge branch 'remove-kernel_getsockopt'

Christoph Hellwig says:

====================
remove kernel_getsockopt

this series reduces scope from the last round and just removes
kernel_getsockopt to avoid conflicting with the sctp cleanup series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: remove kernel_getsockopt
Christoph Hellwig [Wed, 27 May 2020 18:22:29 +0000 (20:22 +0200)] 
net: remove kernel_getsockopt

No users left.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodlm: use the tcp version of accept_from_sock for sctp as well
Christoph Hellwig [Wed, 27 May 2020 18:22:28 +0000 (20:22 +0200)] 
dlm: use the tcp version of accept_from_sock for sctp as well

The only difference between a few missing fixes applied to the SCTP
one is that TCP uses ->getpeername to get the remote address, while
SCTP uses kernel_getsockopt(.. SCTP_PRIMARY_ADDR).  But given that
getpeername is defined to return the primary address for sctp, there
doesn't seem to be any reason for the different way of quering the
peername, or all the code duplication.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: fix typo sctp_ulpevent_nofity_peer_addr_change
Jonas Falkevik [Wed, 27 May 2020 09:59:43 +0000 (11:59 +0200)] 
sctp: fix typo sctp_ulpevent_nofity_peer_addr_change

change typo in function name "nofity" to "notify"
sctp_ulpevent_nofity_peer_addr_change ->
sctp_ulpevent_notify_peer_addr_change

Signed-off-by: Jonas Falkevik <jonas.falkevik@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: Add force_resync for driver resync
Tariq Toukan [Wed, 27 May 2020 09:25:26 +0000 (12:25 +0300)] 
net/tls: Add force_resync for driver resync

This patch adds a field to the tls rx offload context which enables
drivers to force a send_resync call.

This field can be used by drivers to request a resync at the next
possible tls record. It is beneficial for hardware that provides the
resync sequence number asynchronously. In such cases, the packet that
triggered the resync does not contain the information required for a
resync. Instead, the driver requests resync for all the following
TLS record until the asynchronous notification with the resync request
TCP sequence arrives.

A following series for mlx5e ConnectX-6DX TLS RX offload support will
use this mechanism.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net_sched-reduce-the-number-of-qdisc-resets'
David S. Miller [Wed, 27 May 2020 22:05:50 +0000 (15:05 -0700)] 
Merge branch 'net_sched-reduce-the-number-of-qdisc-resets'

Cong Wang says:

====================
net_sched: reduce the number of qdisc resets

This patchset aims to reduce the number of qdisc resets during
qdisc tear down. Patch 1~3 are preparation for their following
patches, especially patch 2 and patch 3 add a few tracepoints
so that we can observe the whole lifetime of qdisc's. Patch 4
and patch 5 are the ones do the actual work. Please find more
details in each patch description.

Vaclav Zindulka tested this patchset and his large ruleset with
over 13k qdiscs defined got from 22s to 520ms.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet_sched: get rid of unnecessary dev_qdisc_reset()
Cong Wang [Wed, 27 May 2020 04:35:27 +0000 (21:35 -0700)] 
net_sched: get rid of unnecessary dev_qdisc_reset()

Resetting old qdisc on dev_queue->qdisc_sleeping in
dev_qdisc_reset() is redundant, because this qdisc,
even if not same with dev_queue->qdisc, is reset via
qdisc_put() right after calling dev_graft_qdisc() when
hitting refcnt 0.

This is very easy to observe with qdisc_reset() tracepoint
and stack traces.

Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Tested-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet_sched: avoid resetting active qdisc for multiple times
Cong Wang [Wed, 27 May 2020 04:35:26 +0000 (21:35 -0700)] 
net_sched: avoid resetting active qdisc for multiple times

Except for sch_mq and sch_mqprio, each dev queue points to the
same root qdisc, so when we reset the dev queues with
netdev_for_each_tx_queue() we end up resetting the same instance
of the root qdisc for multiple times.

Avoid this by checking the __QDISC_STATE_DEACTIVATED bit in
each iteration, so for sch_mq/sch_mqprio, we still reset all
of them like before, for the rest, we only reset it once.

Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Tested-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>