Jakub Kicinski [Fri, 12 Jan 2018 04:29:11 +0000 (20:29 -0800)]
nfp: bpf: add basic control channel communication
For map support we will need to send and receive control messages.
Add basic support for sending a message to FW, and waiting for a
reply.
Control messages are tagged with a 16 bit ID. Add a simple ID
allocator and make sure we don't allow too many messages in flight,
to avoid request <> reply mismatches.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:09 +0000 (20:29 -0800)]
bpf: offload: add map offload infrastructure
BPF map offload follow similar path to program offload. At creation
time users may specify ifindex of the device on which they want to
create the map. Map will be validated by the kernel's
.map_alloc_check callback and device driver will be called for the
actual allocation. Map will have an empty set of operations
associated with it (save for alloc and free callbacks). The real
device callbacks are kept in map->offload->dev_ops because they
have slightly different signatures. Map operations are called in
process context so the driver may communicate with HW freely,
msleep(), wait() etc.
Map alloc and free callbacks are muxed via existing .ndo_bpf, and
are always called with rtnl lock held. Maps and programs are
guaranteed to be destroyed before .ndo_uninit (i.e. before
unregister_netdev() returns). Map callbacks are invoked with
bpf_devs_lock *read* locked, drivers must take care of exclusive
locking if necessary.
All offload-specific branches are marked with unlikely() (through
bpf_map_is_dev_bound()), given that branch penalty will be
negligible compared to IO anyway, and we don't want to penalize
SW path unnecessarily.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:08 +0000 (20:29 -0800)]
bpf: offload: factor out netdev checking at allocation time
Add a helper to check if netdev could be found and whether it
has .ndo_bpf callback. There is no need to check the callback
every time it's invoked, ndos can't reasonably be swapped for
a set without .ndp_bpf while program is loaded.
bpf_dev_offload_check() will also be used by map offload.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:04 +0000 (20:29 -0800)]
bpf: hashtab: move attribute validation before allocation
Number of attribute checks are currently performed after hashtab
is already allocated. Move them to be able to split them out to
the check function later on. Checks have to now be performed on
the attr union directly instead of the members of bpf_map, since
bpf_map will be allocated later. No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:03 +0000 (20:29 -0800)]
bpf: add map_alloc_check callback
.map_alloc callbacks contain a number of checks validating user-
-provided map attributes against constraints of a particular map
type. For offloaded maps we will need to check map attributes
without actually allocating any memory on the host. Add a new
callback for validating attributes before any memory is allocated.
This callback can be selectively implemented by map types for
sharing code with offloads, or simply to separate the logical
steps of validation and allocation.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
====================
Here are the 5th version of patches to moving error injection
table from kprobes. This version fixes a bug and update
fail-function to support multiple function error injection.
Here is the previous version:
https://patchwork.ozlabs.org/cover/858663/
Changes in v5:
- [3/5] Fix a bug that within_error_injection returns false always.
- [5/5] Update to support multiple function error injection.
Masami Hiramatsu [Fri, 12 Jan 2018 17:56:03 +0000 (02:56 +0900)]
error-injection: Support fault injection framework
Support in-kernel fault-injection framework via debugfs.
This allows you to inject a conditional error to specified
function using debugfs interfaces.
Here is the result of test script described in
Documentation/fault-injection/fault-injection.txt
===========
# ./test_fail_function.sh
1+0 records in
1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0227404 s, 46.1 MB/s
btrfs-progs v4.4
See http://btrfs.wiki.kernel.org for more information.
Label: (null)
UUID: bfa96010-12e9-4360-aed0-42eec7af5798
Node size: 16384
Sector size: 4096
Filesystem size: 1001.00MiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 58.00MiB
System: DUP 12.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Number of devices: 1
Devices:
ID SIZE PATH
1 1001.00MiB /dev/loop2
mount: mount /dev/loop2 on /opt/tmpmnt failed: Cannot allocate memory
SUCCESS!
===========
Masami Hiramatsu [Fri, 12 Jan 2018 17:55:33 +0000 (02:55 +0900)]
error-injection: Add injectable error types
Add injectable error types for each error-injectable function.
One motivation of error injection test is to find software flaws,
mistakes or mis-handlings of expectable errors. If we find such
flaws by the test, that is a program bug, so we need to fix it.
But if the tester miss input the error (e.g. just return success
code without processing anything), it causes unexpected behavior
even if the caller is correctly programmed to handle any errors.
That is not what we want to test by error injection.
To clarify what type of errors the caller must expect for each
injectable function, this introduces injectable error types:
- EI_ETYPE_NULL : means the function will return NULL if it
fails. No ERR_PTR, just a NULL.
- EI_ETYPE_ERRNO : means the function will return -ERRNO
if it fails.
- EI_ETYPE_ERRNO_NULL : means the function will return -ERRNO
(ERR_PTR) or NULL.
ALLOW_ERROR_INJECTION() macro is expanded to get one of
NULL, ERRNO, ERRNO_NULL to record the error type for
each function. e.g.
Masami Hiramatsu [Fri, 12 Jan 2018 17:55:03 +0000 (02:55 +0900)]
error-injection: Separate error-injection from kprobe
Since error-injection framework is not limited to be used
by kprobes, nor bpf. Other kernel subsystems can use it
freely for checking safeness of error-injection, e.g.
livepatch, ftrace etc.
So this separate error-injection framework from kprobes.
Some differences has been made:
- "kprobe" word is removed from any APIs/structures.
- BPF_ALLOW_ERROR_INJECTION() is renamed to
ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
- CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
feature. It is automatically enabled if the arch supports
error injection feature for kprobe or ftrace etc.
Masami Hiramatsu [Fri, 12 Jan 2018 17:54:04 +0000 (02:54 +0900)]
tracing/kprobe: bpf: Check error injectable event is on function entry
Check whether error injectable event is on function entry or not.
Currently it checks the event is ftrace-based kprobes or not,
but that is wrong. It should check if the event is on the entry
of target function. Since error injection will override a function
to just return with modified return value, that operation must
be done before the target function starts making stackframe.
As a side effect, bpf error injection is no need to depend on
function-tracer. It can work with sw-breakpoint based kprobe
events too.
bpf: simplify xdp_convert_ctx_access for xdp_rxq_info
As pointed out by Daniel Borkmann, using bpf_target_off() is not
necessary for xdp_rxq_info when extracting queue_index and
ifindex, as these members are u32 like BPF_W.
Also fix trivial spelling mistake introduced in same commit.
Fixes: 02dd3291b2f0 ("bpf: finally expose xdp_rxq_info to XDP bpf-programs") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
====================
hns3: add some new features and fix some bugs
This patchset adds 3 ethtool features: get_channels,
get_coalesce and get_coalesce, and fix some bugs.
[patch 1/11] adds ethtool_ops.get_channels (ethtool -l) support
for VF.
[patch 2/11] removes TSO config command from VF driver,
as only main PF can config TSO MSS length according to
hardware.
[patch 3/11 - 4/11] add ethtool_ops {get|set}_coalesce
(ethtool -c/-C) support to PF.
[patch 5/11 - 9/11] fix some bugs related to {get|set}_coalesce.
[patch 10/11 - 11/11] fix the features handling in
hns3_nic_set_features(). Local variable "changed" was defined
to indicates features changed, but was used only for feature
NETIF_F_HW_VLAN_CTAG_RX. Add checking to improve the reliability.
---
Change log:
V1 -> V2:
1, Rewrite the cover letter requested by David Miller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 12 Jan 2018 08:23:16 +0000 (16:23 +0800)]
net: hns3: add feature check when feature changed
Local variable "changed" was defined to indicates features changed,
but was used only for feature NETIF_F_HW_VLAN_CTAG_RX. Add checking
for other features.
Fixes: 052ece6dc19c ("net: hns3: add ethtool related offload command") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 12 Jan 2018 08:23:15 +0000 (16:23 +0800)]
net: hns3: add int_gl_idx setup for TX and RX queues
If the int_gl_idx does not be set, the default interrupt coalesce index
is 0. The TX queues and the RX queues will both use the GL0 as the
interrupt coalesce GL switch. But it should be GL1 for TX queues and GL0
for RX queues.
This patch adds the int_gl_idx setup for TX queues and RX queues.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 12 Jan 2018 08:23:13 +0000 (16:23 +0800)]
net: hns3: remove unused GL setup function
Since the TX GL and the RX GL need to be set separately,
hns3_set_vector_coalesc_gl() has been replaced with
hns3_set_vector_coalesce_rx_gl() and hns3_set_vector_coalesce_tx_gl().
This patch removes hns3_set_vector_coalesc_gl().
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 12 Jan 2018 08:23:12 +0000 (16:23 +0800)]
net: hns3: refactor GL update function
The GL update function uses the max GL value between tx_int_gl and
rx_int_gl to set both new tx_int_gl and new rx_int_gl. Therefore, User
can not enable TX GL self-adaptive or RX GL self-adaptive individually.
This patch refactors the code to update the TX GL and the RX GL
separately, making user can enable TX GL self-adaptive or RX GL
self-adaptive individually.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 12 Jan 2018 08:23:11 +0000 (16:23 +0800)]
net: hns3: refactor interrupt coalescing init function
In the hardware, the coalesce configurable registers include GL0, GL1,
GL2. In the driver, the TX queues use the register GL1 and the RX queues
use the register GL0. This function initializes the configuration of the
interrupt coalescing, but does not distinguish between the TX direction
and the RX direction. It will cause some confusion.
This patch refactors the function to initialize the TX GL and the RX GL
separately. And the initialization of related variables also is added to
this patch.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
BPF alignment tests got a conflict because the registers
are output as Rn_w instead of just Rn in net-next, and
in net a fixup for a testcase prohibits logical operations
on pointers before using them.
Also, we should attempt to patch BPF call args if JIT always on is
enabled. Instead, if we fail to JIT the subprogs we should pass
an error back up and fail immediately.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 12 Jan 2018 00:57:32 +0000 (16:57 -0800)]
Merge tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two rbd fixes for 4.12 and 4.2 issues respectively, marked for
stable"
* tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client:
rbd: set max_segments to USHRT_MAX
rbd: reacquire lock should update lock owner client id
Linus Torvalds [Fri, 12 Jan 2018 00:54:35 +0000 (16:54 -0800)]
Merge tag 'gpio-v4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fix from Linus Walleij:
"Fix a raw vs elaborate GPIO descriptor bug introduced by yours truly"
* tag 'gpio-v4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Add missing open drain/source handling to gpiod_set_value_cansleep()
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Various BPF related improvements and fixes to nfp driver: i) do
not register XDP RXQ structure to control queues, ii) round up
program stack size to word size for nfp, iii) restrict MTU changes
when BPF offload is active, iv) add more fully featured relocation
support to JIT, v) add support for signed compare instructions to
the nfp JIT, vi) export and reuse verfier log routine for nfp, and
many more, from Jakub, Quentin and Nic.
2) Fix a syzkaller reported GPF in BPF's copy_verifier_state() when
we hit kmalloc failure path, from Alexei.
3) Add two follow-up fixes for the recent XDP RXQ series: i) kvzalloc()
allocated memory was only kfree()'ed, and ii) fix a memory leak where
RX queue was not freed in netif_free_rx_queues(), from Jakub.
4) Add a sample for transferring XDP meta data into the skb, here it
is used for setting skb->mark with the buffer from XDP, from Jesper.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Here's likely the last bluetooth-next pull request for the 4.16 kernel.
- Added support for Bluetooth on 2015+ MacBook (Pro)
- Fix to QCA Rome suspend/resume handling
- Two new QCA_ROME USB IDs in btusb
- A few other minor fixes
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Thu, 11 Jan 2018 11:21:51 +0000 (11:21 +0000)]
net: phy: mdio-bcm-unimac: fix potential NULL dereference in unimac_mdio_probe()
platform_get_resource() may fail and return NULL, so we should
better check it's return value to avoid a NULL pointer dereference
a bit later in the code.
This is detected by Coccinelle semantic patch.
@@
expression pdev, res, n, t, e, e1, e2;
@@
res = platform_get_resource(pdev, t, n);
+ if (!res)
+ return -EINVAL;
... when != res == NULL
e = devm_ioremap(e1, res->start, e2);
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 11 Jan 2018 10:36:24 +0000 (11:36 +0100)]
net: socionext: include linux/io.h to fix build
I ran into a randconfig build failure:
drivers/net/ethernet/socionext/netsec.c: In function 'netsec_probe':
drivers/net/ethernet/socionext/netsec.c:1583:17: error: implicit declaration of function 'devm_ioremap'; did you mean 'ioremap'? [-Werror=implicit-function-declaration]
Including linux/io.h directly fixes this.
Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 11 Jan 2018 16:59:44 +0000 (11:59 -0500)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2018-01-10
This series contains updates to i40e and i40evf only.
Alice adds the displaying of priority xon/xoff packet stats, since we
were already keeping track of them. Based on the recent changes, bump
the driver versions.
Jake changes how the driver determines whether or not the device is
currently up to resolve the possible issue of freeing data structures
and other memory before they have been fully allocated. Refactored
the driver to simplify the locking behavior and to consistently use
spinlocks instead of an overloaded bit lock to protect MAC and filter
lists. Created a helper function which can convert the AdminQ link
speed definition into a virtchnl definition.
Colin Ian King cleans up a redundant variable initialization.
Alex cleans up the driver to stop clearing the pending bit array for
each vector manually, since it is prone to dropping an interrupt and
based on the hardware specs, the pending bit array will be cleared
automatically in MSI-X mode. Cleaned up flags for promiscuous mode to
resolve an issue where enabling & disabling promiscuous mode on a VF
would leave us in a high polling rate for the adminq task. Cleaned up
code that was prone to race issues.
Jingjing renames pipeline personalization profile (ppp) to dynamic
device personalization (ddp) because it was being confused with the
well known point to point protocol. Also removed checks for "track_id"
being zero, since it is valid for it to be zero for profiles that do
not have any 'write' commands.
v2: cleaned up commit message for patch 12 based on feedback from Sergei
Shtylyov and Alex Duyck
v3: dropped patch 15 from the original series while Mariusz Stachura
works on the changes that Jakub Kicinski has suggested
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Fontenot [Wed, 10 Jan 2018 16:40:09 +0000 (10:40 -0600)]
ibmvnic: Don't handle RX interrupts when not up.
Initiating a kdump via the command line can cause a pending interrupt
to be handled by the ibmvnic driver when initializing the sub-CRQ
irqs during driver initialization.
Ganesh Goudar [Wed, 10 Jan 2018 12:45:26 +0000 (18:15 +0530)]
cxgb4: add support for vxlan segmentation offload
add changes to t4_eth_xmit to enable vxlan segmentation
offload support.
Original work by: Santosh Rastapur <santosh@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Wed, 10 Jan 2018 12:45:08 +0000 (18:15 +0530)]
cxgb4: implement udp tunnel callbacks
Implement ndo_udp_tunnel_add and ndo_udp_tunnel_del
to support vxlan tunnelling.
Original work by: Santosh Rastapur <santosh@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Wed, 10 Jan 2018 12:44:49 +0000 (18:14 +0530)]
cxgb4: add data structures to support vxlan
Add data structures and macros to be used in vxlan
offload.
Original work by: Santosh Rastapur <santosh@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) BPF speculation prevention and BPF_JIT_ALWAYS_ON, from Alexei
Starovoitov.
2) Revert dev_get_random_name() changes as adjust the error code
returns seen by userspace definitely breaks stuff.
3) Fix TX DMA map/unmap on older iwlwifi devices, from Emmanuel
Grumbach.
4) From wrong AF family when requesting sock diag modules, from Andrii
Vladyka.
5) Don't add new ipv6 routes attached to the null_entry, from Wei Wang.
6) Some SCTP sockopt length fixes from Marcelo Ricardo Leitner.
7) Don't leak when removing VLAN ID 0, from Cong Wang.
8) Hey there's a potential leak in ipv6_make_skb() too, from Eric
Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
ipv6: sr: fix TLVs not being copied using setsockopt
ipv6: fix possible mem leaks in ipv6_make_skb()
mlxsw: spectrum_qdisc: Don't use variable array in mlxsw_sp_tclass_congestion_enable
mlxsw: pci: Wait after reset before accessing HW
nfp: always unmask aux interrupts at init
8021q: fix a memory leak for VLAN 0 device
of_mdio: avoid MDIO bus removal when a PHY is missing
caif_usb: use strlcpy() instead of strncpy()
doc: clarification about setting SO_ZEROCOPY
net: gianfar_ptp: move set_fipers() to spinlock protecting area
sctp: make use of pre-calculated len
sctp: add a ceiling to optlen in some sockopts
sctp: GFP_ATOMIC is not needed in sctp_setsockopt_events
bpf: introduce BPF_JIT_ALWAYS_ON config
bpf: avoid false sharing of map refcount with max_entries
ipv6: remove null_entry before adding default route
SolutionEngine771x: add Ether TSU resource
SolutionEngine771x: fix Ether platform data
docs-rst: networking: wire up msg_zerocopy
net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()
...
samples/bpf: xdp2skb_meta shows transferring info from XDP to SKB
Creating a bpf sample that shows howto use the XDP 'data_meta'
infrastructure, created by Daniel Borkmann. Very few drivers support
this feature, but I wanted a functional sample to begin with, when
working on adding driver support.
XDP data_meta is about creating a communication channel between BPF
programs. This can be XDP tail-progs, but also other SKB based BPF
hooks, like in this case the TC clsact hook. In this sample I show
that XDP can store info named "mark", and TC/clsact chooses to use
this info and store it into the skb->mark.
It is a bit annoying that XDP and TC samples uses different tools/libs
when attaching their BPF hooks. As the XDP and TC programs need to
cooperate and agree on a struct-layout, it is best/easiest if the two
programs can be contained within the same BPF restricted-C file.
As the bpf-loader, I choose to not use bpf_load.c (or libbpf), but
instead wrote a bash shell scripted named xdp2skb_meta.sh, which
demonstrate howto use the iproute cmdline tools 'tc' and 'ip' for
loading BPF programs. To make it easy for first time users, the shell
script have command line parsing, and support --verbose and --dry-run
mode, if you just want to see/learn the tc+ip command syntax:
# ./xdp2skb_meta.sh --dev ixgbe2 --dry-run
# Dry-run mode: enable VERBOSE and don't call TC+IP
tc qdisc del dev ixgbe2 clsact
tc qdisc add dev ixgbe2 clsact
tc filter add dev ixgbe2 ingress prio 1 handle 1 bpf da obj ./xdp2skb_meta_kern.o sec tc_mark
# Flush XDP on device: ixgbe2
ip link set dev ixgbe2 xdp off
ip link set dev ixgbe2 xdp obj ./xdp2skb_meta_kern.o sec xdp_mark
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Al Viro [Wed, 10 Jan 2018 23:47:05 +0000 (18:47 -0500)]
Fix a leak in socket(2) when we fail to allocate a file descriptor.
Got broken by "make sock_alloc_file() do sock_release() on failures" -
cleanup after sock_map_fd() failure got pulled all the way into
sock_alloc_file(), but it used to serve the case when sock_map_fd()
failed *before* getting to sock_alloc_file() as well, and that got
lost. Trivial to fix, fortunately.
Fixes: 8e1611e23579 (make sock_alloc_file() do sock_release() on failures) Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
====================
sfc: support 25G configuration with ethtool
Adds support for advertise bits beyond the 32-bit legacy masks, and plumbs in
translation of the new 25/50/100G bits to/from MCDI.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Wed, 10 Jan 2018 18:00:14 +0000 (18:00 +0000)]
sfc: support the ethtool ksettings API properly so that 25/50/100G works
Store and handle ethtool link mode masks within the driver instead of
just a single u32. However, quite a significant amount of existing code
wants to manipulate the masks directly, and thus now uses the first
unsigned long (i.e. mask[0]) as though it were a legacy u32 mask. This
is ok because all the bits that code is interested in are in the first
32 bits of the mask; but it might be a good idea to change them in
future to use the proper bitmap API.
Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Wed, 10 Jan 2018 17:59:59 +0000 (17:59 +0000)]
sfc: basic MCDI mapping of 25/50/100G link speeds
Only handles direct speed setting, not autoneg, because the driver is
still trying to pretend it uses the legacy ethtool API which doesn't
have advertised/supported bits for 25/50/100G.
Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Jan 2018 21:07:42 +0000 (16:07 -0500)]
Merge branch 'mlxsw-qdisc-refactoring'
Jiri Pirko says:
====================
mlxsw qdisc refactoring
This patchset refactors the qdisc handling in mlxsw driver in order to make
it more object oriented like.
It helps readability, laying the groundwork for the offloading of
additional qdiscs by the driver
This patchset also makes the qdiscs statistics more generic.
Patch 1 moves the qdiscs declaration to the spectrum_qdisc.c
Patches 2-3 clean the offloaded stats requests. Patch 2 changes the RED
generic stats struct to be sharable by other offloaded qdiscs. Patch 3
changes the xstats request to be like the stats. Note that these patches
are outside the driver scope.
Patches 4-5 clean the statistics related functions and structs within the
driver.
Patches 6-7 decrease the need for the same parameters to be sent to many
functions.
Patches 8-11 create a functions pointers struct, to make the qdiscs
handling more object oriented like.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 10 Jan 2018 14:00:07 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Remove qdisc before setting a new one
If a qdisc is being replaced by another qdisc of the same type, it can
simply override over its configuration.
However, if it replaces a qdisc of another type, it needs to be removed
before setting the new qdisc.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 10 Jan 2018 14:00:06 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Create a generic replace function
Create a generic qdisc replace function.
For that goal, add three functions to the qdisc ops struct:
* check_params: Checks if the given parameters are offloadable.
* replace: Offload the given parameters.
* clean_stats: clean the qdisc stats for the offloaded qdisc.
integrate RED offloading into using the new internal replace API.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 10 Jan 2018 14:00:05 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Create a generic destroy function
Add a destroy function to the qdiscs ops struct.
Create a generic qdisc destroy function, that clears the qdisc metadata as
well as calling the specific qdisc destroy function.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 10 Jan 2018 14:00:04 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Add an ops struct
Qdisc struct have the Qdisc_class_ops struct.
This patch introduces the similar ops struct for the mlxsw_sp_qdisc_ops
struct. It allows better readability as well as code reusability for the
common parts of some functions like destroy.
The first operations to be added are the statistics getters.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 10 Jan 2018 14:00:03 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Unite all handle checks
Every qdisc op gets the qdisc handle ID as well as its location. Each one
of them, beside replace, checks if the handle doesn't match the qdisc in
the given location, and if so, it returns without running the actual op.
Unite these checks to one comparison function and avoid sending the handle
id to these ops.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 10 Jan 2018 14:00:02 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Add tclass number to the mlxsw_sp_qdisc
Tclass number is needed for most of the operations related to the qdisc in
the driver. Create a field for it in the mlxsw_sp_qdisc instead of passing
it to every function as parameter.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 10 Jan 2018 14:00:01 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Make the clean stats function to be for RED only
Improve readability by changing the clean stats function to handle only
RED. Qdiscs that will be offloaded in the future will have a clean stats
function of their own.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 10 Jan 2018 13:59:59 +0000 (14:59 +0100)]
net: sch: red: Change offloaded xstats to be incremental
Change the value of the xstats requested from the driver for offloaded RED
to be incremental, like the normal stats.
It increases consistency - if a qdisc stops being offloaded its xstats
don't change.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mathieu Xhonneux [Wed, 10 Jan 2018 13:35:49 +0000 (13:35 +0000)]
ipv6: sr: fix TLVs not being copied using setsockopt
Function ipv6_push_rthdr4 allows to add an IPv6 Segment Routing Header
to a socket through setsockopt, but the current implementation doesn't
copy possible TLVs at the end of the SRH received from userspace.
Therefore, the execution of the following branch if (sr_has_hmac(sr_phdr))
{ ... } will never complete since the len and type fields of a possible
HMAC TLV are not copied, hence seg6_get_tlv_hmac will return an error,
and the HMAC will not be computed.
This commit adds a memcpy in case TLVs have been appended to the SRH.
Fixes: a149e7c7ce81 ("ipv6: sr: add support for SRH injection through setsockopt") Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 10 Jan 2018 11:45:49 +0000 (03:45 -0800)]
ipv6: fix possible mem leaks in ipv6_make_skb()
ip6_setup_cork() might return an error, while memory allocations have
been done and must be rolled back.
Fixes: 6422398c2ab0 ("ipv6: introduce ipv6_make_skb") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Reported-by: Mike Maloney <maloney@google.com> Acked-by: Mike Maloney <maloney@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 10 Jan 2018 10:42:44 +0000 (11:42 +0100)]
mlxsw: spectrum_qdisc: Don't use variable array in mlxsw_sp_tclass_congestion_enable
Resolve the sparse warning:
"sparse: Variable length array is used."
Use 2 arrays for 2 PRM register accesses.
Fixes: 96f17e0776c2 ("mlxsw: spectrum: Support RED qdisc offload") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 10 Jan 2018 10:42:43 +0000 (11:42 +0100)]
mlxsw: pci: Wait after reset before accessing HW
After performing reset driver polls on HW indication until learning
that the reset is done, but immediately after reset the device becomes
unresponsive which might lead to completion timeout on the first read.
Wait for 100ms before starting the polling.
Fixes: 233fa44bd67a ("mlxsw: pci: Implement reset done check") Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath [Wed, 10 Jan 2018 06:32:13 +0000 (12:02 +0530)]
cxgb4vf: Fix SGE FL buffer initialization logic for 64K pages
We'd come in with SGE_FL_BUFFER_SIZE[0] and [1] both equal to 64KB and
the extant logic would flag that as an error. This was already fixed in
cxgb4 driver with "92ddcc7 cxgb4: Fix some small bugs in
t4_sge_init_soft() when our Page Size is 64KB".
Original Work by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Rothwell [Wed, 10 Jan 2018 04:06:14 +0000 (15:06 +1100)]
tuntap: fix for "tuntap: XDP transmission"
Fixes: fc72d1d54dd9 ("tuntap: XDP transmission") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 10 Jan 2018 02:14:28 +0000 (18:14 -0800)]
nfp: always unmask aux interrupts at init
The link state and exception interrupts may be masked when we probe.
The firmware should in theory prevent sending (and automasking) those
interrupts if the device is disabled, but if my reading of the FW code
is correct there are firmwares out there with race conditions in this
area. The interrupt may also be masked if previous driver which used
the device was malfunctioning and we didn't load the FW (there is no
other good way to comprehensively reset the PF).
Note that FW unmasks the data interrupts by itself when vNIC is
enabled, such helpful operation is not performed for LSC/EXN interrupts.
Always unmask the auxiliary interrupts after request_irq(). On the
remove path add missing PCI write flush before free_irq().
Fixes: 4c3523623dc0 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jingjing Wu [Tue, 14 Nov 2017 12:00:47 +0000 (07:00 -0500)]
i40e: change ppp name to ddp
PPP name was going to be confusing since PPP already means point
to point protocol. It is decided to change pipeline personalization
profile(ppp) to dynamic device personalization(ddp).
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 14 Nov 2017 12:00:46 +0000 (07:00 -0500)]
i40evf: Drop i40evf_fire_sw_int as it is prone to races
Having the interrupts firing while we are polling causes extra overhead and
isn't needed for most systems out there. If an interrupt is lost us
experiencing a 2s latency spike before recovering is still not acceptable
and masks the issue. We are better off just identifying systems that lose
interrupts and instead enable workarounds for those systems.
To that end I am dropping the code that was strobing the interrupts as
there is a narrow window where having them enabled can actually cause
race issues anyway where a few stray packets might get misses if the
interrupt is re-enabled and fires before we call napi_complete.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 14 Nov 2017 12:00:45 +0000 (07:00 -0500)]
i40evf: Clean-up flags for promisc mode to avoid high polling rate
If you enabled and disabled promiscuous mode on a VF you could easily put
it into a state where it would start firing interrupts on all queues at a
rate of 50+ interrupts per second even though there was no traffic present.
The issue seems to have been a stray admin queue feature flag set that was
leaving us in a high polling rate for the adminq task.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 14 Nov 2017 12:00:44 +0000 (07:00 -0500)]
i40evf: Do not clear MSI-X PBA manually
We should not be clearing the pending bit array for each vector manually.
The documentation for the hardware states that when in MSI-X mode the
pending bit array will be cleared automatically. Us clearing it ourselves
just results in multiple opportunities for us to drop an interrupt.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Colin Ian King [Sun, 5 Nov 2017 13:04:29 +0000 (13:04 +0000)]
i40e: remove redundant initialization of read_size
Variable read_size is initialized and this value is never read, it is
instead set inside the do-loop, hence the initialization is redundant
and can be removed. Cleans up clang warning:
drivers/net/ethernet/intel/i40e/i40e_nvm.c:390:6: warning: Value stored
to 'read_size' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alice Michael [Fri, 27 Oct 2017 15:06:56 +0000 (11:06 -0400)]
i40e/i40evf: Bump driver versions
Bump the i40e driver from 2.1.14 to 2.3.2.
Bump the i40evf driver from 3.0.1 to 3.2.2
Signed-off-by: Alice Michael <alice.michael@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 27 Oct 2017 15:06:54 +0000 (11:06 -0400)]
i40e: add helper conversion function for link_speed
We introduced the virtchnl interface in order to have an interface for
talking to a virtual device driver which was host-driver agnostic. This
interface has its own definitions, including one for link speed.
The host driver has to talk to the virtchnl interface using these new
definitions in order to remain compatible. Today, the i40e link_speed
enumerations are value-exact matches for the virtchnl interface, so it
was originally decided to simply use a typecast.
However, this is unsafe, and makes it easier for future drivers to
continue this unsafe practice. There is nothing guaranteeing these
values are exact, and the type-cast would hide any compiler warning
which indicates the problem.
Rather than rely on this type cast, introduce a helper function which
can convert the AdminQ link speed definition into a virtchnl
definition. This can then be used by host driver implementations in
order to safely convert to the interface recognized by the virtual
functions.
If the link speed is not able to be represented by the virtchnl
definitions we'll report UNKNOWN which is the safest result.
This will ensure that should the driver specific link_speeds actual bit
definitions change, we do not report them incorrectly according to the
VF.
Additionally, this provides a better pattern for future drivers to copy,
as it is more likely a future device may not use the exact same bit-wise
definition as the current virtchnl interface.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 27 Oct 2017 15:06:53 +0000 (11:06 -0400)]
i40e: update VFs of link state after GET_VF_RESOURCES
We currently notify a VF of the link state after ENABLE_QUEUES, which is
the last thing a VF does after being configured. Guests may not actually
ENABLE_QUEUES until they get configured, and thus between driver load
and device configuration the VF may show inaccurate link status.
Fix this by also sending the link state after GET_VF_RESOURCES. Although
we could remove the message following ENABLE_QUEUES, it's not that
significant of a loss, so this patch just keeps both to ensure maximum
compatibility with guests on various OSes.
Specifically, without this patch guests running FreeBSD will display
inaccurate link state until the device is brought up. This is mostly
a cosmetic issue but can be confusing to system administrators.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 27 Oct 2017 15:06:52 +0000 (11:06 -0400)]
i40evf: hold the critical task bit lock while opening
If i40evf_open() is called quickly at the same time as a reset occurs
(such as via ethtool) it is possible for the device to attempt to open
while a reset is in progress. This occurs because the driver was not
holding the critical task bit lock during i40evf_open, nor was it
holding it around the call to i40evf_up_complete() in
i40evf_reset_task().
We didn't hold the lock previously because calls to i40evf_down() would
take the bit lock directly, and this would have caused a deadlock.
To avoid this, we'll move the bit lock handling out of i40evf_down() and
into the callers of this function. Additionally, we'll now hold the bit
lock over the entire set of steps when going up or down, to ensure that
we remain consistent.
Ultimately this causes us to serialize the transitions between down and
up properly, and avoid changing status while we're resetting.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 27 Oct 2017 15:06:51 +0000 (11:06 -0400)]
i40evf: release bit locks in reverse order
Although not strictly necessary, it is customary to reverse the order in
which we release locks that we acquire. This helps preserve lock
ordering during future refactors, which can help avoid potential
deadlock situations.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 27 Oct 2017 15:06:50 +0000 (11:06 -0400)]
i40evf: use spinlock to protect (mac|vlan)_filter_list
Stop overloading the __I40EVF_IN_CRITICAL_TASK bit lock to protect the
mac_filter_list and vlan_filter_list. Instead, implement a spinlock to
protect these two lists, similar to how we protect the hash in the i40e
PF code.
Ensure that every place where we access the list uses the spinlock to
ensure consistency, and stop holding the critical section around blocks
of code which only need access to the macvlan filter lists.
This refactor helps simplify the locking behavior, and is necessary as
a future refactor to the __I40EVF_IN_CRITICAL_TASK would cause
a deadlock otherwise.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 27 Oct 2017 15:06:49 +0000 (11:06 -0400)]
i40evf: don't rely on netif_running() outside rtnl_lock()
In i40evf_reset_task we use netif_running() to determine whether or not
the device is currently up. This allows us to properly free queue memory
and shut down things before we request the hardware reset.
It turns out that we cannot be guaranteed of netif_running() returning
false until the device is fully up, as the kernel core code sets
__LINK_STATE_START prior to calling .ndo_open. Since we're not holding
the rtnl_lock(), it's possible that the driver's i40evf_open handler
function is currently being called while we're resetting.
We can't simply hold the rtnl_lock() while checking netif_running() as
this could cause a deadlock with the i40evf_open() function.
Additionally, we can't avoid the deadlock by holding the rtnl_lock()
over the whole reset path, as this essentially serializes all resets,
and can cause massive delays if we have multiple VFs on a system.
Instead, lets just check our own internal state __I40EVF_RUNNING state
field. This allows us to ensure that the state is correct and is only
set after we've finished bringing the device up.
Without this change we might free data structures about device queues
and other memory before they've been fully allocated.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alice Michael [Fri, 27 Oct 2017 15:06:48 +0000 (11:06 -0400)]
i40e: display priority_xon and priority_xoff stats
Display some more stats that were already being counted, to help users
understand when priority xon/xoff packets are being sent/received
Signed-off-by: Alice Michael <alice.michael@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
net: fix xdp_rxq_info build issue when CONFIG_SYSFS is not set
The commit e817f85652c1 ("xdp: generic XDP handling of xdp_rxq_info")
removed some ifdef CONFIG_SYSFS in net/core/dev.c, but forgot to
remove the corresponding ifdef's in include/linux/netdevice.h.
Fixes: e817f85652c1 ("xdp: generic XDP handling of xdp_rxq_info") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 9 Jan 2018 21:42:09 +0000 (22:42 +0100)]
net: phy: marvell: mv88e6390 temperature sensor reading
The internal PHYs in the mv88e6390 switch have a temperature sensor.
It uses a different register layout to other PHY currently supported.
It also has an errata, in that some reads of the sensor result in bad
values. So a number of reads need to be made, and the average taken.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Tue, 9 Jan 2018 21:40:41 +0000 (13:40 -0800)]
8021q: fix a memory leak for VLAN 0 device
A vlan device with vid 0 is allow to creat by not able to be fully
cleaned up by unregister_vlan_dev() which checks for vlan_id!=0.
Also, VLAN 0 is probably not a valid number and it is kinda
"reserved" for HW accelerating devices, but it is probably too
late to reject it from creation even if makes sense. Instead,
just remove the check in unregister_vlan_dev().
Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This converts the dynamic interrupt moderation library from the mlx5e
driver into a library so it can be used by any driver. The penultimate
patch in this set adds support for this new dynamic interrupt moderation
library in the bnxt_en driver and the last patch creates an entry in the
MAINTAINERS file for this library.
The main purpose of this code is to allow an administrator to make sure
that default coalesce settings are optimized for low latency, but
quickly adapt to handle high throughput/bulk traffic by altering how
much time passes before popping an interrupt.
For any new driver the following changes would be needed to use this
library:
- add elements in ring struct to track items needed by this library
- create function that can be called to actually set coalesce settings
for the driver
Credit to Rob Rice and Lee Reed for doing some of the initial proof of
concept and testing for this patch and Tal Gilboa and Or Gerlitz for
their comments, etc on this set.
v4: Fix build breakage for VF representers noticed by kbuild test robot.
Thanks for being so courteous, kbuild test robot!
v3: bnxt_en fix from Michael Chan, comment suggestion from Vasundhara
Volam, and small mlx5e header file fix from Tal Gilboa.
v2: Spelling fixes from Stephen Hemminger, bnxt_en suggestions from
Michael Chan, spelling and formatting fixes from Or Gerlitz, and
spelling and mlx5e changes suggested by Tal Gilboa.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Tue, 9 Jan 2018 21:06:21 +0000 (16:06 -0500)]
MAINTAINERS: add entry for Dynamic Interrupt Moderation
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Tal Gilboa <talgi@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Tue, 9 Jan 2018 21:06:20 +0000 (16:06 -0500)]
bnxt_en: add support for software dynamic interrupt moderation
This implements the changes needed for the bnxt_en driver to add support
for dynamic interrupt moderation per ring.
This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Tue, 9 Jan 2018 21:06:19 +0000 (16:06 -0500)]
net/dim: use struct net_dim_sample as arg to net_dim
Simplify the arguments net_dim() by formatting them into a struct
net_dim_sample before calling the function.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Suggested-by: Tal Gilboa <talgi@mellanox.com> Acked-by: Tal Gilboa <talgi@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Tue, 9 Jan 2018 21:06:18 +0000 (16:06 -0500)]
net/mlx5e: Move dynamic interrupt coalescing code to include/linux
This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring. A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Acked-by: Tal Gilboa <talgi@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Tue, 9 Jan 2018 21:06:17 +0000 (16:06 -0500)]
net/mlx5e: Change Mellanox references in DIM code
Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
NET_DIM, respectively, in code that handles dynamic interrupt
moderation. Also change all references from 'am' to 'dim' when used as
local variables and add generic profile references.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Acked-by: Tal Gilboa <talgi@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Tue, 9 Jan 2018 21:06:16 +0000 (16:06 -0500)]
net/mlx5e: Move generic functions to new file
These functions were identified as ones that could be made generic and
used by multiple drivers. Most of the contents of en_rx_am.c are moved
to net_dim.c.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Acked-by: Tal Gilboa <talgi@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Tue, 9 Jan 2018 21:06:15 +0000 (16:06 -0500)]
net/mlx5e: Move AM logic enums
More movement to help make this code more generic.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Acked-by: Tal Gilboa <talgi@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Tue, 9 Jan 2018 21:06:14 +0000 (16:06 -0500)]
net/mlx5e: Remove rq references in mlx5e_rx_am
This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Acked-by: Tal Gilboa <talgi@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Move these to newly created file to prepare to move these functions to a
library.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Acked-by: Tal Gilboa <talgi@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>