This patch fixes a critical reset issue that resulting to the server
reboot when an Admin changes VF configuration on the host, for example
changing VF to Trusted/non_Trusted mode, the PF driver send reset
notification to AVF driver while also continue with reset flow. However,
AVF driver schedule another reset due to notification, which causes two
concurrent reset going on, and trigger lock up in the FW, with AQ call to
delete VSI.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ice: Alloc queue management bitmaps and arrays dynamically
The total number of queues available on the device is divided between
multiple physical functions (PF) in the firmware and provided to the
driver when it gets function capabilities from the firmware. Thus
each PF knows how many Tx/Rx queues it has. These queues are then
doled out to different VSIs (for LAN traffic, SR-IOV VF traffic, etc.)
To track usage of these queues at the PF level, the driver uses two
bitmaps avail_txqs and avail_rxqs. At the VSI level (i.e. struct ice_vsi
instances) the driver uses two arrays txq_map and rxq_map, to track
ownership of VSIs' queues in avail_txqs and avail_rxqs respectively.
The aforementioned bitmaps and arrays should be allocated dynamically,
because the number of queues supported by a PF is only available once
function capabilities have been queried. The current static allocation
consumes way more memory than required.
This patch removes the DECLARE_BITMAP for avail_txqs and avail_rxqs
and instead uses bitmap_zalloc to allocate the bitmaps during init.
Similarly txq_map and rxq_map are now allocated in ice_vsi_alloc_arrays.
As a result ICE_MAX_TXQS and ICE_MAX_RXQS defines are no longer needed.
Also as txq_map and rxq_map are now allocated and freed, some code
reordering was required in ice_vsi_rebuild for correct functioning.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Paul Greenwalt [Fri, 2 Aug 2019 08:25:20 +0000 (01:25 -0700)]
ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap
The VF driver can call VIRTCHNL_OP_[ENABLE|DISABLE]_QUEUES separately
for each queue. Add support for virtchnl_queue_select.[tx|rx]_queues
bitmap which is used to indicate which queues to enable and disable.
Add tracing of VF Tx/Rx per queue enable state to avoid enabling enabled
queues and disabling disabled queues. Add total queues enabled count and
clear ICE_VF_STATE_QS_ENA when count is zero.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Peng Huang <peng.huang@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ice: add support for enabling/disabling single queues
Refactor the queue handling functions that are going through queue
arrays in a way that the logic done for a single queue is pulled out and
it will be called for each ring when traversing ring array. This implies
that when disabling Tx rings we won't fill up q_ids, q_teids and
q_handles arrays. Drop also 'offset' parameter; the value from vsi's
txq_map is stored in ring->reg_idx and that drops the need for mentioned
parameter. Introduce the ice_vsi_cfg_txq, ice_vsi_stop_tx_ring and
ice_vsi_ctrl_rx_ring that are the functions with pulled out logic.
There's several Tx queue meta data (q_id, q_handle, q_teid and other)
that need to be set up during Tx queue disablement, so let's as well add
a helper structure that wraps it up and a function that will be filling
it up.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Colin Ian King [Fri, 2 Aug 2019 15:52:17 +0000 (16:52 +0100)]
ice: fix potential infinite loop
The loop counter of a for-loop is a u8 however this is being compared
to an int upper bound and this can lead to an infinite loop if the
upper bound is greater than 255 since the loop counter will wrap back
to zero. Fix this potential issue by making the loop counter an int.
Addresses-Coverity: ("Infinite loop") Fixes: c7aeb4d1b9bf ("ice: Disable VFs until reset is completed") Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Thu, 25 Jul 2019 09:54:01 +0000 (02:54 -0700)]
ice: fix ice_is_tc_ena
ice_is_tc_ena is used to check whether a given traffic class is
enabled. Because there are only 8 traffic classes, the function took
a u8 bitmap. This causes problems because it is cast to an unsigned
long causing a static analysis warning regarding Out-of-bounds read.
Fix this by simply updating ice_is_tc_ena to take an unsigned long.
Passing a u8 to this function should implicitly convert the value.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ice: add validation in OP_CONFIG_VSI_QUEUES VF message
Check num_queue_pairs to avoid access to unallocated field of
vsi->tx_rings/vsi->rx_rings. Without this validation we can set
vsi->alloc_txq/vsi->alloc_rxq to value smaller than ICE_MAX_BASE_QS_PER_VF
and send this command with num_queue_pairs greater than
vsi->alloc_txq/vsi->alloc_rxq. This lead to access to unallocated memory.
In VF vsi alloc_txq and alloc_rxq should be the same. Get minimum
because looks more readable.
Also add validation for ring_len param. It should be greater than 32 and
be multiple of 32. Incorrect value leads to hang traffic on PF.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In case of MDD events on VF, don't clog kernel log with unlimited VF MDD
events message "VF 0 has had 1018 MDD events since last boot" - limit
events log message to 30, based on the observation in some experimentation
with sending malicious packet once, and number of events reported before
device stopped observing MDD events.
Also removed defunct macro "ICE_DFLT_NUM_MDD_EVENTS_ALLOWED" for tracking
number of MDD events allowed before disabling the interface...
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ice: Introduce a local variable for a VSI in the rebuild path
When a VSI is accessed inside the ice_for_each_vsi macro in the rebuild
path (ice_vsi_rebuild_all() and ice_vsi_replay_all()), it is referred to
as pf->vsi[i]. Introduce local variables to improve readability.
Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
1. ndo_open and ndo_stop are implemented by ice_open and ice_stop
respectively. When enabling/disabling VSIs, just call
ice_open/ice_stop instead of ndo_open/ndo_stop.
2. Rework logic around rtnl_lock/rtnl_unlock
3. In ice_ena_vsi, remove an unnecessary stack variable and return
0 instead of err when __ICE_NEEDS_RESTART is not set.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Victor Raj [Thu, 25 Jul 2019 09:53:54 +0000 (02:53 -0700)]
ice: added sibling head to parse nodes
There was a bug in the previous code which never traverses all the
children to get the first node of the requested layer. Add a sibling
head pointer to point the first node of each layer per TC. This helps
traverse easier and quicker and also removes the recursion.
Signed-off-by: Victor Raj <victor.raj@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ice: Fix ethtool port and PFC stats for 4x25G cards
This patch fixes the issue where port and PFC statistics counters are
incrementing at the wrong port with 4x25G cards.
Read the GLPRT port registers using lport parameter instead of pf_id to
update the statistics otherwise the pf_ids are flipped for ports 2 and 3
when read from the HW register PF_FUNC_RID and this is expected as per
hardware specification.
Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jakub Kicinski [Mon, 26 Aug 2019 22:30:41 +0000 (15:30 -0700)]
nfp: add AMDA0058 boards to firmware list
Add MODULE_FIRMWARE entries for AMDA0058 boards.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 26 Aug 2019 20:52:36 +0000 (22:52 +0200)]
r8169: improve DMA handling in rtl_rx
Move the call to dma_sync_single_for_cpu after calling napi_alloc_skb.
This avoids calling dma_sync_single_for_cpu w/o handing control back
to device if the memory allocation should fail.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 26 Aug 2019 21:17:44 +0000 (14:17 -0700)]
Merge branch 'cls-hw-offload-rtnl'
Vlad Buslov says:
====================
Refactor cls hardware offload API to support rtnl-independent drivers
Currently, all cls API hardware offloads driver callbacks require caller
to hold rtnl lock when calling them. This patch set introduces new API
that allows drivers to register callbacks that are not dependent on rtnl
lock and unlocked classifiers to offload filters without obtaining rtnl
lock first, which is intended to allow offloading tc rules in parallel.
Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are
already registered with this flag and only take rtnl lock when qdisc or
classifier requires it. Classifiers can indicate that their ops
callbacks don't require caller to hold rtnl lock by setting the
TCF_PROTO_OPS_DOIT_UNLOCKED flag. Unlocked implementation of flower
classifier is now upstreamed. However, this implementation still obtains
rtnl lock before calling hardware offloads API.
Implement following cls API changes:
- Introduce new "unlocked_driver_cb" flag to struct flow_block_offload
to allow registering and unregistering block hardware offload
callbacks that do not require caller to hold rtnl lock. Drivers that
doesn't require users of its tc offload callbacks to hold rtnl lock
sets the flag to true on block bind/unbind. Internally tcf_block is
extended with additional lockeddevcnt counter that is used to count
number of devices that require rtnl lock that block is bound to. When
this counter is zero, tc_setup_cb_*() functions execute callbacks
without obtaining rtnl lock.
- Extend cls API single hardware rule update tc_setup_cb_call() function
with tc_setup_cb_add(), tc_setup_cb_replace(), tc_setup_cb_destroy()
and tc_setup_cb_reoffload() functions. These new APIs are needed to
move management of block offload counter, filter in hardware counter
and flag from classifier implementations to cls API, which is now
responsible for managing them in concurrency-safe manner. Access to
cb_list from callback execution code is synchronized by obtaining new
'cb_lock' rw_semaphore in read mode, which allows executing callbacks
in parallel, but excludes any modifications of data from
register/unregister code. tcf_block offloads counter type is changed
to atomic integer to allow updating the counter concurrently.
- Extend classifier ops with new ops->hw_add() and ops->hw_del()
callbacks which are used to notify unlocked classifiers when filter is
successfully added or deleted to hardware without releasing cb_lock.
This is necessary to update classifier state atomically with callback
list traversal and updating of all relevant counters and allows
unlocked classifiers to synchronize with concurrent reoffload without
requiring any changes to driver callback API implementations.
New tc flow_action infrastructure is also modified to allow its user to
execute without rtnl lock protection. Function tc_setup_flow_action() is
modified to conditionally obtain rtnl lock before accessing action
state. Action data that is accessed by reference is either copied or
reference counted to prevent concurrent action overwrite from
deallocating it. New function tc_cleanup_flow_action() is introduced to
cleanup/release all such data obtained by tc_setup_flow_action().
Flower classifier (only unlocked classifier at the moment) is modified
to use new cls hardware offloads API and no longer obtains rtnl lock
before calling it.
====================
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 26 Aug 2019 13:45:05 +0000 (16:45 +0300)]
net: sched: copy tunnel info when setting flow_action entry->tunnel
In order to remove dependency on rtnl lock, modify tc_setup_flow_action()
to copy tunnel info, instead of just saving pointer to tunnel_key action
tunnel info. This is necessary to prevent concurrent action overwrite from
releasing tunnel info while it is being used by rtnl-unlocked driver.
Implement helper tcf_tunnel_info_copy() that is used to copy tunnel info
with all its options to dynamically allocated memory block. Modify
tc_cleanup_flow_action() to free dynamically allocated tunnel info.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 26 Aug 2019 13:45:04 +0000 (16:45 +0300)]
net: sched: take reference to action dev before calling offloads
In order to remove dependency on rtnl lock when calling hardware offload
API, take reference to action mirred dev when initializing flow_action
structure in tc_setup_flow_action(). Implement function
tc_cleanup_flow_action(), use it to release the device after hardware
offload API is done using it.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 26 Aug 2019 13:45:03 +0000 (16:45 +0300)]
net: sched: take rtnl lock in tc_setup_flow_action()
In order to allow using new flow_action infrastructure from unlocked
classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held'
argument. Take rtnl lock before accessing tc_action data. This is necessary
to protect from concurrent action replace.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 26 Aug 2019 13:45:02 +0000 (16:45 +0300)]
net: sched: conditionally obtain rtnl lock in cls hw offloads API
In order to remove dependency on rtnl lock from offloads code of
classifiers, take rtnl lock conditionally before executing driver
callbacks. Only obtain rtnl lock if block is bound to devices that require
it.
Block bind/unbind code is rtnl-locked and obtains block->cb_lock while
holding rtnl lock. Obtain locks in same order in tc_setup_cb_*() functions
to prevent deadlock.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 26 Aug 2019 13:45:01 +0000 (16:45 +0300)]
net: sched: add API for registering unlocked offload block callbacks
Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow
registering and unregistering block hardware offload callbacks that do not
require caller to hold rtnl lock. Extend tcf_block with additional
lockeddevcnt counter that is incremented for each non-unlocked driver
callback attached to device. This counter is necessary to conditionally
obtain rtnl lock before calling hardware callbacks in following patches.
Register mlx5 tc block offload callbacks as "unlocked".
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 26 Aug 2019 13:45:00 +0000 (16:45 +0300)]
net: sched: notify classifier on successful offload add/delete
To remove dependency on rtnl lock, extend classifier ops with new
ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while
holding cb_lock every time filter if successfully added to or deleted from
hardware.
Implement the new API in flower classifier. Use it to manage hw_filters
list under cb_lock protection, instead of relying on rtnl lock to
synchronize with concurrent fl_reoffload() call.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 26 Aug 2019 13:44:59 +0000 (16:44 +0300)]
net: sched: refactor block offloads counter usage
Without rtnl lock protection filters can no longer safely manage block
offloads counter themselves. Refactor cls API to protect block offloadcnt
with tcf_block->cb_lock that is already used to protect driver callback
list and nooffloaddevcnt counter. The counter can be modified by concurrent
tasks by new functions that execute block callbacks (which is safe with
previous patch that changed its type to atomic_t), however, block
bind/unbind code that checks the counter value takes cb_lock in write mode
to exclude any concurrent modifications. This approach prevents race
conditions between bind/unbind and callback execution code but allows for
concurrency for tc rule update path.
Move block offload counter, filter in hardware counter and filter flags
management from classifiers into cls hardware offloads API. Make functions
tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API
private. Implement following new cls API to be used instead:
tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
already in hardware is successfully offloaded, increment block offloads
counter, set filter in hardware counter and flag. On failure, previously
offloaded filter is considered to be intact and offloads counter is not
decremented.
tc_setup_cb_replace() - destructive filter replace. Release existing
filter block offload counter and reset its in hardware counter and flag.
Set new filter in hardware counter and flag. On failure, previously
offloaded filter is considered to be destroyed and offload counter is
decremented.
tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and
call tc_cls_offload_cnt_update() if cb() didn't return an error.
Refactor all offload-capable classifiers to atomically offload filters to
hardware, change block offload counter, and set filter in hardware counter
and flag by means of the new cls API functions.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 26 Aug 2019 13:44:58 +0000 (16:44 +0300)]
net: sched: change tcf block offload counter type to atomic_t
As a preparation for running proto ops functions without rtnl lock, change
offload counter type to atomic. This is necessary to allow updating the
counter by multiple concurrent users when offloading filters to hardware
from unlocked classifiers.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 26 Aug 2019 13:44:57 +0000 (16:44 +0300)]
net: sched: protect block offload-related fields with rw_semaphore
In order to remove dependency on rtnl lock, extend tcf_block with 'cb_lock'
rwsem and use it to protect flow_block->cb_list and related counters from
concurrent modification. The lock is taken in read mode for read-only
traversal of cb_list in tc_setup_cb_call() and write mode in all other
cases. This approach ensures that:
- cb_list is not changed concurrently while filters is being offloaded on
block.
- block->nooffloaddevcnt is checked while holding the lock in read mode,
but is only changed by bind/unbind code when holding the cb_lock in write
mode to prevent concurrent modification.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Mon, 26 Aug 2019 02:49:15 +0000 (02:49 +0000)]
cirrus: cs89x0: remove set but not used variable 'lp'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/cirrus/cs89x0.c: In function 'cs89x0_platform_probe':
drivers/net/ethernet/cirrus/cs89x0.c:1847:20: warning:
variable 'lp' set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 6751edeb8700 ("cirrus: cs89x0: Use managed interfaces") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan [Mon, 26 Aug 2019 01:31:18 +0000 (09:31 +0800)]
net: mediatek: remove set but not used variable 'status'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/mediatek/mtk_eth_soc.c: In function mtk_handle_irq:
drivers/net/ethernet/mediatek/mtk_eth_soc.c:1951:6: warning: variable status set but not used [-Wunused-but-set-variable]
Fixes: 296c9120752b ("net: ethernet: mediatek: Add MT7628/88 SoC support") Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 24 Aug 2019 23:04:17 +0000 (01:04 +0200)]
net: phy: sfp: Add labels to hwmon sensors
SFPs can report two different power values, the transmit power and the
receive power. Add labels to make it clear which is which. Also add
labels to the other sensors, VCC power supply, bias and module
temperature.
sensors(1) now shows:
sff2-isa-0000
Adapter: ISA adapter
VCC: +3.23 V
temperature: +33.4 C
TX_power: 276.00 uW
RX_power: 20.00 uW
bias: +0.01 A
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 24 Aug 2019 23:46:45 +0000 (16:46 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-08-23
This series contains updates to ice driver only.
Dave adds logic for the necessary bits to be set in the VSI context for
the PF_VSI and the TX_descriptors for control packets egressing the
PF_VSI. Updated the logic to detect both DCBx and LLDP states in the
firmware engine to account for situations where DCBx is enabled and LLDP
is disabled. Fixed the driver to treat the DCBx state of "NOT_STARTED"
as a valid state and should not assume "is_fw_lldp" true automatically.
Since "enable-fw-lldp" flag was confusing and cumbersome, change the
flag to "fw-lldp-agent" with a value of on or off to help clarify
whether the LLDP agent is running or not.
Brett fixes an issue where synchronize_irq() was being called from the
host of VF's, which should not be done.
Michal fixed an issue when rebuilding the DCBx configuration while in
IEEE mode versus CEE mode, so add a check before copying the
configuration value to ensure we are only in CEE mode.
Jake fixes the PF to reject any VF request to setup head writeback since
the support has been deprecated.
Mitch adds an additional check to ensure the VF is active before sending
out an error message that a message was unable to be sent to a
particular VF.
Chinh updates the driver to use "topology" mode when checking the PHY
for status, since this mode provides us the current module type that is
available. Fixes the driver from clearing the auto_fec_enable bit which
was blocking a user from forcing non-spec compliant FEC configurations.
Amruth does a refactor on the code to first check, then assign in the
virtual channel space.
Bruce updates the driver to actually update the stats when a user runs
the ethtool command 'ethtool -S <iface>' instead of providing a snapshot
of the stats that maybe from a second ago.
Akeem fixes up the adding/removing of VSI MAC filters for VFs, so that
VFs cannot add/remove a filter from another VSI. We now track the
number of filters added right from when the VF resources get allocated
and won't get into MAC filter mis-match issue in the switch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Fri, 23 Aug 2019 09:48:53 +0000 (11:48 +0200)]
s390/qeth: add xmit_more support for IQD devices
IQD devices offer limited support for bulking: all frames in a TX buffer
need to have the same target. qeth_iqd_may_bulk() implements this
constraint, and allows us to defer the TX doorbell until
(a) the buffer is full (since each buffer needs its own doorbell), or
(b) the entire TX queue is full, or
(b) we reached the BQL limit.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Fri, 23 Aug 2019 09:48:52 +0000 (11:48 +0200)]
s390/qeth: add BQL support for IQD devices
Each TX buffer may contain multiple skbs. So just accumulate the sent
byte count in the buffer struct, and later use the same count when
completing the buffer.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Fri, 23 Aug 2019 09:48:50 +0000 (11:48 +0200)]
s390/qeth: add TX NAPI support for IQD devices
Due to their large MTU and potentially low utilization of TX buffers,
IQD devices in particular require fast TX recycling. This makes them
a prime candidate for a TX NAPI path in qeth.
qeth_tx_poll() uses the recently introduced qdio_inspect_queue() helper
to poll the TX queue for completed buffers. To avoid hogging the CPU for
too long, we yield to the stack after completing an entire queue's worth
of buffers.
While IQD is expected to transfer its buffers synchronously (and thus
doesn't support TX interrupts), a timer covers for the odd case where a
TX buffer doesn't complete synchronously. Currently this timer should
only ever fire for
(1) the mcast queue,
(2) the occasional race, where the NAPI poll code observes an update to
queue->used_buffers while the TX doorbell hasn't been issued yet.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Fri, 23 Aug 2019 09:48:49 +0000 (11:48 +0200)]
s390/qeth: collect accurate TX statistics
This consolidates the SW statistics code, and improves it to
(1) account for the header overhead of each segment on a TSO skb,
(2) count dangling packets as in-error (during eg. shutdown), and
(3) only count offloads when the skb was successfully transmitted.
We also count each segment of an TSO skb as one packet - except for
tx_dropped, to be consistent with dev->tx_dropped.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Fri, 23 Aug 2019 09:48:48 +0000 (11:48 +0200)]
s390/qdio: let drivers opt-out from Output Queue scanning
If a driver wants to use the new Output Queue poll code, then the qdio
layer must disable its internal Queue scanning. Let the driver select
this mode by passing a special scan_threshold of 0.
As the scan_threshold is the same for all Output Queues, also move it
into the main qdio_irq struct. This allows for fast opt-out checking, a
driver is expected to operate either _all_ or none of its Output Queues
in polling mode.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Fri, 23 Aug 2019 09:48:47 +0000 (11:48 +0200)]
s390/qdio: enable drivers to poll for Output completions
While commit d36deae75011 ("qdio: extend API to allow polling") enhanced
the qdio layer so that drivers can poll their Input Queues, we don't
have the corresponding infrastructure for Output Queues yet.
Factor out a helper that scans a single QDIO Queue, so that qeth can
implement TX NAPI on top of it.
While doing so, remove the duplicated tracking of the next-to-scan index
(q->first_to_check vs q->first_to_kick) in this code path.
qdio_handle_aobs() needs to move slightly upwards in the code hierarchy,
so that it's still called from the polling path.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 23 Aug 2019 05:51:41 +0000 (01:51 -0400)]
bnxt_en: Fix allocation of zero statistics block size regression.
Recent commit added logic to determine the appropriate statistics block
size to allocate and the size is stored in bp->hw_ring_stats_size. But
if the firmware spec is older than 1.6.0, it is 0 and not initialized.
This causes the allocation to fail with size 0 and bnxt_open() to
abort. Fix it by always initializing bp->hw_ring_stats_size to the
legacy default size value.
Fixes: 4e7485066373 ("bnxt_en: Allocate the larger per-ring statistics block for 57500 chips.") Reported-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Thu, 22 Aug 2019 16:00:40 +0000 (18:00 +0200)]
net/core/skmsg: Delete an unnecessary check before the function call “consume_skb”
The consume_skb() function performs also input parameter validation.
Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Thu, 22 Aug 2019 14:49:37 +0000 (22:49 +0800)]
net: hns3: Fix -Wunused-const-variable warning
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h:542:30:
warning: meta_data_key_info defined but not used [-Wunused-const-variable=]
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h:553:30:
warning: tuple_key_info defined but not used [-Wunused-const-variable=]
The two variable is only used in hclge_main.c,
so just move the definition over there.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 23 Aug 2019 18:07:26 +0000 (20:07 +0200)]
r8169: fix DMA issue on MIPS platform
As reported by Aaro this patch causes network problems on
MIPS Loongson platform. Therefore revert it.
Fixes: f072218cca5b ("r8169: remove not needed call to dma_sync_single_for_device") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 23 Aug 2019 15:47:21 +0000 (18:47 +0300)]
drop_monitor: Make timestamps y2038 safe
Timestamps are currently communicated to user space as 'struct
timespec', which is not considered y2038 safe since it uses a 32-bit
signed value for seconds.
Fix this while the API is still not part of any official kernel release
by using 64-bit nanoseconds timestamps instead.
Fixes: ca30707dee2b ("drop_monitor: Add packet alert mode") Fixes: 5e58109b1ea4 ("drop_monitor: Add support for packet alert mode for hardware drops") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dag Moxnes [Fri, 23 Aug 2019 14:03:18 +0000 (16:03 +0200)]
net/rds: Whitelist rdma_cookie and rx_tstamp for usercopy
Add the RDMA cookie and RX timestamp to the usercopy whitelist.
After the introduction of hardened usercopy whitelisting
(https://lwn.net/Articles/727322/), a warning is displayed when the
RDMA cookie or RX timestamp is copied to userspace:
When the whitelisting feature was introduced, the memory for the RDMA
cookie and RX timestamp in RDS was not added to the whitelist, causing
the warning above.
Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com> Tested-by: Jenny <jenny.x.xu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Fri, 23 Aug 2019 12:34:47 +0000 (15:34 +0300)]
net/mlx5: Fix return code in case of hyperv wrong size read
Return code value could be non deterministic in case of wrong size read.
With this patch, if such error occurs, set rc to be -EIO.
In addition, mlx5_hv_config_common() supports reading of
HV_CONFIG_BLOCK_SIZE_MAX bytes only, fix to early return error with
bad input.
Fixes: 913d14e86657 ("net/mlx5: Add wrappers for HyperV PCIe operations") Reported-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Fri, 23 Aug 2019 11:33:03 +0000 (19:33 +0800)]
net: ipv6: fix listify ip6_rcv_finish in case of forwarding
We need a similar fix for ipv6 as Commit 0761680d5215 ("net: ipv4: fix
listify ip_rcv_finish in case of forwarding") does for ipv4.
This issue can be reprocuded by syzbot since Commit 323ebb61e32b ("net:
use listified RX for handling GRO_NORMAL skbs") on net-next. The call
trace was:
Fixes: d8269e2cbf90 ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()") Fixes: 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs") Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com Reported-by: syzbot+4a0643a653ac375612d1@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Aug 2019 21:25:23 +0000 (14:25 -0700)]
Merge branch 'r8152-save-EEE'
Hayes Wang says:
====================
r8152: save EEE
v4:
For patch #2, remove redundant calling of "ocp_reg_write(tp, OCP_EEE_ADV, 0)".
v3:
For patch #2, fix the mistake caused by copying and pasting.
v2:
Adjust patch #1. The EEE has been disabled in the beginning of
r8153_hw_phy_cfg() and r8153b_hw_phy_cfg(), so only check if
it is necessary to enable EEE.
Add the patch #2 for the helper function.
v1:
Saving the settings of EEE to avoid they become the default settings
after reset_resume().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Fri, 23 Aug 2019 07:33:41 +0000 (15:33 +0800)]
r8152: add a helper function about setting EEE
Add a helper function "rtl_eee_enable" for setting EEE. Besides, I
move r8153_eee_en() and r8153b_eee_en(). And, I remove r8152b_enable_eee(),
r8153_set_eee(), and r8153b_set_eee().
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ice: Don't allow VSI to remove unassociated ucast filter
If a VSI is not using a unicast filter or did not configure that
particular unicast filter, driver should not allow it to be removed
by the rogue VSI.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
VSI, especially VF could request to add or remove filter for another VSI,
driver should really guide such request and disallow it.
However, instead of returning error for such malicious request, driver
can simply return success.
In addition, we are not tracking number of MAC filters configured per
VF correctly - and this leads to issue updating VF MAC filters whenever
they were removed and re-configured via bringing VF interface down and
up. Also, since VF could send request to update multiple MAC filters at
once, driver should program those filters individually in the switch, in
order to determine which action resulted to error, and communicate
accordingly to the VF.
So, with this changes, we now track number of filters added right from
when VF resources allocation is done, and could properly add filters for
both trusted and non_trusted VFs, without MAC filters mis-match issue in
the switch...
Also refactor code, so that driver can use new function to add or remove
MAC filters.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Thu, 25 Jul 2019 09:53:50 +0000 (02:53 -0700)]
ice: update ethtool stats on-demand
Users expect ethtool statistics to be updated on-demand when invoking
'ethtool -S <iface>' instead of providing a snapshot of statistics taken
once a second (the frequency of the watchdog task where stats are currently
updated). Update stats every time 'ethtool -S <iface>' is run.
Also, fix an indentation style issue and an unnecessary local variable
initialization in ice_get_ethtool_stats() discovered while investigating
the subject issue.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Chinh T Cao [Mon, 29 Jul 2019 09:04:53 +0000 (02:04 -0700)]
ice: Don't clear auto_fec bit in ice_cfg_phy_fec()
The driver should never clear the auto_fec_enable bit.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Chinh T Cao [Mon, 29 Jul 2019 09:04:52 +0000 (02:04 -0700)]
ice: Fix flag used for module query
When checking the PHY for status, by specification, the driver
should be using "topology" mode when querying the module type.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Mon, 29 Jul 2019 09:04:51 +0000 (02:04 -0700)]
ice: silence some bogus error messages
In some circumstances, VF devices can be deactivated while a message is
in-flight. In that case, a series of scary error message will be
printed in the log. Since these are actually harmless, check for this
case and suppress them. No harm, no foul.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Dave Ertman [Mon, 29 Jul 2019 09:04:50 +0000 (02:04 -0700)]
ice: Rename ethtool private flag for lldp
The current flag name of "enable-fw-lldp" is a bit cumbersome.
Change priv-flag name to "fw-lldp-agent" with a value of on or
off. This is more straight-forward in meaning.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 29 Jul 2019 09:04:49 +0000 (02:04 -0700)]
ice: reject VF attempts to enable head writeback
The virtchnl interface provides a mechanism for a VF driver to request
head writeback support. This feature is deprecated as of AVF 1.0, but
older versions of a VF driver may still attempt to request the mode.
Since the ice hardware does not support head writeback, we should not
accept Tx queue configuration which attempts to enable it.
Currently, the driver simply assumes that the headwb_enabled bit will
never be set.
If a VF driver does request head writeback, the configuration will
return successfully, even though head writeback is not enabled. This
leaves the VF driver in a non functional state since it is assuming to
be operating in head writeback mode.
Fix the PF driver to reject any attempt to setup headwb_enabled.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ice: Copy dcbx configuration only if mode is correct
In rebuild DCB desired_dcbx_cfg was copy to local_dcbx_cfg, but
if DCBX mode is IEEE desired_dcbx_cfg is not initialized by DCBX
config from FW. Change logic to copy config value only if mode is
set to CEE.
If driver copy desired_dcbx_cfg to local_dcbx_cfg in IEEE mode there
is problem with globr. System is frozen after two or more globr.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Dave Ertman [Mon, 29 Jul 2019 09:04:46 +0000 (02:04 -0700)]
ice: Treat DCBx state NOT_STARTED as valid
When a port is not cabled, but DCBx is enabled in the
firmware, the status of DCBx will be NOT_STARTED. This
is a valid state for FW enabled and should not be
treated as a is_fw_lldp true automatically.
Add the code to treat NOT_STARTED as another valid state.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Dave Ertman [Mon, 29 Jul 2019 09:04:44 +0000 (02:04 -0700)]
ice: Account for all states of FW DCBx and LLDP
Currently, only the DCBx status is taken into account to
determine if FW LLDP is possible. But there are NVM version
coming out with DCBx enabled, and FW LLDP disabled. This
is causing errors where the driver sees that DCBx is not
disabled, and then tries to register for LLDP MIB change
events, and fails.
Change the logic to detect both DCBx and LLDP states in the
FW engine.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Dave Ertman [Mon, 29 Jul 2019 09:04:43 +0000 (02:04 -0700)]
ice: Allow egress control packets from PF_VSI
For control packets (i.e. LLDP packets) to be able to egress
from the main VSI, a bit has to be set in the TX_descriptor.
This should only be done for the main VSI and only if the
FW LLDP agent is disabled. A bit to allow this also has to
be set in the VSI context.
Add the logic to add the necessary bits in the VSI context
for the PF_VSI and the TX_descriptors for control packets
egressing the PF_VSI.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ben Wei [Wed, 21 Aug 2019 22:08:49 +0000 (22:08 +0000)]
net/ncsi: update response packet length for GCPS/GNS/GNPTS commands
Update response packet length for the following commands per NC-SI spec
- Get Controller Packet Statistics
- Get NC-SI Statistics
- Get NC-SI Pass-through Statistics command
Signed-off-by: Ben Wei <benwei@fb.com> Reviewed-by: Justin Lee <justin.lee1@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Wed, 21 Aug 2019 19:16:15 +0000 (21:16 +0200)]
can: Delete unnecessary checks before the macro call “dev_kfree_skb”
The dev_kfree_skb() function performs also input parameter validation.
Thus the test around the shown calls is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Sean Nyekjaer <sean@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marco Hartmann [Wed, 21 Aug 2019 11:43:49 +0000 (11:43 +0000)]
net: fec: add C45 MDIO read/write support
IEEE 802.3ae clause 45 defines a modified MDIO protocol that uses a two
staged access model in order to increase the address space.
This patch adds support for C45 MDIO read and write accesses, which are
used whenever the MII_ADDR_C45 flag in the regnum argument is set.
In case it is not set, C22 accesses are used as before.
Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Armstrong [Tue, 20 Aug 2019 07:57:42 +0000 (09:57 +0200)]
dt-bindings: net: meson-dwmac: convert to yaml
Now that we have the DT validation in place, let's convert the device tree
bindings for the Synopsys DWMAC Glue for Amlogic SoCs over to a YAML schemas.
Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The Amlogic Meson DWMAC glue bindings needs a second reg cells for the
glue registers, thus update the reg minItems/maxItems to allow more
than a single reg cell.
Also update the allwinner,sun7i-a20-gmac.yaml derivative schema to specify
maxItems to 1.
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Aug 2019 22:29:39 +0000 (15:29 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2019-08-22
This series contains updates to i40e driver only.
Arnd Bergmann reduces the stack usage which was causing warnings on
32-bit architectures due to large structure sizes for 2 functions
getting inlined, so use noinline_for_stack to prevent the compilers from
combining the 2 functions.
Mauro S. M. Rodrigues fixes an issue when reading an EEPROM from SFP
modules that comply with SFF-8472 but do not implement the Digital
Diagnostic Monitoring (DDM) interface for i40e.
Huhai found we were not checking the return value for configuring the
transmit ring and continuing with XDP configuration of the transmit
ring.
Beilei fixes an issue of shifting signed 32-bit integers.
Sylwia adds support for "packet drop mode" to the MAC configuration for
admin queue command. This bit controls the behavior when a no-drop
packet is blocking a TC queue. Adds support for persistent LLDP by
checking the LLDP flag and reading the LLDP from the NVM when enabled.
Adrian fixes the "recovery mode" check to take into account which device
we are on, since x710 devices have 4 register values to check for status
and x722 devices only have 2 register values to check.
Piotr Azarewicz bumps the supported firmware API version to 1.9 which
extends the PHY access admin queue command support.
Jake makes sure the traffic class stats for a VEB are reset when the VEB
stats are reset.
Slawomir fixes a NULL pointer dereference where the VSI pointer was not
updated before passing it to the i40e_set_vf_mac() when the VF is in a
reset state, so wait for the reset to complete.
Grzegorz removes the i40e_update_dcb_config() which was not using the
correct NVM reads, so call i40e_init_dcb() in its place to correctly
update the DCB configuration.
Piotr Kwapulinski expands the scope of i40e_set_mac_type() since this is
needed during probe to determine if we are in recovery mode. Fixed the
driver reset path when in recovery mode.
Marcin fixed an issue where we were breaking out of a loop too early
when trying to get the PHY capabilities.
v2: Combined patch 7 & 9 in the original series, since both patches
bumped firmware API version. Also combined patches 12 & 13 in the
original series, since one increased the scope of checking for MAC
and the follow-on patch made use of function within the new scope.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Formela [Tue, 23 Jul 2019 10:01:44 +0000 (06:01 -0400)]
i40e: fix retrying in i40e_aq_get_phy_capabilities
Fixed a bug where driver was breaking out of the loop and
reporting an error without retrying first.
Signed-off-by: Marcin Formela <marcin.formela@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Driver waits after issuing a reset. When a reset takes too long a driver
gives up. Implemented by invoking PF reset in a loop. After defined
number of unsuccessful PF reset trials it returns error.
Without this patch PF reset fails when NIC is in recovery mode.
So make i40e_set_mac_type() public. i40e driver requires i40e_set_mac_type()
to be public. It is required for recovery mode handling. Without this patch
recovery mode could not be detected in i40e_probe().
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Grzegorz Siwik [Tue, 23 Jul 2019 10:01:40 +0000 (06:01 -0400)]
i40e: Remove function i40e_update_dcb_config()
This patch removes function i40e_update_dcb_config(). Instead of
i40e_update_dcb_config() we use i40e_init_dcb(), which implements the
correct NVM read.
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Slawomir Laba [Tue, 23 Jul 2019 10:01:39 +0000 (06:01 -0400)]
i40e: Fix crash caused by stress setting of VF MAC addresses
Add update to the VSI pointer passed to the i40e_set_vf_mac function.
If VF is in reset state the driver waits in i40e_set_vf_mac function
for the reset to be complete, yet after reset the vsi pointer
that was passed into this function is no longer valid.
The patch updates local VSI pointer directly from pf->vsi array,
by using the id stored in VF pointer (lan_vsi_idx).
Without this commit the driver might occasionally invoke general
protection fault in kernel and disable the OS entirely.
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 23 Jul 2019 10:01:37 +0000 (06:01 -0400)]
i40e: reset veb.tc_stats when resetting veb.stats
The stats structure for the VEB switch statistics is reset periodically,
but the tc_stats are not reset at the same time.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Piotr Azarewicz [Tue, 23 Jul 2019 10:01:36 +0000 (06:01 -0400)]
i40e: Update FW API version to 1.9
Upcoming FW increment API version to 1.9 due to Extend PHY access AQ
command support. SW is ready for that support as well.
Signed-off-by: Piotr Azarewicz <piotr.azarewicz@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Adrian Podlawski [Tue, 23 Jul 2019 10:01:35 +0000 (06:01 -0400)]
i40e: check_recovery_mode had wrong if statement
Function check_recovery_mode had wrong if statement.
Now we check proper FWS1B register values, which are responsible for
the recovery mode. Recovery mode has 4 values for x710 and 2 for x722.
That's why we need 6 different flags which are defined in the code.
Now in the if statement, we recognize type of mac address
and register value.
Without those changes driver could show wrong state.
Signed-off-by: Adrian Podlawski <adrian.podlawski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds "drop mode" parameter to set mac config AQ command.
This bit controls the behavior when a no-drop packet is blocking a TC
queue.
0 – The PF driver is notified.
1 – The blocking packet is dropped and then the PF driver is notified.
Signed-off-by: Sylwia Wnuczko <sylwia.wnuczko@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e: add check on i40e_configure_tx_ring() return value
When i40e_configure_tx_ring(vsi->tx_rings[i]) returns an error, we should
exit from i40e_vsi_configure_tx and return the error, instead of continuing
to check whether xdp is enable, and configure the xdp transmit ring.
Signed-off-by: huhai <huhai@kylinos.cn> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e: Check if transceiver implements DDM before access
Similar to the ixgbe issue fixed in: 655c91414579 ("ixgbe: Check DDM existence in transceiver before access)
i40e has the same issue when reading eeprom from SFP's module that comply
with SFF-8472 but not implement the Digital Diagnostic Monitoring (DDM)
interface described in it. The existence of such area is specified by bit
6 of byte 92, set to 1 if implemented.
Without this patch, due to not checking this bit i40e fails to read SFP
module's eeprom with the follow message:
ethtool -m enP51p1s0f0
Cannot get Module EEPROM data: Input/output error
Because it fails to read the additional 256 bytes in which it was assumed
to exist the DDM data.
Signed-off-by: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The functions i40e_aq_get_phy_abilities_resp() and i40e_set_fc() both
have giant structure on the stack, which makes each one use stack frames
larger than 500 bytes.
As clang decides one function into the other, we get a warning for
exceeding the frame size limit on 32-bit architectures:
drivers/net/ethernet/intel/i40e/i40e_common.c:1654:23: error: stack frame size of 1116 bytes in function 'i40e_set_fc' [-Werror,-Wframe-larger-than=]
When building with gcc, the inlining does not happen, but i40e_set_fc()
calls i40e_aq_get_phy_abilities_resp() anyway, so they add up on the
kernel stack just as much.
The parts that actually use large stacks don't overlap, so make sure
each one is a separate function, and mark them as noinline_for_stack to
prevent the compilers from combining them again.
Fixes: 0a862b43acc6 ("i40e/i40evf: Add module_types and update_link_info") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Colin Ian King [Thu, 22 Aug 2019 12:53:40 +0000 (13:53 +0100)]
nexthops: remove redundant assignment to variable err
Variable err is initialized to a value that is never read and it is
re-assigned later. The initialization is redundant and can be removed.
Addresses-Coverity: ("Unused Value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Thu, 22 Aug 2019 05:06:00 +0000 (05:06 +0000)]
net/mlx5e: Add mlx5e HV VHCA stats agent
HV VHCA stats agent is responsible on running a preiodic rx/tx
packets/bytes stats update. Currently the supported format is version
MLX5_HV_VHCA_STATS_VERSION. Block ID 1 is dedicated for statistics data
transfer from the VF to the PF.
The reporter fetch the statistics data from all opened channels, fill it
in a buffer and send it to mlx5_hv_vhca_write_agent.
As the stats layer should include some metadata per block (sequence and
offset), the HV VHCA layer shall modify the buffer before actually send it
over block 1.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Thu, 22 Aug 2019 05:05:56 +0000 (05:05 +0000)]
net/mlx5: Add HV VHCA control agent
Control agent is responsible over of the control block (ID 0). It should
update the PF via this block about every capability change. In addition,
upon block 0 invalidate, it should activate all other supported agents
with data requests from the PF.
Upon agent create/destroy, the invalidate callback of the control agent
is being called in order to update the PF driver about this change.
The control agent is an integral part of HV VHCA and will be created
and destroy as part of the HV VHCA init/cleanup flow.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Thu, 22 Aug 2019 05:05:51 +0000 (05:05 +0000)]
net/mlx5: Add HV VHCA infrastructure
HV VHCA is a layer which provides PF to VF communication channel based on
HyperV PCI config channel. It implements Mellanox's Inter VHCA control
communication protocol. The protocol contains control block in order to
pass messages between the PF and VF drivers, and data blocks in order to
pass actual data.
The infrastructure is agent based. Each agent will be responsible of
contiguous buffer blocks in the VHCA config space. This infrastructure will
bind agents to their blocks, and those agents can only access read/write
the buffer blocks assigned to them. Each agent will provide three
callbacks (control, invalidate, cleanup). Control will be invoked when
block-0 is invalidated with a command that concerns this agent. Invalidate
callback will be invoked if one of the blocks assigned to this agent was
invalidated. Cleanup will be invoked before the agent is being freed in
order to clean all of its open resources or deferred works.
Block-0 serves as the control block. All execution commands from the PF
will be written by the PF over this block. VF will ack on those by
writing on block-0 as well. Its format is described by struct
mlx5_hv_vhca_control_block layout.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Thu, 22 Aug 2019 05:05:47 +0000 (05:05 +0000)]
net/mlx5: Add wrappers for HyperV PCIe operations
Add wrapper functions for HyperV PCIe read / write /
block_invalidate_register operations. This will be used as an
infrastructure in the downstream patch for software communication.
This will be enabled by default if CONFIG_PCI_HYPERV_INTERFACE is set.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Thu, 22 Aug 2019 05:05:41 +0000 (05:05 +0000)]
PCI: hv: Add a Hyper-V PCI interface driver for software backchannel interface
This interface driver is a helper driver allows other drivers to
have a common interface with the Hyper-V PCI frontend driver.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dexuan Cui [Thu, 22 Aug 2019 05:05:37 +0000 (05:05 +0000)]
PCI: hv: Add a paravirtual backchannel in software
Windows SR-IOV provides a backchannel mechanism in software for communication
between a VF driver and a PF driver. These "configuration blocks" are
similar in concept to PCI configuration space, but instead of doing reads and
writes in 32-bit chunks through a very slow path, packets of up to 128 bytes
can be sent or received asynchronously.
Nearly every SR-IOV device contains just such a communications channel in
hardware, so using this one in software is usually optional. Using the
software channel, however, allows driver implementers to leverage software
tools that fuzz the communications channel looking for vulnerabilities.
The usage model for these packets puts the responsibility for reading or
writing on the VF driver. The VF driver sends a read or a write packet,
indicating which "block" is being referred to by number.
If the PF driver wishes to initiate communication, it can "invalidate" one or
more of the first 64 blocks. This invalidation is delivered via a callback
supplied by the VF driver by this driver.
No protocol is implied, except that supplied by the PF and VF drivers.
Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Aug 2019 03:23:29 +0000 (20:23 -0700)]
Merge tag 'mlx5-updates-2019-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 tc flow handling for concurrent execution (Part 3)
This series includes updates to mlx5 ethernet and core driver:
Vlad submits part 3 of 3 part series to allow TC flow handling
for concurrent execution.
Vlad says:
==========
Structure mlx5e_neigh_hash_entry code that uses it are refactored in
following ways:
- Extend neigh_hash_entry with rcu and modify its users to always take
reference to the structure when using it (neigh_hash_entry has already
had atomic reference counter which was only used when scheduling neigh
update on workqueue from atomic context of neigh update netevent).
- Always use mlx5e_neigh_update_table->encap_lock when modifying neigh
update hash table and list. Originally, this lock was only used to
synchronize with netevent handler function, which is called from bh
context and cannot use rtnl lock for synchronization. Use rcu read lock
instead of encap_lock to lookup nhe in atomic context of netevent even
handler function. Convert encap_lock to mutex to allow creating new
neigh hash entries while holding it, which is safe to do because the
lock is no longer used in atomic context.
- Rcu-ify mlx5e_neigh_hash_entry->encap_list by changing operations on
encap list to their rcu counterparts and extending encap structure
with rcu_head to free the encap instances after rcu grace period. This
allows fast traversal of list of encaps attached to nhe under rcu read
lock protection.
- Take encap_table_lock when accessing encap entries in neigh update and
neigh stats update code to protect from concurrent encap entry
insertion or removal.
This approach leads to potential race condition when neigh update and
neigh stats update code can access encap and flow entries that are not
fully initialized or are being destroyed, or neigh can change state
without updating encaps that are created concurrently. Prevent these
issues by following changes in flow and encap initialization:
- Extend mlx5e_tc_flow with 'init_done' completion. Modify neigh update
to wait for both encap and flow completions to prevent concurrent
access to a structure that is being initialized by tc.
- Skip structures that failed during initialization: encaps with
encap_id<0 and flows that don't have OFFLOADED flag set.
- To ensure that no new flows are added to encap when it is being
accessed by neigh update or neigh stats update, take encap_table_lock
mutex.
- To prevent concurrent deletion by tc, ensure that neigh update and
neigh stats update hold references to encap and flow instances while
using them.
With changes presented in this patch set it is now safe to execute tc
concurrently with neigh update and neigh stats update. However, these
two workqueue tasks modify same flow "tmp_list" field to store flows
with reference taken in temporary list to release the references after
update operation finishes and should not be executed concurrently with
each other.
Last 3 patches of this series provide 3 new mlx5 trace points to track
mlx5 tc requests and mlx5 neigh updates.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>