]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
7 years agomlxsw: spectrum_acl: Add support for C-TCAM eRPs
Ido Schimmel [Wed, 25 Jul 2018 06:23:57 +0000 (09:23 +0300)] 
mlxsw: spectrum_acl: Add support for C-TCAM eRPs

The number of eRPs that can be used by a single A-TCAM region is limited
to 16. When more eRPs are needed, an ordinary circuit TCAM (C-TCAM) can
be used to hold the extra eRPs.

Unlike the A-TCAM, only a single (last) lookup is performed in the
C-TCAM and not a lookup per-eRP. However, modeling the C-TCAM as extra
eRPs will allow us to easily introduce support for pruning in a
follow-up patch set and is also logically correct.

The following diagram depicts the relation between both TCAMs:
                                                                 C-TCAM
+-------------------+               +--------------------+    +-----------+
|                   |               |                    |    |           |
|  eRP #1 (A-TCAM)  +----> ... +----+  eRP #16 (A-TCAM)  +----+  eRP #17  |
|                   |               |                    |    |    ...    |
+-------------------+               +--------------------+    |  eRP #N   |
                                                              |           |
                                                              +-----------+
Lookup order is from left to right.

Extend the eRP core APIs with a C-TCAM parameter which indicates whether
the requested eRP is to be used with the C-TCAM or not.

Since the C-TCAM is only meant to absorb rules that can't fit in the
A-TCAM due to exceeded number of eRPs or key collision, an error is
returned when a C-TCAM eRP needs to be created when the eRP state
machine is in its initial state (i.e., 'no masks'). This should only
happen in the face of very unlikely errors when trying to push rules
into the A-TCAM.

In order not to perform unnecessary lookups, the eRP core will only
enable a C-TCAM lookup for a given region if it knows there are C-TCAM
eRPs present.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_acl: Enable C-TCAM only mode in eRP core
Ido Schimmel [Wed, 25 Jul 2018 06:23:56 +0000 (09:23 +0300)] 
mlxsw: spectrum_acl: Enable C-TCAM only mode in eRP core

Currently, no calls are performed into the eRP core, but in order to
make review easier we would like to gradually add these calls.

Have the eRP core initialize a region's master mask to all ones and
allow it to use an empty eRP table. This directs the lookup to the
C-TCAM and allows the C-TCAM only mode to continue working.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_acl: Implement common eRP core
Ido Schimmel [Wed, 25 Jul 2018 06:23:55 +0000 (09:23 +0300)] 
mlxsw: spectrum_acl: Implement common eRP core

When rules are inserted into the A-TCAM they are associated with a mask,
which is part of the lookup key: { masked key, mask ID, region ID }.

These masks are called rule patterns (RP) and the aggregation of several
masks into one (to be introduced in follow-up patch sets) is called an
extended RP (eRP).

When a packet undergoes a lookup in an ACL region it is masked by the
current set of eRPs used by the region, looking for an exact match.
Eventually, the rule with the highest priority is picked.

These eRPs are stored in several global banks to allow for lookup to
occur using several eRPs simultaneously.

At first, an ACL region will only require a single mask - upon the
insertion of the first rule. In this case, the region can use the
"master RP" which is composed by OR-ing all the masks used by the
region. This mask is a property of the region and thus there is no need
to use the above mentioned banks.

At some point, a second mask will be needed. In this case, the region
will need to allocate an eRP table from the above mentioned banks and
insert its masks there.

>From now on, upon lookup, the eRP table used by the region will be
fetched from the eRP banks - using {eRP bank, Index within the bank} -
and the eRPs present in the table will be used to mask the packet. Note
that masks with consecutive indexes are inserted into consecutive banks.

When rules are deleted and a region only needs a single mask once again
it can free its eRP table and use the master RP.

The above logic is implemented in the eRP core and represented using the
following state machine:

    +------------+   create mask - as master RP   +---------------+
    |            +-------------------------------->               |
    |  no masks  |                                |  single mask  |
    |            <--------------------------------+               |
    +------------+          delete mask           +-----+--^------+
                                                        |  |
                                                        |  |
                                  create mask -         |  |  delete mask -
    create mask                   transition to use eRP |  |  transition to
     +--------+                   table                 |  |  use master RP
     |        |                                         |  |
     |        |                                         |  |
+----v--------+----+         create mask           +----v--+-----+
|                  <-------------------------------+             |
|  multiple masks  |                               |  two masks  |
|                  +------------------------------->             |
+------------------+      delete mask - if two     +-------------+
                          remaining

The code that actually configures rules in the A-TCAM will interface
with the eRP core by getting or putting an eRP based on the required
mask used by the rule.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: resources: Add Spectrum-2 eRP resources
Ido Schimmel [Wed, 25 Jul 2018 06:23:54 +0000 (09:23 +0300)] 
mlxsw: resources: Add Spectrum-2 eRP resources

Add the following resources to be used by A-TCAM code:
* Maximum number of eRP banks
* Maximum size of eRP bank
* Number of eRP entries required for a 2/4/8/12 key blocks mask

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: resources: Add Spectrum-2 maximum large key ID resource
Ido Schimmel [Wed, 25 Jul 2018 06:23:53 +0000 (09:23 +0300)] 
mlxsw: resources: Add Spectrum-2 maximum large key ID resource

Add a resource to make sure we do not exceed the maximum number of
supported large key IDs in a region.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: reg: Add Policy-Engine eRP Table Register
Ido Schimmel [Wed, 25 Jul 2018 06:23:52 +0000 (09:23 +0300)] 
mlxsw: reg: Add Policy-Engine eRP Table Register

The register is used to add and delete eRPs from the eRP table.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: reg: Add Policy-Engine TCAM Entry Register Version 3
Ido Schimmel [Wed, 25 Jul 2018 06:23:51 +0000 (09:23 +0300)] 
mlxsw: reg: Add Policy-Engine TCAM Entry Register Version 3

The register is used to configure rules in the A-TCAM.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: reg: Prepare PERERP register for A-TCAM usage
Ido Schimmel [Wed, 25 Jul 2018 06:23:50 +0000 (09:23 +0300)] 
mlxsw: reg: Prepare PERERP register for A-TCAM usage

Before introducing A-TCAM support we need to make sure all the necessary
fields are configurable and not hard coded to values that worked for the
C-TCAM only use case.

This includes - for example - the ability to configure the eRP table
used by the TCAM region.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agolan743x: Make symbol lan743x_pm_ops static
Wei Yongjun [Wed, 25 Jul 2018 06:11:16 +0000 (06:11 +0000)] 
lan743x: Make symbol lan743x_pm_ops static

Fixes the following sparse warning:

drivers/net/ethernet/microchip/lan743x_main.c:2944:25: warning:
 symbol 'lan743x_pm_ops' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: make function tcp_retransmit_stamp() static
Wei Yongjun [Wed, 25 Jul 2018 06:06:07 +0000 (06:06 +0000)] 
tcp: make function tcp_retransmit_stamp() static

Fixes the following sparse warnings:

net/ipv4/tcp_timer.c:25:5: warning:
 symbol 'tcp_retransmit_stamp' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/sched: cls_flower: Use correct inline function for assignment of vlan tpid
Jianbo Liu [Wed, 25 Jul 2018 02:31:25 +0000 (02:31 +0000)] 
net/sched: cls_flower: Use correct inline function for assignment of vlan tpid

This fixes the following sparse warning:

net/sched/cls_flower.c:1356:36: warning: incorrect type in argument 3 (different base types)
net/sched/cls_flower.c:1356:36: expected unsigned short [unsigned] [usertype] value
net/sched/cls_flower.c:1356:36: got restricted __be16 [usertype] vlan_tpid

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_core: Allow MTTs starting at any index
Tariq Toukan [Tue, 24 Jul 2018 11:31:45 +0000 (14:31 +0300)] 
net/mlx4_core: Allow MTTs starting at any index

Allow obtaining MTTs starting at any index,
thus give a better cache utilization.

For this, allow setting log_mtts_per_seg to 0, and use
this in default.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Anaty Rahamim Bar Kat <anaty@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlx5-Offload-setting-matching-on-tunnel-tos-ttl'
David S. Miller [Wed, 25 Jul 2018 23:28:58 +0000 (16:28 -0700)] 
Merge branch 'mlx5-Offload-setting-matching-on-tunnel-tos-ttl'

Or Gerlitz says:

====================
net/mlx5: Offload setting/matching on tunnel tos/ttl

This series enables mlx5 offloading of tc eswitch rules that set
tos/ttl (encap) or match on them (decap) for tunnels.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: Offload TC matching on tos/ttl for ip tunnels
Or Gerlitz [Tue, 24 Jul 2018 10:59:35 +0000 (13:59 +0300)] 
net/mlx5e: Offload TC matching on tos/ttl for ip tunnels

Enable offloading of TC matching on tos/ttl for ipv4/6 tunnels.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: Support setup of tos and ttl for tunnel key TC action offload
Or Gerlitz [Tue, 24 Jul 2018 10:59:34 +0000 (13:59 +0300)] 
net/mlx5e: Support setup of tos and ttl for tunnel key TC action offload

Use the values provided by user-space for the encapsulation headers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: Use ttl from route lookup on tc encap offload only if needed
Or Gerlitz [Tue, 24 Jul 2018 10:59:33 +0000 (13:59 +0300)] 
net/mlx5e: Use ttl from route lookup on tc encap offload only if needed

Currnetly, the ttl for the encapsulation headers is taken from the
route lookup result. As a pre-step to allow for an offload case when
the user specifies the ttl, take it from the route lookup only if
not zero. While here, also move to use u8 instead int for the ttl.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovxge: Remove unnecessary include of <linux/pci_hotplug.h>
Bjorn Helgaas [Mon, 23 Jul 2018 20:59:46 +0000 (15:59 -0500)] 
vxge: Remove unnecessary include of <linux/pci_hotplug.h>

The vxge driver doesn't need anything provided by pci_hotplug.h, so remove
the unnecessary include of it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: add helper phy_polling_mode
Heiner Kallweit [Mon, 23 Jul 2018 19:40:07 +0000 (21:40 +0200)] 
net: phy: add helper phy_polling_mode

Add a helper for checking whether polling is used to detect PHY status
changes.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: fs-enet: Use generic CRC32 implementation
Krzysztof Kozlowski [Mon, 23 Jul 2018 16:20:20 +0000 (18:20 +0200)] 
net: ethernet: fs-enet: Use generic CRC32 implementation

Use generic kernel CRC32 implementation because it:
1. Should be faster (uses lookup tables),
2. Removes duplicated CRC generation code,
3. Uses well-proven algorithm instead of coding it one more time.

Suggested-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: freescale: Use generic CRC32 implementation
Krzysztof Kozlowski [Mon, 23 Jul 2018 16:19:14 +0000 (18:19 +0200)] 
net: ethernet: freescale: Use generic CRC32 implementation

Use generic kernel CRC32 implementation because it:
1. Should be faster (uses lookup tables),
2. Removes duplicated CRC generation code,
3. Uses well-proven algorithm instead of coding it one more time.

Suggested-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: prevent PHYs w/o Clause 22 regs from calling genphy_config_aneg
Camelia Groza [Mon, 23 Jul 2018 15:06:15 +0000 (18:06 +0300)] 
net: phy: prevent PHYs w/o Clause 22 regs from calling genphy_config_aneg

genphy_config_aneg() should be called only by PHYs that implement
the Clause 22 register set. Prevent Clause 45 PHYs that don't implement
the register set from calling the genphy function.

Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'virtio_net-Add-ethtool-stat-items'
David S. Miller [Wed, 25 Jul 2018 19:53:38 +0000 (12:53 -0700)] 
Merge branch 'virtio_net-Add-ethtool-stat-items'

Toshiaki Makita says:

====================
virtio_net: Add ethtool stat items

Add some ethtool stat items useful for performance analysis.
====================

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovirtio_net: Add kick stats
Toshiaki Makita [Mon, 23 Jul 2018 14:36:09 +0000 (23:36 +0900)] 
virtio_net: Add kick stats

So we can infer the number of VM-Exits.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovirtio_net: Add XDP related stats
Toshiaki Makita [Mon, 23 Jul 2018 14:36:08 +0000 (23:36 +0900)] 
virtio_net: Add XDP related stats

Add counters below:
* Tx
 - xdp_tx: frames sent by ndo_xdp_xmit or XDP_TX.
 - xdp_tx_drops: dropped frames out of xdp_tx ones.
* Rx
 - xdp_packets: frames went through xdp program.
 - xdp_tx: XDP_TX frames.
 - xdp_redirects: XDP_REDIRECT frames.
 - xdp_drops: any dropped frames out of xdp_packets ones.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovirtio_net: Factor out the logic to determine xdp sq
Toshiaki Makita [Mon, 23 Jul 2018 14:36:07 +0000 (23:36 +0900)] 
virtio_net: Factor out the logic to determine xdp sq

Make sure to use the same logic in all places to determine xdp sq. This
is useful for xdp counters which the following commit will introduce as
well.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovirtio_net: Make drop counter per-queue
Toshiaki Makita [Mon, 23 Jul 2018 14:36:06 +0000 (23:36 +0900)] 
virtio_net: Make drop counter per-queue

Since when XDP was introduced, drop counter has been able to be updated
much more frequently than before, as XDP_DROP increments the counter.
Thus for performance analysis per-queue drop counter would be useful.

Also this avoids cache contention and race on updating the counter. It
is currently racy because napi handlers read-modify-write it without any
locks.

There are more counters in dev->stats that are racy, but I left them
per-device, because they are rarely updated and does not worth being
per-queue counters IMHO. To fix them we need atomic ops or some kind of
locks.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovirtio_net: Use temporary storage for accounting rx stats
Toshiaki Makita [Mon, 23 Jul 2018 14:36:05 +0000 (23:36 +0900)] 
virtio_net: Use temporary storage for accounting rx stats

The purpose is to keep receive_buf arguments simple when more per-queue
counter items are added later.
Also XDP_TX related sq counters will be updated in the following changes
so create a container struct virtnet_rx_stats which will includes both
rq and sq statistics. For now it only covers rq stats.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovirtio_net: Fix incosistent received bytes counter
Toshiaki Makita [Mon, 23 Jul 2018 14:36:04 +0000 (23:36 +0900)] 
virtio_net: Fix incosistent received bytes counter

When received packets are dropped in virtio_net driver, received packets
counter is incremented but bytes counter is not.
As a result, for instance if we drop all packets by XDP, only received
is counted and bytes stays 0, which looks inconsistent.
IMHO received packets/bytes should be counted if packets are produced by
the hypervisor, like what common NICs on physical machines are doing.
So fix the bytes counter.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
David S. Miller [Wed, 25 Jul 2018 02:21:58 +0000 (19:21 -0700)] 
Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net

7 years agoMerge tag 'mips_fixes_4.18_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Wed, 25 Jul 2018 01:11:15 +0000 (18:11 -0700)] 
Merge tag 'mips_fixes_4.18_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Paul Burton:
 "A couple more MIPS fixes for 4.18:

   - Fix an off-by-one in reporting PCI resource sizes to userland which
     regressed in v3.12.

   - Fix writes to DDR controller registers used to flush write buffers,
     which regressed with some refactoring in v4.2"

* tag 'mips_fixes_4.18_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: ath79: fix register address in ath79_ddr_wb_flush()
  MIPS: Fix off-by-one in pci_resource_to_user()

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Wed, 25 Jul 2018 00:31:47 +0000 (17:31 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Handle stations tied to AP_VLANs properly during mac80211 hw
    reconfig. From Manikanta Pubbisetty.

 2) Fix jump stack depth validation in nf_tables, from Taehee Yoo.

 3) Fix quota handling in aRFS flow expiration of mlx5 driver, from Eran
    Ben Elisha.

 4) Exit path handling fix in powerpc64 BPF JIT, from Daniel Borkmann.

 5) Use ptr_ring_consume_bh() in page pool code, from Tariq Toukan.

 6) Fix cached netdev name leak in nf_tables, from Florian Westphal.

 7) Fix memory leaks on chain rename, also from Florian Westphal.

 8) Several fixes to DCTCP congestion control ACK handling, from Yuchunk
    Cheng.

 9) Missing rcu_read_unlock() in CAIF protocol code, from Yue Haibing.

10) Fix link local address handling with VRF, from David Ahern.

11) Don't clobber 'err' on a successful call to __skb_linearize() in
    skb_segment(). From Eric Dumazet.

12) Fix vxlan fdb notification races, from Roopa Prabhu.

13) Hash UDP fragments consistently, from Paolo Abeni.

14) If TCP receives lots of out of order tiny packets, we do really
    silly stuff. Make the out-of-order queue ending more robust to this
    kind of behavior, from Eric Dumazet.

15) Don't leak netlink dump state in nf_tables, from Florian Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
  net: axienet: Fix double deregister of mdio
  qmi_wwan: fix interface number for DW5821e production firmware
  ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
  bnx2x: Fix invalid memory access in rss hash config path.
  net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
  r8169: restore previous behavior to accept BIOS WoL settings
  cfg80211: never ignore user regulatory hint
  sock: fix sg page frag coalescing in sk_alloc_sg
  netfilter: nf_tables: move dumper state allocation into ->start
  tcp: add tcp_ooo_try_coalesce() helper
  tcp: call tcp_drop() from tcp_data_queue_ofo()
  tcp: detect malicious patterns in tcp_collapse_ofo_queue()
  tcp: avoid collapses in tcp_prune_queue() if possible
  tcp: free batches of packets in tcp_prune_ofo_queue()
  ip: hash fragments consistently
  ipv6: use fib6_info_hold_safe() when necessary
  can: xilinx_can: fix power management handling
  can: xilinx_can: fix incorrect clear of non-processed interrupts
  can: xilinx_can: fix RX overflow interrupt not being enabled
  can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
  ...

7 years agonet: axienet: Fix double deregister of mdio
Shubhrajyoti Datta [Tue, 24 Jul 2018 04:39:53 +0000 (10:09 +0530)] 
net: axienet: Fix double deregister of mdio

If the registration fails then mdio_unregister is called.
However at unbind the unregister ia attempted again resulting
in the below crash

[   73.544038] kernel BUG at drivers/net/phy/mdio_bus.c:415!
[   73.549362] Internal error: Oops - BUG: 0 [#1] SMP
[   73.554127] Modules linked in:
[   73.557168] CPU: 0 PID: 2249 Comm: sh Not tainted 4.14.0 #183
[   73.562895] Hardware name: xlnx,zynqmp (DT)
[   73.567062] task: ffffffc879e41180 task.stack: ffffff800cbe0000
[   73.572973] PC is at mdiobus_unregister+0x84/0x88
[   73.577656] LR is at axienet_mdio_teardown+0x18/0x30
[   73.582601] pc : [<ffffff80085fa4cc>] lr : [<ffffff8008616858>]
pstate: 20000145
[   73.589981] sp : ffffff800cbe3c30
[   73.593277] x29: ffffff800cbe3c30 x28: ffffffc879e41180
[   73.598573] x27: ffffff8008a21000 x26: 0000000000000040
[   73.603868] x25: 0000000000000124 x24: ffffffc879efe920
[   73.609164] x23: 0000000000000060 x22: ffffffc879e02000
[   73.614459] x21: ffffffc879e02800 x20: ffffffc87b0b8870
[   73.619754] x19: ffffffc879e02800 x18: 000000000000025d
[   73.625050] x17: 0000007f9a719ad0 x16: ffffff8008195bd8
[   73.630345] x15: 0000007f9a6b3d00 x14: 0000000000000010
[   73.635640] x13: 74656e7265687465 x12: 0000000000000030
[   73.640935] x11: 0000000000000030 x10: 0101010101010101
[   73.646231] x9 : 241f394f42533300 x8 : ffffffc8799f6e98
[   73.651526] x7 : ffffffc8799f6f18 x6 : ffffffc87b0ba318
[   73.656822] x5 : ffffffc87b0ba498 x4 : 0000000000000000
[   73.662117] x3 : 0000000000000000 x2 : 0000000000000008
[   73.667412] x1 : 0000000000000004 x0 : ffffffc8799f4000
[   73.672708] Process sh (pid: 2249, stack limit = 0xffffff800cbe0000)

Fix the same by making the bus NULL on unregister.

Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqmi_wwan: fix interface number for DW5821e production firmware
Aleksander Morgado [Mon, 23 Jul 2018 23:31:07 +0000 (01:31 +0200)] 
qmi_wwan: fix interface number for DW5821e production firmware

The original mapping for the DW5821e was done using a development
version of the firmware. Confirmed with the vendor that the final
USB layout ends up exposing the QMI control/data ports in USB
config #1, interface #0, not in interface #1 (which is now a HID
interface).

T:  Bus=01 Lev=03 Prnt=04 Port=00 Cnt=01 Dev#= 16 Spd=480 MxCh= 0
D:  Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  2
P:  Vendor=413c ProdID=81d7 Rev=03.18
S:  Manufacturer=DELL
S:  Product=DW5821e Snapdragon X20 LTE
S:  SerialNumber=0123456789ABCDEF
C:  #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
I:  If#= 1 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option

Fixes: e7e197edd09c25 ("qmi_wwan: add support for the Dell Wireless 5821e module")
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
Willem de Bruijn [Mon, 23 Jul 2018 23:36:48 +0000 (19:36 -0400)] 
ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull

Syzbot reported a read beyond the end of the skb head when returning
IPV6_ORIGDSTADDR:

  BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
  CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
  Google 01/01/2011
  Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
    kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
    kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
    copy_to_user include/linux/uaccess.h:184 [inline]
    put_cmsg+0x5ef/0x860 net/core/scm.c:242
    ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
    ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
    rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
    [..]

This logic and its ipv4 counterpart read the destination port from
the packet at skb_transport_offset(skb) + 4.

With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
packet that stores headers exactly up to skb_transport_offset(skb) in
the head and the remainder in a frag.

Call pskb_may_pull before accessing the pointer to ensure that it lies
in skb head.

Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnx2x: Fix invalid memory access in rss hash config path.
Sudarsana Reddy Kalluru [Tue, 24 Jul 2018 09:43:52 +0000 (02:43 -0700)] 
bnx2x: Fix invalid memory access in rss hash config path.

Rx hash/filter table configuration uses rss_conf_obj to configure filters
in the hardware. This object is initialized only when the interface is
brought up.
This patch adds driver changes to configure rss params only when the device
is in opened state. In port disabled case, the config will be cached in the
driver structure which will be applied in the successive load path.

Please consider applying it to 'net' branch.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
Jack Morgenstein [Tue, 24 Jul 2018 11:27:55 +0000 (14:27 +0300)] 
net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper

Function mlx4_RST2INIT_QP_wrapper saved the qp number passed in the qp
context, rather than the one passed in the input modifier.

However, the qp number in the qp context is not defined as a
required parameter by the FW. Therefore, drivers may choose to not
specify the qp number in the qp context for the reset-to-init transition.

Thus, we must save the qp number passed in the command input modifier --
which is always present. (This saved qp number is used as the input
modifier for command 2RST_QP when a slave's qp's are destroyed).

Fixes: c82e9aa0a8bc ("mlx4_core: resource tracking for HCA resources used by guests")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/sched: add skbprio scheduler
Nishanth Devarajan [Mon, 23 Jul 2018 14:07:41 +0000 (19:37 +0530)] 
net/sched: add skbprio scheduler

Skbprio (SKB Priority Queue) is a queueing discipline that prioritizes packets
according to their skb->priority field. Under congestion, already-enqueued lower
priority packets will be dropped to make space available for higher priority
packets. Skbprio was conceived as a solution for denial-of-service defenses that
need to route packets with different priorities as a means to overcome DoS
attacks.

v5
*Do not reference qdisc_dev(sch)->tx_queue_len for setting limit. Instead set
default sch->limit to 64.

v4
*Drop Documentation/networking/sch_skbprio.txt doc file to move it to tc man
page for Skbprio, in iproute2.

v3
*Drop max_limit parameter in struct skbprio_sched_data and instead use
sch->limit.

*Reference qdisc_dev(sch)->tx_queue_len only once, during initialisation for
qdisc (previously being referenced every time qdisc changes).

*Move qdisc's detailed description from in-code to Documentation/networking.

*When qdisc is saturated, enqueue incoming packet first before dequeueing
lowest priority packet in queue - improves usage of call stack registers.

*Introduce and use overlimit stat to keep track of number of dropped packets.

v2
*Use skb->priority field rather than DS field. Rename queueing discipline as
SKB Priority Queue (previously Gatekeeper Priority Queue).

*Queueing discipline is made classful to expose Skbprio's internal priority
queues.

Signed-off-by: Nishanth Devarajan <ndev2021@gmail.com>
Reviewed-by: Sachin Paryani <sachin.paryani@gmail.com>
Reviewed-by: Cody Doucette <doucette@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: add GBit master / slave error detection
Heiner Kallweit [Sat, 21 Jul 2018 13:48:47 +0000 (15:48 +0200)] 
net: phy: add GBit master / slave error detection

Certain PHY's have issues when operating in GBit slave mode and can
be forced to master mode. Examples are RTL8211C, also the Micrel PHY
driver has a DT setting to force master mode.
If two such chips are link partners the autonegotiation will fail.
Standard defines a self-clearing on read, latched-high bit to
indicate this error. Check this bit to inform the user.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'net-whitespace-cleanups'
David S. Miller [Tue, 24 Jul 2018 21:10:43 +0000 (14:10 -0700)] 
Merge branch 'net-whitespace-cleanups'

Stephen Hemminger says:

====================
net whitespace cleanups

Ran script that I use to check for trailing whitespace and
blank lines at end of files across all files in net/ directory.
These are errors that checkpatch reports and git flags.

These are the resulting fixes broken up mostly by subsystem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: remove blank lines at end of file
Stephen Hemminger [Tue, 24 Jul 2018 19:29:18 +0000 (12:29 -0700)] 
net: remove blank lines at end of file

Several files have extra line at end of file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agol2tp: remove trailing newline
Stephen Hemminger [Tue, 24 Jul 2018 19:29:17 +0000 (12:29 -0700)] 
l2tp: remove trailing newline

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpfilter: remove trailing newline
Stephen Hemminger [Tue, 24 Jul 2018 19:29:16 +0000 (12:29 -0700)] 
bpfilter: remove trailing newline

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodecnet: whitespace fixes
Stephen Hemminger [Tue, 24 Jul 2018 19:29:14 +0000 (12:29 -0700)] 
decnet: whitespace fixes

Remove trailing whitespace and extra lines at EOF

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agox25: remove blank lines at EOF
Stephen Hemminger [Tue, 24 Jul 2018 19:29:13 +0000 (12:29 -0700)] 
x25: remove blank lines at EOF

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoax25: remove blank line at EOF
Stephen Hemminger [Tue, 24 Jul 2018 19:29:12 +0000 (12:29 -0700)] 
ax25: remove blank line at EOF

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: remove blank lines at EOF
Stephen Hemminger [Tue, 24 Jul 2018 19:29:11 +0000 (12:29 -0700)] 
atm: remove blank lines at EOF

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoila: remove blank lines at EOF
Stephen Hemminger [Tue, 24 Jul 2018 19:29:09 +0000 (12:29 -0700)] 
ila: remove blank lines at EOF

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: whitespace fixes
Stephen Hemminger [Tue, 24 Jul 2018 19:29:08 +0000 (12:29 -0700)] 
sctp: whitespace fixes

Remove blank line at EOF and trailing whitespace.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoxfrm: remove blank lines at EOF
Stephen Hemminger [Tue, 24 Jul 2018 19:29:06 +0000 (12:29 -0700)] 
xfrm: remove blank lines at EOF

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agompls: remove trailing whitepace
Stephen Hemminger [Tue, 24 Jul 2018 19:29:05 +0000 (12:29 -0700)] 
mpls: remove trailing whitepace

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agollc: fix whitespace issues
Stephen Hemminger [Tue, 24 Jul 2018 19:29:04 +0000 (12:29 -0700)] 
llc: fix whitespace issues

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agords: remove trailing whitespace and blank lines
Stephen Hemminger [Tue, 24 Jul 2018 19:29:03 +0000 (12:29 -0700)] 
rds: remove trailing whitespace and blank lines

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agowimax: remove blank lines at EOF
Stephen Hemminger [Tue, 24 Jul 2018 19:29:02 +0000 (12:29 -0700)] 
wimax: remove blank lines at EOF

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosched: fix trailing whitespace
Stephen Hemminger [Tue, 24 Jul 2018 19:29:01 +0000 (12:29 -0700)] 
sched: fix trailing whitespace

Remove trailing whitespace and blank lines at EOF

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agor8169: restore previous behavior to accept BIOS WoL settings
Heiner Kallweit [Tue, 24 Jul 2018 20:21:04 +0000 (22:21 +0200)] 
r8169: restore previous behavior to accept BIOS WoL settings

Commit 7edf6d314cd0 tried to resolve an inconsistency (BIOS WoL
settings are accepted, but device isn't wakeup-enabled) resulting
from a previous broken-BIOS workaround by making disabled WoL the
default.
This however had some side effects, most likely due to a broken BIOS
some systems don't properly resume from suspend when the MagicPacket
WoL bit isn't set in the chip, see
https://bugzilla.kernel.org/show_bug.cgi?id=200195
Therefore restore the WoL behavior from 4.16.

Reported-by: Albert Astals Cid <aacid@kde.org>
Fixes: 7edf6d314cd0 ("r8169: disable WOL per default")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: remove redundant input checks in SIOCSIFTXQLEN case of dev_ifsioc
Tariq Toukan [Tue, 24 Jul 2018 10:53:00 +0000 (13:53 +0300)] 
net: remove redundant input checks in SIOCSIFTXQLEN case of dev_ifsioc

The cited patch added a call to dev_change_tx_queue_len in
SIOCSIFTXQLEN case.
This obsoletes the new len comparison check done before the function call.
Remove it here.

For the desicion of keep/remove the negative value check, we examine the
range check in dev_change_tx_queue_len.
On 64-bit we will fail with -ERANGE.  The 32-bit int ifr_qlen will be sign
extended to 64-bits when it is passed into dev_change_tx_queue_len(). And
then for negative values this test triggers:

if (new_len != (unsigned int)new_len)
return -ERANGE;

because:
if (0xffffffffWHATEVER != 0x00000000WHATEVER)

On 32-bit the signed value will be accepted, changing behavior.

Therefore, the negative value check is kept.

Fixes: 3f76df198288 ("net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Tue, 24 Jul 2018 17:44:24 +0000 (10:44 -0700)] 
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fix from Martin Schwidefsky.

Guenter Roeck reports that the s390 allmodconfig build fails because of
a gcc plugin problem.  The fix won't be in-tree until 4.19, so for now
disable the gcc plugins on s390.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: disable gcc plugins

7 years agomedia: staging: omap4iss: Include asm/cacheflush.h after generic includes
Guenter Roeck [Mon, 23 Jul 2018 21:39:33 +0000 (14:39 -0700)] 
media: staging: omap4iss: Include asm/cacheflush.h after generic includes

Including asm/cacheflush.h first results in the following build error
when trying to build sparc32:allmodconfig, because 'struct page' has not
been declared, and the function declaration ends up creating a separate
(private) declaration of struct page (as a result of function arguments
being in the scope of the function declaration and definition, not in
global scope).

The C scoping rules do not just affect variable visibility, they also
affect type declaration visibility.

The end result is that when the actual call site is seen in
<linux/highmem.h>, the 'struct page' type in the caller is not the same
'struct page' that the function was declared with, resulting in:

  In file included from arch/sparc/include/asm/page.h:10:0,
                   ...
                   from drivers/staging/media/omap4iss/iss_video.c:15:
  include/linux/highmem.h: In function 'clear_user_highpage':
  include/linux/highmem.h:137:31: error:
passing argument 1 of 'sparc_flush_page_to_ram' from incompatible
pointer type

Include generic includes files first to fix the problem.

Fixes: fc96d58c10162 ("[media] v4l: omap4iss: Add support for OMAP4 camera interface - Video devices")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[ Added explanation of C scope rules - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge branch 'cxgb4-collect-free-Tx-Rx-pages-and-page-pointers'
David S. Miller [Tue, 24 Jul 2018 17:12:22 +0000 (10:12 -0700)] 
Merge branch 'cxgb4-collect-free-Tx-Rx-pages-and-page-pointers'

Rahul Lakkireddy says:

====================
cxgb4: collect free Tx/Rx pages and page pointers

Patch 1 collects number of free PSTRUCT page pointers in context
memory.

Patch 2 moves the collection logic for Tx/Rx free pages to common
code, since this information needs to be collected in vmcore device
dump as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: move Tx/Rx free pages collection to common code
Rahul Lakkireddy [Tue, 24 Jul 2018 14:47:10 +0000 (20:17 +0530)] 
cxgb4: move Tx/Rx free pages collection to common code

This information needs to be collected in vmcore device dump as well.
So, move to common code.

Fixes: fa145d5dfd61 ("cxgb4: display number of rx and tx pages free")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: collect number of free PSTRUCT page pointers
Rahul Lakkireddy [Tue, 24 Jul 2018 14:47:09 +0000 (20:17 +0530)] 
cxgb4: collect number of free PSTRUCT page pointers

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlxsw-Add-extack-messages-for-tc-flower'
David S. Miller [Tue, 24 Jul 2018 17:10:33 +0000 (10:10 -0700)] 
Merge branch 'mlxsw-Add-extack-messages-for-tc-flower'

Ido Schimmel says:

====================
mlxsw: Add extack messages for tc flower

Nir says:

This patch set adds extack messages support to tc flower part of mlxsw.
The messages provide clear reasoning to failures, as some of the available
actions and keys are not supported in driver or HW and resources may get
exhausted.

The first patch deals with propagation of the extack pointer among the functions
dealing with key parsing and action sets handling.

Following patches 2-4 add appropriate messages across the different layers of
mlxsw tc flower implementation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_flower: Add extack messages
Nir Dotan [Tue, 24 Jul 2018 14:13:14 +0000 (17:13 +0300)] 
mlxsw: spectrum_flower: Add extack messages

Return extack messages in order to explain failures
of unsupported actions, keys and invalid user input.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_acl: Add extack messages
Nir Dotan [Tue, 24 Jul 2018 14:13:13 +0000 (17:13 +0300)] 
mlxsw: spectrum_acl: Add extack messages

Return extack messages for failures in action set creation.
Messages provide reasons for not being able to implement
the action in HW.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: core_acl_flex_actions: Add extack messages
Nir Dotan [Tue, 24 Jul 2018 14:13:12 +0000 (17:13 +0300)] 
mlxsw: core_acl_flex_actions: Add extack messages

Return extack messages for failures in action set creation.
Errors may occur when action is not currently supported or due
to lack of resources.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_acl: Propagate extack pointer
Nir Dotan [Tue, 24 Jul 2018 14:13:11 +0000 (17:13 +0300)] 
mlxsw: spectrum_acl: Propagate extack pointer

Propagate extack pointer in order to add extack messages for ACL.
In the follow-up patches, appropriate messages will be added
in various points.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetlink: do not store start function in netlink_cb
Florian Westphal [Tue, 24 Jul 2018 10:47:56 +0000 (12:47 +0200)] 
netlink: do not store start function in netlink_cb

->start() is called once when dump is being initialized, there is no
need to store it in netlink_cb.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'mac80211-next-for-davem-2018-07-24' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Tue, 24 Jul 2018 17:00:54 +0000 (10:00 -0700)] 
Merge tag 'mac80211-next-for-davem-2018-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Only a few things:
 * HE (802.11ax) support in HWSIM
 * bypass TXQ with NDP frames as they're special
 * convert ahash -> shash in lib80211 TKIP
 * avoid playing with tailroom counter defer unless
   needed to avoid issues in some cases
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Tue, 24 Jul 2018 16:56:50 +0000 (09:56 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Make sure we don't go over the maximum jump stack boundary,
   from Taehee Yoo.

2) Missing rcu_barrier() in hash and rbtree sets, also from Taehee.

3) Missing check to nul-node in rbtree timeout routine, from Taehee.

4) Use dev->name from flowtable to fix a memleak, from Florian.

5) Oneliner to free flowtable object on removal, from Florian.

6) Memleak in chain rename transaction, again from Florian.

7) Don't allow two chains to use the same name in the same
   transaction, from Florian.

8) handle DCCP SYNC/SYNCACK as invalid, this triggers an
   uninitialized timer in conntrack reported by syzbot, from Florian.

9) Fix leak in case netlink_dump_start() fails, from Florian.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'mac80211-for-davem-2018-07-24' of git://git.kernel.org/pub/scm/linux/kerne...
David S. Miller [Tue, 24 Jul 2018 16:38:50 +0000 (09:38 -0700)] 
Merge tag 'mac80211-for-davem-2018-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Only a few fixes:
 * always keep regulatory user hint
 * add missing break statement in station flags parsing
 * fix non-linear SKBs in port-control-over-nl80211
 * reconfigure VLAN stations during HW restart
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomac80211: restrict delayed tailroom needed decrement
Manikanta Pubbisetty [Tue, 10 Jul 2018 11:18:27 +0000 (16:48 +0530)] 
mac80211: restrict delayed tailroom needed decrement

As explained in ieee80211_delayed_tailroom_dec(), during roam,
keys of the old AP will be destroyed and new keys will be
installed. Deletion of the old key causes
crypto_tx_tailroom_needed_cnt to go from 1 to 0 and the new key
installation causes a transition from 0 to 1.

Whenever crypto_tx_tailroom_needed_cnt transitions from 0 to 1,
we invoke synchronize_net(); the reason for doing this is to avoid
a race in the TX path as explained in increment_tailroom_need_count().
This synchronize_net() operation can be slow and can affect the station
roam time. To avoid this, decrementing the crypto_tx_tailroom_needed_cnt
is delayed for a while so that upon installation of new key the
transition would be from 1 to 2 instead of 0 to 1 and thereby
improving the roam time.

This is all correct for a STA iftype, but deferring the tailroom_needed
decrement for other iftypes may be unnecessary.

For example, let's consider the case of a 4-addr client connecting to
an AP for which AP_VLAN interface is also created, let the initial
value for tailroom_needed on the AP be 1.

* 4-addr client connects to the AP (AP: tailroom_needed = 1)
* AP will clear old keys, delay decrement of tailroom_needed count
* AP_VLAN is created, it takes the tailroom count from master
  (AP_VLAN: tailroom_needed = 1, AP: tailroom_needed = 1)
* Install new key for the station, assume key is plumbed in the HW,
  there won't be any change in tailroom_needed count on AP iface
* Delayed decrement of tailroom_needed count on AP
  (AP: tailroom_needed = 0, AP_VLAN: tailroom_needed = 1)

Because of the delayed decrement on AP iface, tailroom_needed count goes
out of sync between AP(master iface) and AP_VLAN(slave iface) and
there would be unnecessary tailroom created for the packets going
through AP_VLAN iface.

Also, WARN_ONs were observed while trying to bring down the AP_VLAN
interface:
(warn_slowpath_common) (warn_slowpath_null+0x18/0x20)
(warn_slowpath_null) (ieee80211_free_keys+0x114/0x1e4)
(ieee80211_free_keys) (ieee80211_del_virtual_monitor+0x51c/0x850)
(ieee80211_del_virtual_monitor) (ieee80211_stop+0x30/0x3c)
(ieee80211_stop) (__dev_close_many+0x94/0xb8)
(__dev_close_many) (dev_close_many+0x5c/0xc8)

Restricting delayed decrement to station interface alone fixes the problem
and it makes sense to do so because delayed decrement is done to improve
roam time which is applicable only for client devices.

Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agowireless/lib80211: Convert from ahash to shash
Kees Cook [Mon, 16 Jul 2018 03:52:26 +0000 (20:52 -0700)] 
wireless/lib80211: Convert from ahash to shash

In preparing to remove all stack VLA usage from the kernel[1], this
removes the discouraged use of AHASH_REQUEST_ON_STACK in favor of
the smaller SHASH_DESC_ON_STACK by converting from ahash-wrapped-shash
to direct shash. The stack allocation will be made a fixed size in a
later patch to the crypto subsystem.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agocfg80211: never ignore user regulatory hint
Amar Singhal [Fri, 20 Jul 2018 19:15:18 +0000 (12:15 -0700)] 
cfg80211: never ignore user regulatory hint

Currently user regulatory hint is ignored if all wiphys
in the system are self managed. But the hint is not ignored
if there is no wiphy in the system. This affects the global
regulatory setting. Global regulatory setting needs to be
maintained so that it can be applied to a new wiphy entering
the system. Therefore, do not ignore user regulatory setting
even if all wiphys in the system are self managed.

Signed-off-by: Amar Singhal <asinghal@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agos390: disable gcc plugins
Martin Schwidefsky [Tue, 24 Jul 2018 05:55:18 +0000 (07:55 +0200)] 
s390: disable gcc plugins

The s390 build currently fails with the latent entropy plugin:

arch/s390/kernel/als.o: In function `verify_facilities':
als.c:(.init.text+0x24): undefined reference to `latent_entropy'
als.c:(.init.text+0xae): undefined reference to `latent_entropy'
make[3]: *** [arch/s390/boot/compressed/vmlinux] Error 1
make[2]: *** [arch/s390/boot/compressed/vmlinux] Error 2
make[1]: *** [bzImage] Error 2

This will be fixed with the early boot rework from Vasily, which
is planned for the 4.19 merge window.

For 4.18 the simplest solution is to disable the gcc plugins and
reenable them after the early boot rework is upstream.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
7 years agoMerge tag 'wireless-drivers-next-for-davem-2018-07-23' of git://git.kernel.org/pub...
David S. Miller [Tue, 24 Jul 2018 04:30:03 +0000 (21:30 -0700)] 
Merge tag 'wireless-drivers-next-for-davem-2018-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.19

The first set of patches for 4.19. Only smaller features and bug
fixes, not really anything major. Also included are changes to
include/linux/bitfield.h, we agreed with Johannes that it makes sense
to apply them via wireless-drivers-next.

Major changes:

ath10k

* support channel 173

* fix spectral scan for QCA9984 and QCA9888 chipsets

ath6kl

* add support for Dell Wireless 1537

ti wlcore

* add support for runtime PM

* enable runtime PM autosuspend support

qtnfmac

* support changing MAC address

* enable source MAC address randomization support

libertas

* fix suspend and resume for SDIO cards

mt76

* add software DFS radar pattern detector for mt76x2 based devices
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosock: fix sg page frag coalescing in sk_alloc_sg
Daniel Borkmann [Mon, 23 Jul 2018 20:37:54 +0000 (22:37 +0200)] 
sock: fix sg page frag coalescing in sk_alloc_sg

Current sg coalescing logic in sk_alloc_sg() (latter is used by tls and
sockmap) is not quite correct in that we do fetch the previous sg entry,
however the subsequent check whether the refilled page frag from the
socket is still the same as from the last entry with prior offset and
length matching the start of the current buffer is comparing always the
first sg list entry instead of the prior one.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'rds-ipv6'
David S. Miller [Tue, 24 Jul 2018 04:17:44 +0000 (21:17 -0700)] 
Merge branch 'rds-ipv6'

Ka-Cheong Poon says:

====================
rds: IPv6 support

This patch set adds IPv6 support to the kernel RDS and related
modules.  Existing RDS apps using IPv4 address continue to run without
any problem.  New RDS apps which want to use IPv6 address can do so by
passing the address in struct sockaddr_in6 to bind(), connect() or
sendmsg().  And those apps also need to use the new IPv6 equivalents
of some of the existing socket options as the existing options use a
32 bit integer to store IP address.

All RDS code now use struct in6_addr to store IP address.  IPv4
address is stored as an IPv4 mapped address.

Header file changes

There are many data structures (RDS socket options) used by RDS apps
which use a 32 bit integer to store IP address. To support IPv6,
struct in6_addr needs to be used. To ensure backward compatibility, a
new data structure is introduced for each of those data structures
which use a 32 bit integer to represent an IP address. And new socket
options are introduced to use those new structures. This means that
existing apps should work without a problem with the new RDS module.
For apps which want to use IPv6, those new data structures and socket
options can be used. IPv4 mapped address is used to represent IPv4
address in the new data structures.

Internally, all RDS data structures which contain an IP address are
changed to use struct in6_addr to store the address. IPv4 address is
stored as an IPv4 mapped address. All the functions which take an IP
address as argument are also changed to use struct in6_addr.

RDS/RDMA/IB uses a private data (struct rds_ib_connect_private)
exchange between endpoints at RDS connection establishment time to
support RDMA. This private data exchange uses a 32 bit integer to
represent an IP address. This needs to be changed in order to support
IPv6. A new private data struct rds6_ib_connect_private is introduced
to handle this. To ensure backward compatibility, an IPv6 capable RDS
stack uses another RDMA listener port (RDS_CM_PORT) to accept IPv6
connection. And it continues to use the original RDS_PORT for IPv4 RDS
connections. When it needs to communicate with an IPv6 peer, it uses
the RDS_TCP_PORT to send the connection set up request.

RDS/TCP changes

TCP related code is changed to support IPv6.  Note that only an IPv6
TCP listener on port RDS_TCP_PORT is created as it can accept both
IPv4 and IPv6 connection requests.

IB/RDMA changes

The initial private data exchange between IB endpoints using RDMA is
changed to support IPv6 address instead, if the peer address is IPv6.
To ensure backward compatibility, annother RDMA listener port
(RDS_CM_PORT) is used to accept IPv6 connection. An IPv6 capable RDS
module continues to use the original RDS_PORT for IPv4 RDS
connections. When it needs to communicate with an IPv6 peer, it uses
the RDS_CM_PORT to send the connection set up request.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agords: Extend RDS API for IPv6 support
Ka-Cheong Poon [Tue, 24 Jul 2018 03:51:23 +0000 (20:51 -0700)] 
rds: Extend RDS API for IPv6 support

There are many data structures (RDS socket options) used by RDS apps
which use a 32 bit integer to store IP address. To support IPv6,
struct in6_addr needs to be used. To ensure backward compatibility, a
new data structure is introduced for each of those data structures
which use a 32 bit integer to represent an IP address. And new socket
options are introduced to use those new structures. This means that
existing apps should work without a problem with the new RDS module.
For apps which want to use IPv6, those new data structures and socket
options can be used. IPv4 mapped address is used to represent IPv4
address in the new data structures.

v4: Revert changes to SO_RDS_TRANSPORT

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agords: Enable RDS IPv6 support
Ka-Cheong Poon [Tue, 24 Jul 2018 03:51:22 +0000 (20:51 -0700)] 
rds: Enable RDS IPv6 support

This patch enables RDS to use IPv6 addresses. For RDS/TCP, the
listener is now an IPv6 endpoint which accepts both IPv4 and IPv6
connection requests.  RDS/RDMA/IB uses a private data (struct
rds_ib_connect_private) exchange between endpoints at RDS connection
establishment time to support RDMA. This private data exchange uses a
32 bit integer to represent an IP address. This needs to be changed in
order to support IPv6. A new private data struct
rds6_ib_connect_private is introduced to handle this. To ensure
backward compatibility, an IPv6 capable RDS stack uses another RDMA
listener port (RDS_CM_PORT) to accept IPv6 connection. And it
continues to use the original RDS_PORT for IPv4 RDS connections. When
it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to
send the connection set up request.

v5: Fixed syntax problem (David Miller).

v4: Changed port history comments in rds.h (Sowmini Varadhan).

v3: Added support to set up IPv4 connection using mapped address
    (David Miller).
    Added support to set up connection between link local and non-link
    addresses.
    Various review comments from Santosh Shilimkar and Sowmini Varadhan.

v2: Fixed bound and peer address scope mismatched issue.
    Added back rds_connect() IPv6 changes.

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agords: Changing IP address internal representation to struct in6_addr
Ka-Cheong Poon [Tue, 24 Jul 2018 03:51:21 +0000 (20:51 -0700)] 
rds: Changing IP address internal representation to struct in6_addr

This patch changes the internal representation of an IP address to use
struct in6_addr.  IPv4 address is stored as an IPv4 mapped address.
All the functions which take an IP address as argument are also
changed to use struct in6_addr.  But RDS socket layer is not modified
such that it still does not accept IPv6 address from an application.
And RDS layer does not accept nor initiate IPv6 connections.

v2: Fixed sparse warnings.

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'sched-introduce-chain-templates-support-with-offloading-to-mlxsw'
David S. Miller [Tue, 24 Jul 2018 03:44:13 +0000 (20:44 -0700)] 
Merge branch 'sched-introduce-chain-templates-support-with-offloading-to-mlxsw'

Jiri Pirko says:

====================
sched: introduce chain templates support with offloading to mlxsw

For the TC clsact offload these days, some of HW drivers need
to hold a magic ball. The reason is, with the first inserted rule inside
HW they need to guess what fields will be used for the matching. If
later on this guess proves to be wrong and user adds a filter with a
different field to match, there's a problem. Mlxsw resolves it now with
couple of patterns. Those try to cover as many match fields as possible.
This aproach is far from optimal, both performance-wise and scale-wise.
Also, there is a combination of filters that in certain order won't
succeed.

Most of the time, when user inserts filters in chain, he knows right away
how the filters are going to look like - what type and option will they
have. For example, he knows that he will only insert filters of type
flower matching destination IP address. He can specify a template that
would cover all the filters in the chain.

This patchset is providing the possibility to user to provide such
template to kernel and propagate it all the way down to device
drivers.

See the examples below.

Create dummy device with clsact first:

There is no chain present by by default:

Add chain number 11 by explicit command:
chain parent ffff: chain 11

Add filter to chain number 12 which does not exist. That will create
implicit chain 12:
chain parent ffff: chain 11
chain parent ffff: chain 12

Delete both chains:

Add a chain with template of type flower allowing to insert rules matching
on last 2 bytes of destination mac address:

The chain with template is now showed in the list:
chain parent ffff: flower chain 0
  dst_mac 00:00:00:00:00:00/00:00:00:00:ff:ff
  eth_type ipv4

Add another chain (number 22) with template:
chain parent ffff: flower chain 0
  dst_mac 00:00:00:00:00:00/00:00:00:00:ff:ff
  eth_type ipv4
chain parent ffff: flower chain 22
  eth_type ipv4
  dst_ip 0.0.0.0/16

Add a filter that fits the template:

Addition of filters that does not fit the template would fail:
Error: cls_flower: Mask does not fit the template.
We have an error talking to the kernel, -1
Error: cls_flower: Mask does not fit the template.
We have an error talking to the kernel, -1

Additions of filters to chain 22:
Error: cls_flower: Mask does not fit the template.
We have an error talking to the kernel, -1
Error: cls_flower: Mask does not fit the template.
We have an error talking to the kernel, -1

---
v3->v4:
- patch 2:
  - new patch
- patch 3:
  - new patch, derived from the previous v3 chaintemplate obj patch
- patch 4:
  - only templates part as chains creation/deletion is now a separate patch
  - don't pass template priv as arg of "change" op
- patch 6:
  - rebased on top of flower cvlan patch and ip tos/ttl patch
- patch 7:
  - templave priv is no longer passed as an arg to "change" op
- patch 11:
  - split from the originally single patch
- patch 12:
  - split from the originally single patch
v2->v3:
- patch 7:
  - rebase on top of the reoffload patchset
- patch 8:
  - rebase on top of the reoffload patchset
v1->v2:
- patch 8:
  - remove leftover extack arg in fl_hw_create_tmplt()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoselftests: forwarding: add tests for TC chain templates
Jiri Pirko [Mon, 23 Jul 2018 07:24:07 +0000 (09:24 +0200)] 
selftests: forwarding: add tests for TC chain templates

Add basic sanity tests for TC chain templates.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoselftests: forwarding: add tests for TC chains creation adn destruction
Jiri Pirko [Mon, 23 Jul 2018 07:24:06 +0000 (09:24 +0200)] 
selftests: forwarding: add tests for TC chains creation adn destruction

Add basic sanity tests for TC chains.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoselftests: forwarding: move shblock tc support check to a separate helper
Jiri Pirko [Mon, 23 Jul 2018 07:24:05 +0000 (09:24 +0200)] 
selftests: forwarding: move shblock tc support check to a separate helper

The shared block support is only needed for tc_shblock.sh. No need to
require that for other test.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Implement chain template hinting
Jiri Pirko [Mon, 23 Jul 2018 07:23:12 +0000 (09:23 +0200)] 
mlxsw: spectrum: Implement chain template hinting

Since cld_flower provides information about the filter template for
specific chain, use this information in order to prepare a region.
Use the template to find out what elements are going to be used
and pass that down to mlxsw_sp_acl_tcam_group_add(). Later on, when the
first filter is inserted, the mlxsw_sp_acl_tcam_group_use_patterns()
function would use this element usage information instead of looking
up a pattern.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: cls_flower: propagate chain teplate creation and destruction to drivers
Jiri Pirko [Mon, 23 Jul 2018 07:23:11 +0000 (09:23 +0200)] 
net: sched: cls_flower: propagate chain teplate creation and destruction to drivers

Introduce a couple of flower offload commands in order to propagate
template creation/destruction events down to device drivers.
Drivers may use this information to prepare HW in an optimal way
for future filter insertions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: cls_flower: implement chain templates
Jiri Pirko [Mon, 23 Jul 2018 07:23:10 +0000 (09:23 +0200)] 
net: sched: cls_flower: implement chain templates

Use the previously introduced template extension and implement
callback to create, destroy and dump chain template. The existing
parsing and dumping functions are re-used. Also, check if newly added
filters fit the template if it is set.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: cls_flower: change fl_init_dissector to accept mask and dissector
Jiri Pirko [Mon, 23 Jul 2018 07:23:09 +0000 (09:23 +0200)] 
net: sched: cls_flower: change fl_init_dissector to accept mask and dissector

This function is going to be used for templates as well, so we need to
pass the pointer separately.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: cls_flower: move key/mask dumping into a separate function
Jiri Pirko [Mon, 23 Jul 2018 07:23:08 +0000 (09:23 +0200)] 
net: sched: cls_flower: move key/mask dumping into a separate function

Push key/mask dumping from fl_dump() into a separate function
fl_dump_key(), that will be reused for template dumping.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: introduce chain templates
Jiri Pirko [Mon, 23 Jul 2018 07:23:07 +0000 (09:23 +0200)] 
net: sched: introduce chain templates

Allow user to set a template for newly created chains. Template lock
down the chain for particular classifier type/options combinations.
The classifier needs to support templates, otherwise kernel would
reply with error.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: introduce chain object to uapi
Jiri Pirko [Mon, 23 Jul 2018 07:23:06 +0000 (09:23 +0200)] 
net: sched: introduce chain object to uapi

Allow user to create, destroy, get and dump chain objects. Do that by
extending rtnl commands by the chain-specific ones. User will now be
able to explicitly create or destroy chains (so far this was done only
automatically according the filter/act needs and refcounting). Also, the
user will receive notification about any chain creation or destuction.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: Avoid implicit chain 0 creation
Jiri Pirko [Mon, 23 Jul 2018 07:23:05 +0000 (09:23 +0200)] 
net: sched: Avoid implicit chain 0 creation

Currently, chain 0 is implicitly created during block creation. However
that does not align with chain object exposure, creation and destruction
api introduced later on. So make the chain 0 behave the same way as any
other chain and only create it when it is needed. Since chain 0 is
somehow special as the qdiscs need to hold pointer to the first chain
tp, this requires to move the chain head change callback infra to the
block structure.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: push ops lookup bits into tcf_proto_lookup_ops()
Jiri Pirko [Mon, 23 Jul 2018 07:23:04 +0000 (09:23 +0200)] 
net: sched: push ops lookup bits into tcf_proto_lookup_ops()

Push all bits that take care of ops lookup, including module loading
outside tcf_proto_create() function, into tcf_proto_lookup_ops()

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'cpsw-add-MQPRIO-and-CBS-Qdisc-offload'
David S. Miller [Tue, 24 Jul 2018 03:34:36 +0000 (20:34 -0700)] 
Merge branch 'cpsw-add-MQPRIO-and-CBS-Qdisc-offload'

Ivan Khoronzhuk says:

====================
net: ethernet: ti: cpsw: add MQPRIO and CBS Qdisc offload

This series adds MQPRIO and CBS Qdisc offload for TI cpsw driver.
It potentially can be used in audio video bridging (AVB) and time
sensitive networking (TSN).

Patchset was tested on AM572x EVM and BBB boards. Last patch from this
series adds detailed description of configuration with examples. For
consistency reasons, in role of talker and listener, tools from
patchset "TSN: Add qdisc based config interface for CBS" were used and
can be seen here: https://www.spinics.net/lists/netdev/msg460869.html

Based on net-next/master

v5..v4:
- corrected typo of "am57xx" board name, no functional changes

v4..v3:
 - nothing, just rebase

v3..v2:
 - corrected typo of "shaper" word, no functional changes

v2..v1:
 - changed name cpsw.txt on ti-cpsw.txt
 - changed name cpsw_set_tc() on cpsw_set_mqprio()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoDocumentation: networking: cpsw: add MQPRIO & CBS offload examples
Ivan Khoronzhuk [Mon, 23 Jul 2018 21:26:34 +0000 (00:26 +0300)] 
Documentation: networking: cpsw: add MQPRIO & CBS offload examples

This document describes MQPRIO and CBS Qdisc offload configuration
for cpsw driver based on examples. It potentially can be used in
audio video bridging (AVB) and time sensitive networking (TSN).

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: ti: cpsw: restore shaper configuration while down/up
Ivan Khoronzhuk [Mon, 23 Jul 2018 21:26:33 +0000 (00:26 +0300)] 
net: ethernet: ti: cpsw: restore shaper configuration while down/up

Need to restore shapers configuration after interface was down/up.
This is needed as appropriate configuration is still replicated in
kernel settings. This only shapers context restore, so vlan
configuration should be restored by user if needed, especially for
devices with one port where vlan frames are sent via ALE.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: ti: cpsw: add CBS Qdisc offload
Ivan Khoronzhuk [Mon, 23 Jul 2018 21:26:32 +0000 (00:26 +0300)] 
net: ethernet: ti: cpsw: add CBS Qdisc offload

The cpsw has up to 4 FIFOs per port and upper 3 FIFOs can feed rate
limited queue with shaping. In order to set and enable shaping for
those 3 FIFOs queues the network device with CBS qdisc attached is
needed. The CBS configuration is added for dual-emac/single port mode
only, but potentially can be used in switch mode also, based on
switchdev for instance.

Despite the FIFO shapers can work w/o cpdma level shapers the base
usage must be in combine with cpdma level shapers as described in TRM,
that are set as maximum rates for interface queues with sysfs.

One of the possible configuration with txq shapers and CBS shapers:

                      Configured with echo RATE >
                  /sys/class/net/eth0/queues/tx-0/tx_maxrate
             /---------------------------------------------------
            /
           /            cpdma level shapers
        +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+
        | c7 | | c6 | | c5 | | c4 | | c3 | | c2 | | c1 | | c0 |
        \    / \    / \    / \    / \    / \    / \    / \    /
         \  /   \  /   \  /   \  /   \  /   \  /   \  /   \  /
          \/     \/     \/     \/     \/     \/     \/     \/
+---------|------|------|------|-------------------------------------+
|    +----+      |      |  +---+                                     |
|    |      +----+      |  |                                         |
|    v      v           v  v                                         |
| +----+ +----+ +----+ +----+ p        p+----+ +----+ +----+ +----+  |
| |    | |    | |    | |    | o        o|    | |    | |    | |    |  |
| | f3 | | f2 | | f1 | | f0 | r  CPSW  r| f3 | | f2 | | f1 | | f0 |  |
| |    | |    | |    | |    | t        t|    | |    | |    | |    |  |
| \    / \    / \    / \    / 0        1\    / \    / \    / \    /  |
|  \  X   \  /   \  /   \  /             \  /   \  /   \  /   \  /   |
|   \/ \   \/     \/     \/               \/     \/     \/     \/    |
+-------\------------------------------------------------------------+
         \
          \ FIFO shaper, set with CBS offload added in this patch,
           \ FIFO0 cannot be rate limited
            ------------------------------------------------------

CBS shaper configuration is supposed to be used with root MQPRIO Qdisc
offload allowing to add sk_prio->tc->txq maps that direct traffic to
appropriate tx queue and maps L2 priority to FIFO shaper.

The CBS shaper is intended to be used for AVB where L2 priority
(pcp field) is used to differentiate class of traffic. So additionally
vlan needs to be created with appropriate egress sk_prio->l2 prio map.

If CBS has several tx queues assigned to it, the sum of their
bandwidth has not overlap bandwidth set for CBS. It's recomended the
CBS bandwidth to be a little bit more.

The CBS shaper is configured with CBS qdisc offload interface using tc
tool from iproute2 packet.

For instance:

$ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 1

$ tc -g class show dev eth0
+---(100:ffe2) mqprio
|    +---(100:3) mqprio
|    +---(100:4) mqprio
|    
+---(100:ffe1) mqprio
|    +---(100:2) mqprio
|    
+---(100:ffe0) mqprio
     +---(100:1) mqprio

$ tc qdisc add dev eth0 parent 100:1 cbs locredit -1440 \
hicredit 60 sendslope -960000 idleslope 40000 offload 1

$ tc qdisc add dev eth0 parent 100:2 cbs locredit -1470 \
hicredit 62 sendslope -980000 idleslope 20000 offload 1

The above code set CBS shapers for tc0 and tc1, for that txq0 and
txq1 is used. Pay attention, the real set bandwidth can differ a bit
due to discreteness of configuration parameters.

Here parameters like locredit, hicredit and sendslope are ignored
internally and are supposed to be set with assumption that maximum
frame size for frame - 1500.

It's supposed that interface speed is not changed while reconnection,
not always is true, so inform user in case speed of interface was
changed, as it can impact on dependent shapers configuration.

For more examples see Documentation.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: ti: cpsw: add MQPRIO Qdisc offload
Ivan Khoronzhuk [Mon, 23 Jul 2018 21:26:31 +0000 (00:26 +0300)] 
net: ethernet: ti: cpsw: add MQPRIO Qdisc offload

That's possible to offload vlan to tc priority mapping with
assumption sk_prio == L2 prio.

Example:
$ ethtool -L eth0 rx 1 tx 4

$ qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 1

$ tc -g class show dev eth0
+---(100:ffe2) mqprio
|    +---(100:3) mqprio
|    +---(100:4) mqprio
|    
+---(100:ffe1) mqprio
|    +---(100:2) mqprio
|    
+---(100:ffe0) mqprio
     +---(100:1) mqprio

Here, 100:1 is txq0, 100:2 is txq1, 100:3 is txq2, 100:4 is txq3
txq0 belongs to tc0, txq1 to tc1, txq2 and txq3 to tc2
The offload part only maps L2 prio to classes of traffic, but not
to transmit queues, so to direct traffic to traffic class vlan has
to be created with appropriate egress map.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: ti: cpdma: fit rated channels in backward order
Ivan Khoronzhuk [Mon, 23 Jul 2018 21:26:30 +0000 (00:26 +0300)] 
net: ethernet: ti: cpdma: fit rated channels in backward order

According to TRM tx rated channels should be in 7..0 order,
so correct it.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: ti: cpsw: use cpdma channels in backward order for txq
Ivan Khoronzhuk [Mon, 23 Jul 2018 21:26:29 +0000 (00:26 +0300)] 
net: ethernet: ti: cpsw: use cpdma channels in backward order for txq

The cpdma channel highest priority is from hi to lo number.
The driver has limited number of descriptors that are shared between
number of cpdma channels. Number of queues can be tuned with ethtool,
that allows to not spend descriptors on not needed cpdma channels.
In AVB usually only 2 tx queues can be enough with rate limitation.
The rate limitation can be used only for hi priority queues. Thus, to
use only 2 queues the 8 has to be created. It's wasteful.

So, in order to allow using only needed number of rate limited
tx queues, save resources, and be able to set rate limitation for
them, let assign tx cpdma channels in backward order to queues.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>