]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
5 years agotcp: tcp_grow_window() needs to respect tcp_space()
Eric Dumazet [Tue, 16 Apr 2019 17:55:20 +0000 (10:55 -0700)] 
tcp: tcp_grow_window() needs to respect tcp_space()

For some reason, tcp_grow_window() correctly tests if enough room
is present before attempting to increase tp->rcv_ssthresh,
but does not prevent it to grow past tcp_space()

This is causing hard to debug issues, like failing
the (__tcp_select_window(sk) >= tp->rcv_wnd) test
in __tcp_ack_snd_check(), causing ACK delays and possibly
slow flows.

Depending on tcp_rmem[2], MTU, skb->len/skb->truesize ratio,
we can see the problem happening on "netperf -t TCP_RR -- -r 2000,2000"
after about 60 round trips, when the active side no longer sends
immediate acks.

This bug predates git history.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoocelot: Clean up stats update deferred work
Claudiu Manoil [Tue, 16 Apr 2019 14:51:59 +0000 (17:51 +0300)] 
ocelot: Clean up stats update deferred work

This is preventive cleanup that may save troubles later.
No need to cancel repeateadly queued work if code is properly
refactored.
Don't let the ethtool -s process interfere with the stat workqueue
scheduling.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoocelot: Don't sleep in atomic context (irqs_disabled())
Claudiu Manoil [Tue, 16 Apr 2019 14:51:58 +0000 (17:51 +0300)] 
ocelot: Don't sleep in atomic context (irqs_disabled())

Preemption disabled at:
 [<ffff000008cabd54>] dev_set_rx_mode+0x1c/0x38
 Call trace:
 [<ffff00000808a5c0>] dump_backtrace+0x0/0x3d0
 [<ffff00000808a9a4>] show_stack+0x14/0x20
 [<ffff000008e6c0c0>] dump_stack+0xac/0xe4
 [<ffff0000080fe76c>] ___might_sleep+0x164/0x238
 [<ffff0000080fe890>] __might_sleep+0x50/0x88
 [<ffff0000082261e4>] kmem_cache_alloc+0x17c/0x1d0
 [<ffff000000ea0ae8>] ocelot_set_rx_mode+0x108/0x188 [mscc_ocelot_common]
 [<ffff000008cabcf0>] __dev_set_rx_mode+0x58/0xa0
 [<ffff000008cabd5c>] dev_set_rx_mode+0x24/0x38

Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bridge: fix netlink export of vlan_stats_per_port option
Nikolay Aleksandrov [Tue, 16 Apr 2019 13:15:56 +0000 (16:15 +0300)] 
net: bridge: fix netlink export of vlan_stats_per_port option

Since the introduction of the vlan_stats_per_port option the netlink
export of it has been broken since I made a typo and used the ifla
attribute instead of the bridge option to retrieve its state.
Sysfs export is fine, only netlink export has been affected.

Fixes: 9163a0fc1f0c0 ("net: bridge: add support for per-port vlan stats")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: fix spelling mistake "faspath" -> "fastpath"
Colin Ian King [Tue, 16 Apr 2019 11:43:17 +0000 (12:43 +0100)] 
qed: fix spelling mistake "faspath" -> "fastpath"

There is a spelling mistake in a DP_INFO message, fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: set sysctl_tipc_rmem and named_timeout right range
Jie Liu [Tue, 16 Apr 2019 05:10:09 +0000 (13:10 +0800)] 
tipc: set sysctl_tipc_rmem and named_timeout right range

We find that sysctl_tipc_rmem and named_timeout do not have the right minimum
setting. sysctl_tipc_rmem should be larger than zero, like sysctl_tcp_rmem.
And named_timeout as a timeout setting should be not less than zero.

Fixes: cc79dd1ba9c10 ("tipc: change socket buffer overflow control to respect sk_rcvbuf")
Fixes: a5325ae5b8bff ("tipc: add name distributor resiliency queue")
Signed-off-by: Jie Liu <liujie165@huawei.com>
Reported-by: Qiang Ning <ningqiang1@huawei.com>
Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: fix link established but not in session
Tuong Lien [Tue, 16 Apr 2019 03:48:07 +0000 (10:48 +0700)] 
tipc: fix link established but not in session

According to the link FSM, when a link endpoint got RESET_MSG (- a
traditional one without the stopping bit) from its peer, it moves to
PEER_RESET state and raises a LINK_DOWN event which then resets the
link itself. Its state will become ESTABLISHING after the reset event
and the link will be re-established soon after this endpoint starts to
send ACTIVATE_MSG to the peer.

There is no problem with this mechanism, however the link resetting has
cleared the link 'in_session' flag (along with the other important link
data such as: the link 'mtu') that was correctly set up at the 1st step
(i.e. when this endpoint received the peer RESET_MSG). As a result, the
link will become ESTABLISHED, but the 'in_session' flag is not set, and
all STATE_MSG from its peer will be dropped at the link_validate_msg().
It means the link not synced and will sooner or later face a failure.

Since the link reset action is obviously needed for a new link session
(this is also true in the other situations), the problem here is that
the link is re-established a bit too early when the link endpoints are
not really in-sync yet. The commit forces a resync as already done in
the previous commit 91986ee166cf ("tipc: fix link session and
re-establish issues") by simply varying the link 'peer_session' value
at the link_reset().

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Fix missing meta data in skb with vlan packet
Yuya Kusakabe [Tue, 16 Apr 2019 01:22:28 +0000 (10:22 +0900)] 
net: Fix missing meta data in skb with vlan packet

skb_reorder_vlan_header() should move XDP meta data with ethernet header
if XDP meta data exists.

Fixes: de8f3a83b0a0 ("bpf: add meta pointer for direct access")
Signed-off-by: Yuya Kusakabe <yuya.kusakabe@gmail.com>
Signed-off-by: Takeru Hayasaka <taketarou2@gmail.com>
Co-developed-by: Takeru Hayasaka <taketarou2@gmail.com>
Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: atm: Fix potential Spectre v1 vulnerabilities
Gustavo A. R. Silva [Mon, 15 Apr 2019 20:57:23 +0000 (15:57 -0500)] 
net: atm: Fix potential Spectre v1 vulnerabilities

arg is controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

net/atm/lec.c:715 lec_mcast_attach() warn: potential spectre issue 'dev_lec' [r] (local cap)

Fix this by sanitizing arg before using it to index dev_lec.

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/core: work around section mismatch warning for ptp_classifier
Ard Biesheuvel [Mon, 15 Apr 2019 17:14:45 +0000 (19:14 +0200)] 
net/core: work around section mismatch warning for ptp_classifier

The routine ptp_classifier_init() uses an initializer for an
automatic struct type variable which refers to an __initdata
symbol. This is perfectly legal, but may trigger a section
mismatch warning when running the compiler in -fpic mode, due
to the fact that the initializer may be emitted into an anonymous
.data section thats lack the __init annotation. So work around it
by using assignments instead.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bridge: fix per-port af_packet sockets
Nikolay Aleksandrov [Thu, 11 Apr 2019 10:56:39 +0000 (13:56 +0300)] 
net: bridge: fix per-port af_packet sockets

When the commit below was introduced it changed two visible things:
 - the skb was no longer passed through the protocol handlers with the
   original device
 - the skb was passed up the stack with skb->dev = bridge

The first change broke af_packet sockets on bridge ports. For example we
use them for hostapd which listens for ETH_P_PAE packets on the ports.
We discussed two possible fixes:
 - create a clone and pass it through NF_HOOK(), act on the original skb
   based on the result
 - somehow signal to the caller from the okfn() that it was called,
   meaning the skb is ok to be passed, which this patch is trying to
   implement via returning 1 from the bridge link-local okfn()

Note that we rely on the fact that NF_QUEUE/STOLEN would return 0 and
drop/error would return < 0 thus the okfn() is called only when the
return was 1, so we signal to the caller that it was called by preserving
the return value from nf_hook().

Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnx2x: fix spelling mistake "dicline" -> "decline"
Colin Ian King [Mon, 15 Apr 2019 15:47:03 +0000 (16:47 +0100)] 
bnx2x: fix spelling mistake "dicline" -> "decline"

There is a spelling mistake in a BNX2X_ERR message, fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoroute: Avoid crash from dereferencing NULL rt->from
Jonathan Lemon [Sun, 14 Apr 2019 21:21:29 +0000 (14:21 -0700)] 
route: Avoid crash from dereferencing NULL rt->from

When __ip6_rt_update_pmtu() is called, rt->from is RCU dereferenced, but is
never checked for null - rt6_flush_exceptions() may have removed the entry.

[ 1913.989004] RIP: 0010:ip6_rt_cache_alloc+0x13/0x170
[ 1914.209410] Call Trace:
[ 1914.214798]  <IRQ>
[ 1914.219226]  __ip6_rt_update_pmtu+0xb0/0x190
[ 1914.228649]  ip6_tnl_xmit+0x2c2/0x970 [ip6_tunnel]
[ 1914.239223]  ? ip6_tnl_parse_tlv_enc_lim+0x32/0x1a0 [ip6_tunnel]
[ 1914.252489]  ? __gre6_xmit+0x148/0x530 [ip6_gre]
[ 1914.262678]  ip6gre_tunnel_xmit+0x17e/0x3c7 [ip6_gre]
[ 1914.273831]  dev_hard_start_xmit+0x8d/0x1f0
[ 1914.283061]  sch_direct_xmit+0xfa/0x230
[ 1914.291521]  __qdisc_run+0x154/0x4b0
[ 1914.299407]  net_tx_action+0x10e/0x1f0
[ 1914.307678]  __do_softirq+0xca/0x297
[ 1914.315567]  irq_exit+0x96/0xa0
[ 1914.322494]  smp_apic_timer_interrupt+0x68/0x130
[ 1914.332683]  apic_timer_interrupt+0xf/0x20
[ 1914.341721]  </IRQ>

Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMAINTAINERS: normalize Woojung Huh's email address
Lukas Bulwahn [Sat, 13 Apr 2019 07:52:15 +0000 (09:52 +0200)] 
MAINTAINERS: normalize Woojung Huh's email address

MAINTAINERS contains a lower-case and upper-case variant of
Woojung Huh' s email address.

Only keep the lower-case variant in MAINTAINERS.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobonding: fix event handling for stacked bonds
Sabrina Dubroca [Fri, 12 Apr 2019 13:04:10 +0000 (15:04 +0200)] 
bonding: fix event handling for stacked bonds

When a bond is enslaved to another bond, bond_netdev_event() only
handles the event as if the bond is a master, and skips treating the
bond as a slave.

This leads to a refcount leak on the slave, since we don't remove the
adjacency to its master and the master holds a reference on the slave.

Reproducer:
  ip link add bondL type bond
  ip link add bondU type bond
  ip link set bondL master bondU
  ip link del bondL

No "Fixes:" tag, this code is older than git history.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "net-sysfs: Fix memory leak in netdev_register_kobject"
Wang Hai [Fri, 12 Apr 2019 20:36:33 +0000 (16:36 -0400)] 
Revert "net-sysfs: Fix memory leak in netdev_register_kobject"

This reverts commit 6b70fc94afd165342876e53fc4b2f7d085009945.

The reverted bugfix will cause another issue.
Reported by syzbot+6024817a931b2830bc93@syzkaller.appspotmail.com.
See https://syzkaller.appspot.com/x/log.txt?x=1737671b200000 for
details.

Signed-off-by: Wang Hai <wanghai26@huawei.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'wireless-drivers-for-davem-2019-04-15' of git://git.kernel.org/pub/scm...
David S. Miller [Mon, 15 Apr 2019 19:02:29 +0000 (12:02 -0700)] 
Merge tag 'wireless-drivers-for-davem-2019-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 5.1

Second set of fixes for 5.1.

iwlwifi

* add some new PCI IDs (plus a struct name change they depend on)

* fix crypto with new devices, namely 22560 and above

* fix for a potential deadlock in the TX path

* a fix for offloaded rate-control

* support new PCI HW IDs which use a new FW

mt76

* fix lock initialisation and a possible deadlock

* aggregation fixes

rt2x00

* fix sequence numbering during retransmits
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5-fixes-2019-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Sun, 14 Apr 2019 22:07:30 +0000 (15:07 -0700)] 
Merge tag 'mlx5-fixes-2019-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-04-09

This series provides some fixes to mlx5 driver.

I've cc'ed some of the checksum fixes to Eric Dumazet and i would like to get
his feedback before you pull.

For -stable v4.19
('net/mlx5: FPGA, tls, idr remove on flow delete')
('net/mlx5: FPGA, tls, hold rcu read lock a bit longer')

For -stable v4.20
('net/mlx5e: Rx, Check ip headers sanity')
('Revert "net/mlx5e: Enable reporting checksum unnecessary also for L3 packets"')
('net/mlx5e: Rx, Fixup skb checksum for packets with tail padding')

For -stable v5.0
('net/mlx5e: Switch to Toeplitz RSS hash by default')
('net/mlx5e: Protect against non-uplink representor for encap')
('net/mlx5e: XDP, Avoid checksum complete when XDP prog is loaded')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agortnetlink: fix rtnl_valid_stats_req() nlmsg_len check
Eric Dumazet [Sun, 14 Apr 2019 18:02:05 +0000 (11:02 -0700)] 
rtnetlink: fix rtnl_valid_stats_req() nlmsg_len check

Jakub forgot to either use nlmsg_len() or nlmsg_msg_size(),
allowing KMSAN to detect a possible uninit-value in rtnl_stats_get

BUG: KMSAN: uninit-value in rtnl_stats_get+0x6d9/0x11d0 net/core/rtnetlink.c:4997
CPU: 0 PID: 10428 Comm: syz-executor034 Not tainted 5.1.0-rc2+ #24
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x173/0x1d0 lib/dump_stack.c:113
 kmsan_report+0x131/0x2a0 mm/kmsan/kmsan.c:619
 __msan_warning+0x7a/0xf0 mm/kmsan/kmsan_instr.c:310
 rtnl_stats_get+0x6d9/0x11d0 net/core/rtnetlink.c:4997
 rtnetlink_rcv_msg+0x115b/0x1550 net/core/rtnetlink.c:5192
 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2485
 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5210
 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
 netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
 netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1925
 sock_sendmsg_nosec net/socket.c:622 [inline]
 sock_sendmsg net/socket.c:632 [inline]
 ___sys_sendmsg+0xdb3/0x1220 net/socket.c:2137
 __sys_sendmsg net/socket.c:2175 [inline]
 __do_sys_sendmsg net/socket.c:2184 [inline]
 __se_sys_sendmsg+0x305/0x460 net/socket.c:2182
 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2182
 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7

Fixes: 51bc860d4a99 ("rtnetlink: stats: validate attributes in get as well as dumps")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'qed-doorbell-overflow-recovery'
David S. Miller [Sun, 14 Apr 2019 20:59:49 +0000 (13:59 -0700)] 
Merge branch 'qed-doorbell-overflow-recovery'

Denis Bolotin says:

====================
qed: Fix the Doorbell Overflow Recovery mechanism

This patch series fixes and improves the doorbell recovery mechanism.
The main goals of this series are to fix missing attentions from the
doorbells block (DORQ) or not handling them properly, and execute the
recovery from periodic handler instead of the attention handler.

Please consider applying the series to net.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Fix the DORQ's attentions handling
Denis Bolotin [Sun, 14 Apr 2019 14:23:08 +0000 (17:23 +0300)] 
qed: Fix the DORQ's attentions handling

Separate the overflow handling from the hardware interrupt status analysis.
The interrupt status is a single register and is common for all PFs. The
first PF reading the register is not necessarily the one who overflowed.
All PFs must check their overflow status on every attention.
In this change we clear the sticky indication in the attention handler to
allow doorbells to be processed again as soon as possible, but running
the doorbell recovery is scheduled for the periodic handler to reduce the
time spent in the attention handler.
Checking the need for DORQ flush was changed to "db_bar_no_edpm" because
qed_edpm_enabled()'s result could change dynamically and might have
prevented a needed flush.

Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Fix missing DORQ attentions
Denis Bolotin [Sun, 14 Apr 2019 14:23:07 +0000 (17:23 +0300)] 
qed: Fix missing DORQ attentions

When the DORQ (doorbell block) is overflowed, all PFs get attentions at the
same time. If one PF finished handling the attention before another PF even
started, the second PF might miss the DORQ's attention bit and not handle
the attention at all.
If the DORQ attention is missed and the issue is not resolved, another
attention will not be sent, therefore each attention is treated as a
potential DORQ attention.
As a result, the attention callback is called more frequently so the debug
print was moved to reduce its quantity.
The number of periodic doorbell recovery handler schedules was reduced
because it was the previous way to mitigating the missed attention issue.

Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Fix the doorbell address sanity check
Denis Bolotin [Sun, 14 Apr 2019 14:23:06 +0000 (17:23 +0300)] 
qed: Fix the doorbell address sanity check

Fix the condition which verifies that doorbell address is inside the
doorbell bar by checking that the end of the address is within range
as well.

Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Delete redundant doorbell recovery types
Denis Bolotin [Sun, 14 Apr 2019 14:23:05 +0000 (17:23 +0300)] 
qed: Delete redundant doorbell recovery types

DB_REC_DRY_RUN (running doorbell recovery without sending doorbells) is
never used. DB_REC_ONCE (send a single doorbell from the doorbell recovery)
is not needed anymore because by running the periodic handler we make sure
we check the overflow status later instead.
This patch is needed because in the next patches, the only doorbell
recovery type being used is DB_REC_REAL_DEAL, and the fixes are much
cleaner without this enum.

Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: ensure rcu_read_lock() in ipv4_link_failure()
Eric Dumazet [Sun, 14 Apr 2019 00:32:21 +0000 (17:32 -0700)] 
ipv4: ensure rcu_read_lock() in ipv4_link_failure()

fib_compute_spec_dst() needs to be called under rcu protection.

syzbot reported :

WARNING: suspicious RCU usage
5.1.0-rc4+ #165 Not tainted
include/linux/inetdevice.h:220 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by swapper/0/0:
 #0: 0000000051b67925 ((&n->timer)){+.-.}, at: lockdep_copy_map include/linux/lockdep.h:170 [inline]
 #0: 0000000051b67925 ((&n->timer)){+.-.}, at: call_timer_fn+0xda/0x720 kernel/time/timer.c:1315

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.1.0-rc4+ #165
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5162
 __in_dev_get_rcu include/linux/inetdevice.h:220 [inline]
 fib_compute_spec_dst+0xbbd/0x1030 net/ipv4/fib_frontend.c:294
 spec_dst_fill net/ipv4/ip_options.c:245 [inline]
 __ip_options_compile+0x15a7/0x1a10 net/ipv4/ip_options.c:343
 ipv4_link_failure+0x172/0x400 net/ipv4/route.c:1195
 dst_link_failure include/net/dst.h:427 [inline]
 arp_error_report+0xd1/0x1c0 net/ipv4/arp.c:297
 neigh_invalidate+0x24b/0x570 net/core/neighbour.c:995
 neigh_timer_handler+0xc35/0xf30 net/core/neighbour.c:1081
 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
 expire_timers kernel/time/timer.c:1362 [inline]
 __run_timers kernel/time/timer.c:1681 [inline]
 __run_timers kernel/time/timer.c:1649 [inline]
 run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
 __do_softirq+0x266/0x95a kernel/softirq.c:293
 invoke_softirq kernel/softirq.c:374 [inline]
 irq_exit+0x180/0x1d0 kernel/softirq.c:414
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807

Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: recompile ip options in ipv4_link_failure
Stephen Suryaputra [Fri, 12 Apr 2019 20:19:27 +0000 (16:19 -0400)] 
ipv4: recompile ip options in ipv4_link_failure

Recompile IP options since IPCB may not be valid anymore when
ipv4_link_failure is called from arp_error_report.

Refer to the commit 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error")
and the commit before that (9ef6b42ad6fd) for a similar issue.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'rxrpc-fixes'
David S. Miller [Fri, 12 Apr 2019 23:57:23 +0000 (16:57 -0700)] 
Merge branch 'rxrpc-fixes'

David Howells says:

====================
rxrpc: Fixes

Here is a collection of fixes for rxrpc:

 (1) rxrpc_error_report() needs to call sock_error() to clear the error
     code from the UDP transport socket, lest it be unexpectedly revisited
     on the next kernel_sendmsg() call.  This has been causing all sorts of
     weird effects in AFS as the effects have typically been felt by the
     wrong RxRPC call.

 (2) Allow a kernel user of AF_RXRPC to easily detect if an rxrpc call has
     completed.

 (3) Allow errors incurred by attempting to transmit data through the UDP
     socket to get back up the stack to AFS.

 (4) Make AFS use (2) to abort the synchronous-mode call waiting loop if
     the rxrpc-level call completed.

 (5) Add a missing tracepoint case for tracing abort reception.

 (6) Fix detection and handling of out-of-order ACKs.

====================

Tested-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Fix detection of out of order acks
Jeffrey Altman [Fri, 12 Apr 2019 15:34:16 +0000 (16:34 +0100)] 
rxrpc: Fix detection of out of order acks

The rxrpc packet serial number cannot be safely used to compute out of
order ack packets for several reasons:

 1. The allocation of serial numbers cannot be assumed to imply the order
    by which acks are populated and transmitted.  In some rxrpc
    implementations, delayed acks and ping acks are transmitted
    asynchronously to the receipt of data packets and so may be transmitted
    out of order.  As a result, they can race with idle acks.

 2. Serial numbers are allocated by the rxrpc connection and not the call
    and as such may wrap independently if multiple channels are in use.

In any case, what matters is whether the ack packet provides new
information relating to the bounds of the window (the firstPacket and
previousPacket in the ACK data).

Fix this by discarding packets that appear to wind back the window bounds
rather than on serial number procession.

Fixes: 298bc15b2079 ("rxrpc: Only take the rwind and mtu values from latest ACK")
Signed-off-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Trace received connection aborts
David Howells [Fri, 12 Apr 2019 15:34:09 +0000 (16:34 +0100)] 
rxrpc: Trace received connection aborts

Trace received calls that are aborted due to a connection abort, typically
because of authentication failure.  Without this, connection aborts don't
show up in the trace log.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoafs: Check for rxrpc call completion in wait loop
Marc Dionne [Fri, 12 Apr 2019 15:34:02 +0000 (16:34 +0100)] 
afs: Check for rxrpc call completion in wait loop

Check the state of the rxrpc call backing an afs call in each iteration of
the call wait loop in case the rxrpc call has already been terminated at
the rxrpc layer.

Interrupt the wait loop and mark the afs call as complete if the rxrpc
layer call is complete.

There were cases where rxrpc errors were not passed up to afs, which could
result in this loop waiting forever for an afs call to transition to
AFS_CALL_COMPLETE while the rx call was already complete.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Allow errors to be returned from rxrpc_queue_packet()
Marc Dionne [Fri, 12 Apr 2019 15:33:54 +0000 (16:33 +0100)] 
rxrpc: Allow errors to be returned from rxrpc_queue_packet()

Change rxrpc_queue_packet()'s signature so that it can return any error
code it may encounter when trying to send the packet.

This allows the caller to eventually do something in case of error - though
it should be noted that the packet has been queued and a resend is
scheduled.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Make rxrpc_kernel_check_life() indicate if call completed
Marc Dionne [Fri, 12 Apr 2019 15:33:47 +0000 (16:33 +0100)] 
rxrpc: Make rxrpc_kernel_check_life() indicate if call completed

Make rxrpc_kernel_check_life() pass back the life counter through the
argument list and return true if the call has not yet completed.

Suggested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Clear socket error
Marc Dionne [Fri, 12 Apr 2019 15:33:40 +0000 (16:33 +0100)] 
rxrpc: Clear socket error

When an ICMP or ICMPV6 error is received, the error will be attached
to the socket (sk_err) and the report function will get called.
Clear any pending error here by calling sock_error().

This would cause the following attempt to use the socket to fail with
the error code stored by the ICMP error, resulting in unexpected errors
with various side effects depending on the context.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqede: fix write to free'd pointer error and double free of ptp
Colin Ian King [Fri, 12 Apr 2019 14:13:27 +0000 (15:13 +0100)] 
qede: fix write to free'd pointer error and double free of ptp

The err2 error return path calls qede_ptp_disable that cleans up
on an error and frees ptp. After this, the free'd ptp is dereferenced
when ptp->clock is set to NULL and the code falls-through to error
path err1 that frees ptp again.

Fix this by calling qede_ptp_disable and exiting via an error
return path that does not set ptp->clock or kfree ptp.

Addresses-Coverity: ("Write to pointer after free")
Fixes: 035744975aec ("qede: Add support for PTP resource locking.")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovxge: fix return of a free'd memblock on a failed dma mapping
Colin Ian King [Fri, 12 Apr 2019 13:45:12 +0000 (14:45 +0100)] 
vxge: fix return of a free'd memblock on a failed dma mapping

Currently if a pci dma mapping failure is detected a free'd
memblock address is returned rather than a NULL (that indicates
an error). Fix this by ensuring NULL is returned on this error case.

Addresses-Coverity: ("Use after free")
Fixes: 528f727279ae ("vxge: code cleanup and reorganization")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'iwlwifi-for-kalle-2019-04-03' of git://git.kernel.org/pub/scm/linux/kernel...
Kalle Valo [Fri, 12 Apr 2019 18:34:27 +0000 (21:34 +0300)] 
Merge tag 'iwlwifi-for-kalle-2019-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes

Second batch of iwlwifi fixes intended for v5.1

* fix for a potential deadlock in the TX path;
* a fix for offloaded rate-control;
* support new PCI HW IDs which use a new FW;

5 years agomt76x02: avoid status_list.lock and sta->rate_ctrl_lock dependency
Stanislaw Gruszka [Fri, 5 Apr 2019 11:42:56 +0000 (13:42 +0200)] 
mt76x02: avoid status_list.lock and sta->rate_ctrl_lock dependency

Move ieee80211_tx_status_ext() outside of status_list lock section
in order to avoid locking dependency and possible deadlock reposed by
LOCKDEP in below warning.

Also do mt76_tx_status_lock() just before it's needed.

[  440.224832] WARNING: possible circular locking dependency detected
[  440.224833] 5.1.0-rc2+ #22 Not tainted
[  440.224834] ------------------------------------------------------
[  440.224835] kworker/u16:28/2362 is trying to acquire lock:
[  440.224836] 0000000089b8cacf (&(&q->lock)->rlock#2){+.-.}, at: mt76_wake_tx_queue+0x4c/0xb0 [mt76]
[  440.224842]
               but task is already holding lock:
[  440.224842] 000000002cfedc59 (&(&sta->lock)->rlock){+.-.}, at: ieee80211_stop_tx_ba_cb+0x32/0x1f0 [mac80211]
[  440.224863]
               which lock already depends on the new lock.

[  440.224863]
               the existing dependency chain (in reverse order) is:
[  440.224864]
               -> #3 (&(&sta->lock)->rlock){+.-.}:
[  440.224869]        _raw_spin_lock_bh+0x34/0x40
[  440.224880]        ieee80211_start_tx_ba_session+0xe4/0x3d0 [mac80211]
[  440.224894]        minstrel_ht_get_rate+0x45c/0x510 [mac80211]
[  440.224906]        rate_control_get_rate+0xc1/0x140 [mac80211]
[  440.224918]        ieee80211_tx_h_rate_ctrl+0x195/0x3c0 [mac80211]
[  440.224930]        ieee80211_xmit_fast+0x26d/0xa50 [mac80211]
[  440.224942]        __ieee80211_subif_start_xmit+0xfc/0x310 [mac80211]
[  440.224954]        ieee80211_subif_start_xmit+0x38/0x390 [mac80211]
[  440.224956]        dev_hard_start_xmit+0xb8/0x300
[  440.224957]        __dev_queue_xmit+0x7d4/0xbb0
[  440.224968]        ip6_finish_output2+0x246/0x860 [ipv6]
[  440.224978]        mld_sendpack+0x1bd/0x360 [ipv6]
[  440.224987]        mld_ifc_timer_expire+0x1a4/0x2f0 [ipv6]
[  440.224989]        call_timer_fn+0x89/0x2a0
[  440.224990]        run_timer_softirq+0x1bd/0x4d0
[  440.224992]        __do_softirq+0xdb/0x47c
[  440.224994]        irq_exit+0xfa/0x100
[  440.224996]        smp_apic_timer_interrupt+0x9a/0x220
[  440.224997]        apic_timer_interrupt+0xf/0x20
[  440.224999]        cpuidle_enter_state+0xc1/0x470
[  440.225000]        do_idle+0x21a/0x260
[  440.225001]        cpu_startup_entry+0x19/0x20
[  440.225004]        start_secondary+0x135/0x170
[  440.225006]        secondary_startup_64+0xa4/0xb0
[  440.225007]
               -> #2 (&(&sta->rate_ctrl_lock)->rlock){+.-.}:
[  440.225009]        _raw_spin_lock_bh+0x34/0x40
[  440.225022]        rate_control_tx_status+0x4f/0xb0 [mac80211]
[  440.225031]        ieee80211_tx_status_ext+0x142/0x1a0 [mac80211]
[  440.225035]        mt76x02_send_tx_status+0x2e4/0x340 [mt76x02_lib]
[  440.225037]        mt76x02_tx_status_data+0x31/0x40 [mt76x02_lib]
[  440.225040]        mt76u_tx_status_data+0x51/0xa0 [mt76_usb]
[  440.225042]        process_one_work+0x237/0x5d0
[  440.225043]        worker_thread+0x3c/0x390
[  440.225045]        kthread+0x11d/0x140
[  440.225046]        ret_from_fork+0x3a/0x50
[  440.225047]
               -> #1 (&(&list->lock)->rlock#8){+.-.}:
[  440.225049]        _raw_spin_lock_bh+0x34/0x40
[  440.225052]        mt76_tx_status_skb_add+0x51/0x100 [mt76]
[  440.225054]        mt76x02u_tx_prepare_skb+0xbd/0x116 [mt76x02_usb]
[  440.225056]        mt76u_tx_queue_skb+0x5f/0x180 [mt76_usb]
[  440.225058]        mt76_tx+0x93/0x190 [mt76]
[  440.225070]        ieee80211_tx_frags+0x148/0x210 [mac80211]
[  440.225081]        __ieee80211_tx+0x75/0x1b0 [mac80211]
[  440.225092]        ieee80211_tx+0xde/0x110 [mac80211]
[  440.225105]        __ieee80211_tx_skb_tid_band+0x72/0x90 [mac80211]
[  440.225122]        ieee80211_send_auth+0x1f3/0x360 [mac80211]
[  440.225141]        ieee80211_auth.cold.40+0x6c/0x100 [mac80211]
[  440.225156]        ieee80211_mgd_auth.cold.50+0x132/0x15f [mac80211]
[  440.225171]        cfg80211_mlme_auth+0x149/0x360 [cfg80211]
[  440.225181]        nl80211_authenticate+0x273/0x2e0 [cfg80211]
[  440.225183]        genl_family_rcv_msg+0x196/0x3a0
[  440.225184]        genl_rcv_msg+0x47/0x8e
[  440.225185]        netlink_rcv_skb+0x3a/0xf0
[  440.225187]        genl_rcv+0x24/0x40
[  440.225188]        netlink_unicast+0x16d/0x210
[  440.225189]        netlink_sendmsg+0x204/0x3b0
[  440.225191]        sock_sendmsg+0x36/0x40
[  440.225193]        ___sys_sendmsg+0x259/0x2b0
[  440.225194]        __sys_sendmsg+0x47/0x80
[  440.225196]        do_syscall_64+0x60/0x1f0
[  440.225197]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  440.225198]
               -> #0 (&(&q->lock)->rlock#2){+.-.}:
[  440.225200]        lock_acquire+0xb9/0x1a0
[  440.225202]        _raw_spin_lock_bh+0x34/0x40
[  440.225204]        mt76_wake_tx_queue+0x4c/0xb0 [mt76]
[  440.225215]        ieee80211_agg_start_txq+0xe8/0x2b0 [mac80211]
[  440.225225]        ieee80211_stop_tx_ba_cb+0xb8/0x1f0 [mac80211]
[  440.225235]        ieee80211_ba_session_work+0x1c1/0x2f0 [mac80211]
[  440.225236]        process_one_work+0x237/0x5d0
[  440.225237]        worker_thread+0x3c/0x390
[  440.225239]        kthread+0x11d/0x140
[  440.225240]        ret_from_fork+0x3a/0x50
[  440.225240]
               other info that might help us debug this:

[  440.225241] Chain exists of:
                 &(&q->lock)->rlock#2 --> &(&sta->rate_ctrl_lock)->rlock --> &(&sta->lock)->rlock

[  440.225243]  Possible unsafe locking scenario:

[  440.225244]        CPU0                    CPU1
[  440.225244]        ----                    ----
[  440.225245]   lock(&(&sta->lock)->rlock);
[  440.225245]                                lock(&(&sta->rate_ctrl_lock)->rlock);
[  440.225246]                                lock(&(&sta->lock)->rlock);
[  440.225247]   lock(&(&q->lock)->rlock#2);
[  440.225248]
                *** DEADLOCK ***

[  440.225249] 5 locks held by kworker/u16:28/2362:
[  440.225250]  #0: 0000000048fcd291 ((wq_completion)phy0){+.+.}, at: process_one_work+0x1b5/0x5d0
[  440.225252]  #1: 00000000f1c6828f ((work_completion)(&sta->ampdu_mlme.work)){+.+.}, at: process_one_work+0x1b5/0x5d0
[  440.225254]  #2: 00000000433d2b2c (&sta->ampdu_mlme.mtx){+.+.}, at: ieee80211_ba_session_work+0x5c/0x2f0 [mac80211]
[  440.225265]  #3: 000000002cfedc59 (&(&sta->lock)->rlock){+.-.}, at: ieee80211_stop_tx_ba_cb+0x32/0x1f0 [mac80211]
[  440.225276]  #4: 000000009d7b9a44 (rcu_read_lock){....}, at: ieee80211_agg_start_txq+0x33/0x2b0 [mac80211]
[  440.225286]
               stack backtrace:
[  440.225288] CPU: 2 PID: 2362 Comm: kworker/u16:28 Not tainted 5.1.0-rc2+ #22
[  440.225289] Hardware name: LENOVO 20KGS23S0P/20KGS23S0P, BIOS N23ET55W (1.30 ) 08/31/2018
[  440.225300] Workqueue: phy0 ieee80211_ba_session_work [mac80211]
[  440.225301] Call Trace:
[  440.225304]  dump_stack+0x85/0xc0
[  440.225306]  print_circular_bug.isra.38.cold.58+0x15c/0x195
[  440.225307]  check_prev_add.constprop.48+0x5f0/0xc00
[  440.225309]  ? check_prev_add.constprop.48+0x39d/0xc00
[  440.225311]  ? __lock_acquire+0x41d/0x1100
[  440.225312]  __lock_acquire+0xd98/0x1100
[  440.225313]  ? __lock_acquire+0x41d/0x1100
[  440.225315]  lock_acquire+0xb9/0x1a0
[  440.225317]  ? mt76_wake_tx_queue+0x4c/0xb0 [mt76]
[  440.225319]  _raw_spin_lock_bh+0x34/0x40
[  440.225321]  ? mt76_wake_tx_queue+0x4c/0xb0 [mt76]
[  440.225323]  mt76_wake_tx_queue+0x4c/0xb0 [mt76]
[  440.225334]  ieee80211_agg_start_txq+0xe8/0x2b0 [mac80211]
[  440.225344]  ieee80211_stop_tx_ba_cb+0xb8/0x1f0 [mac80211]
[  440.225354]  ieee80211_ba_session_work+0x1c1/0x2f0 [mac80211]
[  440.225356]  process_one_work+0x237/0x5d0
[  440.225358]  worker_thread+0x3c/0x390
[  440.225359]  ? wq_calc_node_cpumask+0x70/0x70
[  440.225360]  kthread+0x11d/0x140
[  440.225362]  ? kthread_create_on_node+0x40/0x40
[  440.225363]  ret_from_fork+0x3a/0x50

Cc: stable@vger.kernel.org
Fixes: 88046b2c9f6d ("mt76: add support for reporting tx status with skb")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
5 years agort2x00: do not increment sequence number while re-transmitting
Vijayakumar Durai [Wed, 27 Mar 2019 10:03:17 +0000 (11:03 +0100)] 
rt2x00: do not increment sequence number while re-transmitting

Currently rt2x00 devices retransmit the management frames with
incremented sequence number if hardware is assigning the sequence.

This is HW bug fixed already for non-QOS data frames, but it should
be fixed for management frames except beacon.

Without fix retransmitted frames have wrong SN:

 AlphaNet_e8:fb:36 Vivotek_52:31:51 Authentication, SN=1648, FN=0, Flags=........C Frame is not being retransmitted 1648 1
 AlphaNet_e8:fb:36 Vivotek_52:31:51 Authentication, SN=1649, FN=0, Flags=....R...C Frame is being retransmitted 1649 1
 AlphaNet_e8:fb:36 Vivotek_52:31:51 Authentication, SN=1650, FN=0, Flags=....R...C Frame is being retransmitted 1650 1

With the fix SN stays correctly the same:

 88:6a:e3:e8:f9:a2 8c:f5:a3:88:76:87 Authentication, SN=1450, FN=0, Flags=........C
 88:6a:e3:e8:f9:a2 8c:f5:a3:88:76:87 Authentication, SN=1450, FN=0, Flags=....R...C
 88:6a:e3:e8:f9:a2 8c:f5:a3:88:76:87 Authentication, SN=1450, FN=0, Flags=....R...C

Cc: stable@vger.kernel.org
Signed-off-by: Vijayakumar Durai <vijayakumar.durai1@vivint.com>
[sgruszka: simplify code, change comments and changelog]
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
5 years agomt76: mt7603: send BAR after powersave wakeup
Felix Fietkau [Tue, 26 Mar 2019 08:34:21 +0000 (09:34 +0100)] 
mt76: mt7603: send BAR after powersave wakeup

Now that the sequence number allocation is fixed, we can finally send a BAR
at powersave wakeup time to refresh the receiver side reorder window

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
5 years agomt76: mt7603: fix sequence number assignment
Felix Fietkau [Tue, 26 Mar 2019 08:34:20 +0000 (09:34 +0100)] 
mt76: mt7603: fix sequence number assignment

If the MT_TXD3_SN_VALID flag is not set in the tx descriptor, the hardware
assigns the sequence number. However, the rest of the code assumes that the
sequence number specified in the 802.11 header gets transmitted.
This was causing issues with the aggregation setup, which worked for the
initial one (where the sequence numbers were still close), but not for
further teardown/re-establishing of sessions.

Additionally, the overwrite of the TID sequence number in WTBL2 was resetting
the hardware assigned sequence numbers, causing them to drift further apart.

Fix this by using the software assigned sequence numbers

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
5 years agomt76: mt7603: add missing initialization for dev->ps_lock
Felix Fietkau [Tue, 26 Mar 2019 08:34:19 +0000 (09:34 +0100)] 
mt76: mt7603: add missing initialization for dev->ps_lock

Fixes lockdep complaint and a potential race condition

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
5 years agoudpv6: Check address length before reading address family
Tetsuo Handa [Fri, 12 Apr 2019 10:56:39 +0000 (19:56 +0900)] 
udpv6: Check address length before reading address family

KMSAN will complain if valid address length passed to udpv6_pre_connect()
is shorter than sizeof("struct sockaddr"->sa_family) bytes.

(This patch is bogus if it is guaranteed that udpv6_pre_connect() is
always called after checking "struct sockaddr"->sa_family. In that case,
we want a comment why we don't need to check valid address length here.)

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobpf: Check address length before reading address family
Tetsuo Handa [Fri, 12 Apr 2019 10:55:47 +0000 (19:55 +0900)] 
bpf: Check address length before reading address family

KMSAN will complain if valid address length passed to bpf_bind() is
shorter than sizeof("struct sockaddr"->sa_family) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agollc: Check address length before reading address field
Tetsuo Handa [Fri, 12 Apr 2019 10:55:14 +0000 (19:55 +0900)] 
llc: Check address length before reading address field

KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_llc) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoBluetooth: Check address length before reading address field
Tetsuo Handa [Fri, 12 Apr 2019 10:54:33 +0000 (19:54 +0900)] 
Bluetooth: Check address length before reading address field

KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_sco) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Check address length before reading srx_service field
Tetsuo Handa [Fri, 12 Apr 2019 10:54:05 +0000 (19:54 +0900)] 
rxrpc: Check address length before reading srx_service field

KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_rxrpc) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: netlink: Check address length before reading groups field
Tetsuo Handa [Fri, 12 Apr 2019 10:53:38 +0000 (19:53 +0900)] 
net: netlink: Check address length before reading groups field

KMSAN will complain if valid address length passed to bind() is shorter
than sizeof(struct sockaddr_nl) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: Check address length before reading address family
Tetsuo Handa [Fri, 12 Apr 2019 10:53:10 +0000 (19:53 +0900)] 
sctp: Check address length before reading address family

KMSAN will complain if valid address length passed to connect() is shorter
than sizeof("struct sockaddr"->sa_family) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomISDN: Check address length before reading address family
Tetsuo Handa [Fri, 12 Apr 2019 10:52:36 +0000 (19:52 +0900)] 
mISDN: Check address length before reading address family

KMSAN will complain if valid address length passed to bind() is shorter
than sizeof("struct sockaddr_mISDN"->family) bytes.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/rds: Check address length before reading address family
Tetsuo Handa [Fri, 12 Apr 2019 10:51:52 +0000 (19:51 +0900)] 
net/rds: Check address length before reading address family

syzbot is reporting uninitialized value at rds_connect() [1] and
rds_bind() [2]. This is because syzbot is passing ulen == 0 whereas
these functions expect that it is safe to access sockaddr->family field
in order to determine minimal address length for validation.

[1] https://syzkaller.appspot.com/bug?id=f4e61c010416c1e6f0fa3ffe247561b60a50ad71
[2] https://syzkaller.appspot.com/bug?id=a4bf9e41b7e055c3823fdcd83e8c58ca7270e38f

Reported-by: syzbot <syzbot+0049bebbf3042dbd2e8f@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+915c9f99f3dbc4bd6cd1@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodctcp: more accurate tracking of packets delivery
Eric Dumazet [Thu, 11 Apr 2019 12:55:23 +0000 (05:55 -0700)] 
dctcp: more accurate tracking of packets delivery

After commit e21db6f69a95 ("tcp: track total bytes delivered with ECN CE marks")
core TCP stack does a very good job tracking ECN signals.

The "sender's best estimate of CE information" Yuchung mentioned in his
patch is indeed the best we can do.

DCTCP can use tp->delivered_ce and tp->delivered to not duplicate the logic,
and use the existing best estimate.

This solves some problems, since current DCTCP logic does not deal with losses
and/or GRO or ack aggregation very well.

This also removes a dubious use of inet_csk(sk)->icsk_ack.rcv_mss
(this should have been tp->mss_cache), and a 64 bit divide.

Finally, we can see that the DCTCP logic, calling dctcp_update_alpha() for
every ACK could be done differently, calling it only once per RTT.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Abdul Kabbani <akabbani@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: fib_tests: Fix 'Command line is not complete' errors
David Ahern [Tue, 9 Apr 2019 21:23:10 +0000 (14:23 -0700)] 
selftests: fib_tests: Fix 'Command line is not complete' errors

A couple of tests are verifying a route has been removed. The helper
expects the prefix as the first part of the expected output. When
checking that a route has been deleted the prefix is empty leading
to an invalid ip command:

  $ ip ro ls match
  Command line is not complete. Try option "help"

Fix by moving the comparison of expected output and output to a new
function that is used by both check_route and check_route6. Use the
new helper for the 2 checks on route removal.

Also, remove the reset of 'set -x' in route_setup which overrides the
user managed setting.

Fixes: d69faad76584c ("selftests: fib_tests: Add prefix route tests with metric")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: netrom: Fix error cleanup path of nr_proto_init
YueHaibing [Tue, 9 Apr 2019 11:53:55 +0000 (19:53 +0800)] 
net: netrom: Fix error cleanup path of nr_proto_init

Syzkaller report this:

BUG: unable to handle kernel paging request at fffffbfff830524b
PGD 237fe8067 P4D 237fe8067 PUD 237e64067 PMD 1c9716067 PTE 0
Oops: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 5.0.0+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:__list_add_valid+0x21/0xe0 lib/list_debug.c:23
Code: 8b 0c 24 e9 17 fd ff ff 90 55 48 89 fd 48 8d 7a 08 53 48 89 d3 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 0f 85 8b 00 00 00 48 8b 53 08 48 39 f2 75 35 48 89 f2
RSP: 0018:ffff8881ea2278d0 EFLAGS: 00010282
RAX: dffffc0000000000 RBX: ffffffffc1829250 RCX: 1ffff1103d444ef4
RDX: 1ffffffff830524b RSI: ffffffff85659300 RDI: ffffffffc1829258
RBP: ffffffffc1879250 R08: fffffbfff0acb269 R09: fffffbfff0acb269
R10: ffff8881ea2278f0 R11: fffffbfff0acb268 R12: ffffffffc1829250
R13: dffffc0000000000 R14: 0000000000000008 R15: ffffffffc187c830
FS:  00007fe0361df700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffbfff830524b CR3: 00000001eb39a001 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 __list_add include/linux/list.h:60 [inline]
 list_add include/linux/list.h:79 [inline]
 proto_register+0x444/0x8f0 net/core/sock.c:3375
 nr_proto_init+0x73/0x4b3 [netrom]
 ? 0xffffffffc1628000
 ? 0xffffffffc1628000
 do_one_initcall+0xbc/0x47d init/main.c:887
 do_init_module+0x1b5/0x547 kernel/module.c:3456
 load_module+0x6405/0x8c10 kernel/module.c:3804
 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fe0361dec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 00007fe0361dec70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0361df6bc
R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
Modules linked in: netrom(+) ax25 fcrypt pcbc af_alg arizona_ldo1 v4l2_common videodev media v4l2_dv_timings hdlc ide_cd_mod snd_soc_sigmadsp_regmap snd_soc_sigmadsp intel_spi_platform intel_spi mtd spi_nor snd_usbmidi_lib usbcore lcd ti_ads7950 hi6421_regulator snd_soc_kbl_rt5663_max98927 snd_soc_hdac_hdmi snd_hda_ext_core snd_hda_core snd_soc_rt5663 snd_soc_core snd_pcm_dmaengine snd_compress snd_soc_rl6231 mac80211 rtc_rc5t583 spi_slave_time leds_pwm hid_gt683r hid industrialio_triggered_buffer kfifo_buf industrialio ir_kbd_i2c rc_core led_class_flash dwc_xlgmac snd_ymfpci gameport snd_mpu401_uart snd_rawmidi snd_ac97_codec snd_pcm ac97_bus snd_opl3_lib snd_timer snd_seq_device snd_hwdep snd soundcore iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan
 bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ide_pci_generic piix aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ide_core psmouse input_leds i2c_piix4 serio_raw intel_agp intel_gtt ata_generic agpgart pata_acpi parport_pc rtc_cmos parport floppy sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: rxrpc]
Dumping ftrace buffer:
   (ftrace buffer empty)
CR2: fffffbfff830524b
---[ end trace 039ab24b305c4b19 ]---

If nr_proto_init failed, it may forget to call proto_unregister,
tiggering this issue.This patch rearrange code of nr_proto_init
to avoid such issues.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fec: manage ahb clock in runtime pm
Andy Duan [Tue, 9 Apr 2019 03:40:56 +0000 (03:40 +0000)] 
net: fec: manage ahb clock in runtime pm

Some SOC like i.MX6SX clock have some limits:
- ahb clock should be disabled before ipg.
- ahb and ipg clocks are required for MAC MII bus.
So, move the ahb clock to runtime management together with
ipg clock.

Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bridge: multicast: use rcu to access port list from br_multicast_start_querier
Nikolay Aleksandrov [Thu, 11 Apr 2019 12:08:25 +0000 (15:08 +0300)] 
net: bridge: multicast: use rcu to access port list from br_multicast_start_querier

br_multicast_start_querier() walks over the port list but it can be
called from a timer with only multicast_lock held which doesn't protect
the port list, so use RCU to walk over it.

Fixes: c83b8fab06fc ("bridge: Restart queries when last querier expires")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'thunderx-xdp-mtu'
David S. Miller [Thu, 11 Apr 2019 18:10:34 +0000 (11:10 -0700)] 
Merge branch 'thunderx-xdp-mtu'

Matteo Croce says:

====================
Fix thunderx MTU with XDP

The thunderx driver can't use XDP with all MTU values.
This patches sets the right MTU values, and add a check to avoid setting
a wrong value which will not function.

v3: Fix a copy-paste from two functions, tested on proper hardware:

2: enP2p1s0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 1c:1b:0d:0d:52:a4 brd ff:ff:ff:ff:ff:ff
[  787.019730] nicvf 0002:01:00.1 enP2p1s0v0: Jumbo frames not yet supported with XDP, current MTU 1800.
RTNETLINK answers: Operation not supported
[  800.574568] nicvf 0002:01:00.1 enP2p1s0v0: Link is Up 10000 Mbps Full duplex
[  807.248321] nicvf 0002:01:00.1 enP2p1s0v0: Jumbo frames not yet supported with XDP, current MTU 1500.
RTNETLINK answers: Invalid argument
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: thunderx: don't allow jumbo frames with XDP
Matteo Croce [Thu, 11 Apr 2019 10:26:33 +0000 (12:26 +0200)] 
net: thunderx: don't allow jumbo frames with XDP

The thunderx driver forbids to load an eBPF program if the MTU is too high,
but this can be circumvented by loading the eBPF, then raising the MTU.

Fix this by limiting the MTU if an eBPF program is already loaded.

Fixes: 05c773f52b96e ("net: thunderx: Add basic XDP support")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: thunderx: raise XDP MTU to 1508
Matteo Croce [Thu, 11 Apr 2019 10:26:32 +0000 (12:26 +0200)] 
net: thunderx: raise XDP MTU to 1508

The thunderx driver splits frames bigger than 1530 bytes to multiple
pages, making impossible to run an eBPF program on it.
This leads to a maximum MTU of 1508 if QinQ is in use.

The thunderx driver forbids to load an eBPF program if the MTU is higher
than 1500 bytes. Raise the limit to 1508 so it is possible to use L2
protocols which need some more headroom.

Fixes: 05c773f52b96e ("net: thunderx: Add basic XDP support")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'smc-fixes'
David S. Miller [Thu, 11 Apr 2019 18:04:08 +0000 (11:04 -0700)] 
Merge branch 'smc-fixes'

Ursula Braun says:

====================
net/smc: fixes 2019-04-11

here are some fixes in different areas of the smc code for the net
tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/smc: move unhash before release of clcsock
Ursula Braun [Thu, 11 Apr 2019 09:17:34 +0000 (11:17 +0200)] 
net/smc: move unhash before release of clcsock

Commit <26d92e951fe0>
("net/smc: move unhash as early as possible in smc_release()")
fixes one occurrence in the smc code, but the same pattern exists
in other places. This patch covers the remaining occurrences and
makes sure, the unhash operation is done before the smc->clcsock is
released. This avoids a potential use-after-free in smc_diag_dump().

Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/smc: fix return code from FLUSH command
Karsten Graul [Thu, 11 Apr 2019 09:17:33 +0000 (11:17 +0200)] 
net/smc: fix return code from FLUSH command

The FLUSH command is used to empty the pnet table. No return code is
expected from the command. Commit a9d8b0b1e3d6 added namespace support
for the pnet table and changed the FLUSH command processing to call
smc_pnet_remove_by_pnetid() to remove the pnet entries. This function
returns -ENOENT when no entry was deleted, which is now the return code
of the FLUSH command. As a result the FLUSH command will return an error
when the pnet table is already empty.
Restore the expected behavior and let FLUSH always return 0.

Fixes: a9d8b0b1e3d6 ("net/smc: add pnet table namespace support")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/smc: propagate file from SMC to TCP socket
Ursula Braun [Thu, 11 Apr 2019 09:17:32 +0000 (11:17 +0200)] 
net/smc: propagate file from SMC to TCP socket

fcntl(fd, F_SETOWN, getpid()) selects the recipient of SIGURG signals
that are delivered when out-of-band data arrives on socket fd.
If an SMC socket program makes use of such an fcntl() call, it fails
in case of fallback to TCP-mode. In case of fallback the traffic is
processed with the internal TCP socket. Propagating field "file" from the
SMC socket to the internal TCP socket fixes the issue.

Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/smc: fix a NULL pointer dereference
Kangjie Lu [Thu, 11 Apr 2019 09:17:31 +0000 (11:17 +0200)] 
net/smc: fix a NULL pointer dereference

In case alloc_ordered_workqueue fails, the fix returns NULL
to avoid NULL pointer dereference.

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/smc: wait for pending work before clcsock release_sock
Karsten Graul [Thu, 11 Apr 2019 09:17:30 +0000 (11:17 +0200)] 
net/smc: wait for pending work before clcsock release_sock

When the clcsock is already released using sock_release() and a pending
smc_listen_work accesses the clcsock than that will fail. Solve this
by canceling and waiting for the work to complete first. Because the
work holds the sock_lock it must make sure that the lock is not hold
before the new helper smc_clcsock_release() is invoked. And before the
smc_listen_work starts working check if the parent listen socket is
still valid, otherwise stop the work early.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv
Lorenzo Bianconi [Tue, 9 Apr 2019 09:47:20 +0000 (11:47 +0200)] 
net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv

gue tunnels run iptunnel_pull_offloads on received skbs. This can
determine a possible use-after-free accessing guehdr pointer since
the packet will be 'uncloned' running pskb_expand_head if it is a
cloned gso skb (e.g if the packet has been sent though a veth device)

Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: missing entries in name table of publications
Hoang Le [Tue, 9 Apr 2019 07:59:24 +0000 (14:59 +0700)] 
tipc: missing entries in name table of publications

When binding multiple services with specific type 1Ki, 2Ki..,
this leads to some entries in the name table of publications
missing when listed out via 'tipc name show'.

The problem is at identify zero last_type conditional provided
via netlink. The first is initial 'type' when starting name table
dummping. The second is continuously with zero type (node state
service type). Then, lookup function failure to finding node state
service type in next iteration.

To solve this, adding more conditional to marked as dirty type and
lookup correct service type for the next iteration instead of select
the first service as initial 'type' zero.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovhost: reject zero size iova range
Jason Wang [Tue, 9 Apr 2019 04:10:25 +0000 (12:10 +0800)] 
vhost: reject zero size iova range

We used to accept zero size iova range which will lead a infinite loop
in translate_desc(). Fixing this by failing the request in this case.

Reported-by: syzbot+d21e6e297322a900c128@syzkaller.appspotmail.com
Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: prevent bad memory access in tls_is_sk_tx_device_offloaded()
Jakub Kicinski [Tue, 9 Apr 2019 00:59:50 +0000 (17:59 -0700)] 
net/tls: prevent bad memory access in tls_is_sk_tx_device_offloaded()

Unlike '&&' operator, the '&' does not have short-circuit
evaluation semantics.  IOW both sides of the operator always
get evaluated.  Fix the wrong operator in
tls_is_sk_tx_device_offloaded(), which would lead to
out-of-bounds access for for non-full sockets.

Fixes: 4799ac81e52a ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofailover: allow name change on IFF_UP slave interfaces
Si-Wei Liu [Mon, 8 Apr 2019 23:45:27 +0000 (19:45 -0400)] 
failover: allow name change on IFF_UP slave interfaces

When a netdev appears through hot plug then gets enslaved by a failover
master that is already up and running, the slave will be opened
right away after getting enslaved. Today there's a race that userspace
(udev) may fail to rename the slave if the kernel (net_failover)
opens the slave earlier than when the userspace rename happens.
Unlike bond or team, the primary slave of failover can't be renamed by
userspace ahead of time, since the kernel initiated auto-enslavement is
unable to, or rather, is never meant to be synchronized with the rename
request from userspace.

As the failover slave interfaces are not designed to be operated
directly by userspace apps: IP configuration, filter rules with
regard to network traffic passing and etc., should all be done on master
interface. In general, userspace apps only care about the
name of master interface, while slave names are less important as long
as admin users can see reliable names that may carry
other information describing the netdev. For e.g., they can infer that
"ens3nsby" is a standby slave of "ens3", while for a
name like "eth0" they can't tell which master it belongs to.

Historically the name of IFF_UP interface can't be changed because
there might be admin script or management software that is already
relying on such behavior and assumes that the slave name can't be
changed once UP. But failover is special: with the in-kernel
auto-enslavement mechanism, the userspace expectation for device
enumeration and bring-up order is already broken. Previously initramfs
and various userspace config tools were modified to bypass failover
slaves because of auto-enslavement and duplicate MAC address. Similarly,
in case that users care about seeing reliable slave name, the new type
of failover slaves needs to be taken care of specifically in userspace
anyway.

It's less risky to lift up the rename restriction on failover slave
which is already UP. Although it's possible this change may potentially
break userspace component (most likely configuration scripts or
management software) that assumes slave name can't be changed while
UP, it's relatively a limited and controllable set among all userspace
components, which can be fixed specifically to listen for the rename
events on failover slaves. Userspace component interacting with slaves
is expected to be changed to operate on failover master interface
instead, as the failover slave is dynamic in nature which may come and
go at any point.  The goal is to make the role of failover slaves less
relevant, and userspace components should only deal with failover master
in the long run.

Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoteam: set slave to promisc if team is already in promisc mode
Hangbin Liu [Mon, 8 Apr 2019 08:45:17 +0000 (16:45 +0800)] 
team: set slave to promisc if team is already in promisc mode

After adding a team interface to bridge, the team interface will enter
promisc mode. Then if we add a new slave to team0, the slave will keep
promisc off. Fix it by setting slave to promisc on if team master is
already in promisc mode, also do the same for allmulti.

v2: add promisc and allmulti checking when delete ports

Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: fix build without CONFIG_TLS_DEVICE
Jakub Kicinski [Wed, 10 Apr 2019 23:23:39 +0000 (16:23 -0700)] 
net/tls: fix build without CONFIG_TLS_DEVICE

buildbot noticed that TLS_HW is not defined if CONFIG_TLS_DEVICE=n.
Wrap the cleanup branch into an ifdef, tls_device_free_resources_tx()
wouldn't be compiled either in this case.

Fixes: 35b71a34ada6 ("net/tls: don't leak partially sent record in device mode")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'tls-leaks'
David S. Miller [Wed, 10 Apr 2019 20:07:02 +0000 (13:07 -0700)] 
Merge branch 'tls-leaks'

Jakub Kicinski says:

====================
net: tls: fix memory leaks and freeing skbs

This series fixes two memory issues and a stack overflow.
First two patches are fairly simple leaks.  Third patch
partially reverts an optimization made to the strparser
which causes creation of skb->frag_list->skb->frag_list...
chains of 100s of skbs, leading to recursive kfree_skb()
filling up the kernel stack.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: strparser: partially revert "strparser: Call skb_unclone conditionally"
Jakub Kicinski [Wed, 10 Apr 2019 18:04:32 +0000 (11:04 -0700)] 
net: strparser: partially revert "strparser: Call skb_unclone conditionally"

This reverts the first part of commit 4e485d06bb8c ("strparser: Call
skb_unclone conditionally").  To build a message with multiple
fragments we need our own root of frag_list.  We can't simply
use the frag_list of orig_skb, because it will lead to linking
all orig_skbs together creating very long frag chains, and causing
stack overflow on kfree_skb() (which is called recursively on
the frag_lists).

BUG: stack guard page was hit at 00000000d40fad41 (stack is 0000000029dde9f4..000000008cce03d5)
kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP
RIP: 0010:free_one_page+0x2b/0x490

Call Trace:
  __free_pages_ok+0x143/0x2c0
  skb_release_data+0x8e/0x140
  ? skb_release_data+0xad/0x140
  kfree_skb+0x32/0xb0

  [...]

  skb_release_data+0xad/0x140
  ? skb_release_data+0xad/0x140
  kfree_skb+0x32/0xb0
  skb_release_data+0xad/0x140
  ? skb_release_data+0xad/0x140
  kfree_skb+0x32/0xb0
  skb_release_data+0xad/0x140
  ? skb_release_data+0xad/0x140
  kfree_skb+0x32/0xb0
  skb_release_data+0xad/0x140
  ? skb_release_data+0xad/0x140
  kfree_skb+0x32/0xb0
  skb_release_data+0xad/0x140
  __kfree_skb+0xe/0x20
  tcp_disconnect+0xd6/0x4d0
  tcp_close+0xf4/0x430
  ? tcp_check_oom+0xf0/0xf0
  tls_sk_proto_close+0xe4/0x1e0 [tls]
  inet_release+0x36/0x60
  __sock_release+0x37/0xa0
  sock_close+0x11/0x20
  __fput+0xa2/0x1d0
  task_work_run+0x89/0xb0
  exit_to_usermode_loop+0x9a/0xa0
  do_syscall_64+0xc0/0xf0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Let's leave the second unclone conditional, as I'm not entirely
sure what is its purpose :)

Fixes: 4e485d06bb8c ("strparser: Call skb_unclone conditionally")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: don't leak partially sent record in device mode
Jakub Kicinski [Wed, 10 Apr 2019 18:04:31 +0000 (11:04 -0700)] 
net/tls: don't leak partially sent record in device mode

David reports that tls triggers warnings related to
sk->sk_forward_alloc not being zero at destruction time:

WARNING: CPU: 5 PID: 6831 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110
WARNING: CPU: 5 PID: 6831 at net/ipv4/af_inet.c:160 inet_sock_destruct+0x15b/0x170

When sender fills up the write buffer and dies from
SIGPIPE.  This is due to the device implementation
not cleaning up the partially_sent_record.

This is because commit a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
moved the partial record cleanup to the SW-only path.

Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: fix the IV leaks
Jakub Kicinski [Wed, 10 Apr 2019 18:04:30 +0000 (11:04 -0700)] 
net/tls: fix the IV leaks

Commit f66de3ee2c16 ("net/tls: Split conf to rx + tx") made
freeing of IV and record sequence number conditional to SW
path only, but commit e8f69799810c ("net/tls: Add generic NIC
offload infrastructure") also allocates that state for the
device offload configuration.  Remember to free it.

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'ibmvnic-features'
David S. Miller [Wed, 10 Apr 2019 19:29:37 +0000 (12:29 -0700)] 
Merge branch 'ibmvnic-features'

Thomas Falcon says:

====================
ibmvnic: Fix netdev features settings on reset

In its current state, a driver reset clobbers any feature settings
a user may have toggled and will disable GRO as it is not explicitly
enabled in the driver. This patch set enables GRO and tries to retain
user settings after a reset. If the underlying carrier changes, however,
the driver will disable features unsupported by the new carrier.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoibmvnic: Fix netdev feature clobbering during a reset
Thomas Falcon [Wed, 10 Apr 2019 16:07:00 +0000 (11:07 -0500)] 
ibmvnic: Fix netdev feature clobbering during a reset

While determining offload capabilities of backing hardware during
a device reset, the driver is clobbering current feature settings.
Update hw_features on reset instead of features unless a feature
is enabled that is no longer supported on the current backing device.
Also enable features that were not supported prior to the reset but
were previously enabled or requested by the user.

This can occur if the reset is the result of a carrier change, such
as a device failover or partition migration.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoibmvnic: Enable GRO
Thomas Falcon [Wed, 10 Apr 2019 16:06:59 +0000 (11:06 -0500)] 
ibmvnic: Enable GRO

Enable Generic Receive Offload in the ibmvnic driver.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlxsw-Various-fixes'
David S. Miller [Wed, 10 Apr 2019 18:57:08 +0000 (11:57 -0700)] 
Merge branch 'mlxsw-Various-fixes'

Ido Schimmel says:

====================
mlxsw: Various fixes

This patchset contains various small fixes for mlxsw.

Patch #1 fixes a warning generated by switchdev core when the driver
fails to insert an MDB entry in the commit phase.

Patches #2-#4 fix a warning in check_flush_dependency() that can be
triggered when a work item in a WQ_MEM_RECLAIM workqueue tries to flush
a non-WQ_MEM_RECLAIM workqueue.

It seems that the semantics of the WQ_MEM_RECLAIM flag are not very
clear [1] and that various patches have been sent to remove it from
various workqueues throughout the kernel [2][3][4] in order to silence
the warning.

These patches do the same for the workqueues created by mlxsw that
probably should not have been created with this flag in the first place.

Patch #5 fixes a regression where an IP address cannot be assigned to a
VRF upper due to erroneous MAC validation check. Patch #6 adds a test
case.

Patch #7 adjusts Spectrum-2 shared buffer configuration to be compatible
with Spectrum-1. The problem and fix are described in detail in the
commit message.

Please consider patches #1-#5 for 5.0.y. I verified they apply cleanly.

[1] https://patchwork.kernel.org/patch/10791315/
[2] Commit ce162bfbc0b6 ("mac80211_hwsim: don't use WQ_MEM_RECLAIM")
[3] Commit 39baf10310e6 ("IB/core: Fix use workqueue without WQ_MEM_RECLAIM")
[4] Commit 75215e5bb22c ("iwcm: Don't allocate iwcm workqueue with WQ_MEM_RECLAIM")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_buffers: Add a multicast pool for Spectrum-2
Ido Schimmel [Wed, 10 Apr 2019 06:58:17 +0000 (06:58 +0000)] 
mlxsw: spectrum_buffers: Add a multicast pool for Spectrum-2

In Spectrum-1, when a multicast packet is admitted to the shared buffer
it increases the quotas of all the ports and {port, TC} to which it is
forwarded to.

The above means that multicast packets are accounted multiple times in
the shared buffer and can therefore cause the associated shared buffer
pool to fill up very quickly.

To work around this issue, commit e83c045e53d7 ("mlxsw:
spectrum_buffers: Configure MC pool") added a dedicated multicast pool
in which multicast packets are accounted.

The issue is not present in Spectrum-2, but in order to be backward
compatible with Spectrum-1, its default behavior is to allow a multicast
packet to increase multiple egress quotas instead of one.

Until the new (non-backward compatible) mode is supported, configure a
dedicated multicast pool as in Spectrum-1.

Fixes: fe099bf682ab ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: mlxsw: Test VRF MAC vetoing
Ido Schimmel [Wed, 10 Apr 2019 06:58:16 +0000 (06:58 +0000)] 
selftests: mlxsw: Test VRF MAC vetoing

Test that it is possible to set an IP address on a VRF and that it is
not vetoed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Do not check VRF MAC address
Ido Schimmel [Wed, 10 Apr 2019 06:58:15 +0000 (06:58 +0000)] 
mlxsw: spectrum_router: Do not check VRF MAC address

Commit 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC
addresses") enabled the driver to veto router interface (RIF) MAC
addresses that it cannot support.

This check should only be performed for interfaces for which the driver
actually configures a RIF. A VRF upper is not one of them, so ignore it.

Without this patch it is not possible to set an IP address on the VRF
device and use it as a loopback.

Fixes: 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw workqueue
Ido Schimmel [Wed, 10 Apr 2019 06:58:15 +0000 (06:58 +0000)] 
mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw workqueue

The workqueue is used to periodically update the networking stack about
activity / statistics of various objects such as neighbours and TC
actions.

It should not be called as part of memory reclaim path, so remove the
WQ_MEM_RECLAIM flag.

Fixes: 3d5479e92087 ("mlxsw: core: Remove deprecated create_workqueue")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw ordered workqueue
Ido Schimmel [Wed, 10 Apr 2019 06:58:14 +0000 (06:58 +0000)] 
mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw ordered workqueue

The ordered workqueue is used to offload various objects such as routes
and neighbours in the order they are notified.

It should not be called as part of memory reclaim path, so remove the
WQ_MEM_RECLAIM flag. This can also result in a warning [1], if a worker
tries to flush a non-WQ_MEM_RECLAIM workqueue.

[1]
[97703.542861] workqueue: WQ_MEM_RECLAIM mlxsw_core_ordered:mlxsw_sp_router_fib6_event_work [mlxsw_spectrum] is flushing !WQ_MEM_RECLAIM events:rht_deferred_worker
[97703.542884] WARNING: CPU: 1 PID: 32492 at kernel/workqueue.c:2605 check_flush_dependency+0xb5/0x130
...
[97703.542988] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
[97703.543049] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib6_event_work [mlxsw_spectrum]
[97703.543061] RIP: 0010:check_flush_dependency+0xb5/0x130
...
[97703.543071] RSP: 0018:ffffb3f08137bc00 EFLAGS: 00010086
[97703.543076] RAX: 0000000000000000 RBX: ffff96e07740ae00 RCX: 0000000000000000
[97703.543080] RDX: 0000000000000094 RSI: ffffffff82dc1934 RDI: 0000000000000046
[97703.543084] RBP: ffffb3f08137bc20 R08: ffffffff82dc18a0 R09: 00000000000225c0
[97703.543087] R10: 0000000000000000 R11: 0000000000007eec R12: ffffffff816e4ee0
[97703.543091] R13: ffff96e06f6a5c00 R14: ffff96e077ba7700 R15: ffffffff812ab0c0
[97703.543097] FS: 0000000000000000(0000) GS:ffff96e077a80000(0000) knlGS:0000000000000000
[97703.543101] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[97703.543104] CR2: 00007f8cd135b280 CR3: 00000001e860e003 CR4: 00000000003606e0
[97703.543109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[97703.543112] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[97703.543115] Call Trace:
[97703.543129] __flush_work+0xbd/0x1e0
[97703.543137] ? __cancel_work_timer+0x136/0x1b0
[97703.543145] ? pwq_dec_nr_in_flight+0x49/0xa0
[97703.543154] __cancel_work_timer+0x136/0x1b0
[97703.543175] ? mlxsw_reg_trans_bulk_wait+0x145/0x400 [mlxsw_core]
[97703.543184] cancel_work_sync+0x10/0x20
[97703.543191] rhashtable_free_and_destroy+0x23/0x140
[97703.543198] rhashtable_destroy+0xd/0x10
[97703.543254] mlxsw_sp_fib_destroy+0xb1/0xf0 [mlxsw_spectrum]
[97703.543310] mlxsw_sp_vr_put+0xa8/0xc0 [mlxsw_spectrum]
[97703.543364] mlxsw_sp_fib_node_put+0xbf/0x140 [mlxsw_spectrum]
[97703.543418] ? mlxsw_sp_fib6_entry_destroy+0xe8/0x110 [mlxsw_spectrum]
[97703.543475] mlxsw_sp_router_fib6_event_work+0x6cd/0x7f0 [mlxsw_spectrum]
[97703.543484] process_one_work+0x1fd/0x400
[97703.543493] worker_thread+0x34/0x410
[97703.543500] kthread+0x121/0x140
[97703.543507] ? process_one_work+0x400/0x400
[97703.543512] ? kthread_park+0x90/0x90
[97703.543523] ret_from_fork+0x35/0x40

Fixes: a3832b31898f ("mlxsw: core: Create an ordered workqueue for FIB offload")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Semion Lisyansky <semionl@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: core: Do not use WQ_MEM_RECLAIM for EMAD workqueue
Ido Schimmel [Wed, 10 Apr 2019 06:58:13 +0000 (06:58 +0000)] 
mlxsw: core: Do not use WQ_MEM_RECLAIM for EMAD workqueue

The EMAD workqueue is used to handle retransmission of EMAD packets that
contain configuration data for the device's firmware.

Given the workers need to allocate these packets and that the code is
not called as part of memory reclaim path, remove the WQ_MEM_RECLAIM
flag.

Fixes: d965465b60ba ("mlxsw: core: Fix possible deadlock")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_switchdev: Add MDB entries in prepare phase
Ido Schimmel [Wed, 10 Apr 2019 06:58:12 +0000 (06:58 +0000)] 
mlxsw: spectrum_switchdev: Add MDB entries in prepare phase

The driver cannot guarantee in the prepare phase that it will be able to
write an MDB entry to the device. In case the driver returned success
during the prepare phase, but then failed to add the entry in the commit
phase, a WARNING [1] will be generated by the switchdev core.

Fix this by doing the work in the prepare phase instead.

[1]
[  358.544486] swp12s0: Commit of object (id=2) failed.
[  358.550061] WARNING: CPU: 0 PID: 30 at net/switchdev/switchdev.c:281 switchdev_port_obj_add_now+0x9b/0xe0
[  358.560754] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 5.0.0-custom-13382-gf2449babf221 #1350
[  358.570472] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  358.580582] Workqueue: events switchdev_deferred_process_work
[  358.587001] RIP: 0010:switchdev_port_obj_add_now+0x9b/0xe0
...
[  358.614109] RSP: 0018:ffffa6b900d6fe18 EFLAGS: 00010286
[  358.619943] RAX: 0000000000000000 RBX: ffff8b00797ff000 RCX: 0000000000000000
[  358.627912] RDX: ffff8b00b7a1d4c0 RSI: ffff8b00b7a152e8 RDI: ffff8b00b7a152e8
[  358.635881] RBP: ffff8b005c3f5bc0 R08: 000000000000022b R09: 0000000000000000
[  358.643850] R10: 0000000000000000 R11: ffffa6b900d6fcc8 R12: 0000000000000000
[  358.651819] R13: dead000000000100 R14: ffff8b00b65a23c0 R15: 0ffff8b00b7a2200
[  358.659790] FS:  0000000000000000(0000) GS:ffff8b00b7a00000(0000) knlGS:0000000000000000
[  358.668820] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  358.675228] CR2: 00007f00aad90de0 CR3: 00000001ca80d000 CR4: 00000000001006f0
[  358.683188] Call Trace:
[  358.685918]  switchdev_port_obj_add_deferred+0x13/0x60
[  358.691655]  switchdev_deferred_process+0x6b/0xf0
[  358.696907]  switchdev_deferred_process_work+0xa/0x10
[  358.702548]  process_one_work+0x1f5/0x3f0
[  358.707022]  worker_thread+0x28/0x3c0
[  358.711099]  ? process_one_work+0x3f0/0x3f0
[  358.715768]  kthread+0x10d/0x130
[  358.719369]  ? __kthread_create_on_node+0x180/0x180
[  358.724815]  ret_from_fork+0x35/0x40

Fixes: 3a49b4fde2a1 ("mlxsw: Adding layer 2 multicast support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Tested-by: Alex Kushnarov <alexanderk@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: Switch to Toeplitz RSS hash by default
Konstantin Khlebnikov [Fri, 31 Aug 2018 11:29:16 +0000 (14:29 +0300)] 
net/mlx5e: Switch to Toeplitz RSS hash by default

Although XOR hash function can perform very well on some special use
cases, to align with all drivers, mlx5 driver should use Toeplitz hash
by default.
Toeplitz is more stable for the general use case and it is more standard
and reliable.

On top of that, since XOR (MLX5_RX_HASH_FN_INVERTED_XOR8) gives only a
repeated 8 bits pattern. When used for udp tunneling RSS source port
manipulation it results in fixed source port, which will cause bad RSS
spread.

Fixes: 2be6967cdbc9 ("net/mlx5e: Support ETH_RSS_HASH_XOR")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoRevert "net/mlx5e: Enable reporting checksum unnecessary also for L3 packets"
Or Gerlitz [Sun, 31 Mar 2019 12:53:03 +0000 (12:53 +0000)] 
Revert "net/mlx5e: Enable reporting checksum unnecessary also for L3 packets"

This reverts commit b820e6fb0978f9c2ac438c199d2bb2f35950e9c9.

Prior the commit we are reverting, checksum unnecessary was only set when
both the L3 OK and L4 OK bits are set on the CQE. This caused packets
of IP protocols such as SCTP which are not dealt by the current HW L4
parser (hence the L4 OK bit is not set, but the L4 header type none bit
is set) to go through the checksum none code, where currently we wrongly
report checksum unnecessary for them, a regression. Fix this by a revert.

Note that on our usual track we report checksum complete, so the revert
isn't expected to have any notable performance impact. Also, when we are
not on the checksum complete track, the L4 protocols for which we report
checksum none are not high performance ones, we will still report
checksum unnecessary for UDP/TCP.

Fixes: b820e6fb0978 ("net/mlx5e: Enable reporting checksum unnecessary also for L3 packets")
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Avi Urman <aviu@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Protect against non-uplink representor for encap
Dmytro Linkin [Fri, 29 Mar 2019 12:50:37 +0000 (12:50 +0000)] 
net/mlx5e: Protect against non-uplink representor for encap

TC encap offload is supported only for the physical uplink
representor. Fail for non uplink representor.

Fixes: 3e621b19b0bb ("net/mlx5e: Support TC encapsulation offloads with upper devices")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Rx, Check ip headers sanity
Saeed Mahameed [Tue, 26 Mar 2019 05:10:59 +0000 (22:10 -0700)] 
net/mlx5e: Rx, Check ip headers sanity

In the two places is_last_ethertype_ip is being called, the caller will
be looking inside the ip header, to be safe, add ip{4,6} header sanity
check. And return true only on valid ip headers, i.e: the whole header
is contained in the linear part of the skb.

Note: Such situation is very rare and hard to reproduce, since mlx5e
allocates a large enough headroom to contain the largest header one can
imagine.

Fixes: fe1dc069990c ("net/mlx5e: don't set CHECKSUM_COMPLETE on SCTP packets")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Rx, Fixup skb checksum for packets with tail padding
Saeed Mahameed [Tue, 12 Mar 2019 07:24:52 +0000 (00:24 -0700)] 
net/mlx5e: Rx, Fixup skb checksum for packets with tail padding

When an ethernet frame with ip payload is padded, the padding octets are
not covered by the hardware checksum.

Prior to the cited commit, skb checksum was forced to be CHECKSUM_NONE
when padding is detected. After it, the kernel will try to trim the
padding bytes and subtract their checksum from skb->csum.

In this patch we fixup skb->csum for any ip packet with tail padding of
any size, if any padding found.
FCS case is just one special case of this general purpose patch, hence,
it is removed.

Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: XDP, Avoid checksum complete when XDP prog is loaded
Saeed Mahameed [Fri, 22 Mar 2019 02:07:20 +0000 (19:07 -0700)] 
net/mlx5e: XDP, Avoid checksum complete when XDP prog is loaded

XDP programs might change packets data contents which will make the
reported skb checksum (checksum complete) invalid.

When XDP programs are loaded/unloaded set/clear rx RQs
MLX5E_RQ_STATE_NO_CSUM_COMPLETE flag.

Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Use fail-safe channels reopen in tx reporter recover
Eran Ben Elisha [Thu, 28 Mar 2019 12:26:47 +0000 (14:26 +0200)] 
net/mlx5e: Use fail-safe channels reopen in tx reporter recover

When requested to recover from error, the tx reporter might open new
channels and close the existing ones. Use safe channels switch flow in
order to guarantee opened channels at the end of the recover flow.
For this purpose, define mlx5e_safe_reopen_channels function and use it
within those flows.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Skip un-needed tx recover if interface state is down
Eran Ben Elisha [Thu, 28 Mar 2019 08:00:35 +0000 (10:00 +0200)] 
net/mlx5e: Skip un-needed tx recover if interface state is down

Skip recover operation if interface is in down state as TX objects are
not open. This fixes a bug were the recover flow re-opened TX objects
which were not opened before, leading to a possible memory leak at
driver unload.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: FPGA, tls, idr remove on flow delete
Saeed Mahameed [Tue, 19 Mar 2019 08:05:41 +0000 (01:05 -0700)] 
net/mlx5: FPGA, tls, idr remove on flow delete

Flow is kfreed on mlx5_fpga_tls_del_flow but kept in the idr data
structure, this is risky and can cause use-after-free, since the
idr_remove is delayed until tls_send_teardown_cmd completion.

Instead of delaying idr_remove, in this patch we do it on
mlx5_fpga_tls_del_flow, before actually kfree(flow).

Added synchronize_rcu before kfree(flow)

Fixes: ab412e1dd7db ("net/mlx5: Accel, add TLS rx offload routines")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: FPGA, tls, hold rcu read lock a bit longer
Saeed Mahameed [Wed, 20 Mar 2019 05:09:05 +0000 (22:09 -0700)] 
net/mlx5: FPGA, tls, hold rcu read lock a bit longer

To avoid use-after-free, hold the rcu read lock until we are done copying
flow data into the command buffer.

Fixes: ab412e1dd7db ("net/mlx5: Accel, add TLS rx offload routines")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoMerge tag 'mac80211-for-davem-2019-04-09' of git://git.kernel.org/pub/scm/linux/kerne...
David S. Miller [Tue, 9 Apr 2019 17:57:10 +0000 (10:57 -0700)] 
Merge tag 'mac80211-for-davem-2019-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Various fixes:
 * iTXQ fixes from Felix
 * tracing fix - increase message length
 * fix SW_CRYPTO_CONTROL enforcement
 * WMM rule handling for regdomain intersection
 * max_interfaces in hwsim - reported by syzbot
 * clear private data in some more commands
 * a clang compiler warning fix

I added a patch with two new (unused) macros for
rate-limited printing to simplify getting the users
into the tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomac80211: Honor SW_CRYPTO_CONTROL for unicast keys in AP VLAN mode
Alexander Wetzel [Sat, 9 Feb 2019 14:01:38 +0000 (15:01 +0100)] 
mac80211: Honor SW_CRYPTO_CONTROL for unicast keys in AP VLAN mode

Restore SW_CRYPTO_CONTROL operation on AP_VLAN interfaces for unicast
keys, the original override was intended to be done for group keys as
those are treated specially by mac80211 and would always have been
rejected.

Now the situation is that AP_VLAN support must be enabled by the driver
if it can support it (meaning it can support software crypto GTK TX).

Thus, also simplify the code - if we get here with AP_VLAN and non-
pairwise key, software crypto must be used (driver doesn't know about
the interface) and can be used (driver must've advertised AP_VLAN if
it also uses SW_CRYPTO_CONTROL).

Fixes: db3bdcb9c3ff ("mac80211: allow AP_VLAN operation on crypto controlled devices")
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Tue, 9 Apr 2019 03:10:46 +0000 (17:10 -1000)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Off by one and bounds checking fixes in NFC, from Dan Carpenter.

 2) There have been many weird regressions in r8169 since we turned ASPM
    support on, some are still not understood nor completely resolved.
    Let's turn this back off for now. From Heiner Kallweit.

 3) Signess fixes for ethtool speed value handling, from Michael
    Zhivich.

 4) Handle timestamps properly in macb driver, from Paul Thomas.

 5) Two erspan fixes, it's the usual "skb ->data potentially reallocated
    and we're holding a stale protocol header pointer". From Lorenzo
    Bianconi.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  bnxt_en: Reset device on RX buffer errors.
  bnxt_en: Improve RX consumer index validity check.
  net: macb driver, check for SKBTX_HW_TSTAMP
  qlogic: qlcnic: fix use of SPEED_UNKNOWN ethtool constant
  broadcom: tg3: fix use of SPEED_UNKNOWN ethtool constant
  ethtool: avoid signed-unsigned comparison in ethtool_validate_speed()
  net: ip6_gre: fix possible use-after-free in ip6erspan_rcv
  net: ip_gre: fix possible use-after-free in erspan_rcv
  r8169: disable ASPM again
  MAINTAINERS: ieee802154: update documentation file pattern
  net: vrf: Fix ping failed when vrf mtu is set to 0
  selftests: add a tc matchall test case
  nfc: nci: Potential off by one in ->pipes[] array
  NFC: nci: Add some bounds checking in nci_hci_cmd_received()

5 years agoMerge branch 'fixes-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Tue, 9 Apr 2019 03:06:43 +0000 (17:06 -1000)] 
Merge branch 'fixes-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull TPM fixes from James Morris:
 "From Jarkko: These are critical fixes for v5.1. Contains also couple
  of new selftests for v5.1 features (partial reads in /dev/tpm0)"

* 'fixes-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  selftests/tpm2: Open tpm dev in unbuffered mode
  selftests/tpm2: Extend tests to cover partial reads
  KEYS: trusted: fix -Wvarags warning
  tpm: Fix the type of the return value in calc_tpm2_event_size()
  KEYS: trusted: allow trusted.ko to initialize w/o a TPM
  tpm: fix an invalid condition in tpm_common_poll
  tpm: turn on TPM on suspend for TPM 1.x