]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
4 years agoigc: reinit_locked() should be called with rtnl_lock
Sasha Neftin [Tue, 20 Oct 2020 13:34:00 +0000 (16:34 +0300)] 
igc: reinit_locked() should be called with rtnl_lock

This commit applies to the igc_reset_task the same changes that
were applied to the igb driver in commit 024a8168b749 ("igb:
reinit_locked() should be called with rtnl_lock")
and fix possible race in reset subtask.

Fixes: 0507ef8a0372 ("igc: Add transmit and receive fastpath and interrupt handlers")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agonet: dsa: bcm_sf2: Qualify phydev->dev_flags based on port
Florian Fainelli [Wed, 10 Mar 2021 22:17:58 +0000 (14:17 -0800)] 
net: dsa: bcm_sf2: Qualify phydev->dev_flags based on port

Similar to commit 92696286f3bb37ba50e4bd8d1beb24afb759a799 ("net:
bcmgenet: Set phydev->dev_flags only for internal PHYs") we need to
qualify the phydev->dev_flags based on whether the port is connected to
an internal or external PHY otherwise we risk having a flags collision
with a completely different interpretation depending on the driver.

Fixes: aa9aef77c761 ("net: dsa: bcm_sf2: communicate integrated PHY revision to PHY driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: b53: VLAN filtering is global to all users
Florian Fainelli [Wed, 10 Mar 2021 18:46:10 +0000 (10:46 -0800)] 
net: dsa: b53: VLAN filtering is global to all users

The bcm_sf2 driver uses the b53 driver as a library but does not make
usre of the b53_setup() function, this made it fail to inherit the
vlan_filtering_is_global attribute. Fix this by moving the assignment to
b53_switch_alloc() which is used by bcm_sf2.

Fixes: 7228b23e68f7 ("net: dsa: b53: Let DSA handle mismatched VLAN filtering settings")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sched: validate stab values
Eric Dumazet [Wed, 10 Mar 2021 16:26:41 +0000 (08:26 -0800)] 
net: sched: validate stab values

iproute2 package is well behaved, but malicious user space can
provide illegal shift values and trigger UBSAN reports.

Add stab parameter to red_check_params() to validate user input.

syzbot reported:

UBSAN: shift-out-of-bounds in ./include/net/red.h:312:18
shift exponent 111 is too large for 64-bit type 'long unsigned int'
CPU: 1 PID: 14662 Comm: syz-executor.3 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
 red_calc_qavg_from_idle_time include/net/red.h:312 [inline]
 red_calc_qavg include/net/red.h:353 [inline]
 choke_enqueue.cold+0x18/0x3dd net/sched/sch_choke.c:221
 __dev_xmit_skb net/core/dev.c:3837 [inline]
 __dev_queue_xmit+0x1943/0x2e00 net/core/dev.c:4150
 neigh_hh_output include/net/neighbour.h:499 [inline]
 neigh_output include/net/neighbour.h:508 [inline]
 ip6_finish_output2+0x911/0x1700 net/ipv6/ip6_output.c:117
 __ip6_finish_output net/ipv6/ip6_output.c:182 [inline]
 __ip6_finish_output+0x4c1/0xe10 net/ipv6/ip6_output.c:161
 ip6_finish_output+0x35/0x200 net/ipv6/ip6_output.c:192
 NF_HOOK_COND include/linux/netfilter.h:290 [inline]
 ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:215
 dst_output include/net/dst.h:448 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 NF_HOOK include/linux/netfilter.h:295 [inline]
 ip6_xmit+0x127e/0x1eb0 net/ipv6/ip6_output.c:320
 inet6_csk_xmit+0x358/0x630 net/ipv6/inet6_connection_sock.c:135
 dccp_transmit_skb+0x973/0x12c0 net/dccp/output.c:138
 dccp_send_reset+0x21b/0x2b0 net/dccp/output.c:535
 dccp_finish_passive_close net/dccp/proto.c:123 [inline]
 dccp_finish_passive_close+0xed/0x140 net/dccp/proto.c:118
 dccp_terminate_connection net/dccp/proto.c:958 [inline]
 dccp_close+0xb3c/0xe60 net/dccp/proto.c:1028
 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478
 __sock_release+0xcd/0x280 net/socket.c:599
 sock_close+0x18/0x20 net/socket.c:1258
 __fput+0x288/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x1a0 kernel/task_work.c:140
 tracehook_notify_resume include/linux/tracehook.h:189 [inline]

Fixes: 8afa10cbe281 ("net_sched: red: Avoid illegal values")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'mlx5-fixes-2021-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Wed, 10 Mar 2021 23:23:55 +0000 (15:23 -0800)] 
Merge tag 'mlx5-fixes-2021-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-fixes-2021-03-10

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: bcm_sf2: use 2 Gbps IMP port link on BCM4908
Rafał Miłecki [Wed, 10 Mar 2021 12:51:59 +0000 (13:51 +0100)] 
net: dsa: bcm_sf2: use 2 Gbps IMP port link on BCM4908

BCM4908 uses 2 Gbps link between switch and the Ethernet interface.
Without this BCM4908 devices were able to achieve only 2 x ~895 Mb/s.
This allows handling e.g. NAT traffic with 940 Mb/s.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: pxa168_eth: Fix a potential data race in pxa168_eth_remove
Pavel Andrianov [Wed, 10 Mar 2021 08:10:46 +0000 (11:10 +0300)] 
net: pxa168_eth: Fix a potential data race in pxa168_eth_remove

pxa168_eth_remove() firstly calls unregister_netdev(),
then cancels a timeout work. unregister_netdev() shuts down a device
interface and removes it from the kernel tables. If the timeout occurs
in parallel, the timeout work (pxa168_eth_tx_timeout_task) performs stop
and open of the device. It may lead to an inconsistent state and memory
leaks.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Pavel Andrianov <andrianov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomacvlan: macvlan_count_rx() needs to be aware of preemption
Eric Dumazet [Wed, 10 Mar 2021 09:56:36 +0000 (01:56 -0800)] 
macvlan: macvlan_count_rx() needs to be aware of preemption

macvlan_count_rx() can be called from process context, it is thus
necessary to disable preemption before calling u64_stats_update_begin()

syzbot was able to spot this on 32bit arch:

WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert include/linux/seqlock.h:271 [inline]
WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269
Modules linked in:
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 4632 Comm: kworker/1:3 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: ARM-Versatile Express
Workqueue: events macvlan_process_broadcast
Backtrace:
[<82740468>] (dump_backtrace) from [<827406dc>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
 r7:00000080 r6:60000093 r5:00000000 r4:8422a3c4
[<827406c4>] (show_stack) from [<82751b58>] (__dump_stack lib/dump_stack.c:79 [inline])
[<827406c4>] (show_stack) from [<82751b58>] (dump_stack+0xb8/0xe8 lib/dump_stack.c:120)
[<82751aa0>] (dump_stack) from [<82741270>] (panic+0x130/0x378 kernel/panic.c:231)
 r7:830209b4 r6:84069ea4 r5:00000000 r4:844350d0
[<82741140>] (panic) from [<80244924>] (__warn+0xb0/0x164 kernel/panic.c:605)
 r3:8404ec8c r2:00000000 r1:00000000 r0:830209b4
 r7:0000010f
[<80244874>] (__warn) from [<82741520>] (warn_slowpath_fmt+0x68/0xd4 kernel/panic.c:628)
 r7:81363f70 r6:0000010f r5:83018e50 r4:00000000
[<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert include/linux/seqlock.h:271 [inline])
[<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269)
 r8:5a109000 r7:0000000f r6:a568dac0 r5:89802300 r4:00000001
[<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (u64_stats_update_begin include/linux/u64_stats_sync.h:128 [inline])
[<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_count_rx include/linux/if_macvlan.h:47 [inline])
[<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_broadcast+0x154/0x26c drivers/net/macvlan.c:291)
 r5:89802300 r4:8a927740
[<8136499c>] (macvlan_broadcast) from [<81365020>] (macvlan_process_broadcast+0x258/0x2d0 drivers/net/macvlan.c:317)
 r10:81364f78 r9:8a86d000 r8:8a9c7e7c r7:8413aa5c r6:00000000 r5:00000000
 r4:89802840
[<81364dc8>] (macvlan_process_broadcast) from [<802696a4>] (process_one_work+0x2d4/0x998 kernel/workqueue.c:2275)
 r10:00000008 r9:8404ec98 r8:84367a02 r7:ddfe6400 r6:ddfe2d40 r5:898dac80
 r4:8a86d43c
[<802693d0>] (process_one_work) from [<80269dcc>] (worker_thread+0x64/0x54c kernel/workqueue.c:2421)
 r10:00000008 r9:8a9c6000 r8:84006d00 r7:ddfe2d78 r6:898dac94 r5:ddfe2d40
 r4:898dac80
[<80269d68>] (worker_thread) from [<80271f40>] (kthread+0x184/0x1a4 kernel/kthread.c:292)
 r10:85247e64 r9:898dac80 r8:80269d68 r7:00000000 r6:8a9c6000 r5:89a2ee40
 r4:8a97bd00
[<80271dbc>] (kthread) from [<80200114>] (ret_from_fork+0x14/0x20 arch/arm/kernel/entry-common.S:158)
Exception stack(0x8a9c7fb0 to 0x8a9c7ff8)

Fixes: 412ca1550cbe ("macvlan: Move broadcasts into a work queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodrop_monitor: Perform cleanup upon probe registration failure
Ido Schimmel [Wed, 10 Mar 2021 10:28:01 +0000 (12:28 +0200)] 
drop_monitor: Perform cleanup upon probe registration failure

In the rare case that drop_monitor fails to register its probe on the
'napi_poll' tracepoint, it will not deactivate its hysteresis timer as
part of the error path. If the hysteresis timer was armed by the shortly
lived 'kfree_skb' probe and user space retries to initiate tracing, a
warning will be emitted for trying to initialize an active object [1].

Fix this by properly undoing all the operations that were done prior to
probe registration, in both software and hardware code paths.

Note that syzkaller managed to fail probe registration by injecting a
slab allocation failure [2].

[1]
ODEBUG: init active (active state 0) object type: timer_list hint: sched_send_work+0x0/0x60 include/linux/list.h:135
WARNING: CPU: 1 PID: 8649 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Modules linked in:
CPU: 1 PID: 8649 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
[...]
Call Trace:
 __debug_object_init+0x524/0xd10 lib/debugobjects.c:588
 debug_timer_init kernel/time/timer.c:722 [inline]
 debug_init kernel/time/timer.c:770 [inline]
 init_timer_key+0x2d/0x340 kernel/time/timer.c:814
 net_dm_trace_on_set net/core/drop_monitor.c:1111 [inline]
 set_all_monitor_traces net/core/drop_monitor.c:1188 [inline]
 net_dm_monitor_start net/core/drop_monitor.c:1295 [inline]
 net_dm_cmd_trace+0x720/0x1220 net/core/drop_monitor.c:1339
 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:739
 genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
 genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:800
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
 genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:672
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2348
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2402
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2435
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

[2]
 FAULT_INJECTION: forcing a failure.
 name failslab, interval 1, probability 0, space 0, times 1
 CPU: 1 PID: 8645 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 Call Trace:
  dump_stack+0xfa/0x151
  should_fail.cold+0x5/0xa
  should_failslab+0x5/0x10
  __kmalloc+0x72/0x3f0
  tracepoint_add_func+0x378/0x990
  tracepoint_probe_register+0x9c/0xe0
  net_dm_cmd_trace+0x7fc/0x1220
  genl_family_rcv_msg_doit+0x228/0x320
  genl_rcv_msg+0x328/0x580
  netlink_rcv_skb+0x153/0x420
  genl_rcv+0x24/0x40
  netlink_unicast+0x533/0x7d0
  netlink_sendmsg+0x856/0xd90
  sock_sendmsg+0xcf/0x120
  ____sys_sendmsg+0x6e8/0x810
  ___sys_sendmsg+0xf3/0x170
  __sys_sendmsg+0xe5/0x1b0
  do_syscall_64+0x2d/0x70
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 70c69274f354 ("drop_monitor: Initialize timer and work item upon tracing enable")
Fixes: 8ee2267ad33e ("drop_monitor: Convert to using devlink tracepoint")
Reported-by: syzbot+779559d6503f3a56213d@syzkaller.appspotmail.com
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
David S. Miller [Wed, 10 Mar 2021 23:14:56 +0000 (15:14 -0800)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2021-03-10

The following pull-request contains BPF updates for your *net* tree.

We've added 8 non-merge commits during the last 5 day(s) which contain
a total of 11 files changed, 136 insertions(+), 17 deletions(-).

The main changes are:

1) Reject bogus use of vmlinux BTF as map/prog creation BTF, from Alexei Starovoitov.

2) Fix allocation failure splat in x86 JIT for large progs. Also fix overwriting
   percpu cgroup storage from tracing programs when nested, from Yonghong Song.

3) Fix rx queue retrieval in XDP for multi-queue veth, from Maciej Fijalkowski.

4) Fix bpf_check_mtu() helper API before freeze to have mtu_len as custom skb/xdp
   L3 input length, from Jesper Dangaard Brouer.

5) Fix inode_storage's lookup_elem return value upon having bad fd, from Tal Lossos.

6) Fix bpftool and libbpf cross-build on MacOS, from Georgi Valkov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv6: fix suspecious RCU usage warning
Wei Wang [Wed, 10 Mar 2021 02:20:35 +0000 (18:20 -0800)] 
ipv6: fix suspecious RCU usage warning

Syzbot reported the suspecious RCU usage in nexthop_fib6_nh() when
called from ipv6_route_seq_show(). The reason is ipv6_route_seq_start()
calls rcu_read_lock_bh(), while nexthop_fib6_nh() calls
rcu_dereference_rtnl().
The fix proposed is to add a variant of nexthop_fib6_nh() to use
rcu_dereference_bh_rtnl() for ipv6_route_seq_show().

The reported trace is as follows:
./include/net/nexthop.h:416 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
2 locks held by syz-executor.0/17895:
     at: seq_read+0x71/0x12a0 fs/seq_file.c:169
     at: seq_file_net include/linux/seq_file_net.h:19 [inline]
     at: ipv6_route_seq_start+0xaf/0x300 net/ipv6/ip6_fib.c:2616

stack backtrace:
CPU: 1 PID: 17895 Comm: syz-executor.0 Not tainted 4.15.0-syzkaller #0
Call Trace:
 [<ffffffff849edf9e>] __dump_stack lib/dump_stack.c:17 [inline]
 [<ffffffff849edf9e>] dump_stack+0xd8/0x147 lib/dump_stack.c:53
 [<ffffffff8480b7fa>] lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5745
 [<ffffffff8459ada6>] nexthop_fib6_nh include/net/nexthop.h:416 [inline]
 [<ffffffff8459ada6>] ipv6_route_native_seq_show net/ipv6/ip6_fib.c:2488 [inline]
 [<ffffffff8459ada6>] ipv6_route_seq_show+0x436/0x7a0 net/ipv6/ip6_fib.c:2673
 [<ffffffff81c556df>] seq_read+0xccf/0x12a0 fs/seq_file.c:276
 [<ffffffff81dbc62c>] proc_reg_read+0x10c/0x1d0 fs/proc/inode.c:231
 [<ffffffff81bc28ae>] do_loop_readv_writev fs/read_write.c:714 [inline]
 [<ffffffff81bc28ae>] do_loop_readv_writev fs/read_write.c:701 [inline]
 [<ffffffff81bc28ae>] do_iter_read+0x49e/0x660 fs/read_write.c:935
 [<ffffffff81bc81ab>] vfs_readv+0xfb/0x170 fs/read_write.c:997
 [<ffffffff81c88847>] kernel_readv fs/splice.c:361 [inline]
 [<ffffffff81c88847>] default_file_splice_read+0x487/0x9c0 fs/splice.c:416
 [<ffffffff81c86189>] do_splice_to+0x129/0x190 fs/splice.c:879
 [<ffffffff81c86f66>] splice_direct_to_actor+0x256/0x890 fs/splice.c:951
 [<ffffffff81c8777d>] do_splice_direct+0x1dd/0x2b0 fs/splice.c:1060
 [<ffffffff81bc4747>] do_sendfile+0x597/0xce0 fs/read_write.c:1459
 [<ffffffff81bca205>] SYSC_sendfile64 fs/read_write.c:1520 [inline]
 [<ffffffff81bca205>] SyS_sendfile64+0x155/0x170 fs/read_write.c:1506
 [<ffffffff81015fcf>] do_syscall_64+0x1ff/0x310 arch/x86/entry/common.c:305
 [<ffffffff84a00076>] entry_SYSCALL_64_after_hwframe+0x42/0xb7

Fixes: f88d8ea67fbdb ("ipv6: Plumb support for nexthop object in a fib6_info")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Petr Machata <petrm@nvidia.com>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'ip6ip6-crash'
David S. Miller [Wed, 10 Mar 2021 20:24:19 +0000 (12:24 -0800)] 
Merge branch 'ip6ip6-crash'

Daniel Borkmann says:

====================
Fix ip6ip6 crash for collect_md skbs

Fix a NULL pointer deref panic I ran into for regular ip6ip6 tunnel devices
when collect_md populated skbs were redirected to them for xmit. See patches
for further details, thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet, bpf: Fix ip6ip6 crash with collect_md populated skbs
Daniel Borkmann [Wed, 10 Mar 2021 00:38:10 +0000 (01:38 +0100)] 
net, bpf: Fix ip6ip6 crash with collect_md populated skbs

I ran into a crash where setting up a ip6ip6 tunnel device which was /not/
set to collect_md mode was receiving collect_md populated skbs for xmit.

The BPF prog was populating the skb via bpf_skb_set_tunnel_key() which is
assigning special metadata dst entry and then redirecting the skb to the
device, taking ip6_tnl_start_xmit() -> ipxip6_tnl_xmit() -> ip6_tnl_xmit()
and in the latter it performs a neigh lookup based on skb_dst(skb) where
we trigger a NULL pointer dereference on dst->ops->neigh_lookup() since
the md_dst_ops do not populate neigh_lookup callback with a fake handler.

Transform the md_dst_ops into generic dst_blackhole_ops that can also be
reused elsewhere when needed, and use them for the metadata dst entries as
callback ops.

Also, remove the dst_md_discard{,_out}() ops and rely on dst_discard{,_out}()
from dst_init() which free the skb the same way modulo the splat. Given we
will be able to recover just fine from there, avoid any potential splats
iff this gets ever triggered in future (or worse, panic on warns when set).

Fixes: f38a9eb1f77b ("dst: Metadata destinations")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: Consolidate common blackhole dst ops
Daniel Borkmann [Wed, 10 Mar 2021 00:38:09 +0000 (01:38 +0100)] 
net: Consolidate common blackhole dst ops

Move generic blackhole dst ops to the core and use them from both
ipv4_dst_blackhole_ops and ip6_dst_blackhole_ops where possible. No
functional change otherwise. We need these also in other locations
and having to define them over and over again is not great.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/mlx5: DR, Fix potential shift wrapping of 32-bit value in STEv1 getter
Yevgeny Kliteynik [Sat, 6 Feb 2021 20:11:52 +0000 (22:11 +0200)] 
net/mlx5: DR, Fix potential shift wrapping of 32-bit value in STEv1 getter

Fix 32-bit variable shift wrapping in dr_ste_v1_get_miss_addr.

Fixes: a6098129c781 ("net/mlx5: DR, Add STEv1 setters and getters")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: SF: Fix error flow of SFs allocation flow
Shay Drory [Thu, 11 Feb 2021 13:41:35 +0000 (15:41 +0200)] 
net/mlx5: SF: Fix error flow of SFs allocation flow

When SF id is unavailable, code jumps to wrong label that accesses
sw id array outside of its range.
Hence, when SF id is not allocated, avoid accessing such array.

Fixes: 8f0105418668 ("net/mlx5: SF, Add port add delete functionality")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: SF: Fix memory leak of work item
Shay Drory [Tue, 2 Feb 2021 10:21:04 +0000 (12:21 +0200)] 
net/mlx5: SF: Fix memory leak of work item

Cited patch in the fixes tag missed to free the allocated work.
Fix it by freeing the work after work execution.

Fixes: f3196bb0f14c ("net/mlx5: Introduce vhca state event notifier")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: SF, Correct vhca context size
Parav Pandit [Fri, 26 Feb 2021 08:15:16 +0000 (10:15 +0200)] 
net/mlx5: SF, Correct vhca context size

Fix vhca context size as defined by device interface specification.

Fixes: f3196bb0f14c ("net/mlx5: Introduce vhca state event notifier")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: E-switch, Fix rate calculation division
Parav Pandit [Wed, 17 Feb 2021 06:23:29 +0000 (08:23 +0200)] 
net/mlx5e: E-switch, Fix rate calculation division

do_div() returns reminder, while cited patch wanted to use
quotient.
Fix it by using quotient.

Fixes: 0e22bfb7c046 ("net/mlx5e: E-switch, Fix rate calculation for overflow")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agoRDMA/mlx5: Fix timestamp default mode
Maor Gottlieb [Wed, 3 Mar 2021 12:39:20 +0000 (14:39 +0200)] 
RDMA/mlx5: Fix timestamp default mode

1. Don't set the ts_format bit to default when it reserved - device is
   running in the old mode (free running).
2. XRC doesn't have a CQ therefore the ts format in the QP
   context should be default / free running.
3. Set ts_format to WQ.

Fixes: 2fe8d4b87802 ("RDMA/mlx5: Fail QP creation if the device can not support the CQE TS")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: Set QP timestamp mode to default
Maor Gottlieb [Wed, 3 Mar 2021 12:36:16 +0000 (14:36 +0200)] 
net/mlx5: Set QP timestamp mode to default

QPs which don't care from timestamp mode, should set the ts_format
to default, otherwise the QP creation could be failed if the timestamp
mode is not supported.

Fixes: 2fe8d4b87802 ("RDMA/mlx5: Fail QP creation if the device can not support the CQE TS")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Fix error flow in change profile
Roi Dayan [Wed, 10 Feb 2021 08:33:13 +0000 (10:33 +0200)] 
net/mlx5e: Fix error flow in change profile

Move priv memset from init to cleanup to avoid double priv cleanup
that can happen on profile change if also roolback fails.
Add missing cleanup flow in mlx5e_netdev_attach_profile().

Fixes: c4d7eb57687f ("net/mxl5e: Add change profile method")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: Disable VF tunnel TX offload if ignore_flow_level isn't supported
Maor Dickman [Mon, 1 Mar 2021 11:07:08 +0000 (13:07 +0200)] 
net/mlx5: Disable VF tunnel TX offload if ignore_flow_level isn't supported

VF tunnel TX traffic offload is adding flow which forward to flow
tables with lower level, which isn't support on all FW versions
and may cause firmware to fail with syndrome.

Fixed by enabling VF tunnel TX offload only if flow table capability
ignore_flow_level is enabled.

Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Check correct ip_version in decapsulation route resolution
Roi Dayan [Tue, 2 Mar 2021 09:31:08 +0000 (11:31 +0200)] 
net/mlx5e: Check correct ip_version in decapsulation route resolution

flow_attr->ip_version has the matching that should be done inner/outer.
When working with chains, decapsulation is done on chain0 and next chain
match on outer header which is the original inner which could be ipv4.
So in tunnel route resolution we cannot use that to know which ip version
we are at so save tun_ip_version when parsing the tunnel match and use
that.

Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: Fix turn-off PPS command
Aya Levin [Mon, 1 Mar 2021 15:54:21 +0000 (17:54 +0200)] 
net/mlx5: Fix turn-off PPS command

Fix a bug of uninitialized pin index when trying to turn off PPS out.

Fixes: de19cd6cc977 ("net/mlx5: Move some PPS logic into helper functions")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Don't match on Geneve options in case option masks are all zero
Maor Dickman [Tue, 16 Feb 2021 11:39:18 +0000 (13:39 +0200)] 
net/mlx5e: Don't match on Geneve options in case option masks are all zero

The cited change added offload support for Geneve options without verifying
the validity of the options masks, this caused offload of rules with match
on Geneve options with class,type and data masks which are zero to fail.

Fix by ignoring the match on Geneve options in case option masks are
all zero.

Fixes: 9272e3df3023 ("net/mlx5e: Geneve, Add support for encap/decap flows offload")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Revert parameters on errors when changing PTP state without reset
Maxim Mikityanskiy [Fri, 29 Jan 2021 12:04:34 +0000 (14:04 +0200)] 
net/mlx5e: Revert parameters on errors when changing PTP state without reset

Port timestamping for PTP can be enabled/disabled while the channels are
closed. In that case mlx5e_safe_switch_channels is skipped, and the
preactivate hook is called directly. However, if that hook returns an
error, the channel parameters must be reverted back to their old values.
This commit adds missing handling on this case.

Fixes: 145e5637d941 ("net/mlx5e: Add TX PTP port object support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: When changing XDP program without reset, take refs for XSK RQs
Maxim Mikityanskiy [Thu, 11 Feb 2021 13:51:11 +0000 (15:51 +0200)] 
net/mlx5e: When changing XDP program without reset, take refs for XSK RQs

Each RQ (including XSK RQs) takes a reference to the XDP program. When
an XDP program is attached or detached, the channels and queues are
recreated, however, there is a special flow for changing an active XDP
program to another one. In that flow, channels and queues stay alive,
but the refcounts of the old and new XDP programs are adjusted. This
flow didn't increment refcount by the number of active XSK RQs, and this
commit fixes it.

Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Set PTP channel pointer explicitly to NULL
Aya Levin [Tue, 12 Jan 2021 11:34:29 +0000 (13:34 +0200)] 
net/mlx5e: Set PTP channel pointer explicitly to NULL

When closing the PTP channel, set its pointer explicitly to NULL. PTP
channel is opened on demand, the code verify the pointer validity before
access. Nullify it when closing the PTP channel to avoid unexpected
behavior.

Fixes: 145e5637d941 ("net/mlx5e: Add TX PTP port object support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Accumulate port PTP TX stats with other channels stats
Aya Levin [Sun, 28 Feb 2021 16:40:18 +0000 (18:40 +0200)] 
net/mlx5e: Accumulate port PTP TX stats with other channels stats

In addition to .get_ethtool_stats, add port PTP TX stats to
.ndo_get_stats64.

Fixes: 145e5637d941 ("net/mlx5e: Add TX PTP port object support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: RX, Mind the MPWQE gaps when calculating offsets
Tariq Toukan [Tue, 12 Jan 2021 11:21:17 +0000 (13:21 +0200)] 
net/mlx5e: RX, Mind the MPWQE gaps when calculating offsets

Since cited patch, MLX5E_REQUIRED_WQE_MTTS is not a power of two.
Hence, usage of MLX5E_LOG_ALIGNED_MPWQE_PPW should be replaced,
as it lost some accuracy. Use the designated macro to calculate
the number of required MTTs.

This makes sure the solution in cited patch works properly.

While here, un-inline mlx5e_get_mpwqe_offset(), and remove the
unused RQ parameter.

Fixes: c3c9402373fe ("net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Enforce minimum value check for ICOSQ size
Tariq Toukan [Tue, 12 Jan 2021 11:29:14 +0000 (13:29 +0200)] 
net/mlx5e: Enforce minimum value check for ICOSQ size

The ICOSQ size should not go below MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE.
Enforce this where it's missing.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agoMerge git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Wed, 10 Mar 2021 01:15:56 +0000 (17:15 -0800)] 
Merge git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

 1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau.

 2) TX skb error handling fix in mt76 driver, also from Felix.

 3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman.

 4) Avoid double free of percpu pointers when freeing a cloned bpf prog.
    From Cong Wang.

 5) Use correct printf format for dma_addr_t in ath11k, from Geert
    Uytterhoeven.

 6) Fix resolve_btfids build with older toolchains, from Kun-Chuan
    Hsieh.

 7) Don't report truncated frames to mac80211 in mt76 driver, from
    Lorenzop Bianconi.

 8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang.

 9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann.

10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from
    Arjun Roy.

11) Ignore routes with deleted nexthop object in mlxsw, from Ido
    Schimmel.

12) Need to undo tcp early demux lookup sometimes in nf_nat, from
    Florian Westphal.

13) Fix gro aggregation for udp encaps with zero csum, from Daniel
    Borkmann.

14) Make sure to always use imp*_ndo_send when necessaey, from Jason A.
    Donenfeld.

15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov.

16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin.

17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir
    Oltean.

18) Make sure textsearch copntrol block is large enough, from Wilem de
    Bruijn.

19) Revert MAC changes to r8152 leading to instability, from Hates Wang.

20) Advance iov in 9p even for empty reads, from Jissheng Zhang.

21) Double hook unregister in nftables, from PabloNeira Ayuso.

22) Fix memleak in ixgbe, fropm Dinghao Liu.

23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne.

24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang
    Tang.

25) Fix DOI refcount bugs in cipso, from Paul Moore.

26) One too many irqsave in ibmvnic, from Junlin Yang.

27) Fix infinite loop with MPLS gso segmenting via virtio_net, from
    Balazs Nemeth.

* git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits)
  s390/qeth: fix notification for pending buffers during teardown
  s390/qeth: schedule TX NAPI on QAOB completion
  s390/qeth: improve completion of pending TX buffers
  s390/qeth: fix memory leak after failed TX Buffer allocation
  net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
  net: check if protocol extracted by virtio_net_hdr_set_proto is correct
  net: dsa: xrs700x: check if partner is same as port in hsr join
  net: lapbether: Remove netif_start_queue / netif_stop_queue
  atm: idt77252: fix null-ptr-dereference
  atm: uPD98402: fix incorrect allocation
  atm: fix a typo in the struct description
  net: qrtr: fix error return code of qrtr_sendmsg()
  mptcp: fix length of ADD_ADDR with port sub-option
  net: bonding: fix error return code of bond_neigh_init()
  net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
  net: enetc: set MAC RX FIFO to recommended value
  net: davicom: Use platform_get_irq_optional()
  net: davicom: Fix regulator not turned off on driver removal
  net: davicom: Fix regulator not turned off on failed probe
  net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
  ...

4 years agoMerge git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Wed, 10 Mar 2021 01:08:41 +0000 (17:08 -0800)] 
Merge git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:
 "Fix opcode filtering for exceptions, and clean up defconfig"

* git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
  sparc: sparc64_defconfig: remove duplicate CONFIGs
  sparc64: Fix opcode filtering in handling of no fault loads

4 years agosparc: sparc64_defconfig: remove duplicate CONFIGs
Corentin Labbe [Mon, 8 Mar 2021 09:51:26 +0000 (09:51 +0000)] 
sparc: sparc64_defconfig: remove duplicate CONFIGs

After my patch there is CONFIG_ATA defined twice.
Remove the duplicate one.
Same problem for CONFIG_HAPPYMEAL, except I added as builtin for boot
test with NFS.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: a57cdeb369ef ("sparc: sparc64_defconfig: add necessary configs for qemu")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosparc64: Fix opcode filtering in handling of no fault loads
Rob Gardner [Mon, 1 Mar 2021 05:48:16 +0000 (22:48 -0700)] 
sparc64: Fix opcode filtering in handling of no fault loads

is_no_fault_exception() has two bugs which were discovered via random
opcode testing with stress-ng. Both are caused by improper filtering
of opcodes.

The first bug can be triggered by a floating point store with a no-fault
ASI, for instance "sta %f0, [%g0] #ASI_PNF", opcode C1A01040.

The code first tests op3[5] (0x1000000), which denotes a floating
point instruction, and then tests op3[2] (0x200000), which denotes a
store instruction. But these bits are not mutually exclusive, and the
above mentioned opcode has both bits set. The intent is to filter out
stores, so the test for stores must be done first in order to have
any effect.

The second bug can be triggered by a floating point load with one of
the invalid ASI values 0x8e or 0x8f, which pass this check in
is_no_fault_exception():
     if ((asi & 0xf2) == ASI_PNF)

An example instruction is "ldqa [%l7 + %o7] #ASI 0x8f, %f38",
opcode CF95D1EF. Asi values greater than 0x8b (ASI_SNFL) are fatal
in handle_ldf_stq(), and is_no_fault_exception() must not allow these
invalid asi values to make it that far.

In both of these cases, handle_ldf_stq() reacts by calling
sun4v_data_access_exception() or spitfire_data_access_exception(),
which call is_no_fault_exception() and results in an infinite
recursion.

Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 's390-qeth-fixes'
David S. Miller [Wed, 10 Mar 2021 00:14:54 +0000 (16:14 -0800)] 
Merge branch 's390-qeth-fixes'

Julian Wiedmann says:

====================
s390/qeth: fixes 2021-03-09

please apply the following patch series to netdev's net tree.

This brings one fix for a memleak in an error path of the setup code.
Also several fixes for dealing with pending TX buffers - two for old
bugs in their completion handling, and one recent regression in a
teardown path.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: fix notification for pending buffers during teardown
Julian Wiedmann [Tue, 9 Mar 2021 16:52:21 +0000 (17:52 +0100)] 
s390/qeth: fix notification for pending buffers during teardown

The cited commit reworked the state machine for pending TX buffers.
In qeth_iqd_tx_complete() it turned PENDING into a transient state, and
uses NEED_QAOB for buffers that get parked while waiting for their QAOB
completion.

But it missed to adjust the check in qeth_tx_complete_buf(). So if
qeth_tx_complete_pending_bufs() is called during teardown to drain
the parked TX buffers, we no longer raise a notification for af_iucv.

Instead of updating the checked state, just move this code into
qeth_tx_complete_pending_bufs() itself. This also gets rid of the
special-case in the common TX completion path.

Fixes: 8908f36d20d8 ("s390/qeth: fix af_iucv notification race")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: schedule TX NAPI on QAOB completion
Julian Wiedmann [Tue, 9 Mar 2021 16:52:20 +0000 (17:52 +0100)] 
s390/qeth: schedule TX NAPI on QAOB completion

When a QAOB notifies us that a pending TX buffer has been delivered, the
actual TX completion processing by qeth_tx_complete_pending_bufs()
is done within the context of a TX NAPI instance. We shouldn't rely on
this instance being scheduled by some other TX event, but just do it
ourselves.

qeth_qdio_handle_aob() is called from qeth_poll(), ie. our main NAPI
instance. To avoid touching the TX queue's NAPI instance
before/after it is (un-)registered, reorder the code in qeth_open()
and qeth_stop() accordingly.

Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: improve completion of pending TX buffers
Julian Wiedmann [Tue, 9 Mar 2021 16:52:19 +0000 (17:52 +0100)] 
s390/qeth: improve completion of pending TX buffers

The current design attaches a pending TX buffer to a custom
single-linked list, which is anchored at the buffer's slot on the
TX ring. The buffer is then checked for final completion whenever
this slot is processed during a subsequent TX NAPI poll cycle.

But if there's insufficient traffic on the ring, we might never make
enough progress to get back to this ring slot and discover the pending
buffer's final TX completion. In particular if this missing TX
completion blocks the application from sending further traffic.

So convert the custom single-linked list code to a per-queue list_head,
and scan this list on every TX NAPI cycle.

Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: fix memory leak after failed TX Buffer allocation
Julian Wiedmann [Tue, 9 Mar 2021 16:52:18 +0000 (17:52 +0100)] 
s390/qeth: fix memory leak after failed TX Buffer allocation

When qeth_alloc_qdio_queues() fails to allocate one of the buffers that
back an Output Queue, the 'out_freeoutqbufs' path will free all
previously allocated buffers for this queue. But it misses to free the
half-finished queue struct itself.

Move the buffer allocation into qeth_alloc_output_queue(), and deal with
such errors internally.

Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'virtio_net-infinite-loop'
David S. Miller [Wed, 10 Mar 2021 00:12:20 +0000 (16:12 -0800)] 
Merge branch 'virtio_net-infinite-loop'

Balazs Nemeth says:

====================
net: prevent infinite loop caused by incorrect proto from virtio_net_hdr_set_proto

These patches prevent an infinite loop for gso packets with a protocol
from virtio net hdr that doesn't match the protocol in the packet.
Note that packets coming from a device without
header_ops->parse_protocol being implemented will not be caught by
the check in virtio_net_hdr_to_skb, but the infinite loop will still
be prevented by the check in the gso layer.

Changes from v2 to v3:
  - Remove unused *eth.
  - Use MPLS_HLEN to also check if the MPLS header length is a multiple
    of four.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
Balazs Nemeth [Tue, 9 Mar 2021 11:31:01 +0000 (12:31 +0100)] 
net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0

A packet with skb_inner_network_header(skb) == skb_network_header(skb)
and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers
from the packet. Subsequently, the call to skb_mac_gso_segment will
again call mpls_gso_segment with the same packet leading to an infinite
loop. In addition, ensure that the header length is a multiple of four,
which should hold irrespective of the number of stacked labels.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: check if protocol extracted by virtio_net_hdr_set_proto is correct
Balazs Nemeth [Tue, 9 Mar 2021 11:31:00 +0000 (12:31 +0100)] 
net: check if protocol extracted by virtio_net_hdr_set_proto is correct

For gso packets, virtio_net_hdr_set_proto sets the protocol (if it isn't
set) based on the type in the virtio net hdr, but the skb could contain
anything since it could come from packet_snd through a raw socket. If
there is a mismatch between what virtio_net_hdr_set_proto sets and
the actual protocol, then the skb could be handled incorrectly later
on.

An example where this poses an issue is with the subsequent call to
skb_flow_dissect_flow_keys_basic which relies on skb->protocol being set
correctly. A specially crafted packet could fool
skb_flow_dissect_flow_keys_basic preventing EINVAL to be returned.

Avoid blindly trusting the information provided by the virtio net header
by checking that the protocol in the packet actually matches the
protocol set by virtio_net_hdr_set_proto. Note that since the protocol
is only checked if skb->dev implements header_ops->parse_protocol,
packets from devices without the implementation are not checked at this
stage.

Fixes: 9274124f023b ("net: stricter validation of untrusted gso packets")
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: xrs700x: check if partner is same as port in hsr join
George McCollister [Mon, 8 Mar 2021 23:38:22 +0000 (17:38 -0600)] 
net: dsa: xrs700x: check if partner is same as port in hsr join

Don't assign dp to partner if it's the same port that xrs700x_hsr_join
was called with. The partner port is supposed to be the other port in
the HSR/PRP redundant pair not the same port. This fixes an issue
observed in testing where forwarding between redundant HSR ports on this
switch didn't work depending on the order the ports were added to the
hsr device.

Fixes: bd62e6f5e6a9 ("net: dsa: xrs700x: add HSR offloading support")
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobpf, x86: Use kvmalloc_array instead kmalloc_array in bpf_jit_comp
Yonghong Song [Tue, 9 Mar 2021 01:56:47 +0000 (17:56 -0800)] 
bpf, x86: Use kvmalloc_array instead kmalloc_array in bpf_jit_comp

x86 bpf_jit_comp.c used kmalloc_array to store jited addresses
for each bpf insn. With a large bpf program, we have see the
following allocation failures in our production server:

    page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP),
                             nodemask=(null),cpuset=/,mems_allowed=0"
    Call Trace:
    dump_stack+0x50/0x70
    warn_alloc.cold.120+0x72/0xd2
    ? __alloc_pages_direct_compact+0x157/0x160
    __alloc_pages_slowpath+0xcdb/0xd00
    ? get_page_from_freelist+0xe44/0x1600
    ? vunmap_page_range+0x1ba/0x340
    __alloc_pages_nodemask+0x2c9/0x320
    kmalloc_order+0x18/0x80
    kmalloc_order_trace+0x1d/0xa0
    bpf_int_jit_compile+0x1e2/0x484
    ? kmalloc_order_trace+0x1d/0xa0
    bpf_prog_select_runtime+0xc3/0x150
    bpf_prog_load+0x480/0x720
    ? __mod_memcg_lruvec_state+0x21/0x100
    __do_sys_bpf+0xc31/0x2040
    ? close_pdeo+0x86/0xe0
    do_syscall_64+0x42/0x110
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f2f300f7fa9
    Code: Bad RIP value.

Dumped assembly:

    ffffffff810b6d70 <bpf_int_jit_compile>:
    ; {
    ffffffff810b6d70: e8 eb a5 b4 00        callq   0xffffffff81c01360 <__fentry__>
    ffffffff810b6d75: 41 57                 pushq   %r15
    ...
    ffffffff810b6f39: e9 72 fe ff ff        jmp     0xffffffff810b6db0 <bpf_int_jit_compile+0x40>
    ;       addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
    ffffffff810b6f3e: 8b 45 0c              movl    12(%rbp), %eax
    ;       return __kmalloc(bytes, flags);
    ffffffff810b6f41: be c0 0c 00 00        movl    $3264, %esi
    ;       addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
    ffffffff810b6f46: 8d 78 01              leal    1(%rax), %edi
    ;       if (unlikely(check_mul_overflow(n, size, &bytes)))
    ffffffff810b6f49: 48 c1 e7 02           shlq    $2, %rdi
    ;       return __kmalloc(bytes, flags);
    ffffffff810b6f4d: e8 8e 0c 1d 00        callq   0xffffffff81287be0 <__kmalloc>
    ;       if (!addrs) {
    ffffffff810b6f52: 48 85 c0              testq   %rax, %rax

Change kmalloc_array() to kvmalloc_array() to avoid potential
allocation error for big bpf programs.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210309015647.3657852-1-yhs@fb.com
4 years agobpf: Don't do bpf_cgroup_storage_set() for kuprobe/tp programs
Yonghong Song [Tue, 9 Mar 2021 18:50:28 +0000 (10:50 -0800)] 
bpf: Don't do bpf_cgroup_storage_set() for kuprobe/tp programs

For kuprobe and tracepoint bpf programs, kernel calls
trace_call_bpf() which calls BPF_PROG_RUN_ARRAY_CHECK()
to run the program array. Currently, BPF_PROG_RUN_ARRAY_CHECK()
also calls bpf_cgroup_storage_set() to set percpu
cgroup local storage with NULL value. This is
due to Commit 394e40a29788 ("bpf: extend bpf_prog_array to store
pointers to the cgroup storage") which modified
__BPF_PROG_RUN_ARRAY() to call bpf_cgroup_storage_set()
and this macro is also used by BPF_PROG_RUN_ARRAY_CHECK().

kuprobe and tracepoint programs are not allowed to call
bpf_get_local_storage() helper hence does not
access percpu cgroup local storage. Let us
change BPF_PROG_RUN_ARRAY_CHECK() not to
modify percpu cgroup local storage.

The issue is observed when I tried to debug [1] where
percpu data is overwritten due to
  preempt_disable -> migration_disable
change. This patch does not completely fix the above issue,
which will be addressed separately, e.g., multiple cgroup
prog runs may preempt each other. But it does fix
any potential issue caused by tracing program
overwriting percpu cgroup storage:
 - in a busy system, a tracing program is to run between
   bpf_cgroup_storage_set() and the cgroup prog run.
 - a kprobe program is triggered by a helper in cgroup prog
   before bpf_get_local_storage() is called.

 [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T

Fixes: 394e40a29788 ("bpf: extend bpf_prog_array to store pointers to the cgroup storage")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Link: https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.com
4 years agoMerge tag 'gpio-fixes-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 9 Mar 2021 20:02:18 +0000 (12:02 -0800)] 
Merge tag 'gpio-fixes-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
 "A bunch of fixes for the GPIO subsystem. We have two regressions in
  the core code spotted right after the merge window, a series of fixes
  for ACPI GPIO and a subsequent fix for a related regression in
  gpio-pca953x + a minor tweak in .gitignore and a rework of handling of
  the gpio-line-names to remedy a regression in stm32mp151.

  Summary:

   - fix two regressions in core GPIO subsystem code: one NULL-pointer
     dereference and one list corruption

   - read GPIO line names from fwnode instead of using the generic
     device properties to fix a regression on stm32mp151

   - fixes to ACPI GPIO and gpio-pca953x to handle a regression in IRQ
     handling on Intel Galileo

   - update .gitignore in GPIO selftests"

* tag 'gpio-fixes-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: Read "gpio-line-names" from a firmware node
  gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
  gpiolib: acpi: Allow to find GpioInt() resource by name and index
  gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
  gpiolib: acpi: Add missing IRQF_ONESHOT
  gpio: fix gpio-device list corruption
  gpio: fix NULL-deref-on-deregistration regression
  selftests: gpio: update .gitignore

4 years agoMerge tag 'mips-fixes_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Tue, 9 Mar 2021 19:56:41 +0000 (11:56 -0800)] 
Merge tag 'mips-fixes_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

 - fixes for boot breakage because of misaligned FDTs

 - fix for overwritten exception handlers

 - enable MIPS optimized crypto for all MIPS CPUs to improve wireguard
   performance

* tag 'mips-fixes_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: kernel: Reserve exception base early to prevent corruption
  MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes
  crypto: mips/poly1305 - enable for all MIPS processors
  MIPS: boot/compressed: Copy DTB to aligned address

4 years agonet: lapbether: Remove netif_start_queue / netif_stop_queue
Xie He [Sun, 7 Mar 2021 11:33:07 +0000 (03:33 -0800)] 
net: lapbether: Remove netif_start_queue / netif_stop_queue

For the devices in this driver, the default qdisc is "noqueue",
because their "tx_queue_len" is 0.

In function "__dev_queue_xmit" in "net/core/dev.c", devices with the
"noqueue" qdisc are specially handled. Packets are transmitted without
being queued after a "dev->flags & IFF_UP" check. However, it's possible
that even if this check succeeds, "ops->ndo_stop" may still have already
been called. This is because in "__dev_close_many", "ops->ndo_stop" is
called before clearing the "IFF_UP" flag.

If we call "netif_stop_queue" in "ops->ndo_stop", then it's possible in
"__dev_queue_xmit", it sees the "IFF_UP" flag is present, and then it
checks "netif_xmit_stopped" and finds that the queue is already stopped.
In this case, it will complain that:
"Virtual device ... asks to queue packet!"

To prevent "__dev_queue_xmit" from generating this complaint, we should
not call "netif_stop_queue" in "ops->ndo_stop".

We also don't need to call "netif_start_queue" in "ops->ndo_open",
because after a netdev is allocated and registered, the
"__QUEUE_STATE_DRV_XOFF" flag is initially not set, so there is no need
to call "netif_start_queue" to clear it.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMIPS: kernel: Reserve exception base early to prevent corruption
Thomas Bogendoerfer [Mon, 8 Mar 2021 09:24:47 +0000 (10:24 +0100)] 
MIPS: kernel: Reserve exception base early to prevent corruption

BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.

Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].

The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.

To fix this, we need to perform an early reservation of the custom
exception space. Additional we reserve the first 4k (1k for R3k) for
either normal exception vector space (legacy CPUs) or special vectors
like cache exceptions.

Huge thanks to Serge for analysing and proposing a solution to this
issue.

Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMerge git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Tue, 9 Mar 2021 06:01:58 +0000 (22:01 -0800)] 
Merge git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc

Pull sparc updates from David Miller:
 "Just some more random bits from Al, including a conversion over to
  generic extables"

* git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
  sparc32: take ->thread.flags out
  sparc32: get rid of fake_swapper_regs
  sparc64: get rid of fake_swapper_regs
  sparc32: switch to generic extables
  sparc32: switch copy_user.S away from range exception table entries
  sparc32: get rid of range exception table entries in checksum_32.S
  sparc32: switch __bzero() away from range exception table entries
  sparc32: kill lookup_fault()
  sparc32: don't bother with lookup_fault() in __bzero()

4 years agoatm: idt77252: fix null-ptr-dereference
Tong Zhang [Mon, 8 Mar 2021 03:25:30 +0000 (22:25 -0500)] 
atm: idt77252: fix null-ptr-dereference

this one is similar to the phy_data allocation fix in uPD98402, the
driver allocate the idt77105_priv and store to dev_data but later
dereference using dev->dev_data, which will cause null-ptr-dereference.

fix this issue by changing dev_data to phy_data so that PRIV(dev) can
work correctly.

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoatm: uPD98402: fix incorrect allocation
Tong Zhang [Mon, 8 Mar 2021 03:25:29 +0000 (22:25 -0500)] 
atm: uPD98402: fix incorrect allocation

dev->dev_data is set in zatm.c, calling zatm_start() will overwrite this
dev->dev_data in uPD98402_start() and a subsequent PRIV(dev)->lock
(i.e dev->phy_data->lock) will result in a null-ptr-dereference.

I believe this is a typo and what it actually want to do is to allocate
phy_data instead of dev_data.

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoatm: fix a typo in the struct description
Tong Zhang [Mon, 8 Mar 2021 03:25:28 +0000 (22:25 -0500)] 
atm: fix a typo in the struct description

phy_data means private PHY data not date

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: qrtr: fix error return code of qrtr_sendmsg()
Jia-Ju Bai [Mon, 8 Mar 2021 09:13:55 +0000 (01:13 -0800)] 
net: qrtr: fix error return code of qrtr_sendmsg()

When sock_alloc_send_skb() returns NULL to skb, no error return code of
qrtr_sendmsg() is assigned.
To fix this bug, rc is assigned with -ENOMEM in this case.

Fixes: 194ccc88297a ("net: qrtr: Support decoding incoming v2 packets")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: fix length of ADD_ADDR with port sub-option
Davide Caratti [Mon, 8 Mar 2021 09:00:04 +0000 (10:00 +0100)] 
mptcp: fix length of ADD_ADDR with port sub-option

in current Linux, MPTCP peers advertising endpoints with port numbers use
a sub-option length that wrongly accounts for the trailing TCP NOP. Also,
receivers will only process incoming ADD_ADDR with port having such wrong
sub-option length. Fix this, making ADD_ADDR compliant to RFC8684 §3.4.1.

this can be verified running tcpdump on the kselftests artifacts:

 unpatched kernel:
 [root@bottarga mptcp]# tcpdump -tnnr unpatched.pcap | grep add-addr
 reading from file unpatched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535
 IP 10.0.1.1.10000 > 10.0.1.2.53078: Flags [.], ack 101, win 509, options [nop,nop,TS val 214459678 ecr 521312851,mptcp add-addr v1 id 1 a00:201:2774:2d88:7436:85c3:17fd:101], length 0
 IP 10.0.1.2.53078 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 521312852 ecr 214459678,mptcp add-addr[bad opt]]

 patched kernel:
 [root@bottarga mptcp]# tcpdump -tnnr patched.pcap | grep add-addr
 reading from file patched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535
 IP 10.0.1.1.10000 > 10.0.1.2.38178: Flags [.], ack 101, win 509, options [nop,nop,TS val 3728873902 ecr 2732713192,mptcp add-addr v1 id 1 10.0.2.1:10100 hmac 0xbccdfcbe59292a1f,nop,nop], length 0
 IP 10.0.1.2.38178 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 2732713195 ecr 3728873902,mptcp add-addr v1-echo id 1 10.0.2.1:10100,nop,nop], length 0

Fixes: 22fb85ffaefb ("mptcp: add port support for ADD_ADDR suboption writing")
CC: stable@vger.kernel.org # 5.11+
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests/bpf: Tests using bpf_check_mtu BPF-helper input mtu_len param
Jesper Dangaard Brouer [Mon, 8 Mar 2021 14:59:23 +0000 (15:59 +0100)] 
selftests/bpf: Tests using bpf_check_mtu BPF-helper input mtu_len param

Add tests that use mtu_len as input parameter in BPF-helper
bpf_check_mtu().

The BPF-helper is avail from both XDP and TC context. Add two tests
per context, one that tests below MTU and one that exceeds the MTU.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/161521556358.3515614.5915221479709358964.stgit@firesoul
4 years agobpf: BPF-helper for MTU checking add length input
Jesper Dangaard Brouer [Mon, 8 Mar 2021 14:59:18 +0000 (15:59 +0100)] 
bpf: BPF-helper for MTU checking add length input

The FIB lookup example[1] show how the IP-header field tot_len
(iph->tot_len) is used as input to perform the MTU check.

This patch extend the BPF-helper bpf_check_mtu() with the same ability
to provide the length as user parameter input, via mtu_len parameter.

This still needs to be done before the bpf_check_mtu() helper API
becomes frozen.

  [1] samples/bpf/xdp_fwd_kern.c

Fixes: 34b2021cc616 ("bpf: Add BPF-helper for MTU checking")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/161521555850.3515614.6533850861569774444.stgit@firesoul
4 years agonet: bonding: fix error return code of bond_neigh_init()
Jia-Ju Bai [Mon, 8 Mar 2021 03:11:02 +0000 (19:11 -0800)] 
net: bonding: fix error return code of bond_neigh_init()

When slave is NULL or slave_ops->ndo_neigh_setup is NULL, no error
return code of bond_neigh_init() is assigned.
To fix this bug, ret is assigned with -EINVAL in these cases.

Fixes: 9e99bfefdbce ("bonding: fix bond_neigh_init()")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: enetc: allow hardware timestamping on TX queues with tc-etf enabled
Vladimir Oltean [Sun, 7 Mar 2021 13:23:39 +0000 (15:23 +0200)] 
net: enetc: allow hardware timestamping on TX queues with tc-etf enabled

The txtime is passed to the driver in skb->skb_mstamp_ns, which is
actually in a union with skb->tstamp (the place where software
timestamps are kept).

Since commit b50a5c70ffa4 ("net: allow simultaneous SW and HW transmit
timestamping"), __sock_recv_timestamp has some logic for making sure
that the two calls to skb_tstamp_tx:

skb_tx_timestamp(skb) # Software timestamp in the driver
-> skb_tstamp_tx(skb, NULL)

and

skb_tstamp_tx(skb, &shhwtstamps) # Hardware timestamp in the driver

will both do the right thing and in a race-free manner, meaning that
skb_tx_timestamp will deliver a cmsg with the software timestamp only,
and skb_tstamp_tx with a non-NULL hwtstamps argument will deliver a cmsg
with the hardware timestamp only.

Why are races even possible? Well, because although the software timestamp
skb->tstamp is private per skb, the hardware timestamp skb_hwtstamps(skb)
lives in skb_shinfo(skb), an area which is shared between skbs and their
clones. And skb_tstamp_tx works by cloning the packets when timestamping
them, therefore attempting to perform hardware timestamping on an skb's
clone will also change the hardware timestamp of the original skb. And
the original skb might have been yet again cloned for software
timestamping, at an earlier stage.

So the logic in __sock_recv_timestamp can't be as simple as saying
"does this skb have a hardware timestamp? if yes I'll send the hardware
timestamp to the socket, otherwise I'll send the software timestamp",
precisely because the hardware timestamp is shared.
Instead, it's quite the other way around: __sock_recv_timestamp says
"does this skb have a software timestamp? if yes, I'll send the software
timestamp, otherwise the hardware one". This works because the software
timestamp is not shared with clones.

But that means we have a problem when we attempt hardware timestamping
with skbs that don't have the skb->tstamp == 0. __sock_recv_timestamp
will say "oh, yeah, this must be some sort of odd clone" and will not
deliver the hardware timestamp to the socket. And this is exactly what
is happening when we have txtime enabled on the socket: as mentioned,
that is put in a union with skb->tstamp, so it is quite easy to mistake
it.

Do what other drivers do (intel igb/igc) and write zero to skb->tstamp
before taking the hardware timestamp. It's of no use to us now (we're
already on the TX confirmation path).

Fixes: 0d08c9ec7d6e ("enetc: add support time specific departure base on the qos etf")
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: enetc: set MAC RX FIFO to recommended value
Alex Marginean [Sun, 7 Mar 2021 13:23:38 +0000 (15:23 +0200)] 
net: enetc: set MAC RX FIFO to recommended value

On LS1028A, the MAC RX FIFO defaults to the value 2, which is too high
and may lead to RX lock-up under traffic at a rate higher than 6 Gbps.
Set it to 1 instead, as recommended by the hardware design team and by
later versions of the ENETC block guide.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Jason Liu <jason.hui.liu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: davicom: Use platform_get_irq_optional()
Paul Cercueil [Sun, 7 Mar 2021 13:17:49 +0000 (13:17 +0000)] 
net: davicom: Use platform_get_irq_optional()

The second IRQ line really is optional, so use
platform_get_irq_optional() to obtain it.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: davicom: Fix regulator not turned off on driver removal
Paul Cercueil [Sun, 7 Mar 2021 13:17:48 +0000 (13:17 +0000)] 
net: davicom: Fix regulator not turned off on driver removal

We must disable the regulator that was enabled in the probe function.

Fixes: 7994fe55a4a2 ("dm9000: Add regulator and reset support to dm9000")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: davicom: Fix regulator not turned off on failed probe
Paul Cercueil [Sun, 7 Mar 2021 13:17:47 +0000 (13:17 +0000)] 
net: davicom: Fix regulator not turned off on failed probe

When the probe fails or requests to be defered, we must disable the
regulator that was previously enabled.

Fixes: 7994fe55a4a2 ("dm9000: Add regulator and reset support to dm9000")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
Vladimir Oltean [Sun, 7 Mar 2021 10:21:56 +0000 (12:21 +0200)] 
net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports

Tobias reports that after the blamed patch, VLAN objects being added to
a bridge device are being added to all slave ports instead (swp2, swp3).

ip link add br0 type bridge vlan_filtering 1
ip link set swp2 master br0
ip link set swp3 master br0
bridge vlan add dev br0 vid 100 self

This is because the fix was too broad: we made dsa_port_offloads_netdev
say "yes, I offload the br0 bridge" for all slave ports, but we didn't
add the checks whether the switchdev object was in fact meant for the
physical port or for the bridge itself. So we are reacting on events in
a way in which we shouldn't.

The reason why the fix was too broad is because the question itself,
"does this DSA port offload this netdev", was too broad in the first
place. The solution is to disambiguate the question and separate it into
two different functions, one to be called for each switchdev attribute /
object that has an orig_dev == net_bridge (dsa_port_offloads_bridge),
and the other for orig_dev == net_bridge_port (*_offloads_bridge_port).

In the case of VLAN objects on the bridge interface, this solves the
problem because we know that VLAN objects are per bridge port and not
per bridge. And when orig_dev is equal to the net_bridge, we offload it
as a bridge, but not as a bridge port; that's how we are able to skip
reacting on those events. Note that this is compatible with future plans
to have explicit offloading of VLAN objects on the bridge interface as a
bridge port (in DSA, this signifies that we should add that VLAN towards
the CPU port).

Fixes: 99b8202b179f ("net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignored")
Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
Tested-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: wan: fix error return code of uhdlc_init()
Jia-Ju Bai [Sun, 7 Mar 2021 09:12:56 +0000 (01:12 -0800)] 
net: wan: fix error return code of uhdlc_init()

When priv->rx_skbuff or priv->tx_skbuff is NULL, no error return code of
uhdlc_init() is assigned.
To fix this bug, ret is assigned with -ENOMEM in these cases.

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hisilicon: hns: fix error return code of hns_nic_clear_all_rx_fetch()
Jia-Ju Bai [Sun, 7 Mar 2021 08:40:12 +0000 (00:40 -0800)] 
net: hisilicon: hns: fix error return code of hns_nic_clear_all_rx_fetch()

When hns_assemble_skb() returns NULL to skb, no error return code of
hns_nic_clear_all_rx_fetch() is assigned.
To fix this bug, ret is assigned with -ENOMEM in this case.

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: usb: log errors to dmesg/syslog
Grant Grundler [Sat, 6 Mar 2021 22:12:32 +0000 (14:12 -0800)] 
net: usb: log errors to dmesg/syslog

Errors in protocol should be logged when the driver aborts operations.
If the driver can carry on and "humor" the device, then emitting
the message as debug output level is fine.

Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: usb: cdc_ncm: emit dev_err on error paths
Grant Grundler [Sat, 6 Mar 2021 22:12:31 +0000 (14:12 -0800)] 
net: usb: cdc_ncm: emit dev_err on error paths

Several error paths in bind/probe code will only emit
output using dev_dbg. But if we are going to fail the
bind/probe, emit related output with "err" priority.

Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: chelsio: inline_crypto: Mundane typos fixed throughout the file chcr_k...
Bhaskar Chowdhury [Sat, 6 Mar 2021 21:20:28 +0000 (02:50 +0530)] 
net: ethernet: chelsio: inline_crypto: Mundane typos fixed throughout the file chcr_ktls.c

Mundane typos fixes throughout the file.

s/establised/established/
s/availbale/available/
s/vaues/values/
s/Incase/In case/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: bcm_sf2: simplify optional reset handling
Philipp Zabel [Fri, 5 Mar 2021 09:14:48 +0000 (10:14 +0100)] 
net: dsa: bcm_sf2: simplify optional reset handling

As of commit bb475230b8e5 ("reset: make optional functions really
optional"), the reset framework API calls use NULL pointers to describe
optional, non-present reset controls.

This allows to unconditionally return errors from
devm_reset_control_get_optional_exclusive.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agolibbpf: Fix INSTALL flag order
Georgi Valkov [Mon, 8 Mar 2021 18:30:38 +0000 (10:30 -0800)] 
libbpf: Fix INSTALL flag order

It was reported ([0]) that having optional -m flag between source and
destination arguments in install command breaks bpftools cross-build
on MacOS. Move -m to the front to fix this issue.

  [0] https://github.com/openwrt/openwrt/pull/3959

Fixes: 7110d80d53f4 ("libbpf: Makefile set specified permission mode")
Signed-off-by: Georgi Valkov <gvalkov@abv.bg>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210308183038.613432-1-andrii@kernel.org
4 years agoMIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes
Bjørn Mork [Sun, 7 Mar 2021 18:23:01 +0000 (19:23 +0100)] 
MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes

The devicetree specification requires 8-byte alignment in
memory. This is now enforced by libfdt since commit 79edff12060f
("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9")
which included the upstream commit 5e735860c478 ("libfdt: Check for
8-byte address alignment in fdt_ro_probe_()").

This broke the MIPS raw appended DTBs which would be appended to
the image immediately following the initramfs section.  This ends
with a 32bit size, resulting in a 4-byte alignment of the DTB.

Fix by padding with zeroes to 8-bytes when MIPS_RAW_APPENDED_DTB
is defined.

Fixes: 79edff12060f ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9")
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agobpf: Change inode_storage's lookup_elem return value from NULL to -EBADF
Tal Lossos [Sun, 7 Mar 2021 12:09:48 +0000 (14:09 +0200)] 
bpf: Change inode_storage's lookup_elem return value from NULL to -EBADF

bpf_fd_inode_storage_lookup_elem() returned NULL when getting a bad FD,
which caused -ENOENT in bpf_map_copy_value. -EBADF error is better than
-ENOENT for a bad FD behaviour.

The patch was partially contributed by CyberArk Software, Inc.

Fixes: 8ea636848aca ("bpf: Implement bpf_local_storage for inodes")
Signed-off-by: Tal Lossos <tallossos@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210307120948.61414-1-tallossos@gmail.com
4 years agobpf: Dont allow vmlinux BTF to be used in map_create and prog_load.
Alexei Starovoitov [Sun, 7 Mar 2021 22:52:48 +0000 (14:52 -0800)] 
bpf: Dont allow vmlinux BTF to be used in map_create and prog_load.

The syzbot got FD of vmlinux BTF and passed it into map_create which caused
crash in btf_type_id_size() when it tried to access resolved_ids. The vmlinux
BTF doesn't have 'resolved_ids' and 'resolved_sizes' initialized to save
memory. To avoid such issues disallow using vmlinux BTF in prog_load and
map_create commands.

Fixes: 5329722057d4 ("bpf: Assign ID to vmlinux BTF and return extra info for BTF in GET_OBJ_INFO")
Reported-by: syzbot+8bab8ed346746e7540e8@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210307225248.79031-1-alexei.starovoitov@gmail.com
4 years agogpiolib: Read "gpio-line-names" from a firmware node
Andy Shevchenko [Fri, 5 Mar 2021 12:02:40 +0000 (14:02 +0200)] 
gpiolib: Read "gpio-line-names" from a firmware node

On STM32MP1, the GPIO banks are subnodes of pin-controller@50002000,
see arch/arm/boot/dts/stm32mp151.dtsi. The driver for
pin-controller@50002000 is in drivers/pinctrl/stm32/pinctrl-stm32.c
and iterates over all of its DT subnodes when registering each GPIO
bank gpiochip. Each gpiochip has:

  - gpio_chip.parent = dev,
    where dev is the device node of the pin controller
  - gpio_chip.of_node = np,
    which is the OF node of the GPIO bank

Therefore, dev_fwnode(chip->parent) != of_fwnode_handle(chip.of_node),
i.e. pin-controller@50002000 != pin-controller@50002000/gpio@5000*000.

The original code behaved correctly, as it extracted the "gpio-line-names"
from of_fwnode_handle(chip.of_node) = pin-controller@50002000/gpio@5000*000.

To achieve the same behaviour, read property from the firmware node.

Fixes: 7cba1a4d5e162 ("gpiolib: generalize devprop_gpiochip_set_names() for device properties")
Reported-by: Marek Vasut <marex@denx.de>
Reported-by: Roman Guskov <rguskov@dh-electronics.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
4 years agogpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
Andy Shevchenko [Thu, 25 Feb 2021 16:33:20 +0000 (18:33 +0200)] 
gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2

The commit 0ea683931adb ("gpio: dwapb: Convert driver to using the
GPIO-lib-based IRQ-chip") indeliberately made a regression on how
IRQ line from GPIO I²C expander is handled. I.e. it reveals that
the quirk for Intel Galileo Gen 2 misses the part of setting IRQ type
which previously was predefined by gpio-dwapb driver. Now, we have to
reorganize the approach to call necessary parts, which can be done via
ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk.

Without this fix and with above mentioned change the kernel hangs
on the first IRQ event with:

    gpio gpiochip3: Persistence not supported for GPIO 1
    irq 32, desc: 62f8fb50, depth: 0, count: 0, unhandled: 0
    ->handle_irq():  41c7b0ab, handle_bad_irq+0x0/0x40
    ->irq_data.chip(): e03f1e72, 0xc2539218
    ->action(): 0ecc7e6f
    ->action->handler(): 8a3db21e, irq_default_primary_handler+0x0/0x10
       IRQ_NOPROBE set
    unexpected IRQ trap at vector 20

Fixes: ba8c90c61847 ("gpio: pca953x: Override IRQ for one of the expanders on Galileo Gen 2")
Depends-on: 0ea683931adb ("gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
4 years agogpiolib: acpi: Allow to find GpioInt() resource by name and index
Andy Shevchenko [Thu, 25 Feb 2021 16:33:19 +0000 (18:33 +0200)] 
gpiolib: acpi: Allow to find GpioInt() resource by name and index

Currently only search by index is supported. However, in some cases
we might need to pass the quirks to the acpi_dev_gpio_irq_get().

For this, split out acpi_dev_gpio_irq_get_by() and replace
acpi_dev_gpio_irq_get() by calling above with NULL for name parameter.

Fixes: ba8c90c61847 ("gpio: pca953x: Override IRQ for one of the expanders on Galileo Gen 2")
Depends-on: 0ea683931adb ("gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
4 years agogpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
Andy Shevchenko [Thu, 25 Feb 2021 16:33:18 +0000 (18:33 +0200)] 
gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk

On some systems the ACPI tables has wrong pin number and instead of
having a relative one it provides an absolute one in the global GPIO
number space.

Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk to cope with such cases.

Fixes: ba8c90c61847 ("gpio: pca953x: Override IRQ for one of the expanders on Galileo Gen 2")
Depends-on: 0ea683931adb ("gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
4 years agogpiolib: acpi: Add missing IRQF_ONESHOT
Yang Li [Tue, 23 Feb 2021 08:35:58 +0000 (16:35 +0800)] 
gpiolib: acpi: Add missing IRQF_ONESHOT

fixed the following coccicheck:
./drivers/gpio/gpiolib-acpi.c:176:7-27: ERROR: Threaded IRQ with no
primary handler requested without IRQF_ONESHOT

Make sure threaded IRQs without a primary handler are always request
with IRQF_ONESHOT

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
4 years agogpio: fix gpio-device list corruption
Johan Hovold [Mon, 1 Mar 2021 09:05:19 +0000 (10:05 +0100)] 
gpio: fix gpio-device list corruption

Make sure to hold the gpio_lock when removing the gpio device from the
gpio_devices list (when dropping the last reference) to avoid corrupting
the list when there are concurrent accesses.

Fixes: ff2b13592299 ("gpio: make the gpiochip a real device")
Cc: stable@vger.kernel.org # 4.6
Reviewed-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
4 years agogpio: fix NULL-deref-on-deregistration regression
Johan Hovold [Mon, 1 Mar 2021 09:05:18 +0000 (10:05 +0100)] 
gpio: fix NULL-deref-on-deregistration regression

Fix a NULL-pointer deference when deregistering the gpio character
device that was introduced by the recent stub-driver hack. When the new
"driver" is unbound as part of deregistration, driver core clears the
driver-data pointer which is used to retrieve the struct gpio_device in
its release callback.

Fix this by using container_of() in the release callback as should have
been done all along.

Fixes: 4731210c09f5 ("gpiolib: Bind gpio_device to a driver to enable fw_devlink=on by default")
Cc: Saravana Kannan <saravanak@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: syzbot+d27b4c8adbbff70fbfde@syzkaller.appspotmail.com
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
4 years agoselftests: gpio: update .gitignore
Bartosz Golaszewski [Wed, 24 Feb 2021 18:53:16 +0000 (19:53 +0100)] 
selftests: gpio: update .gitignore

The executable that we build for GPIO selftests was renamed to
gpio-mockup-cdev. Let's update .gitignore so that we don't show it
as an untracked file.

Fixes: 8bc395a6a2e2 ("selftests: gpio: rework and simplify test implementation")
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Kent Gibson <warthog618@gmail.com>
4 years agocrypto: mips/poly1305 - enable for all MIPS processors
Maciej W. Rozycki [Wed, 3 Mar 2021 01:16:04 +0000 (02:16 +0100)] 
crypto: mips/poly1305 - enable for all MIPS processors

The MIPS Poly1305 implementation is generic MIPS code written such as to
support down to the original MIPS I and MIPS III ISA for the 32-bit and
64-bit variant respectively.  Lift the current limitation then to enable
code for MIPSr1 ISA or newer processors only and have it available for
all MIPS processors.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: a11d055e7a64 ("crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized implementation")
Cc: stable@vger.kernel.org # v5.5+
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: boot/compressed: Copy DTB to aligned address
Paul Cercueil [Wed, 3 Mar 2021 19:33:05 +0000 (19:33 +0000)] 
MIPS: boot/compressed: Copy DTB to aligned address

Since 5.12-rc1, the Device Tree blob must now be properly aligned.

Therefore, the decompress routine must be careful to copy the blob at
the next aligned address after the kernel image.

This commit fixes the kernel sometimes not booting with a Device Tree
blob appended to it.

Fixes: 79edff12060f ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMerge tag 'perf-tools-fixes-for-v5.12-2020-03-07' of git://git.kernel.org/pub/scm...
Linus Torvalds [Sun, 7 Mar 2021 21:35:49 +0000 (13:35 -0800)] 
Merge tag 'perf-tools-fixes-for-v5.12-2020-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:
 "Perf tool fixes:

   - Fix wrong skipping for per-die aggregation in 'perf stat'.

   - Fix race in signal handling on large core count machines, setting
     up signal handlers earlier.

   - Fix -F for branch & mem modes in 'perf report'.

   - Fix the condition checks for max number of NUMA nodes in 'perf
     bench numa'.

   - Fix crash in 'perf diff' error path.

   - Fix filtering of empty build-ids in 'perf archive'.

   - Ensure read cmdlines from libtraceevent are null terminated.

  Recent regressions:

   - Fix control fifo permissions in 'perf daemon'.

   - Fix 'perf daemon' compile error with ASAN.

   - Fix running 'perf daemon' test for non root user.

   - Fix PERF_SAMPLE_WEIGHT_STRUCT 'perf test' failure on non-x86
     arches.

   - Fix event's PMU name parsing related to new drm/i915/gt
     software-gt-awake-time event.

  Fixes from compiler instrumentation:

   - Fix leaks in 'perf test' entries, found using ASAN.

   - Fix use-after-free when 'perf stat -r' option is used.

  Arch specific:

   - Fix bitmap for option om ARM's CS-ETM.

  Documentation:

   - Fix documentation of verbose options.

  Build:

   - Clean 'generated' directory used for creating the syscall table on
     x86.

   - Fix ccache usage in $(CC) when generating arch errno table.

   - Cast (struct timeval).tv_sec when printing, fixing the build with
     MUSL libc.

   - Tighten snprintf() string precision to pass gcc check on some
     32-bit arches.

   - Update UAPI copies from the kernel sources.

   - Fix regression on feature detection 'make clean' target"

* tag 'perf-tools-fixes-for-v5.12-2020-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (44 commits)
  perf cs-etm: Fix bitmap for option
  perf trace: Fix race in signal handling
  perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
  perf report: Fix -F for branch & mem modes
  perf tests x86: Move insn.h include to make sure it finds stddef.h
  perf test: Support the ins_lat check in the X86 specific test
  perf test: Fix sample-parsing failure on non-x86 platforms
  perf archive: Fix filtering of empty build-ids
  perf daemon: Fix compile error with Asan
  perf stat: Fix use-after-free when -r option is used
  libperf: Add perf_evlist__reset_id_hash()
  perf stat: Fix wrong skipping for per-die aggregation
  tools headers UAPI: Sync KVM's kvm.h and vmx.h headers with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  tools headers UAPI: Update tools' copy of linux/coresight-pmu.h
  tools headers: Update syscall.tbl files to support mount_setattr
  perf test: Fix cpu and thread map leaks in perf_time_to_tsc test
  perf test: Fix cpu map leaks in cpu_map_print test
  perf test: Fix a memory leak in thread_map_remove test
  perf test: Fix a thread map leak in thread_map_synthesize test
  ...

4 years agoMerge branch 'parisc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Sun, 7 Mar 2021 21:28:31 +0000 (13:28 -0800)] 
Merge branch 'parisc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
 "Two small parisc architecture fixes: fix a linking failure reported by
  the kernel test robot and remove a duplicate include"

* 'parisc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  arch/parisc/kernel: remove duplicate include in ptrace
  parisc: Enable -mlong-calls gcc option with CONFIG_COMPILE_TEST

4 years agoMerge tag 'powerpc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 7 Mar 2021 21:24:44 +0000 (13:24 -0800)] 
Merge tag 'powerpc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "One non-fix, the conversion of vio_driver->remove() to return void,
  which touches various powerpc specific drivers.

  Fix the privilege checks we do in our perf handling, which could cause
  soft/hard lockups in some configurations.

  Fix a bug with IRQ affinity seen on kdump kernels when CPU 0 is
  offline in the second kernel.

  Fix missed page faults after mprotect(..., PROT_NONE) on 603 (32-bit).

  Fix a bug in our VSX (vector) instruction emulation, which should only
  be seen when doing VSX ops to cache inhibited mappings.

  Three commits fixing various build issues with obscure configurations.

  Thanks to Athira Rajeev, Cédric Le Goater, Christophe Leroy, Christoph
  Plattner, Greg Kurz, Jordan Niethe, Laurent Vivier, Ravi Bangoria,
  Tyrel Datwyler, and Uwe Kleine-König"

* tag 'powerpc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/sstep: Fix VSX instruction emulation
  powerpc/perf: Fix handling of privilege level checks in perf interrupt context
  powerpc: Force inlining of mmu_has_feature to fix build failure
  vio: make remove callback return void
  powerpc/syscall: Force inlining of __prep_irq_for_enabled_exit()
  powerpc/603: Fix protection of user pages mapped with PROT_NONE
  powerpc/pseries: Don't enforce MSI affinity with kdump
  powerpc/4xx: Fix build errors from mfdcr()

4 years agoMerge tag 'm68k-for-v5.12-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 7 Mar 2021 21:20:50 +0000 (13:20 -0800)] 
Merge tag 'm68k-for-v5.12-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k

Pull m68k fix from Geert Uytterhoeven:
 "Fix virt_addr_valid() W=1 compiler warnings.

  This is a single non-critical fix. As the build bots are now testing
  all new code with W=1, these warnings are popping up everywhere,
  confusing people. Hence I think it makes sense to silence it as soon
  as possible"

* tag 'm68k-for-v5.12-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: Fix virt_addr_valid() W=1 compiler warnings

4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Sun, 7 Mar 2021 01:02:40 +0000 (17:02 -0800)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix incorrect enum type definition in nfnetlink_cthelper UAPI,
   from Dmitry V. Levin.

2) Remove extra space in deprecated automatic helper assignment
   notice, from Klemen Košir.

3) Drop early socket demux socket after NAT mangling, from
   Florian Westphal. Add a test to exercise this bug.

4) Fix bogus invalid packet report in the conntrack TCP tracker,
   also from Florian.

5) Fix access to xt[NFPROTO_UNSPEC] list with no mutex
   in target/match_revfn(), from Vasily Averin.

6) Disallow updates on the table ownership flag.

7) Fix double hook unregistration of tables with owner.

8) Remove bogus check on the table owner in __nft_release_tables().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoperf cs-etm: Fix bitmap for option
Suzuki K Poulose [Sat, 6 Feb 2021 15:08:29 +0000 (23:08 +0800)] 
perf cs-etm: Fix bitmap for option

When set option with macros ETM_OPT_CTXTID and ETM_OPT_TS, it wrongly
takes these two values (14 and 28 prespectively) as bit masks, but
actually both are the offset for bits.  But this doesn't lead to
further failure due to the AND logic operation will be always true for
ETM_OPT_CTXTID / ETM_OPT_TS.

This patch defines new independent macros (rather than using the
"config" bits) for requesting the "contextid" and "timestamp" for
cs_etm_set_option().

Signed-off-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Cc: Al Grant <al.grant@arm.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Cc: Denis Nikitin <denik@chromium.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: http://lore.kernel.org/lkml/20210206150833.42120-5-leo.yan@linaro.org
[ Extract the change as a separate patch for easier review ]
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Fix race in signal handling
Michael Petlan [Tue, 2 Mar 2021 14:41:20 +0000 (15:41 +0100)] 
perf trace: Fix race in signal handling

Since a lot of stuff happens before the SIGINT signal handler is registered
(scanning /proc/*, etc.), on bigger systems, such as Cavium Sabre CN99xx,
it may happen that first interrupt signal is lost and perf isn't correctly
terminated.

The reproduction code might look like the following:

    perf trace -a &
    PERF_PID=$!
    sleep 4
    kill -INT $PERF_PID

The issue has been found on a CN99xx machine with RHEL-8 and the patch fixes
it by registering the signal handlers earlier in the init stage.

Suggested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/lkml/YEJnaMzH2ctp3PPx@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
Arnaldo Carvalho de Melo [Fri, 5 Mar 2021 13:02:09 +0000 (10:02 -0300)] 
perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches

Noticed on a debian:experimental mips and mipsel cross build build
environment:

  perfbuilder@ec265a086e9b:~$ mips-linux-gnu-gcc --version | head -1
  mips-linux-gnu-gcc (Debian 10.2.1-3) 10.2.1 20201224
  perfbuilder@ec265a086e9b:~$

    CC       /tmp/build/perf/util/map.o
  util/map.c: In function 'map__new':
  util/map.c:109:5: error: '%s' directive output may be truncated writing between 1 and 2147483645 bytes into a region of size 4096 [-Werror=format-truncation=]
    109 |    "%s/platforms/%s/arch-%s/usr/lib/%s",
        |     ^~
  In file included from /usr/mips-linux-gnu/include/stdio.h:867,
                   from util/symbol.h:11,
                   from util/map.c:2:
  /usr/mips-linux-gnu/include/bits/stdio2.h:67:10: note: '__builtin___snprintf_chk' output 32 or more bytes (assuming 4294967321) into a destination of size 4096
     67 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
        |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     68 |        __bos (__s), __fmt, __va_arg_pack ());
        |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

Since we have the lenghts for what lands in that place, use it to give
the compiler more info and make it happy.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf report: Fix -F for branch & mem modes
Ravi Bangoria [Thu, 4 Mar 2021 06:29:58 +0000 (11:59 +0530)] 
perf report: Fix -F for branch & mem modes

perf report fails to add valid additional fields with -F when
used with branch or mem modes. Fix it.

Before patch:

  $ perf record -b
  $ perf report -b -F +srcline_from --stdio
  Error:
  Invalid --fields key: `srcline_from'

After patch:

  $ perf report -b -F +srcline_from --stdio
  # Samples: 8K of event 'cycles'
  # Event count (approx.): 8784
  ...

Committer notes:

There was an inversion: when looking at branch stack dimensions (keys)
it was checking if the sort mode was 'mem', not 'branch'.

Fixes: aa6b3c99236b ("perf report: Make -F more strict like -s")
Reported-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20210304062958.85465-1-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tests x86: Move insn.h include to make sure it finds stddef.h
Arnaldo Carvalho de Melo [Thu, 4 Mar 2021 12:51:52 +0000 (09:51 -0300)] 
perf tests x86: Move insn.h include to make sure it finds stddef.h

In some versions of alpine Linux the perf build is broken since commit
1d509f2a6ebca1ae ("x86/insn: Support big endian cross-compiles"):

  In file included from /usr/include/linux/byteorder/little_endian.h:13,
                   from /usr/include/asm/byteorder.h:5,
                   from arch/x86/util/../../../../arch/x86/include/asm/insn.h:10,
                   from arch/x86/util/archinsn.c:2:
  /usr/include/linux/swab.h:161:8: error: unknown type name '__always_inline'
   static __always_inline __u16 __swab16p(const __u16 *p)

So move the inclusion of arch/x86/include/asm/insn.h to later in the
places where linux/stddef.h (that conditionally defines
__always_inline) to workaround this problem on Alpine Linux 3.9 to 3.11,
3.12 onwards works.

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf test: Support the ins_lat check in the X86 specific test
Kan Liang [Wed, 3 Mar 2021 16:01:25 +0000 (08:01 -0800)] 
perf test: Support the ins_lat check in the X86 specific test

The ins_lat of PERF_SAMPLE_WEIGHT_STRUCT stands for the instruction
latency, which is only available for X86. Add a X86 specific test for
the ins_lat and PERF_SAMPLE_WEIGHT_STRUCT type.

The test__x86_sample_parsing() uses the same way as the
test__sample_parsing() to verify a sample type. Since the ins_lat and
PERF_SAMPLE_WEIGHT_STRUCT are the only X86 specific sample type for now,
the test__x86_sample_parsing() only verify the PERF_SAMPLE_WEIGHT_STRUCT
type. Other sample types are still verified in the generic test.

  $ perf test 77 -v
  77: x86 Sample parsing                                              :
  --- start ---
  test child forked, pid 102370
  test child finished with 0
  ---- end ----
  x86 Sample parsing: Ok

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/1614787285-104151-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf test: Fix sample-parsing failure on non-x86 platforms
Kan Liang [Wed, 3 Mar 2021 16:01:24 +0000 (08:01 -0800)] 
perf test: Fix sample-parsing failure on non-x86 platforms

Executing 'perf test 27' fails on s390:

  [root@t35lp46 perf]# ./perf test -Fv 27
  27: Sample parsing
  --- start ---
  ---- end ----
  Sample parsing: FAILED!
  [root@t35lp46 perf]#

The commit fbefe9c2f87fd392 ("perf tools: Support arch specific
PERF_SAMPLE_WEIGHT_STRUCT processing") changes the ins_lat to a
model-specific variable only for X86, but perf test still verify the
variable in the generic test.

Remove the ins_lat check in the generic test. The following patch will
add it in the X86 specific test.

Fixes: fbefe9c2f87fd392 ("perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing")
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/1614787285-104151-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf archive: Fix filtering of empty build-ids
Nicholas Fraser [Fri, 19 Feb 2021 16:09:32 +0000 (11:09 -0500)] 
perf archive: Fix filtering of empty build-ids

A non-existent build-id used to be treated as all-zero SHA-1 hash.
Build-ids are now variable width. A non-existent build-id is an empty
string and "perf buildid-list" pads this with spaces. This is true even
when using old perf.data files recorded from older versions of perf;
"perf buildid-list" never reports an all-zero hash anymore.

This fixes "perf-archive" to skip missing build-ids by skipping lines
that start with a padding space rather than with zeroes.

Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Huw Davies <huw@codeweavers.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ulrich Czekalla <uczekalla@codeweavers.com>
Link: https://lore.kernel.org/r/442bffc7-ac5c-0975-b876-a549efce2413@codeweavers.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf daemon: Fix compile error with Asan
Namhyung Kim [Wed, 24 Feb 2021 07:14:38 +0000 (16:14 +0900)] 
perf daemon: Fix compile error with Asan

I'm seeing a build failure when build with address sanitizer.  It seems
we could write to the name[100] if the var is longer.

  $ make EXTRA_CFLAGS=-fsanitize=address
  ...
    CC       builtin-daemon.o
  In function ‘get_session_name’,
    inlined from ‘session_config’ at builtin-daemon.c:164:6,
    inlined from ‘server_config’ at builtin-daemon.c:223:10:
  builtin-daemon.c:155:11: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
    155 |  *session = 0;
        |  ~~~~~~~~~^~~
  builtin-daemon.c: In function ‘server_config’:
  builtin-daemon.c:162:7: note: at offset 100 to object ‘name’ with size 100 declared here
    162 |  char name[100];
        |       ^~~~

Fixes: c0666261ff38 ("perf daemon: Add config file support")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210224071438.686677-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>