Since v4.9 i2c-tiny-usb generates the below call trace
and longer works, since it can't communicate with the
USB device. The reason is, that since v4.9 the USB
stack checks, that the buffer it should transfer is DMA
capable. This was a requirement since v2.2 days, but it
usually worked nevertheless.
Commit fa01e2ca9f53 ("serial: 8250: Integrate Fintek into 8250_base")
modified the probing logic for PNP0501 devices, to remove a collision
between the generic 16550A driver and the Fintek driver, which reused
the same ACPI _HID.
The Fintek device probe is now incorporated into the common 8250 probe
path, and gets called for all discovered 16550A compatible devices,
including ones that are MMIO mapped rather than IO mapped. However,
the Fintek driver assumes the port base is a I/O address, and proceeds
to probe some arbitrary offsets above it.
This is generally a wrong thing to do, but on ARM systems (having no
native port I/O), this may result in faulting accesses of completely
unrelated MMIO regions in the PCI I/O space. Given that this is at
serial probe time, this results in hard to diagnose crashes at boot.
So let's restrict the Fintek probe to devices that we know are using
port I/O in the first place.
The port client data must be set when registering the serdev controller
or client deregistration will fail (and the serdev devices are left
registered and allocated) if the port was never opened in between.
Make sure to clear the port client data on any probe errors to avoid a
use-after-free when the client is later deregistered unconditionally
(e.g. in a tty-port deregistration helper).
Also move port client operation initialisation to registration. Note
that the client ops must be restored on failed probe.
Fixes: bed35c6dfa6a ("serdev: add a tty port controller driver") Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The new serdev bus hooked into the tty layer in
tty_port_register_device() by registering a serdev controller instead of
a tty device whenever a serdev client is present, and by deregistering
the controller in the tty-port destructor. This is broken in several
ways:
Firstly, it leads to a NULL-pointer dereference whenever a tty driver
later deregisters its devices as no corresponding character device will
exist.
Secondly, far from every tty driver uses tty-port refcounting (e.g.
serial core) so the serdev devices might never be deregistered or
deallocated.
Thirdly, deregistering at tty-port destruction is too late as the
underlying device and structures may be long gone by then. A port is not
released before an open tty device is closed, something which a
registered serdev client can prevent from ever happening. A driver
callback while the device is gone typically also leads to crashes.
Many tty drivers even keep their ports around until the driver is
unloaded (e.g. serial core), something which even if a late callback
never happens, leads to leaks if a device is unbound from its driver and
is later rebound.
The right solution here is to add a new tty_port_unregister_device()
helper and to never call tty_device_unregister() whenever the port has
been claimed by serdev, but since this requires modifying just about
every tty driver (and multiple subsystems) it will need to be done
incrementally.
Reverting the offending patch is the first step in fixing the broken
lifetime assumptions. A follow-up patch will add a new pair of
tty-device registration helpers, which a vetted tty driver can use to
support serdev (initially serial core). When every tty driver uses the
serdev helpers (at least for deregistration), we can add serdev
registration to tty_port_register_device() again.
Note that this also fixes another issue with serdev, which currently
allocates and registers a serdev controller for every tty device
registered using tty_port_device_register() only to immediately
deregister and deallocate it when the corresponding OF node or serdev
child node is missing. This should be addressed before enabling serdev
for hot-pluggable buses.
Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit ac29c64089b7 ("powerpc/mm: Replace _PAGE_USER with
_PAGE_PRIVILEGED") swapped _PAGE_USER for _PAGE_PRIVILEGED, and
introduced check_pte_access() which denied kernel access to
non-_PAGE_PRIVILEGED pages.
However, it didn't add _PAGE_PRIVILEGED to the hash fault handler
for spufs' kernel accesses, so the DMAs required to establish SPE
memory no longer work.
This change adds _PAGE_PRIVILEGED to the hash fault handler for
kernel accesses.
Fixes: ac29c64089b7 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED") Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Reported-by: Sombat Tragolgosol <sombat3960@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently if you disable CONFIG_PPC_RADIX_MMU you'll crash on boot on
a P9. This is because we still set MMU_FTR_TYPE_RADIX via
ibm,pa-features and MMU_FTR_TYPE_RADIX is what's used for code patching
in much of the asm code (ie. slb_miss_realmode)
This patch fixes the problem by stopping MMU_FTR_TYPE_RADIX from being
set from ibm.pa-features.
We may eventually end up removing the CONFIG_PPC_RADIX_MMU option
completely but until then this fixes the issue.
Fixes: 17a3dd2f5fc7 ("powerpc/mm/radix: Use firmware feature to enable Radix MMU") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The incorrect UFS s_maxsize limit became a problem as of commit c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
which started using s_maxbytes to avoid a page index overflow in
do_generic_file_read().
That caused files to be truncated on UFS-2 file systems because the
default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it.
Here I simply increase the default to a common value used by other file
systems.
Signed-off-by: Richard Narron <comet.berkeley@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Will B <will.brokenbourgh2877@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ftrace function_graph time measurements of a given function is not
accurate according to those recorded by ftrace using the function
filters. This change pulls the x86_64 fix from 'commit 722b3c746953
("ftrace/graph: Trace function entry before updating index")' into the
sparc specific prepare_ftrace_return which stops ftrace from
counting interrupted tasks in the time measurement.
Example measurements for select_task_rq_fair running "hackbench 100
process 1000":
| tracing/trace_stat/function0 | function_graph
Before patch | 2.802 us | 4.255 us
After patch | 2.749 us | 3.094 us
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
GCC 7 introduced the -Wstringop-overflow flag to detect buffer overflows
in calls to string handling functions [1][2]. Due to the way
``empty_zero_page'' is declared in arch/sparc/include/setup.h, this
causes a warning to trigger at compile time in the function mem_init(),
which is subsequently converted to an error. The ensuing patch fixes
this issue and aligns the declaration of empty_zero_page to that of
other architectures. Thank you.
Signed-off-by: Orlando Arias <oarias@knights.ucf.edu>
-------------------------------------------------------------------------------- Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Current limits with regards to processing program paths do not
really reflect today's needs anymore due to programs becoming
more complex and verifier smarter, keeping track of more data
such as const ALU operations, alignment tracking, spilling of
PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
smarter matching of what LLVM generates.
This also comes with the side-effect that we result in fewer
opportunities to prune search states and thus often need to do
more work to prove safety than in the past due to different
register states and stack layout where we mismatch. Generally,
it's quite hard to determine what caused a sudden increase in
complexity, it could be caused by something as trivial as a
single branch somewhere at the beginning of the program where
LLVM assigned a stack slot that is marked differently throughout
other branches and thus causing a mismatch, where verifier
then needs to prove safety for the whole rest of the program.
Subsequently, programs with even less than half the insn size
limit can get rejected. We noticed that while some programs
load fine under pre 4.11, they get rejected due to hitting
limits on more recent kernels. We saw that in the vast majority
of cases (90+%) pruning failed due to register mismatches. In
case of stack mismatches, majority of cases failed due to
different stack slot types (invalid, spill, misc) rather than
differences in spilled registers.
This patch makes pruning more aggressive by also adding markers
that sit at conditional jumps as well. Currently, we only mark
jump targets for pruning. For example in direct packet access,
these are usually error paths where we bail out. We found that
adding these markers, it can reduce number of processed insns
by up to 30%. Another option is to ignore reg->id in probing
PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
slightly as well by up to 7% observed complexity reduction as
stand-alone. Meaning, if a previous path with register type
PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
the same map X must be safe as well. Last but not least the
patch also adds a scheduling point and bumps the current limit
for instructions to be processed to a more adequate value.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
attr->map_flags, since it does not support preallocation yet. We
check the flag, but we never copy the flag into trie->map.map_flags,
which is later on exposed into fdinfo and used by loaders such as
iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
whether a pinned map has the same spec as the one from the BPF obj
file and if not, bails out, which is currently the case for lpm
since it exposes always 0 as flags.
Also copy over flags in array_map_alloc() and stack_map_alloc().
They always have to be 0 right now, but we should make sure to not
miss to copy them over at a later point in time when we add actual
flags for them to use.
Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation") Reported-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The bpf_clone_redirect() still needs to be listed in
bpf_helper_changes_pkt_data() since we call into
bpf_try_make_head_writable() from there, thus we need
to invalidate prior pkt regs as well.
Fixes: 36bbef52c7eb ("bpf: direct packet write and access for helpers for clsact progs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2) At the same time run following loop :
while :
do
ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
done
Cong Wang attempted to add back rt->fi in commit 82486aa6f1b9 ("ipv4: restore rt->fi for reference counting")
but this proved to add some issues that were complex to solve.
Instead, I suggested to add a refcount to the metrics themselves,
being a standalone object (in particular, no reference to other objects)
I tried to make this patch as small as possible to ease its backport,
instead of being super clean. Note that we believe that only ipv4 dst
need to take care of the metric refcount. But if this is wrong,
this patch adds the basic infrastructure to extend this to other
families.
Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
for his efforts on this problem.
Fixes: 2860583fe840 ("ipv4: Kill rt->fi") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fix addresses two problems in the way the DSCP field is formulated
on the encapsulating header of IPv6 tunnels.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661
1) The IPv6 tunneling code was manipulating the DSCP field of the
encapsulating packet using the 32b flowlabel. Since the flowlabel is
only the lower 20b it was incorrect to assume that the upper 12b
containing the DSCP and ECN fields would remain intact when formulating
the encapsulating header. This fix handles the 'inherit' and
'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.
2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
incorrect and resulted in the DSCP value always being set to 0.
Commit 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class
is non-0") caused the regression by masking out the flowlabel
which exposed the incorrect handling of the DSCP portion of the
flowlabel in ip6_tunnel and ip6_gre.
Fixes: 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class is non-0") Signed-off-by: Peter Dawson <peter.a.dawson@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sometimes ICMP replies to INIT chunks are ignored by the client, even if
the encapsulated SCTP headers match an open socket. This happens when the
ICMP packet is carried by a paged skb: use skb_header_pointer() to read
packet contents beyond the SCTP header, so that chunk header and initiate
tag are validated correctly.
v2:
- don't use skb_header_pointer() to read the transport header, since
icmp_socket_deliver() already puts these 8 bytes in the linear area.
- change commit message to make specific reference to INIT chunks.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fastopen API should be used to perform fastopen operations on the TCP
socket. It does not make sense to use fastopen API to perform disconnect
by calling it with AF_UNSPEC. The fastopen data path is also prone to
race conditions and bugs when using with AF_UNSPEC.
One issue reported and analyzed by Vegard Nossum is as follows:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thread A: Thread B:
------------------------------------------------------------------------
sendto()
- tcp_sendmsg()
- sk_stream_memory_free() = 0
- goto wait_for_sndbuf
- sk_stream_wait_memory()
- sk_wait_event() // sleep
| sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
| - tcp_sendmsg()
| - tcp_sendmsg_fastopen()
| - __inet_stream_connect()
| - tcp_disconnect() //because of AF_UNSPEC
| - tcp_transmit_skb()// send RST
| - return 0; // no reconnect!
| - sk_stream_wait_connect()
| - sock_error()
| - xchg(&sk->sk_err, 0)
| - return -ECONNRESET
- ... // wake up, see sk->sk_err == 0
- skb_entail() on TCP_CLOSE socket
If the connection is reopened then we will send a brand new SYN packet
after thread A has already queued a buffer. At this point I think the
socket internal state (sequence numbers etc.) becomes messed up.
When the new connection is closed, the FIN-ACK is rejected because the
sequence number is outside the window. The other side tries to
retransmit,
but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
and return EOPNOTSUPP to user if such case happens.
Fixes: cf60af03ca4e7 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)") Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") fill_info
does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
because it uses ip_tunnel_info_af() with the device level info, which is
not valid for COLLECT_METADATA.
Fix by checking for the presence of the actual sockets.
Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") Signed-off-by: Eric Garver <e@erig.me> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since virtio does not provide it's own ndo_features_check handler,
TSO, and now checksum offload, are disabled for stacked vlans.
Re-enable the support and let the host take care of it. This
restores/improves Guest-to-Guest performance over Q-in-Q vlans.
Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At least some of the be2net cards do not seem to be capabled
of performing checksum offload computions on Q-in-Q packets.
In these case, the recevied checksum on the remote is invalid
and TCP syn packets are dropped.
This patch adds a call to check disbled acceleration features
on Q-in-Q tagged traffic.
CC: Sathya Perla <sathya.perla@broadcom.com> CC: Ajit Khaparde <ajit.khaparde@broadcom.com> CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> CC: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It appears that TCP checksum offloading has been broken for
Q-in-Q vlans. The behavior was execerbated by the
series
commit afb0bc972b52 ("Merge branch 'stacked_vlan_tso'")
that that enabled accleleration features on stacked vlans.
However, event without that series, it is possible to trigger
this issue. It just requires a lot more specialized configuration.
The root cause is the interaction between how
netdev_intersect_features() works, the features actually set on
the vlan devices and HW having the ability to run checksum with
longer headers.
The issue starts when netdev_interesect_features() replaces
NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
if the HW advertises IP|IPV6 specific checksums. This happens
for tagged and multi-tagged packets. However, HW that enables
IP|IPV6 checksum offloading doesn't gurantee that packets with
arbitrarily long headers can be checksummed.
This patch disables IP|IPV6 checksums on the packet for multi-tagged
packets.
CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> CC: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 88m1101 has an errata when configuring autoneg. However, it was
being applied to many other Marvell PHYs as well. Limit its scope to
just the 88m1101.
Fixes: 76884679c644 ("phylib: Add support for Marvell 88e1111S and 88e1145") Reported-by: Daniel Walker <danielwa@cisco.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Harini Katakam <harinik@xilinx.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently when firmware command gets stuck or it takes long time to
complete, the driver command will get timeout and the command slot is
freed and can be used for new commands, and if the firmware receive new
command on the old busy slot its behavior is unexpected and this could
be harmful.
To fix this when the driver command gets timeout we return failure,
but we don't free the command slot and we wait for the firmware to
explicitly respond to that command.
Once all the entries are busy we will stop processing new firmware
commands.
As of 7bb11dc9f59d and 0622cab0341c, bond slaves in a 3ad bond are not
removed from the aggregator when they are down, and the active slave count
is NOT equal to number of ports in the aggregator, but rather the number
of ports in the aggregator that are still enabled. The sysfs spew for
bonding_show_ad_num_ports() has a comment that says "Show number of active
802.3ad ports.", but it's currently showing total number of ports, both
active and inactive. Remedy it by using the same logic introduced in 0622cab0341c in __bond_3ad_get_active_agg_info(), so sysfs, procfs and
netlink all report the number of active ports. Note that this means that
IFLA_BOND_AD_INFO_NUM_PORTS really means NUM_ACTIVE_PORTS instead of
NUM_PORTS, and thus perhaps should be renamed for clarity.
Lightly tested on a dual i40e lacp bond, simulating link downs with an ip
link set dev <slave2> down, was able to produce the state where I could
see both in the same aggregator, but a number of ports count of 1.
MII Status: up
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2 <---
Slave Interface: ens10
MII Status: up <---
Aggregator ID: 1
Slave Interface: ens11
MII Status: up
Aggregator ID: 1
MII Status: up
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 1 <---
Slave Interface: ens10
MII Status: down <---
Aggregator ID: 1
Slave Interface: ens11
MII Status: up
Aggregator ID: 1
CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
kernel hello and hold timers"), bridge would not start hello_timer if
stp_enabled is not KERNEL_STP when br_dev_open.
The problem is even if users set stp_enabled with KERNEL_STP later,
the timer will still not be started. It causes that KERNEL_STP can
not really work. Users have to re-ifup the bridge to avoid this.
This patch is to fix it by starting br->hello_timer when enabling
KERNEL_STP in br_stp_start.
As an improvement, it's also to start hello_timer again only when
br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
no reason to start the timer again when it's NO_STP.
Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers") Reported-by: Haidong Li <haili@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In their infinite wisdom, and never ending quest for end user frustration,
Lenovo has decided to use a new USB device ID for the wwan modules in
their 2017 laptops. The actual hardware is still the Sierra Wireless
EM7455 or EM7430, depending on region.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently it is allowed to set the default pvid of a bridge to a value
above VLAN_VID_MASK (0xfff). This patch adds a check to br_validate and
returns -EINVAL in case the pvid is out of bounds.
Reproduce by calling:
[root@test ~]# ip l a type bridge
[root@test ~]# ip l a type dummy
[root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
[root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999
[root@test ~]# ip l s dummy0 master bridge0
[root@test ~]# bridge vlan
port vlan ids
bridge0 9999 PVID Egress Untagged
dummy0 9999 PVID Egress Untagged
Fixes: 0f963b7592ef ("bridge: netlink: add support for default_pvid") Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do not use unsigned variables to see if it returns a negative
error or not.
Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options") Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The KASAN warning repoted below was discovered with a syzkaller
program. The reproducer is basically:
int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
send(s, &one_byte_of_data, 1, MSG_MORE);
send(s, &more_than_mtu_bytes_data, 2000, 0);
The socket() call sets the nexthdr field of the v6 header to
NEXTHDR_HOP, the first send call primes the payload with a non zero
byte of data, and the second send call triggers the fragmentation path.
The fragmentation code tries to parse the header options in order
to figure out where to insert the fragment option. Since nexthdr points
to an invalid option, the calculation of the size of the network header
can made to be much larger than the linear section of the skb and data
is read outside of it.
This fix makes ip6_find_1stfrag return an error if it detects
running out-of-bounds.
In general, rtnetlink dumps do not anticipate failure to dump a single
object (e.g., link or route) on a single pass. As both route and link
objects have grown via more attributes, that is no longer a given.
netlink dumps can handle a failure if the dump function returns an
error; specifically, netlink_dump adds the return code to the response
if it is <= 0 so userspace is notified of the failure. The missing
piece is the rtnetlink dump functions returning the error.
Fix route and link dump functions to return the errors if no object is
added to an skb (detected by skb->len != 0). IPv6 route dumps
(rt6_dump_route) already return the error; this patch updates IPv4 and
link dumps. Other dump functions may need to be ajusted as well.
Reported-by: Jan Moskyto Matejka <mq@ucw.cz> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver explicitly bypasses APIs to register all memory once a
connection is made, and thus allows remote access to memory.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, SMC enables remote access to physical memory when a user
has successfully configured and established an SMC-connection until ten
minutes after the last SMC connection is closed. Because this is considered
a security risk, drivers are supposed to use IB_PD_UNSAFE_GLOBAL_RKEY in
such a case.
This patch changes the current SMC code to use IB_PD_UNSAFE_GLOBAL_RKEY.
This improves user awareness, but does not remove the security risk itself.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tcp_ack() can call tcp_fragment() which may dededuct the
value tp->fackets_out when MSS changes. When prior_fackets
is larger than tp->fackets_out, tcp_clean_rtx_queue() can
invoke tcp_update_reordering() with negative values. This
results in absurd tp->reodering values higher than
sysctl_tcp_max_reordering.
Note that tcp_update_reordering indeeds sets tp->reordering
to min(sysctl_tcp_max_reordering, metric), but because
the comparison is signed, a negative metric always wins.
Fixes: c7caf8d3ed7a ("[TCP]: Fix reord detection due to snd_una covered holes") Reported-by: Rebecca Isaacs <risaacs@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When using a TX ring buffer, if an error occurs processing a control
message (e.g. invalid message), the net_device reference is not
released.
Fixes c14ac9451c348 ("sock: enable timestamping using control messages") Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 0ca50d12fe46 ("sctp: fix src address selection if using secondary
addresses") has fixed a src address selection issue when using secondary
addresses for ipv4.
Now sctp ipv6 also has the similar issue. When using a secondary address,
sctp_v6_get_dst tries to choose the saddr which has the most same bits
with the daddr by sctp_v6_addr_match_len. It may make some cases not work
as expected.
route from hostA to hostB:
fd21:356b:459a:cf30::/64 dev eth1 metric 1024 mtu 1500
The expected path should be:
fd21:356b:459a:cf10::11 <-> fd21:356b:459a:cf30::2
But addr[2] matches addr[a] more bits than addr[1] does, according to
sctp_v6_addr_match_len. It causes the path to be:
fd21:356b:459a:cf20::11 <-> fd21:356b:459a:cf30::2
This patch is to fix it with the same way as Marcelo's fix for sctp ipv4.
As no ip_dev_find for ipv6, this patch is to use ipv6_chk_addr to check
if the saddr is in a dev instead.
Note that for backwards compatibility, it will still do the addr_match_len
check here when no optimal is found.
Reported-by: Patrick Talbert <ptalbert@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The macro tipc_wait_for_cond() is embedding the macro sk_wait_event()
to fulfil its task. The latter, in turn, is evaluating the stated
condition outside the socket lock context. This is problematic if
the condition is accessing non-trivial data structures which may be
altered by incoming interrupts, as is the case with the cong_links()
linked list, used by socket to keep track of the current set of
congested links. We sometimes see crashes when this list is accessed
by a condition function at the same time as a SOCK_WAKEUP interrupt
is removing an element from the list.
We fix this by expanding selected parts of sk_wait_event() into the
outer macro, while ensuring that all evaluations of a given condition
are performed under socket lock protection.
Fixes: commit 365ad353c256 ("tipc: reduce risk of user starvation during link congestion") Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes a bug in splitting an SKB during SACK
processing. Specifically if an skb contains multiple
packets and is only partially sacked in the higher sequences,
tcp_match_sack_to_skb() splits the skb and marks the second fragment
as SACKed.
The current code further attempts rounding up the first fragment
to MSS boundaries. But it misses a boundary condition when the
rounded-up fragment size (pkt_len) is exactly skb size. Spliting
such an skb is pointless and causses a kernel warning and aborts
the SACK processing. This patch universally checks such over-split
before calling tcp_fragment to prevent these unnecessary warnings.
Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If CONFIG_INET is not set, net/core/sock.c can not compile :
net/core/sock.c: In function ‘skb_orphan_partial’:
net/core/sock.c:1810:2: error: implicit declaration of function
‘skb_is_tcp_pure_ack’ [-Werror=implicit-function-declaration]
if (skb_is_tcp_pure_ack(skb))
^
Fix this by always including <net/tcp.h>
Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I should have known that lowering skb->truesize was dangerous :/
In case packets are not leaving the host via a standard Ethernet device,
but looped back to local sockets, bad things can happen, as reported
by Michael Madsen ( https://bugzilla.kernel.org/show_bug.cgi?id=195713 )
So instead of tweaking skb->truesize, lets change skb->destructor
and keep a reference on the owner socket via its sk_refcnt.
Fixes: f2f872f9272a ("netem: Introduce skb_orphan_partial() helper") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Michael Madsen <mkm@nabto.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shubham was recently asking on netdev why in arm64 JIT we don't multiply
the index for accessing the tail call map by 8. That led me into testing
out arm64 JIT wrt tail calls and it turned out I got a NULL pointer
dereference on the tail call.
The buggy access is at:
prog = array->ptrs[index];
if (prog == NULL)
goto out;
The code triggering the crash is f862694b. x1 at the time contains the
address of the bpf array, x10 offsetof(struct bpf_array, ptrs). Meaning,
above we load the pointer to the program at map slot 0 into x10. x10
can then be NULL if the slot is not occupied, which we later on try to
access with a user given offset in x2 that is the map index.
This basically adds the offset to ptrs to the base address of the bpf
array we got and we later on access the map with an index * 8 offset
relative to that. The tail call map itself is basically one large area
with meta data at the head followed by the array of prog pointers.
This makes tail calls working again, tested on Cavium ThunderX ARMv8.
Fixes: ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper") Reported-by: Shubham Bansal <illusionist.neo@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b4d72c08b358 ("qeth: bridgeport support - basic control")
broke the support for OSM and OSN devices as follows:
As OSM and OSN are L2 only, qeth_core_probe_device() does an early
setup by loading the l2 discipline and calling qeth_l2_probe_device().
In this context, adding the l2-specific bridgeport sysfs attributes
via qeth_l2_create_device_attributes() hits a BUG_ON in fs/sysfs/group.c,
since the basic sysfs infrastructure for the device hasn't been
established yet.
Note that OSN actually has its own unique sysfs attributes
(qeth_osn_devtype), so the additional attributes shouldn't be created
at all.
For OSM, add a new qeth_l2_devtype that contains all the common
and l2-specific sysfs attributes.
When qeth_core_probe_device() does early setup for OSM or OSN, assign
the corresponding devtype so that the ccwgroup probe code creates the
full set of sysfs attributes.
This allows us to skip qeth_l2_create_device_attributes() in case
of an early setup.
Any device that can't do early setup will initially have only the
generic sysfs attributes, and when it's probed later
qeth_l2_probe_device() adds the l2-specific attributes.
If an early-setup device is removed (by calling ccwgroup_ungroup()),
device_unregister() will - using the devtype - delete the
l2-specific attributes before qeth_l2_remove_device() is called.
So make sure to not remove them twice.
What complicates the issue is that qeth_l2_probe_device() and
qeth_l2_remove_device() is also called on a device when its
layer2 attribute changes (ie. its layer mode is switched).
For early-setup devices this wouldn't work properly - we wouldn't
remove the l2-specific attributes when switching to L3.
But switching the layer mode doesn't actually make any sense;
we already decided that the device can only operate in L2!
So just refuse to switch the layer mode on such devices. Note that
OSN doesn't have a layer2 attribute, so we only need to special-case
OSM.
Based on an initial patch by Ursula Braun.
Fixes: b4d72c08b358 ("qeth: bridgeport support - basic control") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When setting up the device from within the layer discipline's
probe routine, creating the layer-specific sysfs attributes can fail.
Report this error back to the caller, and handle it by
releasing the layer discipline.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
[jwi: updated commit msg, moved an OSN change to a subsequent patch] Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Like commit 657831ffc38e ("dccp/tcp: do not inherit mc_list from parent")
we should clear ipv6_mc_list etc. for IPv6 sockets too.
Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current codes only deal with the case that the skb is dropped, it
may meet one use-after-free issue when NF_HOOK returns 0 that means
the skb is stolen by one netfilter rule or hook.
When one netfilter rule or hook stoles the skb and return NF_STOLEN,
it means the skb is taken by the rule, and other modules should not
touch this skb ever. Maybe the skb is queued or freed directly by the
rule.
Now uses the nf_hook instead of NF_HOOK to get the result of netfilter,
and check the return value of nf_hook. Only when its value equals 1, it
means the skb could go ahead. Or reset the skb as NULL.
BTW, because vrf_rcv_finish is empty function, so needn't invoke it
even though nf_hook returns 1. But we need to modify vrf_rcv_finish
to deal with the NF_STOLEN case.
There are two cases when skb is stolen.
1. The skb is stolen and freed directly.
There is nothing we need to do, and vrf_rcv_finish isn't invoked.
2. The skb is queued and reinjected again.
The vrf_rcv_finish would be invoked as okfn, so need to free the
skb in it.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Protect the global dev_cntr_names and port_cntr_names with the global
mutex as they are allocated and freed in a function called per device.
Otherwise there is a danger of double free and memory leaks.
Fixes: Commit b7481944b06e ("IB/hfi1: Show statistics counters under IB stats interface") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Explicitly disable stolen memory when running as a guest in a virtual
machine, since the memory is not mediated between clients and reserved
entirely for the host. The actual size should be reported as zero, but
like every other quirk we want to tell the user what is happening.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99028 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161109103905.17860-1-chris@chris-wilson.co.uk Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
/dev/mem currently allows mmap() mappings that wrap around the end of
the physical address space, which should probably be illegal. It
circumvents the existing STRICT_DEVMEM permission check because the loop
immediately terminates (as the start address is already higher than the
end address). On the x86_64 architecture it will then cause a panic
(from the BUG(start >= end) in arch/x86/mm/pat.c:reserve_memtype()).
This patch adds an explicit check to make sure offset + size will not
wrap around in the physical address type.
Signed-off-by: Julius Werner <jwerner@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In error cases, lgp->lg_layout_type may be out of bounds; so we
shouldn't be using it until after the check of nfserr.
This was seen to crash nfsd threads when the server receives a LAYOUTGET
request with a large layout type.
GETDEVICEINFO has the same problem.
Reported-by: Ari Kauppi <Ari.Kauppi@synopsys.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
UBSAN: Undefined behaviour in fs/nfsd/nfs4proc.c:1262:34
shift exponent 128 is too large for 32-bit type 'int'
Depending on compiler+architecture, this may cause the check for
layout_type to succeed for overly large values (which seems to be the
case with amd64). The large value will be later used in de-referencing
nfsd4_layout_ops for function pointers.
Reported-by: Jani Tuovila <tuovila@synopsys.com> Signed-off-by: Ari Kauppi <ari@synopsys.com>
[colin.king@canonical.com: use LAYOUT_TYPE_MAX instead of 32] Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the server fails to return the attributes as part of an OPEN
reply, and then reboots, we can end up hanging. The reason is that
the client attempts to send a GETATTR in order to pick up the
missing OPEN call, but fails to release the slot first, causing
reboot recovery to deadlock.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Fixes: 2e80dbe7ac51a ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The builtin eDP panel in the HP zBook 17 G2 supports 10 bpc,
as advertised by the Laptops product specs and verified via
injecting a fixed edid + photometer measurements, but edid
reports unknown depth, so drivers fall back to 6 bpc.
The old 1-bit hamming layout requires ECC data to be placed at a
fixed offset, and not necessarily at the end of the OOB area.
Add this old layout back in order to fix legacy setups.
Fixes: 41b207a70d3a ("mtd: nand: implement the default mtd_ooblayout_ops") Signed-off-by: Alexander Couzens <lynxis@fe80.eu> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9711ec5250b ("mtd: nand: omap: Clean up device tree support")
caused the parent device name to be changed from "omap2-nand.0"
to "<base address>.nand" (e.g. 30000000.nand on omap3 platforms).
This caused mtd->name to be changed as well. This breaks partition
creation via mtdparts passed by u-boot as it uses "omap2-nand.0"
for the mtd-id.
Fix this by explicitly setting the mtd->name to "omap2-nand.<CS number>"
if it isn't already set by nand_set_flash_node(). CS number is the
NAND controller instance ID.
Fixes: c9711ec5250b ("mtd: nand: omap: Clean up device tree support") Reported-by: Leto Enrico <enrico.leto@siemens.com> Reported-by: Adam Ford <aford173@gmail.com> Suggested-by: Boris Brezillon <boris.brezillon@free-electrons.com> Tested-by: Adam Ford <aford173@gmail.com> Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Laurent Pinchart reported that the Renesas R-Car H2 Lager board (r8a7790)
crashes during suspend tests. Geert Uytterhoeven managed to reproduce the
issue on an M2-W Koelsch board (r8a7791):
It occurs when the PME scan runs, once per second. During PME scan, the
PCI host bridge (rcar-pci) registers are accessed while its module clock
has already been disabled, leading to the crash.
One reproducer is to configure s2ram to use "s2idle" instead of "deep"
suspend:
Another reproducer is to write either "platform" or "processors" to
/sys/power/pm_test. It does not (or is less likely) to happen during full
system suspend ("core" or "none") because system suspend also disables
timers, and thus the workqueue handling PME scans no longer runs. Geert
believes the issue may still happen in the small window between disabling
module clocks and disabling timers:
# echo 0 > /sys/module/printk/parameters/console_suspend
# echo platform > /sys/power/pm_test # Or "processors"
# echo mem > /sys/power/state
(Make sure CONFIG_PCI_RCAR_GEN2 and CONFIG_USB_OHCI_HCD_PCI are enabled.)
Rafael Wysocki agrees that PME scans should be suspended before the host
bridge registers become inaccessible. To that end, queue the task on a
workqueue that gets frozen before devices suspend.
Rafael notes however that as a result, some wakeup events may be missed if
they are delivered via PME from a device without working IRQ (which hence
must be polled) and occur after the workqueue has been frozen. If that
turns out to be an issue in practice, it may be possible to solve it by
calling pci_pme_list_scan() once directly from one of the host bridge's
pm_ops callbacks.
In the PCI_MMAP_PROCFS case when the address being passed by the user is a
'user visible' resource address based on the bus window, and not the actual
contents of the resource, that's what we need to be checking it against.
When we have 32 or more CPUs in the affinity mask, we should use a special
constant to specify that to the host. Fix this issue.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The memory allocation here needs to be non-blocking. Fix the issue.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently SoCs pass2.x do not emulate EA headers for ACPI boot method at
all. However, for pass2.x some devices (like EDAC) advertise incorrect
base addresses in their BARs which results in driver probe failure during
resource request. Since all problematic blocks are on 2nd NUMA node under
domain 10 add necessary quirk entry to obtain BAR addresses correction
using EA header emulation.
Fixes: 44f22bd91e88 ("PCI: Add MCFG quirks for Cavium ThunderX pass2.x host controller") Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With no blank lines, it's not obvious where the macro definitions end and
the uses begin. Add some blank lines and reorder the ThunderX definitions.
No functional change intended.
If thermal bank with 4 sensors, thermal driver should read TEMP_MSR3.
However, currently thermal driver would not read TEMP_MSR3 since mt8173
thermal driver only use 3 sensors on each thermal bank at the same time,
so this patch would not effect temperature.
Only if mt mt8173 thermal driver use 4 sensors on any thermal bank, would
read third sensor two times, and lose fourth sensor of vale.
Enabling the tracer selftest triggers occasionally the warning in
text_poke(), which warns when the to be modified page is not marked
reserved.
The reason is that the tracer selftest installs kprobes on functions marked
__init for testing. These probes are removed after the tests, but that
removal schedules the delayed kprobes_optimizer work, which will do the
actual text poke. If the work is executed after the init text is freed,
then the warning triggers. The bug can be reproduced reliably when the work
delay is increased.
Flush the optimizer work and wait for the optimizing/unoptimizing lists to
become empty before returning from the kprobes tracer selftest. That
ensures that all operations which were queued due to the probes removal
have completed.
Link: http://lkml.kernel.org/r/20170516094802.76a468bb@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 6274de498 ("kprobes: Support delayed unoptimizing") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gcc-7 notices that the length we pass to strncat is wrong:
drivers/firmware/ti_sci.c: In function 'ti_sci_probe':
drivers/firmware/ti_sci.c:204:32: error: specified bound 50 equals the size of the destination [-Werror=stringop-overflow=]
Instead of the total length, we must pass the length of the
remaining space here.
Fixes: aa276781a64a ("firmware: Add basic support for TI System Control Interface (TI-SCI) protocol") Acked-by: Nishanth Menon <nm@ti.com> Acked-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When killing kref_sub(), the unconditional additional kref_get()
was not properly paired with the necessary kref_put(), causing
a leak of struct drbd_requests (~ 224 Bytes) per submitted bio,
and breaking DRBD in general, as the destructor of those "drbd_requests"
does more than just the mempoll_free().
We yield the kvm->mmu_lock occassionaly while performing an operation
(e.g, unmap or permission changes) on a large area of stage2 mappings.
However this could possibly cause another thread to clear and free up
the stage2 page tables while we were waiting for regaining the lock and
thus the original thread could end up in accessing memory that was
freed. This patch fixes the problem by making sure that the stage2
pagetable is still valid after we regain the lock. The fact that
mmu_notifer->release() could be called twice (via __mmu_notifier_release
and mmu_notifier_unregsister) enhances the possibility of hitting
this race where there are two threads trying to unmap the entire guest
shadow pages.
While at it, cleanup the redudant checks around cond_resched_lock in
stage2_wp_range(), as cond_resched_lock already does the same checks.
Cc: Mark Rutland <mark.rutland@arm.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: andreyknvl@google.com Cc: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In kvm_free_stage2_pgd() we check the stage2 PGD before holding
the lock and proceed to take the lock if it is valid. And we unmap
the page tables, followed by releasing the lock. We reset the PGD
only after dropping this lock, which could cause a race condition
where another thread waiting on or even holding the lock, could
potentially see that the PGD is still valid and proceed to perform
a stage2 operation and later encounter a NULL PGD.
CMB doesn't get unmapped until removal while getting remapped on every
reset. Add the unmapping and sysfs file removal to the reset path in
nvme_pci_disable to match the mapping path in nvme_pci_enable.
Fixes: 202021c1a ("nvme : Add sysfs entry for NVMe CMBs when appropriate") Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Acked-by: Keith Busch <keith.busch@intel.com> Reviewed-By: Stephen Bates <sbates@raithlin.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
irq_set_chained_handler_and_data() sets up the chained interrupt and then
stores the handler data.
That's racy against an immediate interrupt which gets handled before the
store of the handler data happened. The handler will dereference a NULL
pointer and crash.
Cure it by storing handler data before installing the chained handler.
The metag implementation of strncpy_from_user() doesn't validate the src
pointer, which could allow reading of arbitrary kernel memory. Add a
short access_ok() check to prevent that.
Its still possible for it to read across the user/kernel boundary, but
it will invariably reach a NUL character after only 9 bytes, leaking
only a static kernel address being loaded into D0Re0 at the beginning of
__start, which is acceptable for the immediate fix.
Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The __user_bad() macro used by access_ok() has a few corner cases
noticed by Al Viro where it doesn't behave correctly:
- The kernel range check has off by 1 errors which permit access to the
first and last byte of the kernel mapped range.
- The kernel range check ends at LINCORE_BASE rather than
META_MEMORY_LIMIT, which is ineffective when the kernel is in global
space (an extremely uncommon configuration).
There are a couple of other shortcomings here too:
- Access to the whole of the other address space is permitted (i.e. the
global half of the address space when the kernel is in local space).
This isn't ideal as it could theoretically still contain privileged
mappings set up by the bootloader.
- The size argument is unused, permitting user copies which start on
valid pages at the end of the user address range and cross the
boundary into the kernel address space (e.g. addr = 0x3ffffff0, size
> 0x10).
It isn't very convenient to add size checks when disallowing certain
regions, and it seems far safer to be sure and explicit about what
userland is able to access, so invert the logic to allow certain regions
instead, and fix the off by 1 errors and missing size checks. This also
allows the get_fs() == KERNEL_DS check to be more easily optimised into
the user address range case.
We now have 3 such allowed regions:
- The user address range (incorporating the get_fs() == KERNEL_DS
check).
- NULL (some kernel code expects this to work, and we'll always catch
the fault anyway).
- The core code memory region.
Fixes: 373cd784d0fc ("metag: Memory handling") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case of there is no cpuidle devices registered, dev will be null, and
panic will be triggered like below;
In this patch, add checking of dev before usage, like that done in
cpuidle_idle_call.
Ever since commit 091d42e43d ("iommu/vt-d: Copy translation tables from
old kernel") the kdump kernel copies the IOMMU context tables from the
previous kernel. Each device mappings will be destroyed once the driver
for the respective device takes over.
This unfortunately breaks the workflow of mapping and unmapping a new
context to the IOMMU. The mapping function assumes that either:
1) Unmapping did the proper IOMMU flushing and it only ever flush if the
IOMMU unit supports caching invalid entries.
2) The system just booted and the initialization code took care of
flushing all IOMMU caches.
This assumption is not true for the kdump kernel since the context
tables have been copied from the previous kernel and translations could
have been cached ever since. So make sure to flush the IOTLB as well
when we destroy these old copied mappings.
Cc: Joerg Roedel <joro@8bytes.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Anthony Liguori <aliguori@amazon.com> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Fixes: 091d42e43d ("iommu/vt-d: Copy translation tables from old kernel") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vchiq_arm supports transfers less than one page and at arbitrary
alignment, using the dma-mapping API to perform its cache maintenance
(even though the VPU drives the DMA hardware). Read (DMA_FROM_DEVICE)
operations use cache invalidation for speed, falling back to
clean+invalidate on partial cache lines, with writes (DMA_TO_DEVICE)
using flushes.
If a read transfer has ends which aren't page-aligned, performing cache
maintenance as if they were whole pages can lead to memory corruption
since the partial cache lines at the ends (and any cache lines before or
after the transfer area) will be invalidated. This bug was masked until
the disabling of the cache flush in flush_dcache_page().
Honouring the requested transfer start- and end-points prevents the
corruption.
Fixes: cf9caf192988 ("staging: vc04_services: Replace dmac_map_area with dmac_map_sg") Signed-off-by: Phil Elwell <phil@raspberrypi.org> Reported-by: Stefan Wahren <stefan.wahren@i2se.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some kernel features don't currently work if a task puts a non-zero
address tag in its stack pointer, frame pointer, or frame record entries
(FP, LR).
For example, with a tagged stack pointer, the kernel can't deliver
signals to the process, and the task is killed instead. As another
example, with a tagged frame pointer or frame records, perf fails to
generate call graphs or resolve symbols.
For now, just document these limitations, instead of finding and fixing
everything that doesn't work, as it's not known if anyone needs to use
tags in these places anyway.
In addition, as requested by Dave Martin, generalize the limitations
into a general kernel address tag policy, and refactor
tagged-pointers.txt to include it.
Fixes: d50240a5f6ce ("arm64: mm: permit use of tagged pointers at EL0") Reviewed-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>