The SDM says that exiting system management mode from 64-bit mode
is invalid, but that would be too good to be true. But actually,
most of the code is already there to support exiting from compat
mode (EFER.LME=1, EFER.LMA=0). Getting all the way from 64-bit
mode to real mode only requires clearing CS.L and CR4.PCIDE.
GET_SMSTATE depends on real mode to ensure that smbase+offset is treated
as a physical address, which has already caused a bug after shuffling
the code. Enforce physical addressing.
Not zeroing EFER means that a 32-bit firmware cannot enter paging mode
without clearing EFER.LME first (which it should not know about).
Yang Zhang from Intel confirmed that the manual is wrong and EFER is
cleared to zero on INIT.
Fixes: d28bc9dd25ce023270d2e039e7c98d38ecbf7758 Cc: Yang Z Zhang <yang.z.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If either of the memory allocations in kvm_arch_vcpu_create() fail, the
vcpu which has been allocated and kvm_vcpu_init'd doesn't get uninit'd
in the error handling path. Add a call to kvm_vcpu_uninit() to fix this.
Fixes: 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The immediate field of the CACHE instruction is signed, so ensure that
it gets sign extended by casting it to an int16_t rather than just
masking the low 16 bits.
ASID restoration on guest resume should determine the guest execution
mode based on the guest Status register rather than bit 30 of the guest
PC.
Fix the two places in locore.S that do this, loading the guest status
from the cop0 area. Note, this assembly is specific to the trap &
emulate implementation of KVM, so it doesn't need to check the
supervisor bit as that mode is not implemented in the guest.
Fixes: b680f70fc111 ("KVM/MIPS32: Entry point for trampolining to...") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The DDR control initialization needs to know the SoC type, however
ath79_detect_sys_type() was called after ath79_ddr_ctrl_init().
Reverse the order to fix the DDR control initialization on ar71xx and
ar934x.
Signed-off-by: Alban Bedel <albeu@free.fr> Cc: Felix Fietkau <nbd@openwrt.org> Cc: Qais Yousef <qais.yousef@imgtec.com> Cc: Andrew Bresticker <abrestic@chromium.org> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/11500/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add helper macro builtin_mips_cdmm_driver() for builtin CDMM drivers
that don't do anything special in init and have no exit. The
module_mips_cdmm_driver() helper isn't really appropriate for drivers
that can't be built as a module.
This adds a basic implementation of clk_round_rate()
The clk_round_rate() function is called by multiple drivers and
subsystems now and the lantiq clk driver is supposed to export this,
but doesn't do so, this causes linking problems like this one:
ERROR: "clk_round_rate" [drivers/media/v4l2-core/videodev.ko] undefined!
The z2 machine calls pxa27x_set_pwrmode() in order to power off
the machine, but this function gets discarded early at boot because
it is marked __init, as pointed out by kbuild:
WARNING: vmlinux.o(.text+0x145c4): Section mismatch in reference from the function z2_power_off() to the function .init.text:pxa27x_set_pwrmode()
The function z2_power_off() references
the function __init pxa27x_set_pwrmode().
This is often because z2_power_off lacks a __init
annotation or the annotation of pxa27x_set_pwrmode is wrong.
This removes the __init section modifier to fix rebooting and the
build error.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: ba4a90a6d86a ("ARM: pxa/z2: fix building error of pxa27x_cpu_suspend() no longer available") Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 99f84cae43df ("ARM: dts: add wl12xx/wl18xx bindings") added
device tree bindings for the TI WLAN SDIO on many omap variants.
I recall wondering how come omap5-uevm did not have the WLAN
added and this issue has been bugging me for a while now, and
I finally tracked it down to a bad pinmux regression, and a missing
deferred probe handling for the 32k clock from palmas that's
requested by twl6040.
Basically 392adaf796b9 ("ARM: dts: omap5-evm: Add mcspi data")
added pin muxing for mcspi4 that conflicts with the onboard
WLAN. While some omap5-uevm don't have WLAN populated, the
pins are not reused for other devices. And as the SDIO bus
should be probed, let's try to enable WLAN by default.
Let's fix the regression and add the WLAN configuration as
done for the other boards in 99f84cae43df ("ARM: dts: add
wl12xx/wl18xx bindings"). And let's use the new MMC pwrseq for
the 32k clock as suggested by Javier Martinez Canillas
<javier@dowhile0.org>.
Note that without a related deferred probe fix for twl6040,
the 32k clock is not initialized if palmas-clk is a module
and twl6040 is built-in.
Let's also use the generic "non-removable" instead of the
legacy "ti,non-removable" property while at it.
And finally, note that omap5 seems to require WAKEUP_EN for
the WLAN GPIO interrupt.
fncpy() requires that the source and the destination are both 8-byte
aligned.
Signed-off-by: Patrick Doyle <pdoyle@irobot.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Fixes: d94e688cae56 ("ARM: at91/pm: move the copying the sram function to the sram initialization phase") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 72daceb9a10a ("net: rfkill: gpio: Add default GPIO driver mappings
for ACPI") removed possibility to request GPIO by table index for non-ACPI
platforms without changing its users. As result "shutdown" GPIO request
will fail if request for "reset" GPIO succeeded or "reset" will be
requested instead of "shutdown" if "reset" wasn't defined. Fix it by
making gpiod_lookup_table use con_id's instead of indexes.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Fixes: 72daceb (net: rfkill: gpio: Add default GPIO driver mappings for ACPI) Acked-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Marc Dietrich <marvin24@gmx.de> Tested-by: Marc Dietrich <marvin24@gmx.de> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For imx27, it needs three clocks to let the controller work,
the old code is wrong, and usbmisc has not included clock handling
code any more. Without this patch, it will cause below data
abort when accessing usbmisc registers.
usbcore: registered new interface driver usb-storage
Unhandled fault: external abort on non-linefetch (0x008) at 0xf4424600
pgd = c0004000
[f4424600] *pgd=10000452(bad)
Internal error: : 8 [#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.1.0-next-20150701-dirty #3089
Hardware name: Freescale i.MX27 (Device Tree Support)
task: c7832b60 ti: c783e000 task.ti: c783e000
PC is at usbmisc_imx27_init+0x4c/0xbc
LR is at usbmisc_imx27_init+0x40/0xbc
pc : [<c03cb5c0>] lr : [<c03cb5b4>] psr: 60000093
sp : c783fe08 ip : 00000000 fp : 00000000
r10: c0576434 r9 : 0000009c r8 : c7a773a0
r7 : 01000000 r6 : 60000013 r5 : c7a776f0 r4 : c7a773f0
r3 : f4424600 r2 : 00000000 r1 : 00000001 r0 : 00000001
Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: 0005317f Table: a0004000 DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc783e190)
Stack: (0xc783fe08 to 0xc7840000)
IOMMU-based dma_mmap() implementation lacked proper support for offset
parameter used in mmap call (it always assumed that mapping starts from
offset zero). This patch adds support for offset parameter to IOMMU-based
implementation.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dma_mmap() function in IOMMU-based dma-mapping implementation lacked
a check for valid range of mmap parameters (offset and buffer size), what
might have caused access beyond the allocated buffer. This patch fixes
this issue.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Only cpu seeing dst refcount going to 0 can safely
dereference dst->flags.
Otherwise an other cpu might already have freed the dst.
Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst") Reported-by: Greg Thelen <gthelen@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 7d5cd2ce529b, when bond_enslave fails on devices that
are not ARPHRD_ETHER, if needed, it resets the bonding device back to
ARPHRD_ETHER by calling ether_setup.
Unfortunately, ether_setup clobbers dev->flags, clearing IFF_UP
if the bond device is up, leaving it in a quasi-down state without
having actually gone through dev_close. For bonding, if any periodic
work queue items are active (miimon, arp_interval, etc), those will
remain running, as they are stopped by bond_close. At this point, if
the bonding module is unloaded or the bond is deleted, the system will
panic when the work function is called.
This panic is resolved by calling dev_close on the bond itself
prior to calling ether_setup.
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a race conditions between packet_notifier and packet_bind{_spkt}.
It happens if packet_notifier(NETDEV_UNREGISTER) executes between the
time packet_bind{_spkt} takes a reference on the new netdevice and the
time packet_do_bind sets po->ifindex.
In this case the notification can be missed.
If this happens during a dev_change_net_namespace this can result in the
netdevice to be moved to the new namespace while the packet_sock in the
old namespace still holds a reference on it. When the netdevice is later
deleted in the new namespace the deletion hangs since the packet_sock
is not found in the new namespace' &net->packet.sklist.
It can be reproduced with the script below.
This patch makes packet_do_bind check again for the presence of the
netdevice in the packet_sock's namespace after the synchronize_net
in unregister_prot_hook.
More in general it also uses the rcu lock for the duration of the bind
to stop dev_change_net_namespace/rollback_registered_many from
going past the synchronize_net following unlist_netdevice, so that
no NETDEV_UNREGISTER notifications can happen on the new netdevice
while the bind is executing. In order to do this some code from
packet_bind{_spkt} is consolidated into packet_do_dev.
import socket, os, time, sys
proto=7
realDev='em1'
vlanId=400
if len(sys.argv) > 1:
vlanId=int(sys.argv[1])
dev='vlan%d' % vlanId
os.system('taskset -p 0x10 %d' % os.getpid())
s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto)
os.system('ip link add link %s name %s type vlan id %d' %
(realDev, dev, vlanId))
os.system('ip netns add dummy')
pid=os.fork()
if pid == 0:
# dev should be moved while packet_do_bind is in synchronize net
os.system('taskset -p 0x20000 %d' % os.getpid())
os.system('ip link set %s netns dummy' % dev)
os.system('ip netns exec dummy ip link del %s' % dev)
s.close()
sys.exit(0)
time.sleep(.004)
try:
s.bind(('%s' % dev, proto+1))
except:
print 'Could not bind socket'
s.close()
os.system('ip netns del dummy')
sys.exit(0)
os.waitpid(pid, 0)
s.close()
os.system('ip netns del dummy')
sys.exit(0)
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: b8f1a55639e6 ("udp: Add function to make source port for UDP tunnels") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up
the dev_snmp6 entry that we have already registered for this device.
Call snmp6_unregister_dev in this case.
Fixes: a317a2f19da7d ("ipv6: fail early when creating netdev named all or default") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Under low memory conditions, tcp_sk_init() and icmp_sk_init()
can both iterate on all possible cpus and call inet_ctl_sock_destroy(),
with eventual NULL pointer.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the IP stack passes SKBs the sfc driver puts them in 2 different TX
queues (called partners), one for checksummed and one for not checksummed.
If the SKB has xmit_more set the driver will delay pushing the work to the
NIC.
When later it does decide to push the buffers this patch ensures it also
pushes the partner queue, if that also has any delayed work. Before this
fix the work in the partner queue would be left for a long time and cause
a netdev watchdog.
Fixes: 70b33fb ("sfc: add support for skb->xmit_more") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sit0 device allocates its percpu storage twice :
- One time in ipip6_tunnel_init()
- One time in ipip6_fb_tunnel_init()
Thus we leak 48 bytes per possible cpu per network namespace dismantle.
ipip6_fb_tunnel_init() can be much simpler and does not
return an error, and should be called after register_netdev()
Note that ipip6_tunnel_clone_6rd() also needs to be called
after register_netdev() (calling ipip6_tunnel_init())
Fixes: ebe084aafb7e ("sit: Use ipip6_tunnel_init as the ndo_init function.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ani Sinha <ani@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
priv->hwts_*_en indicate if timestamping is enabled/disabled at run
time. But priv->dma_cap.time_stamp and priv->dma_cap.atime_stamp
indicates HW is support for PTPv1/PTPv2.
Signed-off-by: Phil Reid <preid@electromag.com.au> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When nexthop is part of multipath route we should clear the
LINKDOWN flag when link goes UP or when first address is added.
This is needed because we always set LINKDOWN flag when DEAD flag
was set but now on UP the nexthop is not dead anymore. Examples when
LINKDOWN bit can be forgotten when no NETDEV_CHANGE is delivered:
- link goes down (LINKDOWN is set), then link goes UP and device
shows carrier OK but LINKDOWN remains set
- last address is deleted (LINKDOWN is set), then address is
added and device shows carrier OK but LINKDOWN remains set
Steps to reproduce:
modprobe dummy
ifconfig dummy0 192.168.168.1 up
here add a multipath route where one nexthop is for dummy0:
ip route add 1.2.3.4 nexthop dummy0 nexthop SOME_OTHER_DEVICE
ifconfig dummy0 down
ifconfig dummy0 up
now ip route shows nexthop that is not dead. Now set the sysctl var:
When fib_netdev_event calls fib_disable_ip on NETDEV_DOWN event
we should not delete the local routes if the local address
is still present. The confusion comes from the fact that both
fib_netdev_event and fib_inetaddr_event use the NETDEV_DOWN
constant. Fix it by returning back the variable 'force'.
Steps to reproduce:
modprobe dummy
ifconfig dummy0 192.168.168.1 up
ifconfig dummy0 down
ip route list table local | grep dummy | grep host
local 192.168.168.1 dev dummy0 proto kernel scope host src 192.168.168.1
Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Testing of the new UDP bearer has revealed that reception of
NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE
message buffers is not prepared for the case that those may be
non-linear.
We now linearize all such buffers before they are delivered up to the
generic reception layer.
In order for the commit to apply cleanly to 'net' and 'stable', we do
the change in the function tipc_udp_recv() for now. Later, we will post
a commit to 'net-next' moving the linearization to generic code, in
tipc_named_rcv() and tipc_link_proto_rcv().
Fixes: commit d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When doing memcpy/memset of EQEs, we should use sizeof struct
mlx4_eqe as the base size and not caps.eqe_size which could be bigger.
If caps.eqe_size is bigger than the struct mlx4_eqe then we corrupt
data in the master context.
When using a 64 byte stride, the memcpy copied over 63 bytes to the
slave_eq structure. This resulted in copying over the entire eqe of
interest, including its ownership bit -- and also 31 bytes of garbage
into the next WQE in the slave EQ -- which did NOT include the ownership
bit (and therefore had no impact).
However, once the stride is increased to 128, we are overwriting the
ownership bits of *three* eqes in the slave_eq struct. This results
in an incorrect ownership bit for those eqes, which causes the eq to
seem to be full. The issue therefore surfaced only once 128-byte EQEs
started being used in SRIOV and (overarchitectures that have 128/256
byte cache-lines such as PPC) - e.g after commit 77507aa249ae
"net/mlx4_core: Enable CQE/EQE stride support".
Fixes: 08ff32352d6f ('mlx4: 64-byte CQE/EQE support') Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Either of pskb_pull() or pskb_trim() may fail under low memory conditions.
If rds_tcp_data_recv() ignores such failures, the application will
receive corrupted data because the skb has not been correctly
carved to the RDS datagram size.
Avoid this by handling pskb_pull/pskb_trim failure in the same
manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and
retry via the deferred call to rds_send_worker() that gets set up on
ENOMEM from rds_tcp_read_sock()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We were computing the child index in cases where the key value we were
looking for was actually less than the base key of the tnode. As a result
we were getting incorrect index values that would cause us to skip over
some children.
To fix this I have added a test that will force us to use child index 0 if
the key we are looking for is less than the key of the current tnode.
Fixes: 8be33e955cb9 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf") Reported-by: Brian Rak <brak@gameservers.com> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If fec MDIO write method succeeds its return value comes from
call to pm_runtime_get_sync().
But pm_runtime_get_sync() can also return 1.
In case of Micrel KSZ9031 PHY this value will then
be returned along the call chain of phy_write() ->
ksz9031_extended_write() -> ksz9031_center_flp_timing() ->
ksz9031_config_init() -> phy_init_hw() -> phy_attach_direct() ->
phy_connect_direct().
Then phy_connect() will cast it into a pointer using ERR_PTR(),
which then fec_enet_mii_probe() will try to dereference
resulting in an oops.
Fix it by normalizing return value of pm_runtime_get_sync()
to be zero if positive in MDIO write method.
Fixes: 8fff755e9f8d ("net: fec: Ensure clocks are enabled while using mdio bus") Signed-off-by: Maciej Szmigiero <mail@maciej.szmigiero.name> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gre_gso_segment() chokes if SIT frames were aggregated by GRO engine.
Fixes: 61c1db7fae21e ("ipv6: sit: add GSO/TSO support") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is no need to use the IS_ERR_VALUE() macro for checking
the return value from pm_runtime_* functions.
Just do a simple negative test instead.
The semantic patch that makes this change is available
in scripts/coccinelle/api/pm_runtime.cocci.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During Tx cleanup it's still possible for the descriptor data to be
read ahead of the descriptor index. A memory barrier is required between
the read of the descriptor index and the start of the Tx cleanup loop.
This allows a change to a lighter-weight barrier in the Tx transmit
routine just before updating the current descriptor index.
Since the memory barrier does result in extra overhead on arm64, keep
the previous change to not chase the current descriptor value. This
prevents the execution of the barrier for each loop performed.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code currently uses the lightweight dma_wmb barrier before updating
the current descriptor count. Under heavy load, the Tx cleanup routine
was seeing the updated current descriptor count before the updated
descriptor information. As a result, the Tx descriptor was being cleaned
up before it was used because it was not "owned" by the hardware yet,
resulting in a Tx queue hang.
Using the wmb barrier insures that the descriptor is updated before the
descriptor counter preventing the Tx queue hang. For extra insurance,
the Tx cleanup routine is changed to grab the current decriptor count on
entry and uses that initial value in the processing loop rather than
trying to chase the current value.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We can't rely on PPPOX_ZOMBIE to decide whether to clear po->pppoe_dev.
PPPOX_ZOMBIE can be set by pppoe_disc_rcv() even when po->pppoe_dev is
NULL. So we have no guarantee that (sk->sk_state & PPPOX_ZOMBIE) implies
(po->pppoe_dev != NULL).
Since we're releasing a PPPoE socket, we want to release the pppoe_dev
if it exists and reset sk_state to PPPOX_DEAD, no matter the previous
value of sk_state. So we can just check for po->pppoe_dev and avoid any
assumption on sk->sk_state.
Fixes: 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We don't have fraglist support in TAP_FEATURES. This will lead
software segmentation of gro skb with frag list. Fixes by having
frag list support in TAP_FEATURES.
With this patch single session of netperf receiving were restored from
about 5Gb/s to about 12Gb/s on mlx4.
Fixes a567dd6252 ("macvtap: simplify usage of tap_features") Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
New device IDs shamelessly lifted from the vendor driver.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, NETLINK_LIST_MEMBERSHIPS grabs the netlink table while copying
the membership state to user-space. However, grabing the netlink table is
effectively a write_lock_irq(), and as such we should not be triggering
page-faults in the critical section.
This can be easily reproduced by the following snippet:
int s = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
void *p = mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
int r = getsockopt(s, 0x10e, 9, p, (void*)((char*)p + 4092));
This should work just fine, but currently triggers EFAULT and a possible
WARN_ON below handle_mm_fault().
Fix this by reducing locking of NETLINK_LIST_MEMBERSHIPS to a read-side
lock. The write-lock was overkill in the first place, and the read-lock
allows page-faults just fine.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code for message reassembly is erroneously assuming that
the the first arriving fragment buffer always is linear, and then goes
ahead resetting the fragment list of that buffer in anticipation of
more arriving fragments.
However, if the buffer already happens to be non-linear, we will
inadvertently drop the already attached fragment list, and later
on trig a BUG() in __pskb_pull_tail().
We see this happen when running fragmented TIPC multicast across UDP,
something made possible since
commit d0f91938bede ("tipc: add ip/udp media type")
We fix this by not resetting the fragment list when the buffer is non-
linear, and by initiatlizing our private fragment list tail pointer to
the tail of the existing fragment list.
Fixes: commit d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is decrementing the pointer, instead of the value stored in the
pointer. KASan detects it as an out of bounds reference.
Reported-by: "Berry Cheng 程君(成淼)" <chengmiao.cj@alibaba-inc.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 0b34a166f291d255755be46e43ed5497cdd194f2 "x86/xen: Support
kexec/kdump in HVM guests by doing a soft reset" has been added to the
4.2-stable tree" needed to correct the CONFIG variable, as
CONFIG_KEXEC_CORE only showed up in 4.3.
Reported-by: David Vrabel <david.vrabel@citrix.com> Reported-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Intel Baytrail pinctrl driver implements irqchip callbacks which are
called with desc->lock raw_spinlock held. In mainline this is fine because
spinlock resolves to raw_spinlock. However, running the same code in -rt we
get:
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/0
Preemption disabled at:[<ffffffff81092e9f>] cpu_startup_entry+0x17f/0x480
There is a hardware issue in Intel Baytrail where concurrent GPIO register
access might result reads of 0xffffffff and writes might get dropped
completely.
Prevent this from happening by taking the serializing lock in all places
where it is possible that more than one thread might be accessing the
hardware concurrently.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use is_zero_pfn() on pteval only after pte_present() check on pteval
(It might be better idea to introduce is_zero_pte() which checks
pte_present() first).
Otherwise when working on a swap or migration entry and if pte_pfn's
result is equal to zero_pfn by chance, we lose user's data in
__collapse_huge_page_copy(). So if you're unlucky, the application
segfaults and finally you could see below message on exit:
If user space calls unreference on a user_dmabuf it will typically
kill the struct ttm_base_object member which is responsible for the
user-space visibility. However the dmabuf part may still be alive and
refcounted. In some situations, like for shared guest-backed surface
referencing/opening, the driver may try to reference the
struct ttm_base_object member again, causing an immediate kernel warning
and a later kernel NULL pointer dereference.
Fix this by always maintaining a reference on the struct
ttm_base_object member, in situations where it might subsequently be
referenced.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the STXR instruction fails in the SWP emulation code, we leave *data
overwritten with the loaded value, therefore corrupting the data written
by a subsequent, successful attempt.
This patch re-jigs the code so that we only write back to *data once we
know that the update has happened.
Fixes: bd35a4adc413 ("arm64: Port SWP/SWPB emulation support from arm") Reported-by: Shengjiu Wang <shengjiu.wang@freescale.com> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a workaround for KNL platform, where in some cases MPERF counter
will not have updated value before next read of MSR_IA32_MPERF. In this
case divide by zero will occur. This change ignores current sample for
busy calculation in this case.
Fixes: b34ef932d79a (intel_pstate: Knights Landing support) Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9d5142624256 ("sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target")
broke select_task_rq_dl() and find_lock_later_rq(), because it introduced
a comparison between the local task's deadline and dl.earliest_dl.curr of
the remote queue.
However, if the remote runqueue does not contain any SCHED_DEADLINE
task its earliest_dl.curr is 0 (always smaller than the deadline of
the local task) and the remote runqueue is not selected for pushing.
As a result, if an application creates multiple SCHED_DEADLINE
threads, they will never be pushed to runqueues that do not already
contain SCHED_DEADLINE tasks.
This patch fixes the issue by checking if dl.dl_nr_running == 0.
Signed-off-by: Luca Abeni <luca.abeni@unitn.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@linux.intel.com> Fixes: 9d5142624256 ("sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target") Link: http://lkml.kernel.org/r/1444982781-15608-1-git-send-email-luca.abeni@unitn.it Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ib_send_cm_sidr_rep could sometimes erase the node from the sidr
(depending on errors in the process). Since ib_send_cm_sidr_rep is
called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
could be either erased from the rb_tree twice or not erased at all.
Fixing that by making sure it's erased only once before freeing
cm_id_priv.
tags is freed in blk_mq_free_rq_map() and should not be used after that.
The problem doesn't manifest if CONFIG_CPUMASK_OFFSTACK is false because
free_cpumask_var() is nop.
tags->cpumask is allocated in blk_mq_init_tags() so it's natural to
free cpumask in its counter part, blk_mq_free_tags().
Fixes: f26cdc8536ad ("blk-mq: Shared tag enhancements") Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Keith Busch <keith.busch@intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have to exclude memory locations <= PAGE_SIZE from
the condition and let the kernel mode fault path catch it.
Otherwise a kernel NULL pointer exception will be reported
as a kernel user space access.
Fixes: d2313084e2c (um: Catch unprotected user memory access) Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The value of emul_con was getting overwritten if the selected soc is
SOC_ARCH_EXYNOS5260. And so as a result we were reading from the wrong
register in the case of SOC_ARCH_EXYNOS5260.
Fixes: 488c7455d74c ("thermal: exynos: Add the support for Exynos5433 TMU") Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Acked-by: Lukasz Majewski <l.majewski@samsung.com> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kukjin Kim <kgene@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 8eb934591f8b ("btrfs: check unsupported filters in balance
arguments") adds a jump to exit label out_bargs in case the argument
check fails. At this point in addition to the bargs memory, the
memory for struct btrfs_balance_control has already been allocated.
Ownership of bctl is passed to btrfs_balance() in the good case,
thus the memory is not freed due to the introduced jump. Make sure
that the memory gets freed in any case as necessary. Detected by
Coverity CID 1328378.
Signed-off-by: Christian Engelmayer <cengelma@gmx.at> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 00590fdd5be0 introduced RCU locking in list type and in
doing so introduced a memory allocation in list_set_add, which
is done in an atomic context, due to the fact that ipset rcu
list modifications are serialised with a spin lock. The reason
why we can't use a mutex is that in addition to modifying the
list with ipset commands, it's also being modified when a
particular ipset rule timeout expires aka garbage collection.
This gc is triggered from set_cleanup_entries, which in turn
is invoked from a timer thus requiring the lock to be bh-safe.
Concretely the following call chain can lead to "sleeping function
called in atomic context" splat:
call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL).
And since GFP_KERNEL allows initiating direct reclaim thus
potentially sleeping in the allocation path.
To fix the issue change the allocation type to GFP_ATOMIC, to
correctly reflect that it is occuring in an atomic context.
Fixes: 00590fdd5be0 ("netfilter: ipset: Introduce RCU locking in list type") Signed-off-by: Nikolay Borisov <kernel@kyup.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When pci_pool_alloc fails in mvs_task_prep then task->lldd_task stays
NULL but it's later used in mvs_abort_task as slot which is passed
to mvs_slot_task_free causing NULL pointer dereference.
Just return from mvs_slot_task_free when passed with NULL slot.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101891 Signed-off-by: Dāvis Mosāns <davispuh@gmail.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The LIC doesn't deal with the different types of interrupts itself
but needs to forward calls to set the appropriate type to its parent
IRQ controller.
Without this fix all IRQs routed through the LIC will stay at the
initial EDGE type, while most of them should actually be level triggered.
Fixes: 1eec582158e2 "irqchip: tegra: Add Tegra210 support" Signed-off-by: Lucas Stach <dev@lynxeye.de> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Alexandre Courbot <gnurou@gmail.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: http://lkml.kernel.org/r/1445787552-13062-1-git-send-email-dev@lynxeye.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7d375bffa524 ("sb_edac: Fix support for systems with two home agents per socket")
NUM_CHANNELS was changed to 8 and the channel space was renumerated to
handle EN, EP, and EX configurations.
The *_mci_bind_devs() functions - except for sbridge_mci_bind_devs() -
got a new device presence check in the form of saw_chan_mask. However,
sbridge_mci_bind_devs() still uses the NUM_CHANNELS for loop.
With the increase in NUM_CHANNELS, this loop fails at index 4 since
SB only has 4 TADs. This results in the following error on SB machines:
EDAC sbridge: Some needed devices are missing
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Couldn't find mci handle
This patch adapts the saw_chan_mask logic for sbridge_mci_bind_devs() as
well.
After this patch:
EDAC MC0: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#0: DEV 0000:3f:0e.0 (POLLED)
EDAC MC1: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#1: DEV 0000:7f:0e.0 (POLLED)
This commit is poorly justified, I can find not discusison in email,
and it clearly causes a problem.
If a device which is being recovered fails and is subsequently
re-added to an array, there could easily have been changes to the
array *before* the point where the recovery was up to. So the
recovery must start again from the beginning.
If a spare is being recovered and fails, then when it is re-added we
really should do a bitmap-based recovery up to the recovery-offset,
and then a full recovery from there. Before this reversion, we only
did the "full recovery from there" which is not corect. After this
reversion with will do a full recovery from the start, which is safer
but not ideal.
It will be left to a future patch to arrange the two different styles
of recovery.
Reported-and-tested-by: Nate Dailey <nate.dailey@stratus.com> Signed-off-by: NeilBrown <neilb@suse.com> Fixes: 7eb418851f32 ("md: allow a partially recovered device to be hot-added to an array.") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit 566c09c53455 ("raid5: relieve lock contention in get_active_stripe()")
__find_stripe() is called under conf->hash_locks + hash.
But handle_stripe_clean_event() calls remove_hash() under
conf->device_lock.
Under some cirscumstances the hash chain can be circuited,
and we get an infinite loop with disabled interrupts and locked hash
lock in __find_stripe(). This leads to hard lockup on multiple CPUs
and following system crash.
I was able to reproduce this behavior on raid6 over 6 ssd disks.
The devices_handle_discard_safely option should be set to enable trim
support. The following script was used:
for i in `seq 1 32`; do
dd if=/dev/zero of=large$i bs=10M count=100 &
done
neilb: original was against a 3.x kernel. I forward-ported
to 4.3-rc. This verison is suitable for any kernel since
Commit: 59fc630b8b5f ("RAID5: batch adjacent full stripe write")
(v4.1+). I'll post a version for earlier kernels to stable.
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Fixes: 566c09c53455 ("raid5: relieve lock contention in get_active_stripe()") Signed-off-by: NeilBrown <neilb@suse.com> Cc: Shaohua Li <shli@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This was introduced with 9e882242c6193ae6f416f2d8d8db0d9126bd996b
which changed the return value of submit_bio_wait() to return != 0 on
error, but didn't update the caller accordingly.
This was introduced with 9e882242c6193ae6f416f2d8d8db0d9126bd996b
which changed the return value of submit_bio_wait() to return != 0 on
error, but didn't update the caller accordingly.
Currently a number of Crypto API operations may fail when a signal
occurs. This causes nasty problems as the caller of those operations
are often not in a good position to restart the operation.
In fact there is currently no need for those operations to be
interrupted by user signals at all. All we need is for them to
be killable.
This patch replaces the relevant calls of signal_pending with
fatal_signal_pending, and wait_for_completion_interruptible with
wait_for_completion_killable, respectively.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92bac83dd79e ("Input: alps - non interleaved V2 dualpoint has
separate stick button bits") assumes that all alps v2 non-interleaved
dual point setups have the separate stick button bits.
Later we limited this to Dell laptops only because of reports that this
broke things on non Dell laptops. Now it turns out that this breaks things
on the Dell Latitude D600 too. So it seems that only the Dell Latitude
D420/430/620/630, which all share the same touchpad / stick combo,
have these separate bits.
This patch limits the checking of the separate bits to only these models
fixing regressions with other models.
Reported-and-tested-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-By: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If two overlayfs filesystems are stacked on top of each other, then we need
recursion in ovl_d_select_inode().
I guess d_backing_inode() is supposed to do that. But currently it doesn't
and that functionality is open coded in vfs_open(). This is now copied
into ovl_d_select_inode() to fix this regression.
Reported-by: Alban Crequy <alban.crequy@gmail.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay...") Cc: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In ovl_copy_up_locked(), newdentry is leaked if the function exits through
out_cleanup as this just to out after calling ovl_cleanup() - which doesn't
actually release the ref on newdentry.
The out_cleanup segment should instead exit through out2 as certainly
newdentry leaks - and possibly upper does also, though this isn't caught
given the catch of newdentry.
Without this fix, something like the following is seen:
Open the lower file with O_LARGEFILE in ovl_copy_up().
Pass O_LARGEFILE unconditionally in ovl_copy_up_data() as it's purely for
catching 32-bit userspace dealing with a file large enough that it'll be
mishandled if the application isn't aware that there might be an integer
overflow. Inside the kernel, there shouldn't be any problems.
63692df103e9 ("PCI: Allow numa_node override via sysfs") didn't check that
the numa node provided by userspace is valid. Passing a node number too
high would attempt to access invalid memory and trigger a kernel panic.
Laura Abbott <labbott@redhat.com> produced a patch which lead us to
inspect symbol_put_addr(). This function has a comment claiming it
doesn't need to disable preemption around the module lookup
because it holds a reference to the module it wants to find, which
therefore cannot go away.
This is wrong (and a false optimization too, preempt_disable() is really
rather cheap, and I doubt any of this is on uber critical paths,
otherwise it would've retained a pointer to the actual module anyway and
avoided the second lookup).
While its true that the module cannot go away while we hold a reference
on it, the data structure we do the lookup in very much _CAN_ change
while we do the lookup. Therefore fix the comment and add the
required preempt_disable().
xen-blkfront will crash if the check to talk_to_blkback()
in blkback_changed()(XenbusStateInitWait) returns an error.
The driver data is freed and info is set to NULL. Later during
the close process via talk_to_blkback's call to xenbus_dev_fatal()
the null pointer is passed to and dereference in blkfront_closing.
We received several reports of systems rebooting and powering on
after an attempted shutdown. Testing showed that setting
XHCI_SPURIOUS_WAKEUP quirk in addition to the XHCI_SPURIOUS_REBOOT
quirk allowed the system to shutdown as expected for LynxPoint-LP
xHCI controllers. Set the quirk back.
Note that the quirk was originally introduced for LynxPoint and
LynxPoint-LP just for this same reason. See:
commit 638298dc66ea ("xhci: Fix spurious wakeups after S5 on Haswell")
It was later limited to only concern HP machines as it caused
regression on some machines, see both bug and commit:
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=66171
commit 6962d914f317 ("xhci: Limit the spurious wakeup fix only to HP machines")
Later it was discovered that the powering on after shutdown
was limited to LynxPoint-LP (Haswell-ULT) and that some non-LP HP
machine suffered from spontaneous resume from S3 (which should
not be related to the SPURIOUS_WAKEUP quirk at all). An attempt
to fix this then removed the SPURIOUS_WAKEUP flag usage completely.
commit b45abacde3d5 ("xhci: no switching back on non-ULT Haswell")
Current understanding is that LynxPoint-LP (Haswell ULT) machines
need the SPURIOUS_WAKEUP quirk, otherwise they will restart, and
plain Lynxpoint (Haswell) machines may _not_ have the quirk
set otherwise they again will restart.
Signed-off-by: Laura Abbott <labbott@fedoraproject.org> Cc: Takashi Iwai <tiwai@suse.de> Cc: Oliver Neukum <oneukum@suse.com>
[Added more history to commit message -Mathias] Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a host fails to wake up a isochronous SuperSpeed device from U1/U2
in time for a isoch transfer it will generate a "No ping response error"
Host will then move to the next transfer descriptor.
Handle this case in the same way as missed service errors, tag the
current TD as skipped and handle it on the next transfer event.
a PPC64LE kernel fails to boot when fbcon_add_cursor_timer uses an
uninitialized ops->cur_blink_jiffies. Prevent by initializing
in fbcon_init before the call to info->fbops->fb_set_par.
clk_add_alias() was not correctly handling the case where alias_dev_name
was NULL: rather than producing an entry with a NULL dev_id pointer,
it would produce a device name of (null). Fix this.
Fixes: 2568999835d7 ("clkdev: add clkdev_create() helper") Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 00d8689b85a7 ("i2c: mv64xxx: rework offload support to fix
several problems") completely reworked the offload support, but left a
debugging-related "return false" at the beginning of the
mv64xxx_i2c_can_offload() function. This has the unfortunate consequence
that offloading is in fact never used, which wasn't really the
intention.
This commit fixes that problem by removing the bogus "return false".
Fixes: 00d8689b85a7 ("i2c: mv64xxx: rework offload support to fix several problems") Signed-off-by: Hezi Shahmoon <hezi@marvell.com>
[Thomas: reworked commit log and title.] Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit prevents from sending "big" file using Bluetooth.
When sending a lot of data quickly through the Bluetooth interface, and
after a variable amount of data sent, transfer fails with error:
kernel: [ 415.247453] Bluetooth: hci0 hardware error 0x00
Found on T100TA.
After reverting this commit, send works fine for any file size.
Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com> Fixes: 9119fba0cfed (serial: 8250_dma: don't bother DMA with small transfers) Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Compiling the nvme driver on 32-bit warns about a cast from a __u64
variable to a pointer:
drivers/block/nvme-core.c: In function 'nvme_submit_io':
drivers/block/nvme-core.c:1847:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
(void __user *)io.addr, length, NULL, 0);
The cast here is intentional and safe, so we can shut up the
gcc warning by adding an intermediate cast to 'uintptr_t'.
I had previously submitted a patch to fix this problem in the
nvme driver, but it was accepted on the same day that two new
warnings got added.
For clarification, I also change the third instance of this cast
to use uintptr_t instead of unsigned long now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: d29ec8241c10e ("nvme: submit internal commands through the block layer") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
btree_split_beneath()'s error path had an outstanding FIXME that speaks
directly to the potential for _not_ cleaning up a previously allocated
bufio-backed block.
Fix this by releasing the previously allocated bufio block using
unlock_block().
Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the CLEAN_SHUTDOWN flag is not set when a cache is loaded then all cache
blocks are marked as dirty and a full writeback occurs.
__commit_transaction() is responsible for setting/clearing
CLEAN_SHUTDOWN (based the flags_mutator that is passed in).
Fix this issue, of the cache's on-disk flags being wrong, by making sure
__commit_transaction() does not reset the flags after the mutator has
altered the flags in preparation for them being serialized to disk.
Commit 4c7e309340ff ("dm btree remove: fix bug in redistribute3") wasn't
a complete fix for redistribute3().
The redistribute3 function takes 3 btree nodes and shares out the entries
evenly between them. If the three nodes in total contained
(MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting
rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in
the center.
Fix this issue by being more careful about calculating the target number
of entries for the left and right nodes.
Unit tested in userspace using this program:
https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.c
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
bdi's are initialized in two steps, bdi_init() and bdi_register(), but
destroyed in a single step by bdi_destroy() which, for a bdi embedded
in a request_queue, is called during blk_cleanup_queue() which makes
the queue invisible and starts the draining of remaining usages.
A request_queue's user can access the congestion state of the embedded
bdi as long as it holds a reference to the queue. As such, it may
access the congested state of a queue which finished
blk_cleanup_queue() but hasn't reached blk_release_queue() yet.
Because the congested state was embedded in backing_dev_info which in
turn is embedded in request_queue, accessing the congested state after
bdi_destroy() was called was fine. The bdi was destroyed but the
memory region for the congested state remained accessible till the
queue got released.
a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in
bdi_writeback") changed the situation. Now, the root congested state
which is expected to be pinned while request_queue remains accessible
is separately reference counted and the base ref is put during
bdi_destroy(). This means that the root congested state may go away
prematurely while the queue is between bdi_dstroy() and
blk_cleanup_queue(), which was detected by Andrey's KASAN tests.
The root cause of this problem is that bdi doesn't distinguish the two
steps of destruction, unregistration and release, and now the root
congested state actually requires a separate release step. To fix the
issue, this patch separates out bdi_unregister() and bdi_exit() from
bdi_destroy(). bdi_unregister() is called from blk_cleanup_queue()
and bdi_exit() from blk_release_queue(). bdi_destroy() is now just a
simple wrapper calling the two steps back-to-back.
While at it, the prototype of bdi_destroy() is moved right below
bdi_setup_and_register() so that the counterpart operations are
located together.
Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback") Reported-and-tested-by: Andrey Konovalov <andreyknvl@google.com> Link: http://lkml.kernel.org/g/CAAeHK+zUJ74Zn17=rOyxacHU18SgCfC6bsYW=6kCY5GXJBwGfQ@mail.gmail.com Reviewed-by: Jan Kara <jack@suse.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit dd006da21646 ("arm64: mm: increase VA range of identity map")
introduced a mechanism to extend the virtual memory map range
to support arm64 systems with system RAM located at very high offset,
where the identity mapping used to enable/disable the MMU requires
additional translation levels to map the physical memory at an equal
virtual offset.
The kernel detects at boot time the tcr_el1.t0sz value required by the
identity mapping and sets-up the tcr_el1.t0sz register field accordingly,
any time the identity map is required in the kernel (ie when enabling the
MMU).
After enabling the MMU, in the cold boot path the kernel resets the
tcr_el1.t0sz to its default value (ie the actual configuration value for
the system virtual address space) so that after enabling the MMU the
memory space translated by ttbr0_el1 is restored as expected.
Commit dd006da21646 ("arm64: mm: increase VA range of identity map")
also added code to set-up the tcr_el1.t0sz value when the kernel resumes
from low-power states with the MMU off through cpu_resume() in order to
effectively use the identity mapping to enable the MMU but failed to add
the code required to restore the tcr_el1.t0sz to its default value, when
the core returns to the kernel with the MMU enabled, so that the kernel
might end up running with tcr_el1.t0sz value set-up for the identity
mapping which can be lower than the value required by the actual virtual
address space, resulting in an erroneous set-up.
This patchs adds code in the resume path that restores the tcr_el1.t0sz
default value upon core resume, mirroring this way the cold boot path
behaviour therefore fixing the issue.
Cc: Catalin Marinas <catalin.marinas@arm.com> Fixes: dd006da21646 ("arm64: mm: increase VA range of identity map") Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With this patch applied, we were the only architecture making this sort
of adjustment to the PC calculation in the unwinder. This causes
problems for ftrace, where the PC values are matched against the
contents of the stack frames in the callchain and fail to match any
records after the address adjustment.
Whilst there has been some effort to change ftrace to workaround this,
those patches are not yet ready for mainline and, since we're the odd
architecture in this regard, let's just step in line with other
architectures (like arch/arm/) for now.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 8a603f91cc48 ("ARM: 8445/1: fix vdsomunge not to depend on
glibc specific byteswap.h") unfortunately introduced a bug created but
not found during discussion and patch simplification.
Reported-by: Efraim Yawitz <efraim.yawitz@gmail.com> Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Fixes: 8a603f91cc48 ("ARM: 8445/1: fix vdsomunge not to depend on glibc specific byteswap.h") Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>