Sudip Mukherjee [Wed, 28 Oct 2015 09:11:32 +0000 (14:41 +0530)]
parport: EXPORT_SYMBOL should follow function
All symbols were exported at the end of the file but they are supposed
to be exported just after the function. And checkpatch was complaining
about it.
Merge tag 'extcon-next-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into char-misc-next
Chanwoo writes:
Update extcon for 4.5
Detailed description for patchset:
1. Add new MAX3355 extcon driver
- Maxim Integrated MAX3355E chip integrates a charge pump
and comparator to enable a system with an integrated
USB OTG dual-role transceiver to function as an USB OTG
dual-role device.
2. Update the extcon-arizona driver for jack detection
- Add the device binding for the jack detection and add
the documentation of extcon-arizona.c.
3. Fix the minor issue of extcon driver
- Add IRQF_ONESHOT to interrupt flags of extcon-rt8973.
- Fix the return value regmap_irq_get_virq() of
extcon-max(14577|77693|77843).c driver by using script[1].
[1] http://permalink.gmane.org/gmane.linux.kernel/2046107
Andrew F. Davis [Thu, 17 Dec 2015 15:47:02 +0000 (08:47 -0700)]
coresight: Fix a typo in Kconfig
Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathieu Poirier [Thu, 17 Dec 2015 15:47:02 +0000 (08:47 -0700)]
coresight: checking for NULL string in coresight_name_match()
Connection child names associated to ports can sometimes be NULL,
which is the case when booting a system on QEMU or when the Coresight
power domain isn't switched on.
This patch is adding a check to make sure a NULL string isn't fed
to strcmp(), something that avoid crashing the system.
K. Y. Srinivasan [Wed, 16 Dec 2015 00:27:27 +0000 (16:27 -0800)]
Drivers: hv: vmbus: Treat Fibre Channel devices as performance critical
For performance critical devices, we distribute the incoming
channel interrupt load across available CPUs in the guest.
Include Fibre channel devices in the set of devices for which
we would distribute the interrupt load.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC [M] drivers/input/serio/hyperv-keyboard.o
drivers/input/serio/hyperv-keyboard.c:427:2: warning: missing braces around
initializer [-Wmissing-braces]
{ HV_KBD_GUID, },
^
drivers/input/serio/hyperv-keyboard.c:427:2: warning: (near initialization
for .id_table[0].guid.b.) [-Wmissing-braces]
The patch fixes the warning.
Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sergei Shtylyov [Fri, 18 Dec 2015 23:17:41 +0000 (08:17 +0900)]
extcon: add Maxim MAX3355 driver
Maxim Integrated MAX3355E chip integrates a charge pump and comparators to
enable a system with an integrated USB OTG dual-role transceiver to
function as an USB OTG dual-role device. In addition to sensing/controlling
Vbus, the chip also passes thru the ID signal from the USB OTG connector.
On some Renesas boards, this signal is just fed into the SoC thru a GPIO
pin -- there's no real OTG controller, only host and gadget USB controllers
sharing the same USB bus; however, we'd like to allow host or gadget
drivers to be loaded depending on the cable type, hence the need for the
MAX3355 extcon driver. The Vbus status signals are also wired to GPIOs
(however, we aren't currently interested in them), the OFFVBUS# signal is
controlled by the host controllers, there's also the SHDN# signal wired to
a GPIO, it should be driven high for the normal operation.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Acked-by: Rob Herring <robh@kernel.org>
[cw00.choi: Add the GPIOLIB dependency] Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Currently, there is only one user for hv_ringbuffer_read()/
hv_ringbuffer_peak() functions and the usage of these functions is:
- insecure as we drop ring_lock between them, someone else (in theory
only) can acquire it in between;
- non-optimal as we do a number of things (acquire/release the above
mentioned lock, calculate available space on the ring, ...) twice and
this path is performance-critical.
Remove hv_ringbuffer_peek() moving the logic from __vmbus_recvpacket() to
hv_ringbuffer_read().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vitaly Kuznetsov [Tue, 15 Dec 2015 03:02:00 +0000 (19:02 -0800)]
Drivers: hv: remove code duplication between vmbus_recvpacket()/vmbus_recvpacket_raw()
vmbus_recvpacket() and vmbus_recvpacket_raw() are almost identical but
there are two discrepancies:
1) vmbus_recvpacket() doesn't propagate errors from hv_ringbuffer_read()
which looks like it is not desired.
2) There is an error message printed in packetlen > bufferlen case in
vmbus_recvpacket(). I'm removing it as it is usless for users to see
such messages and /vmbus_recvpacket_raw() doesn't have it.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vitaly Kuznetsov [Tue, 15 Dec 2015 03:01:59 +0000 (19:01 -0800)]
Drivers: hv: ring_buffer: remove code duplication from hv_ringbuffer_peek/read()
hv_ringbuffer_peek() does the same as hv_ringbuffer_read() without
advancing the read index. The only functional change this patch brings
is moving hv_need_to_signal_on_read() call under the ring_lock but this
function is just a couple of comparisons.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
smp_read_barrier_depends() does nothing on almost all arcitectures
including x86 and having it in the beginning of
hv_get_ringbuffer_availbytes() does not provide any guarantees anyway.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vitaly Kuznetsov [Tue, 15 Dec 2015 03:01:57 +0000 (19:01 -0800)]
Drivers: hv: ring_buffer.c: fix comment style
Convert 6+-string comments repeating function names to normal kernel-style
comments and fix a couple of other comment style issues. No textual or
functional changes intended.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The problem is that hvutil_transport_destroy() which does misc_deregister()
freeing the appropriate device is reachable by two paths: module unload
and from util_remove(). While module unload path is protected by .owner in
struct file_operations util_remove() path is not. Freeing the device while
someone holds an open fd for it is a show stopper.
In general, it is not possible to revoke an fd from all users so the only
way to solve the issue is to defer freeing the hvutil_transport structure.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When Hyper-V host asks us to remove some util driver by closing the
appropriate channel there is no easy way to force the current file
descriptor holder to hang up but we can start to respond -EBADF to all
operations asking it to exit gracefully.
As we're setting hvt->mode from two separate contexts now we need to use
a proper locking.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
K. Y. Srinivasan [Tue, 15 Dec 2015 00:01:57 +0000 (16:01 -0800)]
Drivers: hv: utils: Invoke the poll function after handshake
When the handshake with daemon is complete, we should poll the channel since
during the handshake, we will not be processing any messages. This is a
potential bug if the host is waiting for a response from the guest.
I would like to thank Dexuan for pointing this out.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
K. Y. Srinivasan [Tue, 15 Dec 2015 00:01:56 +0000 (16:01 -0800)]
Drivers: hv: vmbus: Force all channel messages to be delivered on CPU 0
Force all channel messages to be delivered on CPU0. These messages are not
performance critical and are used during the setup and teardown of the
channel.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey Smetanin [Tue, 15 Dec 2015 00:01:55 +0000 (16:01 -0800)]
drivers/hv: correct tsc page sequence invalid value
Hypervisor Top Level Functional Specification v3/4 says
that TSC page sequence value = -1(0xFFFFFFFF) is used to
indicate that TSC page no longer reliable source of reference
timer. Unfortunately, we found that Windows Hyper-V guest
side implementation uses sequence value = 0 to indicate
that Tsc page no longer valid. This is clearly visible
inside Windows 2012R2 ntoskrnl.exe HvlGetReferenceTime()
function dissassembly:
HvlGetReferenceTime proc near
xchg ax, ax
loc_1401C3132:
mov rax, cs:HvlpReferenceTscPage
mov r9d, [rax]
test r9d, r9d
jz short loc_1401C3176
rdtsc
mov rcx, cs:HvlpReferenceTscPage
shl rdx, 20h
or rdx, rax
mov rax, [rcx+8]
mov rcx, cs:HvlpReferenceTscPage
mov r8, [rcx+10h]
mul rdx
mov rax, cs:HvlpReferenceTscPage
add rdx, r8
mov ecx, [rax]
cmp ecx, r9d
jnz short loc_1401C3132
jmp short loc_1401C3184
loc_1401C3176:
mov ecx, 40000020h
rdmsr
shl rdx, 20h
or rdx, rax
loc_1401C3184:
mov rax, rdx
retn
HvlGetReferenceTime endp
This patch aligns Tsc page invalid sequence value with
Windows Hyper-V guest implementation which is more
compatible with both Hyper-V hypervisor and KVM hypervisor.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
K. Y. Srinivasan [Tue, 15 Dec 2015 00:01:54 +0000 (16:01 -0800)]
Drivers: hv: vmbus: Fix a Host signaling bug
Currently we have two policies for deciding when to signal the host:
One based on the ring buffer state and the other based on what the
VMBUS client driver wants to do. Consider the case when the client
wants to explicitly control when to signal the host. In this case,
if the client were to defer signaling, we will not be able to signal
the host subsequently when the client does want to signal since the
ring buffer state will prevent the signaling. Implement logic to
have only one signaling policy in force for a given channel.
Jake Oshins [Tue, 15 Dec 2015 00:01:52 +0000 (16:01 -0800)]
drivers:hv: Allow for MMIO claims that span ACPI _CRS records
This patch makes 16GB GPUs work in Hyper-V VMs, since, for
compatibility reasons, the Hyper-V BIOS lists MMIO ranges in 2GB
chunks in its root bus's _CRS object.
Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dexuan Cui [Tue, 15 Dec 2015 00:01:51 +0000 (16:01 -0800)]
Drivers: hv: vmbus: channge vmbus_connection.channel_lock to mutex
spinlock is unnecessary here.
mutex is enough.
Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dexuan Cui [Tue, 15 Dec 2015 00:01:50 +0000 (16:01 -0800)]
Drivers: hv: vmbus: release relid on error in vmbus_process_offer()
We want to simplify vmbus_onoffer_rescind() by not invoking
hv_process_channel_removal(NULL, ...).
Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dexuan Cui [Tue, 15 Dec 2015 00:01:49 +0000 (16:01 -0800)]
Drivers: hv: vmbus: fix rescind-offer handling for device without a driver
In the path vmbus_onoffer_rescind() -> vmbus_device_unregister() ->
device_unregister() -> ... -> __device_release_driver(), we can see for a
device without a driver loaded: dev->driver is NULL, so
dev->bus->remove(dev), namely vmbus_remove(), isn't invoked.
As a result, vmbus_remove() -> hv_process_channel_removal() isn't invoked
and some cleanups(like sending a CHANNELMSG_RELID_RELEASED message to the
host) aren't done.
We can demo the issue this way:
1. rmmod hv_utils;
2. disable the Heartbeat Integration Service in Hyper-V Manager and lsvmbus
shows the device disappears.
3. re-enable the Heartbeat in Hyper-V Manager and modprobe hv_utils, but
lsvmbus shows the device can't appear again.
This is because, the host thinks the VM hasn't released the relid, so can't
re-offer the device to the VM.
We can fix the issue by moving hv_process_channel_removal()
from vmbus_close_internal() to vmbus_device_release(), since the latter is
always invoked on device_unregister(), whether or not the dev has a driver
loaded.
Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dexuan Cui [Tue, 15 Dec 2015 00:01:48 +0000 (16:01 -0800)]
Drivers: hv: vmbus: do sanity check of channel state in vmbus_close_internal()
This fixes an incorrect assumption of channel state in the function.
Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dexuan Cui [Tue, 15 Dec 2015 00:01:47 +0000 (16:01 -0800)]
Drivers: hv: vmbus: serialize process_chn_event() and vmbus_close_internal()
process_chn_event(), running in the tasklet, can race with
vmbus_close_internal() in the case of SMP guest, e.g., when the former is
accessing channel->inbound.ring_buffer, the latter could be freeing the
ring_buffer pages.
To resolve the race, we can serialize them by disabling the tasklet when
the latter is running here.
Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Olaf Hering [Tue, 15 Dec 2015 00:01:42 +0000 (16:01 -0800)]
Drivers: hv: vss: run only on supported host versions
The Backup integration service on WS2012 has appearently trouble to
negotiate with a guest which does not support the provided util version.
Currently the VSS driver supports only version 5/0. A WS2012 offers only
version 1/x and 3/x, and vmbus_prep_negotiate_resp correctly returns an
empty icframe_vercnt/icmsg_vercnt. But the host ignores that and
continues to send ICMSGTYPE_NEGOTIATE messages. The result are weird
errors during boot and general misbehaviour.
Check the Windows version to work around the host bug, skip hv_vss_init
on WS2012 and older.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jake Oshins [Tue, 15 Dec 2015 00:01:40 +0000 (16:01 -0800)]
drivers:hv: Export the API to invoke a hypercall on Hyper-V
This patch exposes the function that hv_vmbus.ko uses to make hypercalls. This
is necessary for retargeting an interrupt when it is given a new affinity.
Since we are exporting this API, rename the API as it will be visible outside
the hv.c file.
Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jake Oshins [Tue, 15 Dec 2015 00:01:39 +0000 (16:01 -0800)]
drivers:hv: Export a function that maps Linux CPU num onto Hyper-V proc num
This patch exposes the mapping between Linux CPU number and Hyper-V virtual
processor number. This is necessary because the hypervisor needs to know which
virtual processors to target when making a mapping in the Interrupt Redirection
Table in the I/O MMU.
Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey Smetanin [Tue, 15 Dec 2015 00:01:38 +0000 (16:01 -0800)]
drivers/hv: cleanup synic msrs if vmbus connect failed
Before vmbus_connect() synic is setup per vcpu - this means
hypervisor receives writes at synic msr's and probably allocate
hypervisor resources per synic setup.
If vmbus_connect() failed for some reason it's neccessary to cleanup
synic setup by call hv_synic_cleanup() at each vcpu to get a chance
to free allocated resources by hypervisor per synic.
This patch does appropriate cleanup in case of vmbus_connect() failure.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Olaf Hering [Tue, 15 Dec 2015 00:01:35 +0000 (16:01 -0800)]
tools: hv: remove repeated HV_FCOPY string
HV_FCOPY is already used as identifier in syslog.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Olaf Hering [Tue, 15 Dec 2015 00:01:34 +0000 (16:01 -0800)]
tools: hv: report ENOSPC errors in hv_fcopy_daemon
Currently some "Unspecified error 0x80004005" is reported on the Windows
side if something fails. Handle the ENOSPC case and return
ERROR_DISK_FULL, which allows at least Copy-VMFile to report a meaning
full error.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Olaf Hering [Tue, 15 Dec 2015 00:01:33 +0000 (16:01 -0800)]
Drivers: hv: utils: run polling callback always in interrupt context
All channel interrupts are bound to specific VCPUs in the guest
at the point channel is created. While currently, we invoke the
polling function on the correct CPU (the CPU to which the channel
is bound to) in some cases we may run the polling function in
a non-interrupt context. This potentially can cause an issue as the
polling function can be interrupted by the channel callback function.
Fix the issue by running the polling function on the appropriate CPU
at interrupt level. Additional details of the issue being addressed by
this patch are given below:
Currently hv_fcopy_onchannelcallback is called from interrupts and also
via the ->write function of hv_utils. Since the used global variables to
maintain state are not thread safe the state can get out of sync.
This affects the variable state as well as the channel inbound buffer.
As suggested by KY adjust hv_poll_channel to always run the given
callback on the cpu which the channel is bound to. This avoids the need
for locking because all the util services are single threaded and only
one transaction is active at any given point in time.
Additionally, remove the context variable, they will always be the same as
recv_channel.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
K. Y. Srinivasan [Tue, 15 Dec 2015 00:01:32 +0000 (16:01 -0800)]
Drivers: hv: util: Increase the timeout for util services
Util services such as KVP and FCOPY need assistance from daemon's running
in user space. Increase the timeout so we don't prematurely terminate
the transaction in the kernel. Host sets up a 60 second timeout for
all util driver transactions. The host will retry the transaction if it
times out. Set the guest timeout at 30 seconds.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Charles Keepax [Mon, 14 Dec 2015 10:37:12 +0000 (10:37 +0000)]
extcon: arizona: Update device tree binding for mic detect configurations
Update the device tree binding documentation to include documentation for
the wlf,micd-configs property that is used to specify the configurations
for headset polarity detection (CTIA / OTMP).
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Charles Keepax [Mon, 14 Dec 2015 10:37:11 +0000 (10:37 +0000)]
extcon: arizona: Add device bindings for the micd configurations
Add device bindings to support configuring the jack detection
configurations. Each configuration needs to specify the connection of
the mic det pins, which micbias should be used and the value of the
micd polarity GPIO required to activate that configuration.
Add support to find string-similar symbols. When option --sim SYM is
specified, checkkconfigsymbols.py will print at most 10 symbols defined
in Kconfig that are string similar to SYM in the following format:
Similar symbols: $COMMA_SEPARATED_LIST_OF_SYMBOLS
Note, if no similar symbols are found it is indicated as follows:
Similar symbols: no similar symbols found
Since the implemented functionality is also useful when searching the
entire source or when diffing two commits, a list of similar symbols is
printed unconditionally with the other data. In order to make the
output more readable, the format now looks as follows:
$UNDEFINED_SYMBOL
Referencing files: $COMMA_SEPARATED_LIST_OF_FILES
Similar symbols: $COMMA_SEPARATED_LIST_OF_SYMBOLS
[Optional with '--find']
Commits changing symbol:
- $COMMIT_1_HASH ("$COMMIT_1_MESSAGE")
- $COMMIT_2_HASH ("$COMMIT_2_MESSAGE")
or
- no commit found
Peter Zijlstra [Sun, 13 Dec 2015 21:11:16 +0000 (22:11 +0100)]
sched/wait: Fix the signal handling fix
Jan Stancek reported that I wrecked things for him by fixing things for
Vladimir :/
His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which
should not be possible, however my previous patch made this possible by
unconditionally checking signal_pending().
We cannot use current->state as was done previously, because the
instruction after the store to that variable it can be changed. We must
instead pass the initial state along and use that.
Fixes: 68985633bccb ("sched/wait: Fix signal handling in bit wait helpers") Reported-by: Jan Stancek <jstancek@redhat.com> Reported-by: Chris Mason <clm@fb.com> Tested-by: Jan Stancek <jstancek@redhat.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Chris Mason <clm@fb.com> Reviewed-by: Paul Turner <pjt@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: tglx@linutronix.de Cc: Oleg Nesterov <oleg@redhat.com> Cc: hpa@zytor.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 13 Dec 2015 20:41:10 +0000 (12:41 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixlets from Thomas Gleixner:
"Two trivial fixes which add missing header fileas and forward
declarations so the code will compile even when the magic include
chains are different"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Add missing include for barrier.h
irqchip/gic-v3: Add missing struct device_node declaration
Linus Torvalds [Sun, 13 Dec 2015 20:29:22 +0000 (12:29 -0800)]
Merge tag 'char-misc-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull fpga driver fixes from Greg KH:
"Only two small fpga driver fixes here, both have been in linux-next
for a while, and resolve some reported issues"
* tag 'char-misc-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
fpga manager: Fix firmware resource leak on error
fpga manager: remove label
Linus Torvalds [Sun, 13 Dec 2015 19:58:18 +0000 (11:58 -0800)]
Merge tag 'usb-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are a number of small USB fixes for 4.4-rc5. All of them have
been in linux-next. The majority are gadget and phy issues, with a
few new quirks and device ids added as well"
* tag 'usb-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (32 commits)
USB: add quirk for devices with broken LPM
xhci: fix usb2 resume timing and races.
usb: musb: fail with error when no DMA controller set
usb: gadget: uvc: fix permissions of configfs attributes
usb: musb: core: Fix pm runtime for deferred probe
usb: phy: msm: fix a possible NULL dereference
USB: host: ohci-at91: fix a crash in ohci_hcd_at91_overcurrent_irq
usb: Quiet down false peer failure messages
usb: xhci: fix config fail of FS hub behind a HS hub with MTT
xhci: Fix memory leak in xhci_pme_acpi_rtd3_enable()
usb: Use the USB_SS_MULT() macro to decode burst multiplier for log message
USB: whci-hcd: add check for dma mapping error
usb: core : hub: Fix BOS 'NULL pointer' kernel panic
USB: quirks: Apply ALWAYS_POLL to all ELAN devices
usb-storage: Fix scsi-sd failure "Invalid field in cdb" for USB adapter JMicron
USB: quirks: Fix another ELAN touchscreen
usb: dwc3: gadget: don't prestart interrupt endpoints
USB: serial: Another Infineon flash loader USB ID
USB: cdc_acm: Ignore Infineon Flash Loader utility
USB: cp210x: Remove CP2110 ID from compatibility list
...
Linus Torvalds [Sun, 13 Dec 2015 00:43:44 +0000 (16:43 -0800)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Here are a bunch of small bug fixes for various ARM platforms, nothing
really sticks out this week, most of either fixes bugs in code that
was just added in 4.4, or that has been broken for many years without
anyone noticing.
at91/sama5d2:
- fix sama5de hardware setup of sd/mmc interface
- proper selection of pinctrl drivers. PIO4 is necessary for sama5d2
imx:
- Fix vf610 SAI clock configuration bug which is discovered by the
newly added master mode support in SAI audio driver.
- Fix buggy L2 cache latency values in vf610 device trees, which may
cause system hang when cpu runs at a higher frequency.
ixp4xx:
- fix prototypes for readl/writel functions
ls2080a:
- use little-endian register access for GPIO and SDHCI
omap:
- Fix clock source for ARM TWD and global timers on am437x
- Always select REGULATOR_FIXED_VOLTAGE for omap2+ instead of when
MACH_OMAP3_PANDORA is selected
- Fix SPI DMA handles for dm816x as only some were mapped
- Fix up mbox cells for dm816x to make mailbox usable
Linus Torvalds [Sat, 12 Dec 2015 21:39:59 +0000 (13:39 -0800)]
Merge tag 'powerpc-4.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- opal-irqchip: Fix double endian conversion from Alistair Popple
- cxl: Set endianess of kernel contexts from Frederic Barrat
- sbc8641: drop bogus PHY IRQ entries from DTS file from Paul Gortmaker
- Revert "powerpc/eeh: Don't unfreeze PHB PE after reset" from Andrew
Donnellan
* tag 'powerpc-4.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
Revert "powerpc/eeh: Don't unfreeze PHB PE after reset"
powerpc/sbc8641: drop bogus PHY IRQ entries from DTS file
cxl: Set endianess of kernel contexts
powerpc/opal-irqchip: Fix double endian conversion
Linus Torvalds [Sat, 12 Dec 2015 18:44:49 +0000 (10:44 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"17 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
MIPS: fix DMA contiguous allocation
sh64: fix __NR_fgetxattr
ocfs2: fix SGID not inherited issue
mm/oom_kill.c: avoid attempting to kill init sharing same memory
drivers/base/memory.c: prohibit offlining of memory blocks with missing sections
tmpfs: fix shmem_evict_inode() warnings on i_blocks
mm/hugetlb.c: fix resv map memory leak for placeholder entries
mm: hugetlb: call huge_pte_alloc() only if ptep is null
kernel: remove stop_machine() Kconfig dependency
mm: kmemleak: mark kmemleak_init prototype as __init
mm: fix kerneldoc on mem_cgroup_replace_page
osd fs: __r4w_get_page rely on PageUptodate for uptodate
MAINTAINERS: make Vladimir co-maintainer of the memory controller
mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo
memcg: fix memory.high target
mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
Linus Torvalds [Sat, 12 Dec 2015 18:34:20 +0000 (10:34 -0800)]
Merge branch 'parisc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"Fix the boot crash on Mako machines with Huge Pages, prevent a panic
with SATA controllers (and others) by correctly calculating the IOMMU
space, hook up the mlock2 syscall and drop unneeded code in the parisc
pci code"
* 'parisc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Disable huge pages on Mako machines
parisc: Wire up mlock2 syscall
parisc: Remove unused pcibios_init_bus()
parisc iommu: fix panic due to trying to allocate too large region
Linus Torvalds [Sat, 12 Dec 2015 18:24:00 +0000 (10:24 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
"A set of fixes for the current series. This contains:
- A bunch of fixes for lightnvm, should be the last round for this
series. From Matias and Wenwei.
- A writeback detach inode fix from Ilya, also marked for stable.
- A block (though it says SCSI) fix for an OOPS in SCSI runtime power
management.
- Module init error path fixes for null_blk from Minfei"
* 'for-linus' of git://git.kernel.dk/linux-block:
null_blk: Fix error path in module initialization
lightnvm: do not compile in debugging by default
lightnvm: prevent gennvm module unload on use
lightnvm: fix media mgr registration
lightnvm: replace req queue with nvmdev for lld
lightnvm: comments on constants
lightnvm: check mm before use
lightnvm: refactor spin_unlock in gennvm_get_blk
lightnvm: put blks when luns configure failed
lightnvm: use flags in rrpc_get_blk
block: detach bdev inode from its wb in __blkdev_put()
SCSI: Fix NULL pointer dereference in runtime PM
Linus Torvalds [Sat, 12 Dec 2015 18:16:26 +0000 (10:16 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Update the linker script to use L1_CACHE_BYTES instead of hard-coded
64. We recently changed L1_CACHE_BYTES to 128
- Improve race condition reporting on set_pte_at() and change the BUG
to WARN_ONCE. With hardware update of the accessed/dirty state, we
need to ensure that set_pte_at() does not inadvertently override
hardware updated state. The patch also makes the checks ignore
!pte_valid() new entries
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Improve error reporting on set_pte_at() checks
arm64: update linker script to increased L1_CACHE_BYTES value
Qais Yousef [Fri, 11 Dec 2015 21:41:09 +0000 (13:41 -0800)]
MIPS: fix DMA contiguous allocation
Recent changes to how GFP_ATOMIC is defined seems to have broken the
condition to use mips_alloc_from_contiguous() in
mips_dma_alloc_coherent().
I couldn't bottom out the exact change but I think it's this commit d0164adc89f6 ("mm, page_alloc: distinguish between being unable to
sleep, unwilling to sleep and avoiding waking kswapd").
GFP_ATOMIC has multiple bits set and the check for !(gfp & GFP_ATOMIC)
isn't enough.
The reason behind this condition is to check whether we can potentially
do a sleeping memory allocation. Use gfpflags_allow_blocking() instead
which should be more robust.
Dmitry V. Levin [Fri, 11 Dec 2015 21:41:06 +0000 (13:41 -0800)]
sh64: fix __NR_fgetxattr
According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr
has to be defined to 259, but it doesn't. Instead, it's defined to 269,
which is of course used by another syscall, __NR_sched_setaffinity in this
case.
This bug was found by strace test suite.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Fri, 11 Dec 2015 21:41:03 +0000 (13:41 -0800)]
ocfs2: fix SGID not inherited issue
Commit 8f1eb48758aa ("ocfs2: fix umask ignored issue") introduced an
issue, SGID of sub dir was not inherited from its parents dir. It is
because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but
is overwritten by "mode" which don't have SGID set later.
Fixes: 8f1eb48758aa ("ocfs2: fix umask ignored issue") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chen Jie [Fri, 11 Dec 2015 21:41:00 +0000 (13:41 -0800)]
mm/oom_kill.c: avoid attempting to kill init sharing same memory
It's possible that an oom killed victim shares an ->mm with the init
process and thus oom_kill_process() would end up trying to kill init as
well.
This has been shown in practice:
Out of memory: Kill process 9134 (init) score 3 or sacrifice child
Killed process 9134 (init) total-vm:1868kB, anon-rss:84kB, file-rss:572kB
Kill process 1 (init) sharing same memory
...
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
And this will result in a kernel panic.
If a process is forked by init and selected for oom kill while still
sharing init_mm, then it's likely this system is in a recoverable state.
However, it's better not to try to kill init and allow the machine to
panic due to unkillable processes.
[rientjes@google.com: rewrote changelog]
[akpm@linux-foundation.org: fix inverted test, per Ben] Signed-off-by: Chen Jie <chenjie6@huawei.com> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Seth Jennings [Fri, 11 Dec 2015 21:40:57 +0000 (13:40 -0800)]
drivers/base/memory.c: prohibit offlining of memory blocks with missing sections
Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory
x86-64 systems") and 982792c782ef ("x86, mm: probe memory block size for
generic x86 64bit") introduced large block sizes for x86. This made it
possible to have multiple sections per memory block where previously,
there was a only every one section per block.
Since blocks consist of contiguous ranges of section, there can be holes
in the blocks where sections are not present. If one attempts to
offline such a block, a crash occurs since the code is not designed to
deal with this.
This patch is a quick fix to gaurd against the crash by not allowing
blocks with non-present sections to be offlined.
Hugh Dickins [Fri, 11 Dec 2015 21:40:55 +0000 (13:40 -0800)]
tmpfs: fix shmem_evict_inode() warnings on i_blocks
Dmitry Vyukov provides a little program, autogenerated by syzkaller,
which races a fault on a mapping of a sparse memfd object, against
truncation of that object below the fault address: run repeatedly for a
few minutes, it reliably generates shmem_evict_inode()'s
WARN_ON(inode->i_blocks).
(But there's nothing specific to memfd here, nor to the fstat which it
happened to use to generate the fault: though that looked suspicious,
since a shmem_recalc_inode() had been added there recently. The same
problem can be reproduced with open+unlink in place of memfd_create, and
with fstatfs in place of fstat.)
v3.7 commit 0f3c42f522dc ("tmpfs: change final i_blocks BUG to WARNING")
explains one cause of such a warning (a race with shmem_writepage to
swap), and possible solutions; but we never took it further, and this
syzkaller incident turns out to have a different cause.
shmem_getpage_gfp()'s error recovery, when a freshly allocated page is
then found to be beyond eof, looks plausible - decrementing the alloced
count that was just before incremented - but in fact can go wrong, if a
racing thread (the truncator, for example) gets its shmem_recalc_inode()
in just after our delete_from_page_cache(). delete_from_page_cache()
decrements nrpages, that shmem_recalc_inode() will balance the books by
decrementing alloced itself, then our decrement of alloced take it one
too low: leading to the WARNING when the object is finally evicted.
Once the new page has been exposed in the page cache,
shmem_getpage_gfp() must leave it to shmem_recalc_inode() itself to get
the accounting right in all cases (and not fall through from "trunc:" to
"decused:"). Adjust that error recovery block; and the reinitialization
of info and sbinfo can be removed too.
While we're here, fix shmem_writepage() to avoid the original issue: it
will be safe against a racing shmem_recalc_inode(), if it merely
increments swapped before the shmem_delete_from_page_cache() which
decrements nrpages (but it must then do its own shmem_recalc_inode()
before that, while still in balance, instead of after). (Aside: why do
we shmem_recalc_inode() here in the swap path? Because its raison d'etre
is to cope with clean sparse shmem pages being reclaimed behind our
back: so here when swapping is a good place to look for that case.) But
I've not now managed to reproduce this bug, even without the patch.
I don't see why I didn't do that earlier: perhaps inhibited by the
preference to eliminate shmem_recalc_inode() altogether. Driven by this
incident, I do now have a patch to do so at last; but still want to sit
on it for a bit, there's a couple of questions yet to be resolved.
Dmitry identified a potential memory leak in the routine region_chg,
where a region descriptor is not free'ed on an error path.
However, the root cause for the above memory leak resides in region_del.
In this specific case, a "placeholder" entry is created in region_chg.
The associated page allocation fails, and the placeholder entry is left
in the reserve map. This is "by design" as the entry should be deleted
when the map is released. The bug is in the region_del routine which is
used to delete entries within a specific range (and when the map is
released). region_del did not handle the case where a placeholder entry
exactly matched the start of the range range to be deleted. In this
case, the entry would not be deleted and leaked. The fix is to take
these special placeholder entries into account in region_del.
The region_chg error path leak is also fixed.
Fixes: feba16e25a57 ("mm/hugetlb: add region_del() to delete a specific range of entries") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> [4.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Fri, 11 Dec 2015 21:40:49 +0000 (13:40 -0800)]
mm: hugetlb: call huge_pte_alloc() only if ptep is null
Currently at the beginning of hugetlb_fault(), we call huge_pte_offset()
and check whether the obtained *ptep is a migration/hwpoison entry or
not. And if not, then we get to call huge_pte_alloc(). This is racy
because the *ptep could turn into migration/hwpoison entry after the
huge_pte_offset() check. This race results in BUG_ON in
huge_pte_alloc().
We don't have to call huge_pte_alloc() when the huge_pte_offset()
returns non-NULL, so let's fix this bug with moving the code into else
block.
Note that the *ptep could turn into a migration/hwpoison entry after
this block, but that's not a problem because we have another
!pte_present check later (we never go into hugetlb_no_page() in that
case.)
Fixes: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> [2.6.36+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Wilson [Fri, 11 Dec 2015 21:40:46 +0000 (13:40 -0800)]
kernel: remove stop_machine() Kconfig dependency
Currently the full stop_machine() routine is only enabled on SMP if
module unloading is enabled, or if the CPUs are hotpluggable. This
leads to configurations where stop_machine() is broken as it will then
only run the callback on the local CPU with irqs disabled, and not stop
the other CPUs or run the callback on them.
For example, this breaks MTRR setup on x86 in certain configs since ea8596bb2d8d379 ("kprobes/x86: Remove unused text_poke_smp() and
text_poke_smp_batch() functions") as the MTRR is only established on the
boot CPU.
This patch removes the Kconfig option for STOP_MACHINE and uses the SMP
and HOTPLUG_CPU config options to compile the correct stop_machine() for
the architecture, removing the false dependency on MODULE_UNLOAD in the
process.
Link: https://lkml.org/lkml/2014/10/8/124
References: https://bugs.freedesktop.org/show_bug.cgi?id=84794 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Iooss [Fri, 11 Dec 2015 21:40:43 +0000 (13:40 -0800)]
mm: kmemleak: mark kmemleak_init prototype as __init
The kmemleak_init() definition in mm/kmemleak.c is marked __init but its
prototype in include/linux/kmemleak.h is marked __ref since commit a6186d89c913 ("kmemleak: Mark the early log buffer as __initdata").
This causes a section mismatch which is reported as a warning when
building with clang -Wsection, because kmemleak_init() is declared in
section .ref.text but defined in .init.text.
Fix this by marking kmemleak_init() prototype __init.
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Fri, 11 Dec 2015 21:40:38 +0000 (13:40 -0800)]
osd fs: __r4w_get_page rely on PageUptodate for uptodate
Commit 42cb14b110a5 ("mm: migrate dirty page without
clear_page_dirty_for_io etc") simplified the migration of a PageDirty
pagecache page: one stat needs moving from zone to zone and that's about
all.
It's convenient and safest for it to shift the PageDirty bit from old
page to new, just before updating the zone stats: before copying data
and marking the new PageUptodate. This is all done while both pages are
isolated and locked, just as before; and just as before, there's a
moment when the new page is visible in the radix_tree, but not yet
PageUptodate. What's new is that it may now be briefly visible as
PageDirty before it is PageUptodate.
When I scoured the tree to see if this could cause a problem anywhere,
the only places I found were in two similar functions __r4w_get_page():
which look up a page with find_get_page() (not using page lock), then
claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.
I'm not sure whether that was right before, but now it might be wrong
(on rare occasions): only claim the page is uptodate if PageUptodate.
Or perhaps the page in question could never be migratable anyway?
Johannes Weiner [Fri, 11 Dec 2015 21:40:35 +0000 (13:40 -0800)]
MAINTAINERS: make Vladimir co-maintainer of the memory controller
Vladimir architected and authored much of the current state of the
memcg's slab memory accounting and tracking. Make sure he gets CC'd on
bug reports ;-)
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 11 Dec 2015 21:40:32 +0000 (13:40 -0800)]
mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
Tetsuo Handa has reported that the system might basically livelock in
OOM condition without triggering the OOM killer.
The issue is caused by internal dependency of the direct reclaim on
vmstat counter updates (via zone_reclaimable) which are performed from
the workqueue context. If all the current workers get assigned to an
allocation request, though, they will be looping inside the allocator
trying to reclaim memory but zone_reclaimable can see stalled numbers so
it will consider a zone reclaimable even though it has been scanned way
too much. WQ concurrency logic will not consider this situation as a
congested workqueue because it relies that worker would have to sleep in
such a situation. This also means that it doesn't try to spawn new
workers or invoke the rescuer thread if the one is assigned to the
queue.
In order to fix this issue we need to do two things. First we have to
let wq concurrency code know that we are in trouble so we have to do a
short sleep. In order to prevent from issues handled by 0e093d99763e
("writeback: do not sleep on the congestion queue if there are no
congested BDIs or if significant congestion is not being encountered in
the current zone") we limit the sleep only to worker threads which are
the ones of the interest anyway.
The second thing to do is to create a dedicated workqueue for vmstat and
mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
have a spare worker thread for it.
Vlastimil Babka [Fri, 11 Dec 2015 21:40:29 +0000 (13:40 -0800)]
mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo
Commit 016c13daa5c9 ("mm, page_alloc: use masks and shifts when
converting GFP flags to migrate types") has swapped MIGRATE_MOVABLE and
MIGRATE_RECLAIMABLE in the enum definition. However, migratetype_names
wasn't updated to reflect that.
As a result, the file /proc/pagetypeinfo shows the counts for Movable as
Reclaimable and vice versa.
Additionally, commit 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks
for high-order atomic allocations on demand") introduced
MIGRATE_HIGHATOMIC, but did not add a letter to distinguish it into
show_migration_types(), so it doesn't appear in the listing of free
areas during page alloc failures or oom kills.
This patch fixes both problems. The atomic reserves will show with a
letter 'H' in the free areas listings.
Fixes: 016c13daa5c9 ("mm, page_alloc: use masks and shifts when converting GFP flags to migrate types") Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Fri, 11 Dec 2015 21:40:24 +0000 (13:40 -0800)]
memcg: fix memory.high target
When the memory.high threshold is exceeded, try_charge() schedules a
task_work to reclaim the excess. The reclaim target is set to the
number of pages requested by try_charge().
This is wrong, because try_charge() usually charges more pages than
requested (batch > nr_pages) in order to refill per cpu stocks. As a
result, a process in a cgroup can easily exceed memory.high
significantly when doing a lot of charges w/o returning to userspace
(e.g. reading a file in big chunks).
Fix this issue by assuring that when exceeding memory.high a process
reclaims as many pages as were actually charged (i.e. batch).
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back on
alloc_buddy_huge_page() to directly create a hugepage from the buddy
allocator.
In that case, however, if alloc_buddy_huge_page() succeeds we don't
decrement h->resv_huge_pages, which means that successful
hugetlb_fault() returns without releasing the reserve count. As a
result, subsequent hugetlb_fault() might fail despite that there are
still free hugepages.
This patch simply adds decrementing code on that code path.
I reproduced this problem when testing v4.3 kernel in the following situation:
- the test machine/VM is a NUMA system,
- hugepage overcommiting is enabled,
- most of hugepages are allocated and there's only one free hugepage
which is on node 0 (for example),
- another program, which calls set_mempolicy(MPOL_BIND) to bind itself to
node 1, tries to allocate a hugepage,
- the allocation should fail but the reserve count is still hold.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mikulas Patocka [Mon, 30 Nov 2015 19:47:46 +0000 (14:47 -0500)]
parisc iommu: fix panic due to trying to allocate too large region
When using the Promise TX2+ SATA controller on PA-RISC, the system often
crashes with kernel panic, for example just writing data with the dd
utility will make it crash.
Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources
The cause of the crash is not exhaustion of the IOMMU space, there is
plenty of free pages. The function sba_alloc_range is called with size
0x11000, thus the pages_needed variable is 0x11. The function
sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
0x10 (because dma_get_seg_boundary(dev) returns 0xffff).
The function sba_search_bitmap attempts to allocate 17 pages that must not
cross 16-page boundary - it can't satisfy this requirement
(iommu_is_span_boundary always returns true) and fails even if there are
many free entries in the IOMMU space.
How did it happen that we try to allocate 17 pages that don't cross
16-page boundary? The cause is in the function iommu_coalesce_chunks. This
function tries to coalesce adjacent entries in the scatterlist. The
function does several checks if it may coalesce one entry with the next,
one of those checks is this:
if (startsg->length + dma_len > max_seg_size)
break;
When it finishes coalescing adjacent entries, it allocates the mapping:
It is possible that (startsg->length + dma_len > max_seg_size) is false
(we are just near the 0x10000 max_seg_size boundary), so the funcion
decides to coalesce this entry with the next entry. When the coalescing
succeeds, the function performs
dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
to allocate 17 pages for a device that must not cross 16-page boundary.
To fix the bug, we must make sure that dma_len after addition of
dma_offset and alignment doesn't cross the segment boundary. I.e. change
if (startsg->length + dma_len > max_seg_size)
break;
to
if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
break;
This patch makes this change (it precalculates max_seg_boundary at the
beginning of the function iommu_coalesce_chunks). I also added a check
that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
not needed for Promise TX2+ SATA, but it may be needed for other devices
that have dma_get_seg_boundary lower than dma_get_max_seg_size).
Kevin Hilman [Sat, 12 Dec 2015 00:14:34 +0000 (16:14 -0800)]
Merge tag 'imx-fixes-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes
Merge "ARM: imx: fixes for 4.4, 2nd round" from Shawn Guo:
The i.MX fixes for 4.4, 2nd round:
- Fix vf610 SAI clock configuration bug which is discovered by the newly
added master mode support in SAI audio driver.
- Fix buggy L2 cache latency values in vf610 device trees, which may
cause system hang when cpu runs at a higher frequency.
* tag 'imx-fixes-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: vf610: use reset values for L2 cache latencies
ARM: dts: vf610: fix clock definition for SAI2
ARM: imx: clk-vf610: fix SAI clock tree
Li Yang [Fri, 4 Dec 2015 22:55:04 +0000 (16:55 -0600)]
dt-bindings: define little-endian property for QorIQ GPIO
The GPIO block on different QorIQ chips could have registers in different
endianess. Define the property to specify which endian is used by the
hardware.
Signed-off-by: Liu Gang <Gang.Liu@freescale.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Kevin Hilman <khilman@linaro.org>
yangbo lu [Fri, 4 Dec 2015 22:55:03 +0000 (16:55 -0600)]
ARM64: dts: ls2080a: fix eSDHC endianness
Add the "little-endian" property to fix the issue that eSDHC
is not working and dumping out "mmc0: Controller never released
inhibit bit(s)." error messages constantly.
Fixes: 5461597f6ce0 ("dts/ls2080a: Update DTSI to add support of various peripherals") Signed-off-by: Yangbo Lu <yangbo.lu@freescale.com> Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Kevin Hilman <khilman@linaro.org>
Alan Stern [Thu, 10 Dec 2015 20:27:21 +0000 (15:27 -0500)]
USB: add quirk for devices with broken LPM
Some USB device / host controller combinations seem to have problems
with Link Power Management. For example, Steinar found that his xHCI
controller wouldn't handle bandwidth calculations correctly for two
video cards simultaneously when LPM was enabled, even though the bus
had plenty of bandwidth available.
This patch introduces a new quirk flag for devices that should remain
disabled for LPM, and creates quirk entries for Steinar's devices.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathias Nyman [Fri, 11 Dec 2015 12:38:06 +0000 (14:38 +0200)]
xhci: fix usb2 resume timing and races.
According to USB 2 specs ports need to signal resume for at least 20ms,
in practice even longer, before moving to U0 state.
Both host and devices can initiate resume.
On device initiated resume, a port status interrupt with the port in resume
state in issued. The interrupt handler tags a resume_done[port]
timestamp with current time + USB_RESUME_TIMEOUT, and kick roothub timer.
Root hub timer requests for port status, finds the port in resume state,
checks if resume_done[port] timestamp passed, and set port to U0 state.
On host initiated resume, current code sets the port to resume state,
sleep 20ms, and finally sets the port to U0 state. This should also
be changed to work in a similar way as the device initiated resume, with
timestamp tagging, but that is not yet tested and will be a separate
fix later.
There are a few issues with this approach
1. A host initiated resume will also generate a resume event. The event
handler will find the port in resume state, believe it's a device
initiated resume, and act accordingly.
2. A port status request might cut the resume signalling short if a
get_port_status request is handled during the host resume signalling.
The port will be found in resume state. The timestamp is not set leading
to time_after_eq(jiffies, timestamp) returning true, as timestamp = 0.
get_port_status will proceed with moving the port to U0.
3. If an error, or anything else happens to the port during device
initiated resume signalling it will leave all the device resume
parameters hanging uncleared, preventing further suspend, returning
-EBUSY, and cause the pm thread to busyloop trying to enter suspend.
Fix this by using the existing resuming_ports bitfield to indicate that
resume signalling timing is taken care of.
Check if the resume_done[port] is set before using it for timestamp
comparison, and also clear out any resume signalling related variables
if port is not in U0 or Resume state
This issue was discovered when a PM thread busylooped, trying to runtime
suspend the xhci USB 2 roothub on a Dell XPS
Linus Torvalds [Fri, 11 Dec 2015 19:00:30 +0000 (11:00 -0800)]
Merge tag 'dm-4.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Five stable fixes:
- Two DM btree bufio buffer leak fixes that resolve reported BUG_ONs
during DM thinp metadata close's dm_bufio_client_destroy().
- A DM thinp range discard fix to handle discarding a partially
mapped range.
- A DM thinp metadata snapshot fix to make sure the btree roots saved
in the metadata snapshot are the most current.
- A DM space map metadata refcounting fix that improves both DM thinp
and DM cache metadata"
* tag 'dm-4.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm btree: fix bufio buffer leaks in dm_btree_del() error path
dm space map metadata: fix ref counting bug when bootstrapping a new space map
dm thin metadata: fix bug when taking a metadata snapshot
dm thin metadata: fix bug in dm_thin_remove_range()
dm btree: fix leak of bufio-backed block in btree_split_sibling error path
Linus Torvalds [Fri, 11 Dec 2015 18:51:02 +0000 (10:51 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Not too much this time.
- One nouveau workaround extended to a few more GPUs
- Some amdgpu big endian fixes, and a regression fixer
- Some vmwgfx fixes
- One ttm locking fix
- One vgaarb fix"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
vgaarb: fix signal handling in vga_get()
radeon: Fix VCE IB test on Big-Endian systems
radeon: Fix VCE ring test for Big-Endian systems
radeon/cik: Fix GFX IB test on Big-Endian
drm/amdgpu: fix the lost duplicates checking
drm/nouveau/pmu: remove whitelist for PGOB-exit WAR, enable by default
drm/vmwgfx: Implement the cursor_set2 callback v2
drm/vmwgfx: fix a warning message
drm/ttm: Fixed a read/write lock imbalance