git.ipfire.org Git - thirdparty/kernel/stable.git/log

powerpc: Fix bad NULL pointer check in udbg_uart_getc_poll()

commit cd32e2dcc9de6c27ecbbfc0e2079fb64b42bad5f upstream.

We have some code in udbg_uart_getc_poll() that tries to protect
against a NULL udbg_uart_in, but gets it all wrong.

Found with the LLVM static analyzer (scan-build).

Fixes: 309257484cc1 ("powerpc: Cleanup udbg_16550 and add support for LPC PIO-only UARTs")
Signed-off-by: Anton Blanchard <anton@samba.org>
[mpe: Add some newlines for readability while we're here]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file

commit 53dc20b9a3d928b0744dad5aee65b610de1cc85d upstream.

In ocfs2_link(), the parent directory inode passed to function
ocfs2_lookup_ino_from_name() is wrong.  Parameter dir is the parent of
new_dentry not old_dentry.  We should get old_dir from old_dentry and
lookup old_dentry in old_dir in case another node remove the old dentry.

With this change, hard linking works again, when paths are relative with
at least one subdirectory.  This is how the problem was reproducable:

  # mkdir a
  # mkdir b
  # touch a/test
  # ln a/test b/test
  ln: failed to create hard link `b/test' => `a/test': No such file or  directory

However when creating links in the same dir, it worked well.

Now the link gets created.

Fixes: 0e048316ff57 ("ocfs2: check existence of old dentry in ocfs2_link()")
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reported-by: Szabo Aron - UBIT <aron@ubit.hu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Tested-by: Aron Szabo <aron@ubit.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

mtd: nand: omap: Fix NAND enumeration on 3430 LDP

commit 775a9134f4398ca98a10af8cc3cf9b664017267f upstream.

3430LDP has NAND flash with 32 bytes OOB size which is sufficient to hold
BCH8 codes but the small page check introduced in
commit b491da7233d5 ("mtd: nand: omap: clean-up ecc layout for BCH ecc schemes")
considers anything below 64 bytes unsuitable for BCH4/8/16. There is another
bug in that code where it doesn't skip the check for OMAP_ECC_HAM1_CODE_SW.

Get rid of that small page check code as it is insufficient and redundant
because we are checking for OOB available bytes vs ecc layout before calling
nand_scan_tail().

Fixes: b491da7233d5 ("mtd: nand: omap: clean-up ecc layout for BCH ecc schemes")
Reported-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ALSA: fireworks: fix an endianness bug for transaction length

commit 92cb46584e104e2f4b14a44959109ffe13524a26 upstream.

Although the 't->length' is a big-endian value, it's used without any
conversion. This means that the driver always uses 'length' parameter.

Fixes: 555e8a8f7f14("ALSA: fireworks: Add command/response functionality into hwdep interface")
Reported-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

HID: i2c-hid: Do not free buffers in i2c_hid_stop()

commit 5b44c53aeb791757072be4a267255cedfff594fd upstream.

When a hid driver that uses i2c-hid as transport is unloaded, the hid core
will call i2c_hid_stop() which releases all the buffers associated with the
device. This includes also the command buffer.

Now, when the i2c-hid driver itself is unloaded it tries to power down the
device by sending it PWR_SLEEP command. Since the command buffer is already
released we get following crash:

[   79.691459] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   79.691532] IP: [<ffffffffa05bc049>] __i2c_hid_command+0x49/0x310 [i2c_hid]
...
[   79.693467] Call Trace:
[   79.693494]  [<ffffffff810424e1>] ? __unmask_ioapic+0x21/0x30
[   79.693537]  [<ffffffff81042855>] ? unmask_ioapic+0x25/0x40
[   79.693581]  [<ffffffffa05bc35b>] ? i2c_hid_set_power+0x4b/0xa0 [i2c_hid]
[   79.693632]  [<ffffffffa05bc3cf>] ? i2c_hid_runtime_resume+0x1f/0x30 [i2c_hid]
[   79.693689]  [<ffffffff814c08fb>] ? __rpm_callback+0x2b/0x70
[   79.693733]  [<ffffffff814c0961>] ? rpm_callback+0x21/0x90
[   79.693776]  [<ffffffff814c0dec>] ? rpm_resume+0x41c/0x600
[   79.693820]  [<ffffffff814c1e1c>] ? __pm_runtime_resume+0x4c/0x80
[   79.693868]  [<ffffffff814b8588>] ? __device_release_driver+0x28/0x100
[   79.693917]  [<ffffffff814b8d90>] ? driver_detach+0xa0/0xb0
[   79.693959]  [<ffffffff814b82cc>] ? bus_remove_driver+0x4c/0xb0
[   79.694006]  [<ffffffff810d1cfd>] ? SyS_delete_module+0x11d/0x1d0
[   79.694054]  [<ffffffff8165f107>] ? int_signal+0x12/0x17
[   79.694095]  [<ffffffff8165ee69>] ? system_call_fastpath+0x12/0x17

Fix this so that we only free buffers when the i2c-hid driver itself is
removed.

Fixes: 34f439e4afcd ("HID: i2c-hid: add runtime PM support")
Reported-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ASoC: eukrea-tlv320: Fix of_node_put() call with uninitialized object

commit 077661b6ed24e530dabc9db3ab3ae48fbaf19679 upstream.

The of_node_put() call in eukrea_tlv320_probe() may take an
uninitialized pointer, as compiler spotted out:
sound/soc/fsl/eukrea-tlv320.c:221:14: warning: 'ssi_np' may be used uninitialized in this function [-Wuninitialized]

This patch adds the proper NULL initializations as a fix.
(codec_np is also NULL initialized just for consistency.)

Fixes: 66f232908de2 ('ASoC: eukrea-tlv320: Add DT support')
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

drivers/rtc/rtc-isl12057.c: fix masking of register values

commit 5945b2880363ed7648e62aabba770ec57ff2a316 upstream.

When Intersil ISL12057 support was added by commit 70e123373c05 ("rtc: Add
support for Intersil ISL12057 I2C RTC chip"), two masks for time registers
values imported from the device were either wrong or omitted, leading to
additional bits from those registers to impact read values:

- mask for hour register value when reading it in AM/PM mode. As
   AM/PM mode is not the usual mode used by the driver, this error
   would only have an impact on an externally configured RTC hour
   later read by the driver.
- mask for month value. The lack of masking would provide an
   erroneous value if century bit is set.

This patch fixes those two masks.

Fixes: 70e123373c05 ("rtc: Add support for Intersil ISL12057 I2C RTC chip")
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Peter Huewe <peter.huewe@infineon.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Grant Likely <grant.likely@linaro.org>
Acked-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

fsnotify: next_i is freed during fsnotify_unmount_inodes.

commit 6424babfd68dd8a83d9c60a5242d27038856599f upstream.

During file system stress testing on 3.10 and 3.12 based kernels, the
umount command occasionally hung in fsnotify_unmount_inodes in the
section of code:

                spin_lock(&inode->i_lock);
                if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
                        spin_unlock(&inode->i_lock);
                        continue;
                }

As this section of code holds the global inode_sb_list_lock, eventually
the system hangs trying to acquire the lock.

Multiple crash dumps showed:

The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
back at itself.  As this is not the value of list upon entry to the
function, the kernel never exits the loop.

To help narrow down problem, the call to list_del_init in
inode_sb_list_del was changed to list_del.  This poisons the pointers in
the i_sb_list and causes a kernel to panic if it transverse a freed
inode.

Subsequent stress testing paniced in fsnotify_unmount_inodes at the
bottom of the list_for_each_entry_safe loop showing next_i had become
free.

We believe the root cause of the problem is that next_i is being freed
during the window of time that the list_for_each_entry_safe loop
temporarily releases inode_sb_list_lock to call fsnotify and
fsnotify_inode_delete.

The code in fsnotify_unmount_inodes attempts to prevent the freeing of
inode and next_i by calling __iget.  However, the code doesn't do the
__iget call on next_i

if i_count == 0 or
if i_state & (I_FREEING | I_WILL_FREE)

The patch addresses this issue by advancing next_i in the above two cases
until we either find a next_i which we can __iget or we reach the end of
the list.  This makes the handling of next_i more closely match the
handling of the variable "inode."

The time to reproduce the hang is highly variable (from hours to days.) We
ran the stress test on a 3.10 kernel with the proposed patch for a week
without failure.

During list_for_each_entry_safe, next_i is becoming free causing
the loop to never terminate.  Advance next_i in those cases where
__iget is not done.

Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ken Helias <kenhelias@firemail.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

perf session: Do not fail on processing out of order event

commit f61ff6c06dc8f32c7036013ad802c899ec590607 upstream.

Linus reported perf report command being interrupted due to processing
of 'out of order' event, with following error:

  Timestamp below last timeslice flush
  0x5733a8 [0x28]: failed to process type: 3

I could reproduce the issue and in my case it was caused by one CPU
(mmap) being behind during record and userspace mmap reader seeing the
data after other CPUs data were already stored.

This is expected under some circumstances because we need to limit the
number of events that we queue for reordering when we receive a
PERF_RECORD_FINISHED_ROUND or when we force flush due to memory
pressure.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1417016371-30249-1-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
[zhangzhiqiang: backport to 3.10:
- adjust context
- commit f61ff6c06d struct events_stats was defined in tools/perf/util/event.h
   while 3.10 stable defined in tools/perf/util/hist.h.
- 3.10 stable there is no pr_oe_time() which used for debug.
- After the above adjustments, becomes same to the original patch:
   https://github.com/torvalds/linux/commit/f61ff6c06dc8f32c7036013ad802c899ec590607
]
Signed-off-by: Zhiqiang Zhang <zhangzhiqiang.zhang@huawei.com>
[ luis: backported to 3.16: used zhangzhiqiang backport to 3.10 ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

xen/arm/arm64: introduce xen_arch_need_swiotlb

commit a4dba130891271084344c12537731542ec77cb85 upstream.

Introduce an arch specific function to find out whether a particular dma
mapping operation needs to bounce on the swiotlb buffer.

On ARM and ARM64, if the page involved is a foreign page and the device
is not coherent, we need to bounce because at unmap time we cannot
execute any required cache maintenance operations (we don't know how to
find the pfn from the mfn).

No change of behaviour for x86.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[ stefano: The commit needs to be slightly modified because
  is_device_dma_coherent is not available on kernels < 3.19, so I just
  removed the call, thus assuming that the device is not coherent on arm
  (slower but safe) ]
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[ luis: backported to 3.16: used backport by stefano ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

mm: fix corner case in anon_vma endless growing prevention

commit b800c91a0517071156e772d4fb329ad33590da62 upstream.

Fix for BUG_ON(anon_vma->degree) splashes in unlink_anon_vmas() ("kernel
BUG at mm/rmap.c:399!") caused by commit 7a3ef208e662 ("mm: prevent
endless growth of anon_vma hierarchy")

Anon_vma_clone() is usually called for a copy of source vma in
destination argument.  If source vma has anon_vma it should be already
in dst->anon_vma.  NULL in dst->anon_vma is used as a sign that it's
called from anon_vma_fork().  In this case anon_vma_clone() finds
anon_vma for reusing.

Vma_adjust() calls it differently and this breaks anon_vma reusing
logic: anon_vma_clone() links vma to old anon_vma and updates degree
counters but vma_adjust() overrides vma->anon_vma right after that.  As
a result final unlink_anon_vmas() decrements degree for wrong anon_vma.

This patch assigns ->anon_vma before calling anon_vma_clone().

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-and-tested-by: Chris Clayton <chris2553@googlemail.com>
Reported-and-tested-by: Oded Gabbay <oded.gabbay@amd.com>
Reported-and-tested-by: Chih-Wei Huang <cwhuang@android-x86.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Forrest <dan.forrest@ssec.wisc.edu>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

mm: Don't count the stack guard page towards RLIMIT_STACK

commit 690eac53daff34169a4d74fc7bfbd388c4896abb upstream.

Commit fee7e49d4514 ("mm: propagate error from stack expansion even for
guard page") made sure that we return the error properly for stack
growth conditions. It also theorized that counting the guard page
towards the stack limit might break something, but also said "Let's see
if anybody notices".

Somebody did notice. Apparently android-x86 sets the stack limit very
close to the limit indeed, and including the guard page in the rlimit
check causes the android 'zygote' process problems.

So this adds the (fairly trivial) code to make the stack rlimit check be
against the actual real stack size, rather than the size of the vma that
includes the guard page.

Reported-and-tested-by: Chih-Wei Huang <cwhuang@android-x86.org>
Cc: Jay Foad <jay.foad@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

HID: roccat: potential out of bounds in pyra_sysfs_write_settings()

commit 606185b20caf4c57d7e41e5a5ea4aff460aef2ab upstream.

This is a static checker fix. We write some binary settings to the
sysfs file. One of the settings is the "->startup_profile". There
isn't any checking to make sure it fits into the
pyra->profile_settings[] array in the profile_activated() function.

I added a check to pyra_sysfs_write_settings() in both places because
I wasn't positive that the other callers were correct.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

sched/deadline: Avoid double-accounting in case of missed deadlines

commit 269ad8015a6b2bb1cf9e684da4921eb6fa0a0c88 upstream.

The dl_runtime_exceeded() function is supposed to ckeck if
a SCHED_DEADLINE task must be throttled, by checking if its
current runtime is <= 0. However, it also checks if the
scheduling deadline has been missed (the current time is
larger than the current scheduling deadline), further
decreasing the runtime if this happens.
This "double accounting" is wrong:

- In case of partitioned scheduling (or single CPU), this
  happens if task_tick_dl() has been called later than expected
  (due to small HZ values). In this case, the current runtime is
  also negative, and replenish_dl_entity() can take care of the
  deadline miss by recharging the current runtime to a value smaller
  than dl_runtime

- In case of global scheduling on multiple CPUs, scheduling
  deadlines can be missed even if the task did not consume more
  runtime than expected, hence penalizing the task is wrong

This patch fix this problem by throttling a SCHED_DEADLINE task
only when its runtime becomes negative, and not modifying the runtime

Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418813432-20797-3-git-send-email-luca.abeni@unitn.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

sched/deadline: Fix migration of SCHED_DEADLINE tasks

commit 6a503c3be937d275113b702e0421e5b0720abe8a upstream.

According to global EDF, tasks should be migrated between runqueues
without checking if their scheduling deadlines and runtimes are valid.
However, SCHED_DEADLINE currently performs such a check:
a migration happens doing:

deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, later_rq->cpu);
activate_task(later_rq, next_task, 0);

which ends up calling dequeue_task_dl(), setting the new CPU, and then
calling enqueue_task_dl().

enqueue_task_dl() then calls enqueue_dl_entity(), which calls
update_dl_entity(), which can modify scheduling deadline and runtime,
breaking global EDF scheduling.

As a result, some of the properties of global EDF are not respected:
for example, a taskset {(30, 80), (40, 80), (120, 170)} scheduled on
two cores can have unbounded response times for the third task even
if 30/80+40/80+120/170 = 1.5809 < 2

This can be fixed by invoking update_dl_entity() only in case of
wakeup, or if this is a new SCHED_DEADLINE task.

Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418813432-20797-2-git-send-email-luca.abeni@unitn.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed

commit 9e5e3661727eaf960d3480213f8e87c8d67b6956 upstream.

Charles Shirron and Paul Cassella from Cray Inc have reported kswapd
stuck in a busy loop with nothing left to balance, but
kswapd_try_to_sleep() failing to sleep.  Their analysis found the cause
to be a combination of several factors:

1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait

2. The process has been killed (by OOM in this case), but has not yet been
   scheduled to remove itself from the waitqueue and die.

3. kswapd checks for throttled processes in prepare_kswapd_sleep():

        if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
                wake_up(&pgdat->pfmemalloc_wait);
return false; // kswapd will not go to sleep
}

   However, for a process that was already killed, wake_up() does not remove
   the process from the waitqueue, since try_to_wake_up() checks its state
   first and returns false when the process is no longer waiting.

4. kswapd is running on the same CPU as the only CPU that the process is
   allowed to run on (through cpus_allowed, or possibly single-cpu system).

5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd
   encounters no voluntary preemption points and repeatedly fails
   prepare_kswapd_sleep(), blocking the process from running and removing
   itself from the waitqueue, which would let kswapd sleep.

So, the source of the problem is that we prevent kswapd from going to
sleep until there are processes waiting on the pfmemalloc_wait queue,
and a process waiting on a queue is guaranteed to be removed from the
queue only when it gets scheduled.  This was done to make sure that no
process is left sleeping on pfmemalloc_wait when kswapd itself goes to
sleep.

However, it isn't necessary to postpone kswapd sleep until the
pfmemalloc_wait queue actually empties.  To prevent processes from being
left sleeping, it's actually enough to guarantee that all processes
waiting on pfmemalloc_wait queue have been woken up by the time we put
kswapd to sleep.

This patch therefore fixes this issue by substituting 'wake_up' with
'wake_up_all' and removing 'return false' in the code snippet from
prepare_kswapd_sleep() above.  Note that if any process puts itself in
the queue after this waitqueue_active() check, or after the wake up
itself, it means that the process will also wake up kswapd - and since
we are under prepare_to_wait(), the wake up won't be missed.  Also we
update the comment prepare_kswapd_sleep() to hopefully more clearly
describe the races it is preventing.

Fixes: 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

mm: protect set_page_dirty() from ongoing truncation

commit 2d6d7f98284648c5ed113fe22a132148950b140f upstream.

Tejun, while reviewing the code, spotted the following race condition
between the dirtying and truncation of a page:

__set_page_dirty_nobuffers()       __delete_from_page_cache()
  if (TestSetPageDirty(page))
                                     page->mapping = NULL
     if (PageDirty())
       dec_zone_page_state(page, NR_FILE_DIRTY);
       dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
    if (page->mapping)
      account_page_dirtied(page)
        __inc_zone_page_state(page, NR_FILE_DIRTY);
__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);

which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.

Dirtiers usually lock out truncation, either by holding the page lock
directly, or in case of zap_pte_range(), by pinning the mapcount with
the page table lock held.  The notable exception to this rule, though,
is do_wp_page(), for which this race exists.  However, do_wp_page()
already waits for a locked page to unlock before setting the dirty bit,
in order to prevent a race where clear_page_dirty() misses the page bit
in the presence of dirty ptes.  Upgrade that wait to a fully locked
set_page_dirty() to also cover the situation explained above.

Afterwards, the code in set_page_dirty() dealing with a truncation race
is no longer needed.  Remove it.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

mm: prevent endless growth of anon_vma hierarchy

commit 7a3ef208e662f4b63d43a23f61a64a129c525bbc upstream.

Constantly forking task causes unlimited grow of anon_vma chain.  Each
next child allocates new level of anon_vmas and links vma to all
previous levels because pages might be inherited from any level.

This patch adds heuristic which decides to reuse existing anon_vma
instead of forking new one.  It adds counter anon_vma->degree which
counts linked vmas and directly descending anon_vmas and reuses anon_vma
if counter is lower than two.  As a result each anon_vma has either vma
or at least two descending anon_vmas.  In such trees half of nodes are
leafs with alive vmas, thus count of anon_vmas is no more than two times
bigger than count of vmas.

This heuristic reuses anon_vmas as few as possible because each reuse
adds false aliasing among vmas and rmap walker ought to scan more ptes
when it searches where page is might be mapped.

Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu
Fixes: 5beb49305251 ("mm: change anon_vma linking to fix multi-process server scalability issue")
[akpm@linux-foundation.org: fix typo, per Rik]
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Daniel Forrest <dan.forrest@ssec.wisc.edu>
Tested-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Jerome Marchand <jmarchan@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

exit: fix race between wait_consider_task() and wait_task_zombie()

commit 3245d6acab981a2388ffb877c7ecc97e763c59d4 upstream.

wait_consider_task() checks EXIT_ZOMBIE after EXIT_DEAD/EXIT_TRACE and
both checks can fail if we race with EXIT_ZOMBIE -> EXIT_DEAD/EXIT_TRACE
change in between, gcc needs to reload p->exit_state after
security_task_wait(). In this case ->notask_error will be wrongly
cleared and do_wait() can hang forever if it was the last eligible
child.

Many thanks to Arne who carefully investigated the problem.

Note: this bug is very old but it was pure theoretical until commit
b3ab03160dfa ("wait: completely ignore the EXIT_DEAD tasks"). Before
this commit "-O2" was probably enough to guarantee that compiler won't
read ->exit_state twice.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Arne Goedeke <el@laramies.com>
Tested-by: Arne Goedeke <el@laramies.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

arm64/efi: add missing call to early_ioremap_reset()

commit 0e63ea48b4d8035dd0e91a3fa6fb79458b47adfb upstream.

The early ioremap support introduced by patch bf4b558eba92
("arm64: add early_ioremap support") failed to add a call to
early_ioremap_reset() at an appropriate time. Without this call,
invocations of early_ioremap etc. that are done too late will go
unnoticed and may cause corruption.

This is exactly what happened when the first user of this feature
was added in patch f84d02755f5a ("arm64: add EFI runtime services").
The early mapping of the EFI memory map is unmapped during an early
initcall, at which time the early ioremap support is long gone.

Fix by adding the missing call to early_ioremap_reset() to
setup_arch(), and move the offending early_memunmap() to right after
the point where the early mapping of the EFI memory map is last used.

Fixes: f84d02755f5a ("arm64: add EFI runtime services")
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

rpc: fix xdr_truncate_encode to handle buffer ending on page boundary

commit 49a068f82a1d30eb585d7804b05948376be6cf9a upstream.

A struct xdr_stream at a page boundary might point to the end of one
page or the beginning of the next, but xdr_truncate_encode isn't
prepared to handle the former.

This can cause corruption of NFSv4 READDIR replies in the case that a
readdir entry that would have exceeded the client's dircount/maxcount
limit would have ended exactly on a 4k page boundary. You're more
likely to hit this case on large directories.

Other xdr_truncate_encode callers are probably also affected.

Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Fixes: 3e19ce762b53 "rpc: xdr_truncate_encode"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

vfio-pci: Fix the check on pci device type in vfio_pci_probe()

commit 7c2e211f3c95b91912a92a8c6736343690042e2e upstream.

Current vfio-pci just supports normal pci device, so vfio_pci_probe() will
return if the pci device is not a normal device. While current code makes a
mistake. PCI_HEADER_TYPE is the offset in configuration space of the device
type, but we use this value to mask the type value.

This patch fixs this by do the check directly on the pci_dev->hdr_type.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ALSA: hda - Add new GPU codec ID 0x10de0072 to snd-hda

commit 60834b73a9c2bbc2f514122ddc626f3350fb40cd upstream.

Vendor ID 0x10de0072 is used by a yet-to-be-named GPU chip.

Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

mm: propagate error from stack expansion even for guard page

commit fee7e49d45149fba60156f5b59014f764d3e3728 upstream.

Jay Foad reports that the address sanitizer test (asan) sometimes gets
confused by a stack pointer that ends up being outside the stack vma
that is reported by /proc/maps.

This happens due to an interaction between RLIMIT_STACK and the guard
page: when we do the guard page check, we ignore the potential error
from the stack expansion, which effectively results in a missing guard
page, since the expected stack expansion won't have been done.

And since /proc/maps explicitly ignores the guard page (commit
d7824370e263: "mm: fix up some user-visible effects of the stack guard
page"), the stack pointer ends up being outside the reported stack area.

This is the minimal patch: it just propagates the error. It also
effectively makes the guard page part of the stack limit, which in turn
measn that the actual real stack is one page less than the stack limit.

Let's see if anybody notices. We could teach acct_stack_growth() to
allow an extra page for a grow-up/grow-down stack in the rlimit test,
but I don't want to add more complexity if it isn't needed.

Reported-and-tested-by: Jay Foad <jay.foad@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ACPI / PM: Fix PM initialization for devices that are not present

commit 1b1f3e1699a9886f1070f94171097ab4ccdbfc95 upstream.

If an ACPI device object whose _STA returns 0 (not present and not
functional) has _PR0 or _PS0, its power_manageable flag will be set
and acpi_bus_init_power() will return 0 for it. Consequently, if
such a device object is passed to the ACPI device PM functions, they
will attempt to carry out the requested operation on the device,
although they should not do that for devices that are not present.

To fix that problem make acpi_bus_init_power() return an error code
for devices that are not present which will cause power_manageable to
be cleared for them as appropriate in acpi_bus_get_power_flags().
However, the lists of power resources should not be freed for the
device in that case, so modify acpi_bus_get_power_flags() to keep
those lists even if acpi_bus_init_power() returns an error.
Accordingly, when deciding whether or not the lists of power
resources need to be freed, acpi_free_power_resources_lists()
should check the power.flags.power_resources flag instead of
flags.power_manageable, so make that change too.

Furthermore, if acpi_bus_attach() sees that flags.initialized is
unset for the given device, it should reset the power management
settings of the device and re-initialize them from scratch instead
of relying on the previous settings (the device may have appeared
after being not present previously, for example), so make it use
the 'valid' flag of the D0 power state as the initial value of
flags.power_manageable for it and call acpi_bus_init_power() to
discover its current power state.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

drm/radeon: adjust default bapm settings for KV

commit 02ae7af53a451a1b0a51022c4693f5b339133e79 upstream.

Enabling bapm seems to cause clocking problems on some
KV configurations. Disable it by default for now.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

drm/radeon: properly filter DP1.2 4k modes on non-DP1.2 hw

commit 410cce2a6b82299b46ff316c6384e789ce275ecb upstream.

The check was already in place in the dp mode_valid check, but
radeon_dp_get_dp_link_clock() never returned the high clock
mode_valid was checking for because that function clipped the
clock based on the hw capabilities. Add an explicit check
in the mode_valid function.

bug:
https://bugs.freedesktop.org/show_bug.cgi?id=87172

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

drm/radeon: fix sad_count check for dce3

commit 5665c3ebe5ee8a2c516925461f7214ba59c2e6d7 upstream.

Make it consistent with the sad code for other asics to deal
with monitors that don't report sads.

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=89461

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

drm/radeon: KV has three PPLLs (v2)

commit fbedf1c3fc3a1e9f249c2efe2f4553d8df9d86d3 upstream.

Enable all three in the driver. Early documentation
indicated the 3rd one was used for something else, but
that is not the case.

v2: handle disable as well

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ALSA: hda - Fix wrong gpio_dir & gpio_mask hint setups for IDT/STAC codecs

commit c507de88f6a336bd7296c9ec0073b2d4af8b4f5e upstream.

stac_store_hints() does utterly wrong for masking the values for
gpio_dir and gpio_data, likely due to copy&paste errors. Fortunately,
this feature is used very rarely, so the impact must be really small.

Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

net: ethernet: cpsw: fix hangs with interrupts

commit 7ce67a38f799d1fb332f672b117efbadedaa5352 upstream.

The CPSW IP implements pulse-signaled interrupts. Due to
that we must write a correct, pre-defined value to the
CPDMA_MACEOIVECTOR register so the controller generates
a pulse on the correct IRQ line to signal the End Of
Interrupt.

The way the driver is written today, all four IRQ lines
are requested using the same IRQ handler and, because of
that, we could fall into situations where a TX IRQ fires
but we tell the controller that we ended an RX IRQ (or
vice-versa). This situation triggers an IRQ storm on the
reserved IRQ 127 of INTC which will in turn call ack_bad_irq()
which will, then, print a ton of:

unexpected IRQ trap at vector 00

In order to fix the problem, we are moving all calls to
cpdma_ctlr_eoi() inside the IRQ handler and making sure
we *always* write the correct value to the CPDMA_MACEOIVECTOR
register. Note that the algorithm assumes that IRQ numbers and
value-to-be-written-to-EOI are proportional, meaning that a
write of value 0 would trigger an EOI pulse for the RX_THRESHOLD
Interrupt and that's the IRQ number sitting in the 0-th index
of our irqs_table array.

This, however, is safe at least for current implementations of
CPSW so we will refrain from making the check smarter (and, as
a side-effect, slower) until we actually have a platform where
IRQ lines are swapped.

This patch has been tested for several days with AM335x- and
AM437x-based platforms. AM57x was left out because there are
still pending patches to enable ethernet in mainline for that
platform. A read of the TRM confirms the statement on previous
paragraph.

Reported-by: Yegor Yefremov <yegorslists@googlemail.com>
Fixes: 510a1e7 (drivers: net: davinci_cpdma: acknowledge interrupt properly)
Signed-off-by: Felipe Balbi <balbi@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Btrfs: don't delay inode ref updates during log replay

commit 6f8960541b1eb6054a642da48daae2320fddba93 upstream.

Commit 1d52c78afbb (Btrfs: try not to ENOSPC on log replay) added a
check to skip delayed inode updates during log replay because it
confuses the enospc code. But the delayed processing will end up
ignoring delayed refs from log replay because the inode itself wasn't
put through the delayed code.

This can end up triggering a warning at commit time:

WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()

Which is repeated for each commit because we never process the delayed
inode ref update.

The fix used here is to change btrfs_delayed_delete_inode_ref to return
an error if we're currently in log replay. The caller will do the ref
deletion immediately and everything will work properly.

Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

SCSI: fix regression in scsi_send_eh_cmnd()

commit 511833acfc06c013d453e288f483c682c60ffbff upstream.

Commit ac61d1955934 (scsi: set correct completion code in
scsi_send_eh_cmnd()) introduced a bug.  It changed the stored return
value from a queuecommand call, but it didn't take into account that
the return value was used again later on.  This patch fixes the bug by
changing the later usage.

There is a big comment in the middle of scsi_send_eh_cmnd() which
does a good job of explaining how the routine works.  But it mentions
a "rtn = FAILURE" value that doesn't exist in the code.  This patch
adjusts the code to match the comment (I assume the comment is right
and the code is wrong).

This fixes Bugzilla #88341.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Андрей Аладьев <aladjev.andrew@gmail.com>
Tested-by: Андрей Аладьев <aladjev.andrew@gmail.com>
Fixes: ac61d19559349e205dad7b5122b281419aa74a82
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Add USB_EHCI_EXYNOS to multi_v7_defconfig

commit 007487f1fd43d84f26cda926081ca219a24ecbc4 upstream.

Currently we enable Exynos devices in the multi v7 defconfig, however, when
testing on my ODROID-U3, I noticed that USB was not working. Enabling this
option causes USB to work, which enables networking support as well since the
ODROID-U3 has networking on the USB bus.

[arnd] Support for odroid-u3 was added in 3.10, so it would be nice to
backport this fix at least that far.

Signed-off-by: Steev Klimaszewski <steev@gentoo.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

video/fbdev: fix defio's fsync

commit 30ea9c5218651bc11cbdba7820be78f04e2d83bc upstream.

fb_deferred_io_fsync() returns the value of schedule_delayed_work() as
an error code, but schedule_delayed_work() does not return an error. It
returns true/false depending on whether the work was already queued.

Fix this by ignoring the return value of schedule_delayed_work().

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

video/logo: prevent use of logos after they have been freed

commit 92b004d1aa9f367c372511ca0330f58216b25703 upstream.

If the probe of an fb driver has been deferred due to missing
dependencies, and the probe is later ran when a module is loaded, the
fbdev framework will try to find a logo to use.

However, the logos are __initdata, and have already been freed. This
causes sometimes page faults, if the logo memory is not mapped,
sometimes other random crashes as the logo data is invalid, and
sometimes nothing, if the fbdev decides to reject the logo (e.g. the
random value depicting the logo's height is too big).

This patch adds a late_initcall function to mark the logos as freed. In
reality the logos are freed later, and fbdev probe may be ran between
this late_initcall and the freeing of the logos. In that case we will
miss drawing the logo, even if it would be possible.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

x86, vdso: Use asm volatile in __getcpu

commit 1ddf0b1b11aa8a90cef6706e935fc31c75c406ba upstream.

In Linux 3.18 and below, GCC hoists the lsl instructions in the
pvclock code all the way to the beginning of __vdso_clock_gettime,
slowing the non-paravirt case significantly.  For unknown reasons,
presumably related to the removal of a branch, the performance issue
is gone as of

e76b027e6408 x86,vdso: Use LSL unconditionally for vgetcpu

but I don't trust GCC enough to expect the problem to stay fixed.

There should be no correctness issue, because the __getcpu calls in
__vdso_vlock_gettime were never necessary in the first place.

Note to stable maintainers: In 3.18 and below, depending on
configuration, gcc 4.9.2 generates code like this:

     9c3:       44 0f 03 e8             lsl    %ax,%r13d
     9c7:       45 89 eb                mov    %r13d,%r11d
     9ca:       0f 03 d8                lsl    %ax,%ebx

This patch won't apply as is to any released kernel, but I'll send a
trivial backported version if needed.

[
Backported by Andy Lutomirski.  Should apply to all affected
versions.  This fixes a functionality bug as well as a performance
bug: buggy kernels can infinite loop in __vdso_clock_gettime on
affected compilers.  See, for exammple:

https://bugzilla.redhat.com/show_bug.cgi?id=1178975
]

Fixes: 51c19b4f5927 x86: vdso: pvclock gettime support
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[ luis: backported to 3.16: used Andy's backport for stable kernels ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap

commit 3475c3d034d7f276a474c8bd53f44b48c8bf669d upstream.

Flush the FIFOs when the stream is prepared for use. This avoids
an inadvertent swapping of the left/right channels if the FIFOs are
not empty at startup.

Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

drm/nv4c/mc: disable msi

commit 4761703bd04bbdf56396d264903cc5a1fdcb3c01 upstream.

Several users have, over time, reported issues with MSI on these IGPs.
They're old, rarely available, and MSI doesn't provide such huge
advantages on them. Just disable.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87361
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74492
Fixes: fa8c9ac72fe ("drm/nv4c/mc: nv4x igp's have a different msi rearm register")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

x86_64, vdso: Fix the vdso address randomization algorithm

commit 394f56fe480140877304d342dec46d50dc823d46 upstream.

The theory behind vdso randomization is that it's mapped at a random
offset above the top of the stack.  To avoid wasting a page of
memory for an extra page table, the vdso isn't supposed to extend
past the lowest PMD into which it can fit.  Other than that, the
address should be a uniformly distributed address that meets all of
the alignment requirements.

The current algorithm is buggy: the vdso has about a 50% probability
of being at the very end of a PMD.  The current algorithm also has a
decent chance of failing outright due to incorrect handling of the
case where the top of the stack is near the top of its PMD.

This fixes the implementation.  The paxtest estimate of vdso
"randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
don't know what the paxtest code is actually calculating.)

It's worth noting that this algorithm is inherently biased: the vdso
is more likely to end up near the end of its PMD than near the
beginning.  Ideally we would either nix the PMD sharing requirement
or jointly randomize the vdso and the stack to reduce the bias.

In the mean time, this is a considerable improvement with basically
no risk of compatibility issues, since the allowed outputs of the
algorithm are unchanged.

As an easy test, doing this:

for i in `seq 10000`
  do grep -P vdso /proc/self/maps |cut -d- -f1
done |sort |uniq -d

used to produce lots of output (1445 lines on my most recent run).
A tiny subset looks like this:

7fffdfffe000
7fffe01fe000
7fffe05fe000
7fffe07fe000
7fffe09fe000
7fffe0bfe000
7fffe0dfe000

Note the suspicious fe000 endings.  With the fix, I get a much more
palatable 76 repeated addresses.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

drm/i915: Don't call intel_prepare_page_flip() multiple times on gen2-4

commit 7d47559ee84b3ac206aa9e675606fafcd7c0b500 upstream.

The flip stall detector kicks in when pending>=INTEL_FLIP_COMPLETE. That
means if we first call intel_prepare_page_flip() but don't call
intel_finish_page_flip(), the next stall check will erroneosly think
the page flip was somehow stuck.

With enough debug spew emitted from the interrupt handler my 830 hangs
when this happens. My theory is that the previous vblank interrupt gets
sufficiently delayed that the handler will see the pending bit set in
IIR, but ISR still has the bit set as well (ie. the flip was processed
by CS but didn't complete yet). In this case the handler will proceed
to call intel_check_page_flip() immediately after
intel_prepare_page_flip(). It then tries to print a backtrace for the
stuck flip WARN, which apparetly results in way too much debug spew
delaying interrupt processing further. That then seems to cause an
endless loop in the interrupt handler, and the machine is dead until
the watchdog kicks in and reboots. At least limiting the number of
iterations of the loop in the interrupt handler also prevented the
hang.

So it seems better to not call intel_prepare_page_flip() without
immediately calling intel_finish_page_flip(). The IIR/ISR trickery
avoids races here so this is a perfectly safe thing to do.

v2: Fix typo in commit message (checkpatch)

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88381
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85888
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

spi: sh-msiof: Add runtime PM lock in initializing

commit 015760563ec77bf17cec712fa94afdf53b285287 upstream.

SH-MSIOF driver is enabled autosuspend API of spi framework.
But autosuspend framework doesn't work during initializing.
So runtime PM lock is added in SH-MSIOF driver initializing.

Fixes: e2a0ba547ba31c (spi: sh-msiof: Convert to spi core auto_runtime_pm framework)
Signed-off-by: Hisashi Nakamura <hisashi.nakamura.ak@renesas.com>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Linux 3.16.7-ckt4

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

KEYS: close race between key lookup and freeing

commit a3a8784454692dd72e5d5d34dcdab17b4420e74c upstream.

When a key is being garbage collected, it's key->user would get put before
the ->destroy() callback is called, where the key is removed from it's
respective tracking structures.

This leaves a key hanging in a semi-invalid state which leaves a window open
for a different task to try an access key->user. An example is
find_keyring_by_name() which would dereference key->user for a key that is
in the process of being garbage collected (where key->user was freed but
->destroy() wasn't called yet - so it's still present in the linked list).

This would cause either a panic, or corrupt memory.

Fixes CVE-2014-9529.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Moritz Muehlenhoff <jmm@inutil.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

deal with deadlock in d_walk()

commit ca5358ef75fc69fee5322a38a340f5739d997c10 upstream.

... by not hitting rename_retry for reasons other than rename having
happened. In other words, do _not_ restart when finding that
between unlocking the child and locking the parent the former got
into __dentry_kill(). Skip the killed siblings instead...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Moritz Muehlenhoff <jmm@inutil.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

move d_rcu from overlapping d_child to overlapping d_alias

commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.16:
- Apply name changes in all the different places we use d_alias and d_child
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Moritz Muehlenhoff <jmm@inutil.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

batman-adv: Calculate extra tail size based on queued fragments

commit 5b6698b0e4a37053de35cc24ee695b98a7eb712b upstream.

The fragmentation code was replaced in 610bfc6bc99bc83680d190ebc69359a05fc7f605
("batman-adv: Receive fragmented packets and merge"). The new code provided a
mostly unused parameter skb for the merging function. It is used inside the
function to calculate the additionally needed skb tailroom. But instead of
increasing its own tailroom, it is only increasing the tailroom of the first
queued skb. This is not correct in some situations because the first queued
entry can be a different one than the parameter.

An observed problem was:

1. packet with size 104, total_size 1464, fragno 1 was received
   - packet is queued
2. packet with size 1400, total_size 1464, fragno 0 was received
   - packet is queued at the end of the list
3. enough data was received and can be given to the merge function
   (1464 == (1400 - 20) + (104 - 20))
   - merge functions gets 1400 byte large packet as skb argument
4. merge function gets first entry in queue (104 byte)
   - stored as skb_out
5. merge function calculates the required extra tail as total_size - skb->len
   - pskb_expand_head tail of skb_out with 64 bytes
6. merge function tries to squeeze the extra 1380 bytes from the second queued
   skb (1400 byte aka skb parameter) in the 64 extra tail bytes of skb_out

Instead calculate the extra required tail bytes for skb_out also using skb_out
instead of using the parameter skb. The skb parameter is only used to get the
total_size from the last received packet. This is also the total_size used to
decide that all fragments were received.

Reported-by: Philipp Psurek <philipp.psurek@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Moritz Muehlenhoff <jmm@inutil.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

platform/chrome: Add support for the acer c720p touchscreen.

commit b90b3c4ae06af135e279c9a5aa1c640d22787fc4 upstream.

Add support for the acer c720p touchscreen.
Tested manually by using the touchscreen on the acer c720p-2664

Based on the following patch by Dave Parker <dparker@chromium.org>:
https://chromium-review.googlesource.com/#/c/167136/

Signed-off-by: Michael Mullin <masmullin@gmail.com>
Reviewed-by: Benson Leung <bleung@chromium.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Scot Doyle <lkml14@scotdoyle.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

platform/chrome: chromeos_laptop - Add Toshiba CB35 Touch

commit 963cb6fa0f5f115986e970b9d97440e4906524fa upstream.

Add support for Leon touch devices, which is the same as
falco/peppy/wolf on the same buses using the LynxPoint-LP I2C
via the i2c-designware-pci driver.

Based on these patches from the chromeos-3.8 kernel:
https://chromium-review.googlesource.com/168351
https://chromium-review.googlesource.com/173445

Signed-off-by: Gene Chen <gene.chen@intel.com>
Signed-off-by: Benson Leung <bleung@chromium.org>
Tested-by: Scot Doyle <lkml14@scotdoyle.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Scot Doyle <lkml14@scotdoyle.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

platform/chrome: chromeos_laptop - Add Dell Chromebook 11 touch

commit 0e1e5e590a457063c94d55c219b349bcf0d1f93a upstream.

Add support for Dell Chromebook 11's touch device, which is the same
as falco/peppy on the same bus using the LynxPoint-LP I2C via the
i2c-designware-pci driver.

Based on these patches from the chromeos-3.8 kernel:
https://chromium-review.googlesource.com/#/c/65320/
https://chromium-review.googlesource.com/#/c/174664/

Signed-off-by: Mohammed Habibulla <moch@chromium.org>
Signed-off-by: Benson Leung <bleung@chromium.org>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Scot Doyle <lkml14@scotdoyle.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

platform/chrome: chromeos_laptop - Add HP Chromebook 14

commit 5ea9567f6126846f5dcfa8515d7ef2c238133c0d upstream.

Add support for the trackpad on HP Chromebook 14.

Signed-off-by: Benson Leung <bleung@chromium.org>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Scot Doyle <lkml14@scotdoyle.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

platform/chrome: chromeos_laptop - Add support for Acer C720

commit da3b0ab75aadab63d1ffd5563100c9386e444dad upstream.

Acer C720 has touchpad and light sensor connected to a separate I2C buses.
Since the designware I2C host controller driver has two instances on this
particular machine we need a way to match the correct instance. Add support
for this and then register both C720 touchpad and light sensor.

This code is based on following patch from Benson Leung:

https://patchwork.kernel.org/patch/3074411/

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Benson Leung <bleung@chromium.org>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Scot Doyle <lkml14@scotdoyle.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

udf: Check component length before reading it

commit e237ec37ec154564f8690c5bd1795339955eeef9 upstream.

Check that length specified in a component of a symlink fits in the
input buffer we are reading. Also properly ignore component length for
component types that do not use it. Otherwise we read memory after end
of buffer for corrupted udf image.

Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

udf: Verify symlink size before loading it

commit a1d47b262952a45aae62bd49cfaf33dd76c11a2c upstream.

UDF specification allows arbitrarily large symlinks. However we support
only symlinks at most one block large. Check the length of the symlink
so that we don't access memory beyond end of the symlink block.

Reported-by: Carl Henrik Lunde <chlunde@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

udf: Verify i_size when loading inode

commit e159332b9af4b04d882dbcfe1bb0117f0a6d4b58 upstream.

Verify that inode size is sane when loading inode with data stored in
ICB. Otherwise we may get confused later when working with the inode and
inode size is too big.

Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
[ luis: backported to 3.16:
- Adjusted exit paths as commit 6d3d5e860a11 ("udf: Make udf_read_inode()
and udf_iget() return error") is not present in 3.16 kernel ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

udf: Check path length when reading symlink

commit 0e5cc9a40ada6046e6bc3bdfcd0c0d7e4b706b14 upstream.

Symlink reading code does not check whether the resulting path fits into
the page provided by the generic code. This isn't as easy as just
checking the symlink size because of various encoding conversions we
perform on path. So we have to check whether there is still enough space
in the buffer on the fly.

Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

crypto: af_alg - fix backlog handling

commit 7e77bdebff5cb1e9876c561f69710b9ab8fa1f7e upstream.

If a request is backlogged, it's complete() handler will get called
twice: once with -EINPROGRESS, and once with the final error code.

af_alg's complete handler, unlike other users, does not handle the
-EINPROGRESS but instead always completes the completion that recvmsg()
is waiting on.  This can lead to a return to user space while the
request is still pending in the driver.  If userspace closes the sockets
before the requests are handled by the driver, this will lead to
use-after-frees (and potential crashes) in the kernel due to the tfm
having been freed.

The crashes can be easily reproduced (for example) by reducing the max
queue length in cryptod.c and running the following (from
http://www.chronox.de/libkcapi.html) on AES-NI capable hardware:

$ while true; do kcapi -x 1 -e -c '__ecb-aes-aesni' \
    -k 00000000000000000000000000000000 \
    -p 00000000000000000000000000000000 >/dev/null & done

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

isofs: Fix unchecked printing of ER records

commit 4e2024624e678f0ebb916e6192bd23c1f9fdf696 upstream.

We didn't check length of rock ridge ER records before printing them.
Thus corrupted isofs image can cause us to access and print some memory
behind the buffer with obvious consequences.

Reported-and-tested-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

iwlwifi: make U-APSD default configurable at compile time

commit d6ca18de129fc4dd4374389561930b32820f35ff upstream.

With a significant number of deployed APs, enabling uAPSD leads to the
AP never using aggregation sessions (likely due to the complexities
involved in handling uAPSD in those.) This obviously results in a large
drop in throughput with such APs.

On the other hand, uAPSD can result in some power consumption benefits,
but for now just disable it to get performance with affected APs back
up.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

powercap / RAPL: add support for CPU model 0x3f

commit 64c7569c065564a066bb44161f904b4afc9f3e3a upstream.

I've confirmed that monitoring the package power usage as well as setting power
limits appear to be working as expected. Supports the package and dram domains.

Tested aginst cpu:

Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz

Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Btrfs: fix loop writing of async reclaim

commit 25ce459c1af138f95a3fd318461193397ebb825b upstream.

One of my tests shows that when we really don't have space to reclaim via
flush_space and also run out of space, this async reclaim work loops on adding
itself into the workqueue and keeps writing something to disk according to
iostat's results, and these writes mainly comes from commit_transaction which
writes super_block. This's unacceptable as it can be bad to disks, especially
memeory storages.

This adds a check to avoid the above situation.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Cc: Alexander E. Patrakov <patrakov@gmail.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

kvm: x86: drop severity of "generation wraparound" message

commit a629df7eadffb03e6ce4a8616e62ea29fdf69b6b upstream.

Since most virtual machines raise this message once, it is a bit annoying.
Make it KERN_DEBUG severity.

Fixes: 7a2e8aaf0f6873b47bc2347f216ea5b0e4c258ab
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

parisc: fix out-of-register compiler error in ldcw inline assembler function

commit 45db07382a5c78b0c43b3b0002b63757fb60e873 upstream.

The __ldcw macro has a problem when its argument needs to be reloaded from
memory. The output memory operand and the input register operand both need to
be reloaded using a register in class R1_REGS when generating 64-bit code.
This fails because there's only a single register in the class. Instead, use a
memory clobber. This also makes the __ldcw macro a compiler memory barrier.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

audit: restore AUDIT_LOGINUID unset ABI

commit 041d7b98ffe59c59fdd639931dea7d74f9aa9a59 upstream.

A regression was caused by commit 780a7654cee8:
audit: Make testing for a valid loginuid explicit.
(which in turn attempted to fix a regression caused by e1760bd)

When audit_krule_to_data() fills in the rules to get a listing, there was a
missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID.

This broke userspace by not returning the same information that was sent and
expected.

The rule:
auditctl -a exit,never -F auid=-1
gives:
auditctl -l
LIST_RULES: exit,never f24=0 syscall=all
when it should give:
LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all

Tag it so that it is reported the same way it was set. Create a new
private flags audit_krule field (pflags) to store it that won't interact with
the public one from the API.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

arm64: kernel: fix __cpu_suspend mm switch on warm-boot

commit f43c27188a49111b58e9611afa2f0365b0b55625 upstream.

On arm64 the TTBR0_EL1 register is set to either the reserved TTBR0
page tables on boot or to the active_mm mappings belonging to user space
processes, it must never be set to swapper_pg_dir page tables mappings.

When a CPU is booted its active_mm is set to init_mm even though its
TTBR0_EL1 points at the reserved TTBR0 page mappings. This implies
that when __cpu_suspend is triggered the active_mm can point at
init_mm even if the current TTBR0_EL1 register contains the reserved
TTBR0_EL1 mappings.

Therefore, the mm save and restore executed in __cpu_suspend might
turn out to be erroneous in that, if the current->active_mm corresponds
to init_mm, on resume from low power it ends up restoring in the
TTBR0_EL1 the init_mm mappings that are global and can cause speculation
of TLB entries which end up being propagated to user space.

This patch fixes the issue by checking the active_mm pointer before
restoring the TTBR0 mappings. If the current active_mm == &init_mm,
the code sets the TTBR0_EL1 to the reserved TTBR0 mapping instead of
switching back to the active_mm, which is the expected behaviour
corresponding to the TTBR0_EL1 settings when __cpu_suspend was entered.

Fixes: 95322526ef62 ("arm64: kernel: cpu_{suspend/resume} implementation")
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

arm64: Move cpu_resume into the text section

commit c3684fbb446501b48dec6677a6a9f61c215053de upstream.

The function cpu_resume currently lives in the .data section.
There's no reason for it to be there since we can use relative
instructions without a problem. Move a few cpu_resume data
structures out of the assembly file so the .data annotation
can be dropped completely and cpu_resume ends up in the read
only text section.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[ luis: 3.16-stable prereq for
f43c27188a49 "arm64: kernel: fix __cpu_suspend mm switch on warm-boot" ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

arm64: kernel: refactor the CPU suspend API for retention states

commit 714f59925595b9c2ea9c22b107b340d38e3b3bc9 upstream.

CPU suspend is the standard kernel interface to be used to enter
low-power states on ARM64 systems. Current cpu_suspend implementation
by default assumes that all low power states are losing the CPU context,
so the CPU registers must be saved and cleaned to DRAM upon state
entry. Furthermore, the current cpu_suspend() implementation assumes
that if the CPU suspend back-end method returns when called, this has
to be considered an error regardless of the return code (which can be
successful) since the CPU was not expected to return from a code path that
is different from cpu_resume code path - eg returning from the reset vector.

All in all this means that the current API does not cope well with low-power
states that preserve the CPU context when entered (ie retention states),
since first of all the context is saved for nothing on state entry for
those states and a successful state entry can return as a normal function
return, which is considered an error by the current CPU suspend
implementation.

This patch refactors the cpu_suspend() API so that it can be split in
two separate functionalities. The arm64 cpu_suspend API just provides
a wrapper around CPU suspend operation hook. A new function is
introduced (for architecture code use only) for states that require
context saving upon entry:

__cpu_suspend(unsigned long arg, int (*fn)(unsigned long))

__cpu_suspend() saves the context on function entry and calls the
so called suspend finisher (ie fn) to complete the suspend operation.
The finisher is not expected to return, unless it fails in which case
the error is propagated back to the __cpu_suspend caller.

The API refactoring results in the following pseudo code call sequence for a
suspending CPU, when triggered from a kernel subsystem:

/*
* int cpu_suspend(unsigned long idx)
* @idx: idle state index
*/
{
-> cpu_suspend(idx)
|---> CPU operations suspend hook called, if present
|--> if (retention_state)
|--> direct suspend back-end call (eg PSCI suspend)
else
|--> __cpu_suspend(idx, &back_end_finisher);
}

By refactoring the cpu_suspend API this way, the CPU operations back-end
has a chance to detect whether idle states require state saving or not
and can call the required suspend operations accordingly either through
simple function call or indirectly through __cpu_suspend() which carries out
state saving and suspend finisher dispatching to complete idle state entry.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[ luis: 3.16-stable prereq for
f43c27188a49 "arm64: kernel: fix __cpu_suspend mm switch on warm-boot" ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

arm64: kernel: add missing __init section marker to cpu_suspend_init

commit 18ab7db6b749ac27aac08d572afbbd2f4d937934 upstream.

Suspend init function must be marked as __init, since it is not needed
after the kernel has booted. This patch moves the cpu_suspend_init()
function to the __init section.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[ luis: 3.16-stable prereq for
f43c27188a49 "arm64: kernel: fix __cpu_suspend mm switch on warm-boot" ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb

commit 54dc77d974a50147d6639dac6f59cb2c29207161 upstream.

Eric Paris explains: Since kauditd_send_multicast_skb() gets called in
audit_log_end(), which can come from any context (aka even a sleeping context)
GFP_KERNEL can't be used.  Since the audit_buffer knows what context it should
use, pass that down and use that.

See: https://lkml.org/lkml/2014/12/16/542

BUG: sleeping function called from invalid context at mm/slab.c:2849
in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin
2 locks held by sulogin/885:
  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff91152e30>] prepare_bprm_creds+0x28/0x8b
  #1:  (tty_files_lock){+.+.+.}, at: [<ffffffff9123e787>] selinux_bprm_committing_creds+0x55/0x22b
CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30
Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014
  ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375
  ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006
  0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38
Call Trace:
  [<ffffffff916ba529>] dump_stack+0x50/0xa8
  [<ffffffff91063185>] ___might_sleep+0x1b6/0x1be
  [<ffffffff910632a6>] __might_sleep+0x119/0x128
  [<ffffffff91140720>] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f
  [<ffffffff91141d81>] kmem_cache_alloc+0x43/0x1c9
  [<ffffffff914e148d>] __alloc_skb+0x42/0x1a3
  [<ffffffff914e2b62>] skb_copy+0x3e/0xa3
  [<ffffffff910c263e>] audit_log_end+0x83/0x100
  [<ffffffff9123b8d3>] ? avc_audit_pre_callback+0x103/0x103
  [<ffffffff91252a73>] common_lsm_audit+0x441/0x450
  [<ffffffff9123c163>] slow_avc_audit+0x63/0x67
  [<ffffffff9123c42c>] avc_has_perm+0xca/0xe3
  [<ffffffff9123dc2d>] inode_has_perm+0x5a/0x65
  [<ffffffff9123e7ca>] selinux_bprm_committing_creds+0x98/0x22b
  [<ffffffff91239e64>] security_bprm_committing_creds+0xe/0x10
  [<ffffffff911515e6>] install_exec_creds+0xe/0x79
  [<ffffffff911974cf>] load_elf_binary+0xe36/0x10d7
  [<ffffffff9115198e>] search_binary_handler+0x81/0x18c
  [<ffffffff91153376>] do_execveat_common.isra.31+0x4e3/0x7b7
  [<ffffffff91153669>] do_execve+0x1f/0x21
  [<ffffffff91153967>] SyS_execve+0x25/0x29
  [<ffffffff916c61a9>] stub_execve+0x69/0xa0

Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Paul Moore <pmoore@redhat.com>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

audit: don't attempt to lookup PIDs when changing PID filtering audit rules

commit 3640dcfa4fd00cd91d88bb86250bdb496f7070c0 upstream.

Commit f1dc4867 ("audit: anchor all pid references in the initial pid
namespace") introduced a find_vpid() call when adding/removing audit
rules with PID/PPID filters; unfortunately this is problematic as
find_vpid() only works if there is a task with the associated PID
alive on the system. The following commands demonstrate a simple
reproducer.

# auditctl -D
# auditctl -l
# autrace /bin/true
# auditctl -l

This patch resolves the problem by simply using the PID provided by
the user without any additional validation, e.g. no calls to check to
see if the task/PID exists.

Cc: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

tick/powerclamp: Remove tick_nohz_idle abuse

commit a5fd9733a30d18d7ac23f17080e7e07bb3205b69 upstream.

commit 4dbd27711cd9 "tick: export nohz tick idle symbols for module
use" was merged via the thermal tree without an explicit ack from the
relevant maintainers.

The exports are abused by the intel powerclamp driver which implements
a fake idle state from a sched FIFO task. This causes all kinds of
wreckage in the NOHZ core code which rightfully assumes that
tick_nohz_idle_enter/exit() are only called from the idle task itself.

Recent changes in the NOHZ core lead to a failure of the powerclamp
driver and now people try to hack completely broken and backwards
workarounds into the NOHZ core code. This is completely unacceptable
and just papers over the real problem. There are way more subtle
issues lurking around the corner.

The real solution is to fix the powerclamp driver by rewriting it with
a sane concept, but that's beyond the scope of this.

So the only solution for now is to remove the calls into the core NOHZ
code from the powerclamp trainwreck along with the exports.

Fixes: d6d71ee4a14a "PM: Introduce Intel PowerClamp Driver"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Pan Jacob jun <jacob.jun.pan@intel.com>
Cc: LKP <lkp@01.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ALSA: usb-audio: extend KEF X300A FU 10 tweak to Arcam rPAC

commit d70a1b9893f820fdbcdffac408c909c50f2e6b43 upstream.

The Arcam rPAC seems to have the same problem - whenever anything
(alsamixer, udevd, 3.9+ kernel from 60af3d037eb8c, ..) attempts to
access mixer / control interface of the card, the firmware "locks up"
the entire device, resulting in
SNDRV_PCM_IOCTL_HW_PARAMS failed (-5): Input/output error
from alsa-lib.

Other operating systems can somehow read the mixer (there seems to be
playback volume/mute), but any manipulation is ignored by the device
(which has hardware volume controls).

Signed-off-by: Jiri Jaburek <jjaburek@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

x86/tls: Don't validate lm in set_thread_area() after all

commit 3fb2f4237bb452eb4e98f6a5dbd5a445b4fed9d0 upstream.

It turns out that there's a lurking ABI issue.  GCC, when
compiling this in a 32-bit program:

struct user_desc desc = {
.entry_number    = idx,
.base_addr       = base,
.limit           = 0xfffff,
.seg_32bit       = 1,
.contents        = 0, /* Data, grow-up */
.read_exec_only  = 0,
.limit_in_pages  = 1,
.seg_not_present = 0,
.useable         = 0,
};

will leave .lm uninitialized.  This means that anything in the
kernel that reads user_desc.lm for 32-bit tasks is unreliable.

Revert the .lm check in set_thread_area().  The value never did
anything in the first place.

Fixes: 0e58af4e1d21 ("x86/tls: Disallow unusual TLS segments")
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

i2c: mv64xxx: rework offload support to fix several problems

commit 00d8689b85a7bb37cc57ba4c40bd46325f51ced4 upstream.

Originally, the I2C controller supported by the i2c-mv64xxx driver
requires a lot of software support: an interrupt is generated at each
step of an I2C transaction (after the start bit, after sending the
address, etc.) and the driver is in charge of re-programming the I2C
controller to do the next step of the I2C transaction. This explains
the fairly complex state machine that the driver has.

On Marvell Armada XP and later processors (Armada 375, 38x, etc.), the
I2C controller was extended with a part called the "I2C Bridge", which
allows to offload the I2C transaction completely to the
hardware. Initial support for this mechanism was added in commit
930ab3d403a ("i2c: mv64xxx: Add I2C Transaction Generator support").

However, the implementation done in this commit has two related
issues, which this commit fixes by completely changing how the offload
implementation is done:

* SMBus read transfers, where there is one write to select the
   register immediately followed in the same transaction by one read,
   were making the processor hang. This was easier visible on the
   Marvell Armada XP WRT1900AC platform using a driver for an I2C LED
   controller, or on other Armada XP platforms by using a simple
   'i2cget' command to read an I2C EEPROM.

* The implementation was based on the fact that the offload engine
   was re-programmed to transfer each message of an I2C xfer: this
   meant that each message sent with the offload engine was starting
   with a normal I2C start sequence. However, the I2C subsystem
   assumes that all messages belonging to the same xfer will use the
   so-called "repeated start" so that the entire I2C xfer is seen as
   one transfer by the I2C devices and cannot be interrupt by other
   I2C masters on the same bus.

In fact, the "I2C Bridge" allows to offload three types of xfer:

- xfer of one write message
- xfer of one read message
- xfer of one write message followed by one read message

For all other situations, we have to fallback to not using the "I2C
Bridge" in order to get proper I2C semantics.

Therefore, this commit reworks the offload implementation to put it
not at the message level, but at the xfer level: in the
mv64xxx_i2c_xfer() function, we decide if the transaction can be
offloaded (in which case it is handled by the
mv64xxx_i2c_offload_xfer() function), or otherwise it is handled by
the slow path (implemented in the existing mv64xxx_i2c_execute_msg()).

This allows to simplify the state machine, which no longer needs to
have any state related to the offload implementation: the offload
implementation is now completely separated from the slow path (with
the exception of the interrupt handler, of course).

In summary:

- mv64xxx_i2c_can_offload() will analyze an I2C xfer and decided of
   the "I2C Bridge" can be used to offload it or not.

- mv64xxx_i2c_offload_xfer() will actually program the "I2C Bridge"
   to offload one xfer (of either one or two messages), and block
   using mv64xxx_i2c_wait_for_completion() until the xfer completes.

- The interrupt handler mv64xxx_i2c_intr() is modified to push the
   offload related code to a separate function,
   mv64xxx_i2c_intr_offload(). It will take care of reading the
   received data if needed.

This commit was tested on:

- Armada XP OpenBlocks AX3-4 (EEPROM on I2C and RTC on I2C)
- Armada XP WRT1900AC (LED controller on I2C)
- Armada XP GP (EEPROM on I2C)

Fixes: 930ab3d403ae ("i2c: mv64xxx: Add I2C Transaction Generator support")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
[wsa: fixed checkpatch warnings]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

i2c: mv64xxx: use BIT() macro for register value definitions

commit 12598695c26ff8fccea92bd36ee3617a6da9b0d0 upstream.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

dm: fix missed error code if .end_io isn't implemented by target_type

commit 5164bece1673cdf04782f8ed3fba70743700f5da upstream.

In bio-based DM's clone_endio(), when target_type doesn't implement
.end_io (e.g. linear) r will be always be initialized 0. So if a
WRITE SAME bio fails WRITE SAME will not be disabled as intended.

Fix this by initializing r to error, rather than 0, in clone_endio().

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: 7eee4ae2db ("dm: disable WRITE SAME if it fails")
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

dm thin: fix missing out-of-data-space to write mode transition if blocks are released

commit 2c43fd26e46734430122b8d2ad3024bb532df3ef upstream.

Discard bios and thin device deletion have the potential to release data
blocks. If the thin-pool is in out-of-data-space mode, and blocks were
released, transition the thin-pool back to full write mode.

The correct time to do this is just after the thin-pool metadata commit.
It cannot be done before the commit because the space maps will not
allow immediate reuse of the data blocks in case there's a rollback
following power failure.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

dm thin: fix inability to discard blocks when in out-of-data-space mode

commit 45ec9bd0fd7abf8705e7cf12205ff69fe9d51181 upstream.

When the pool was in PM_OUT_OF_SPACE mode its process_prepared_discard
function pointer was incorrectly being set to
process_prepared_discard_passdown rather than process_prepared_discard.

This incorrect function pointer meant the discard was being passed down,
but not effecting the mapping. As such any discard that was issued, in
an attempt to reclaim blocks, would not successfully free data space.

Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ALSA: hda/realtek - Add new Dell desktop for ALC3234 headset mode

commit 8b72415d8aa8bb1904c61926bd0701447ce44bee upstream.

New Dell desktop needs to support headset mode for ALC3234.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

drm/i915: Force the CS stall for invalidate flushes

commit add284a3a2481e759d6bec35f6444c32c8ddc383 upstream.

In order to act as a full command barrier by itself, we need to tell the
pipecontrol to actually stall the command streamer while the flush runs.
We require the full command barrier before operations like
MI_SET_CONTEXT, which currently rely on a prior invalidate flush.

References: https://bugs.freedesktop.org/show_bug.cgi?id=83677
Cc: Simon Farnsworth <simon@farnz.org.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

drm/i915: Invalidate media caches on gen7

commit 148b83d0815a3778c8949e6a97cb798cbaa0efb3 upstream.

In the gen7 pipe control there is an extra bit to flush the media
caches, so let's set it during cache invalidation flushes.

v2: Rename to MEDIA_STATE_CLEAR to be more inline with spec.

Cc: Simon Farnsworth <simon@farnz.org.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

iscsi-target: Fail connection on short sendmsg writes

commit 6bf6ca7515c1df06f5c03737537f5e0eb191e29e upstream.

This patch changes iscsit_do_tx_data() to fail on short writes
when kernel_sendmsg() returns a value different than requested
transfer length, returning -EPIPE and thus causing a connection
reset to occur.

This avoids a potential bug in the original code where a short
write would result in kernel_sendmsg() being called again with
the original iovec base + length.

In practice this has not been an issue because iscsit_do_tx_data()
is only used for transferring 48 byte headers + 4 byte digests,
along with seldom used control payloads from NOPIN + TEXT_RSP +
REJECT with less than 32k of data.

So following Al's audit of iovec consumers, go ahead and fail
the connection on short writes for now, and remove the bogus
logic ahead of his proper upstream fix.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

storvsc: ring buffer failures may result in I/O freeze

commit e86fb5e8ab95f10ec5f2e9430119d5d35020c951 upstream.

When ring buffer returns an error indicating retry, storvsc may not
return a proper error code to SCSI when bounce buffer is not used.
This has introduced I/O freeze on RAID running atop storvsc devices.
This patch fixes it by always returning a proper error code.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

scsi: blacklist RSOC for Microsoft iSCSI target devices

commit 198a956a11b15b564ac06d1411881e215b587408 upstream.

The Microsoft iSCSI target does not support REPORT SUPPORTED OPERATION
CODES. Blacklist these devices so we don't attempt to send the command.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tested-by: Mike Christie <michaelc@cs.wisc.edu>
Reported-by: jazz@deti74.ru
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode

commit 8117ac6a6c2fa0f847ff6a21a1f32c8d2c8501d0 upstream.

Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on. This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context. If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.

This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.

[ shreyas@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

genirq: Prevent proc race against freeing of irq descriptors

commit c291ee622165cb2c8d4e7af63fffd499354a23be upstream.

Since the rework of the sparse interrupt code to actually free the
unused interrupt descriptors there exists a race between the /proc
interfaces to the irq subsystem and the code which frees the interrupt
descriptor.

CPU0 CPU1
show_interrupts()
  desc = irq_to_desc(X);
free_desc(desc)
  remove_from_radix_tree();
  kfree(desc);
  raw_spinlock_irq(&desc->lock);

/proc/interrupts is the only interface which can actively corrupt
kernel memory via the lock access. /proc/stat can only read from freed
memory. Extremly hard to trigger, but possible.

The interfaces in /proc/irq/N/ are not affected by this because the
removal of the proc file is serialized in procfs against concurrent
readers/writers. The removal happens before the descriptor is freed.

For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
as the descriptor is never freed. It's merely cleared out with the irq
descriptor lock held. So any concurrent proc access will either see
the old correct value or the cleared out ones.

Protect the lookup and access to the irq descriptor in
show_interrupts() with the sparse_irq_lock.

Provide kstat_irqs_usr() which is protecting the lookup and access
with sparse_irq_lock and switch /proc/stat to use it.

Document the existing kstat_irqs interfaces so it's clear that the
caller needs to take care about protection. The users of these
interfaces are either not affected due to SPARSE_IRQ=n or already
protected against removal.

Fixes: 1f5a5b87f78f "genirq: Implement a sane sparse_irq allocator"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

iscsi,iser-target: Expose supported protection ops according to t10_pi

commit 23a548ee656c8ba6da8cb2412070edcd62e2ac5d upstream.

iSER will report supported protection operations based on
the tpg attribute t10_pi settings and HCA PI offload capabilities.
If the HCA does not support PI offload or tpg attribute t10_pi is
not set, we fall to SW PI mode.

In order to do that, we move iscsit_get_sup_prot_ops after connection
tpg assignment.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

iser-target: Fix NULL dereference in SW mode DIF

commit 302cc7c3ca14d21ccdffdebdb61c4fe028f2d5ad upstream.

Fallback to software mode DIF if HCA does not support
PI (without crashing obviously). It is still possible to
run with backend protection and an unprotected frontend,
so looking at the command prot_op is not enough. Check
device PI capability on a per-IO basis (isert_prot_cmd
inline static) to determine if we need to handle protection
information.

Trace:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffffa037f8b1>] isert_reg_sig_mr+0x351/0x3b0 [ib_isert]
Call Trace:
[<ffffffff812b003a>] ? swiotlb_map_sg_attrs+0x7a/0x130
[<ffffffffa038184d>] isert_reg_rdma+0x2fd/0x370 [ib_isert]
[<ffffffff8108f2ec>] ? idle_balance+0x6c/0x2c0
[<ffffffffa0382b68>] isert_put_datain+0x68/0x210 [ib_isert]
[<ffffffffa02acf5b>] lio_queue_data_in+0x2b/0x30 [iscsi_target_mod]
[<ffffffffa02306eb>] target_complete_ok_work+0x21b/0x310 [target_core_mod]
[<ffffffff8106ece2>] process_one_work+0x182/0x3b0
[<ffffffff8106fda0>] worker_thread+0x120/0x3c0
[<ffffffff8106fc80>] ? maybe_create_worker+0x190/0x190
[<ffffffff8107594e>] kthread+0xce/0xf0
[<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8159a22c>] ret_from_fork+0x7c/0xb0
[<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

iser-target: Allocate PI contexts dynamically

commit 570db170f37715b7df23c95868169f3d9affa48c upstream.

This patch converts to allocate PI contexts dynamically in order
avoid a potentially bogus np->tpg_np and associated NULL pointer
dereference in isert_connect_request() during iser-target endpoint
shutdown with multiple network portals.

Also, there is really no need to allocate these at connection
establishment since it is not guaranteed that all the IOs on
that connection will be to a PI formatted device.

We can do it in a lazy fashion so the initial burst will have a
transient slow down, but very fast all IOs will allocate a PI
context.

Squashed:

iser-target: Centralize PI context handling code

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

iser-target: Fix implicit termination of connections

commit b02efbfc9a051b41e71fe8f94ddf967260e024a6 upstream.

In situations such as bond failover, The new session establishment
implicitly invokes the termination of the old connection.

So, we don't want to wait for the old connection wait_conn to completely
terminate before we accept the new connection and post a login response.

The solution is to deffer the comp_wait completion and the conn_put to
a work so wait_conn will effectively be non-blocking (flush errors are
assumed to come very fast).

We allocate isert_release_wq with WQ_UNBOUND and WQ_UNBOUND_MAX_ACTIVE
to spread the concurrency of release works.

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

iser-target: Handle ADDR_CHANGE event for listener cm_id

commit ca6c1d82d12d8013fb75ce015900d62b9754623c upstream.

The np listener cm_id will also get ADDR_CHANGE event
upcall (in case it is bound to a specific IP). Handle
it correctly by creating a new cm_id and implicitly
destroy the old one.

Since this is the second event a listener np cm_id may
encounter, we move the np cm_id event handling to a
routine.

Squashed:

iser-target: Move cma_id setup to a function

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

iser-target: Fix connected_handler + teardown flow race

commit 19e2090fb246ca21b3e569ead51a6a7a1748eadd upstream.

Take isert_conn pointer from cm_id->qp->qp_context. This
will allow us to know that the cm_id context is always
the network portal. This will make the cm_id event check
(connection or network portal) more reliable.

In order to avoid a NULL dereference in cma_id->qp->qp_context
we destroy the qp after we destroy the cm_id (and make the
dereference safe). session stablishment/teardown sequences
can happen in parallel, we should take into account that
connected_handler might race with connection teardown flow.

Also, protect isert_conn->conn_device->active_qps decrement
within the error patch during QP creation failure and the
normal teardown path in isert_connect_release().

Squashed:

iser-target: Decrement completion context active_qps in error flow

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

iser-target: Parallelize CM connection establishment

commit 2371e5da8cfe91443339b54444dec6254fdd6dfc upstream.

There is no point in accepting a new CM request only
when we are completely done with the last iscsi login.
Instead we accept immediately, this will also cause the
CM connection to reach connected state and the initiator
is allowed to send the first login. We mark that we got
the initial login and let iscsi layer pick it up when it
gets there.

This reduces the parallel login sequence by a factor of
more then 4 (and more for multi-login) and also prevents
the initiator (who does all logins in parallel) from
giving up on login timeout expiration.

In order to support multiple login requests sequence (CHAP)
we call isert_rx_login_req from isert_rx_completion insead
of letting isert_get_login_rx call it.

Squashed:

iser-target: Use kref_get_unless_zero in connected_handler
iser-target: Acquire conn_mutex when changing connection state
iser-target: Reject connect request in failure path

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

iser-target: Fix flush + disconnect completion handling

commit 128e9cc84566a84146baea2335b3824288eed817 upstream.

ISER_CONN_UP state is not sufficient to know if
we should wait for completion of flush errors and
disconnected_handler event.

Instead, split it to 2 states:
- ISER_CONN_UP: Got to CM connected phase, This state
indicates that we need to wait for a CM disconnect
event before going to teardown.

- ISER_CONN_FULL_FEATURE: Got to full feature phase
after we posted login response, This state indicates
that we posted recv buffers and we need to wait for
flush completions before going to teardown.

Also avoid deffering disconnected handler to a work,
and handle it within disconnected handler.
More work here is needed to handle DEVICE_REMOVAL event
correctly (cleanup all resources).

Squashed:

iser-target: Don't deffer disconnected handler to a work

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

iscsi,iser-target: Initiate termination only once

commit 954f23722b5753305be490330cf2680b7a25f4a3 upstream.

Since commit 0fc4ea701fcf ("Target/iser: Don't put isert_conn inside
disconnected handler") we put the conn kref in isert_wait_conn, so we
need .wait_conn to be invoked also in the error path.

Introduce call to isert_conn_terminate (called under lock)
which transitions the connection state to TERMINATING and calls
rdma_disconnect. If the state is already teminating, just bail
out back (temination started).

Also, make sure to destroy the connection when getting a connect
error event if didn't get to connected (state UP). Same for the
handling of REJECTED and UNREACHABLE cma events.

Squashed:

iscsi-target: Add call to wait_conn in establishment error flow

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

drm/i915: vlv: fix IRQ masking when uninstalling interrupts

commit c352d1ba1e1e2c8a96af660944a58e86b12ac4af upstream.

irq_mask should include all IRQ bits that we want to mask, but atm we
set it incorrectly to the inverse of this. If the mask is used
subsequently to enable/disable some IRQ bits, we may unintentionally
unmask unrelated IRQs. I can't see any way that this can lead to a real
problem in the current -nightly code, since the first place the mask
will be used next (after a suspend/resume cycle) is in
valleyview_irq_postinstall(), but the mask is reset there to its proper
value.

This causes a problem in the upstream kernel though, where - due to another
issue - the mask is used in the above way to disable only the display IRQs.
This other issue is fixed by:

commit 950eabaf5a87257040e0c207be09487954113f54
Author: Imre Deak <imre.deak@intel.com>
Date: Mon Sep 8 15:21:09 2014 +0300

drm/i915: vlv: fix display IRQ enable/disable

Interestingly, even with the above two bugs, we shouldn't in theory have
any real problems (arguably a famous last sentence:). That's because
even if we unmask something unintentionally via the VLV_IMR/VLV_IER
register the master IRQ masking bit in VLV_MASTER_IER is still set and
should prevent all i915 interrupts. According to my testing on an ASUS
T100 with DSI output this isn't the case at least with the
MIPIA_INTERRUPT. Leaving this one unmasked in IMR/IER, while having
VLV_MASTER_IER set to 0 may lead to a lockup during system suspend as
shown in the bugzilla ticket below. This fix should get rid of the
problem reported there in upstream and older kernels.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85920
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

perf: Fix events installation during moving group

commit 9fc81d87420d0d3fd62d5e5529972c0ad9eab9cc upstream.

We allow PMU driver to change the cpu on which the event
should be installed to. This happened in patch:

  e2d37cd213dc ("perf: Allow the PMU driver to choose the CPU on which to install events")

This patch also forces all the group members to follow
the currently opened events cpu if the group happened
to be moved.

This and the change of event->cpu in perf_install_in_context()
function introduced in:

  0cda4c023132 ("perf: Introduce perf_pmu_migrate_context()")

forces group members to change their event->cpu,
if the currently-opened-event's PMU changed the cpu
and there is a group move.

Above behaviour causes problem for breakpoint events,
which uses event->cpu to touch cpu specific data for
breakpoints accounting. By changing event->cpu, some
breakpoints slots were wrongly accounted for given
cpu.

Vinces's perf fuzzer hit this issue and caused following
WARN on my setup:

   WARNING: CPU: 0 PID: 20214 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x142/0x150()
   Can't find any breakpoint slot
   [...]

This patch changes the group moving code to keep the event's
original cpu.

Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vince@deater.net>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1418243031-20367-3-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

perf/x86/intel/uncore: Make sure only uncore events are collected

commit af91568e762d04931dcbdd6bef4655433d8b9418 upstream.

The uncore_collect_events functions assumes that event group
might contain only uncore events which is wrong, because it
might contain any type of events.

This bug leads to uncore framework touching 'not' uncore events,
which could end up all sorts of bugs.

One was triggered by Vince's perf fuzzer, when the uncore code
touched breakpoint event private event space as if it was uncore
event and caused BUG:

   BUG: unable to handle kernel paging request at ffffffff82822068
   IP: [<ffffffff81020338>] uncore_assign_events+0x188/0x250
   ...

The code in uncore_assign_events() function was looking for
event->hw.idx data while the event was initialized as a
breakpoint with different members in event->hw union.

This patch forces uncore_collect_events() to collect only uncore
events.

Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1418243031-20367-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Btrfs: fix fs corruption on transaction abort if device supports discard

commit 678886bdc6378c1cbd5072da2c5a3035000214e3 upstream.

When we abort a transaction we iterate over all the ranges marked as dirty
in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them
from those trees, add them back (unpin) to the free space caches and, if
the fs was mounted with "-o discard", perform a discard on those regions.
Also, after adding the regions to the free space caches, a fitrim ioctl call
can see those ranges in a block group's free space cache and perform a discard
on the ranges, so the same issue can happen without "-o discard" as well.

This causes corruption, affecting one or multiple btree nodes (in the worst
case leaving the fs unmountable) because some of those ranges (the ones in
the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are
referred by the last committed super block - breaking the rule that anything
that was committed by a transaction is untouched until the next transaction
commits successfully.

I ran into this while running in a loop (for several hours) the fstest that
I recently submitted:

  [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim

The corruption always happened when a transaction aborted and then fsck complained
like this:

   _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
   *** fsck.btrfs output ***
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   read block failed check_tree_block
   Couldn't open file system

In this case 94945280 corresponded to the root of a tree.
Using frace what I observed was the following sequence of steps happened:

   1) transaction N started, fs_info->pinned_extents pointed to
      fs_info->freed_extents[0];

   2) node/eb 94945280 is created;

   3) eb is persisted to disk;

   4) transaction N commit starts, fs_info->pinned_extents now points to
      fs_info->freed_extents[1], and transaction N completes;

   5) transaction N + 1 starts;

   6) eb is COWed, and btrfs_free_tree_block() called for this eb;

   7) eb range (94945280 to 94945280 + 16Kb) is added to
      fs_info->pinned_extents (fs_info->freed_extents[1]);

   8) Something goes wrong in transaction N + 1, like hitting ENOSPC
      for example, and the transaction is aborted, turning the fs into
      readonly mode. The stack trace I got for example:

      [112065.253935]  [<ffffffff8140c7b6>] dump_stack+0x4d/0x66
      [112065.254271]  [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98
      [112065.254567]  [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs]
      [112065.261674]  [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50
      [112065.261922]  [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs]
      [112065.262211]  [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs]
      [112065.262545]  [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs]
      [112065.262771]  [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs]
      [112065.263105]  [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs]
      (...)
      [112065.264493] ---[ end trace dd7903a975a31a08 ]---
      [112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left
      [112065.264997] BTRFS info (device sdc): forced readonly

   9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in
      fs_info->fs_state and calls btrfs_cleanup_transaction(), which in
      turn calls btrfs_destroy_pinned_extent();

   10) Then btrfs_destroy_pinned_extent() iterates over all the ranges
       marked as dirty in fs_info->freed_extents[], and for each one
       it calls discard, if the fs was mounted with "-o discard", and
       adds the range to the free space cache of the respective block
       group;

   11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path,
       sees the free space entries and performs a discard;

   12) After an umount and mount (or fsck), our eb's location on disk was full
       of zeroes, and it should have been untouched, because it was marked as
       dirty in the fs_info->pinned_extents tree, and therefore used by the
       trees that the last committed superblock points to.

Fix this by not performing a discard and not adding the ranges to the free space
caches - it's useless from this point since the fs is now in readonly mode and
we won't write free space caches to disk anymore (otherwise we would leak space)
nor any new superblock. By not adding the ranges to the free space caches, it
prevents other code paths from allocating that space and write to it as well,
therefore being safer and simpler.

This isn't a new problem, as it's been present since 2011 (git commit
acce952b0263825da32cf10489413dec78053347).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ASoC: pcm512x: Trigger auto-increment of register addresses on i2c

commit 681a19560378213a193c424881b2180a783b81ae upstream.

When the codec is connected using i2c, it will only auto-increment
register addresses if msb (0x80) of the register address byte is set.

[Fixes cache sync if multiple adjacent registers are updated -- broonie]

Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>