git.ipfire.org Git - thirdparty/kernel/stable.git/log

remove directory incorrectly tries to set delete on close on non-empty directories

commit 897fba1172d637d344f009d700f7eb8a1fa262f1 upstream.

Wrong return code was being returned on SMB3 rmdir of
non-empty directory.

For SMB3 (unlike for cifs), we attempt to delete a directory by
set of delete on close flag on the open. Windows clients set
this flag via a set info (SET_FILE_DISPOSITION to set this flag)
which properly checks if the directory is empty.

With this patch on smb3 mounts we correctly return
"DIRECTORY NOT EMPTY"
on attempts to remove a non-empty directory.

Signed-off-by: Steve French <steve.french@primarydata.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

fs/cifs: correctly to anonymous authentication for the NTLM(v2) authentication

commit 1a967d6c9b39c226be1b45f13acd4d8a5ab3dc44 upstream.

Only server which map unknown users to guest will allow
access using a non-null NTLMv2_Response.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.16:adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

fs/cifs: correctly to anonymous authentication for the NTLM(v1) authentication

commit 777f69b8d26bf35ade4a76b08f203c11e048365d upstream.

Only server which map unknown users to guest will allow
access using a non-null NTChallengeResponse.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.16: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

fs/cifs: correctly to anonymous authentication for the LANMAN authentication

commit fa8f3a354bb775ec586e4475bcb07f7dece97e0c upstream.

Only server which map unknown users to guest will allow
access using a non-null LMChallengeResponse.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.16: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

fs/cifs: correctly to anonymous authentication via NTLMSSP

commit cfda35d98298131bf38fbad3ce4cd5ecb3cf18db upstream.

See [MS-NLMP] 3.2.5.1.2 Server Receives an AUTHENTICATE_MESSAGE from the Client:

   ...
   Set NullSession to FALSE
   If (AUTHENTICATE_MESSAGE.UserNameLen == 0 AND
      AUTHENTICATE_MESSAGE.NtChallengeResponse.Length == 0 AND
      (AUTHENTICATE_MESSAGE.LmChallengeResponse == Z(1)
       OR
       AUTHENTICATE_MESSAGE.LmChallengeResponse.Length == 0))
       -- Special case: client requested anonymous authentication
       Set NullSession to TRUE
   ...

Only server which map unknown users to guest will allow
access using a non-null NTChallengeResponse.

For Samba it's the "map to guest = bad user" option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

drm/fb_helper: Fix references to dev->mode_config.num_connector

commit 255f0e7c418ad95a4baeda017ae6182ba9b3c423 upstream.

During boot, MST hotplugs are generally expected (even if no physical
hotplugging occurs) and result in DRM's connector topology changing.
This means that using num_connector from the current mode configuration
can lead to the number of connectors changing under us. This can lead to
some nasty scenarios in fbcon:

- We allocate an array to the size of dev->mode_config.num_connectors.
- MST hotplug occurs, dev->mode_config.num_connectors gets incremented.
- We try to loop through each element in the array using the new value
  of dev->mode_config.num_connectors, and end up going out of bounds
  since dev->mode_config.num_connectors is now larger then the array we
  allocated.

fb_helper->connector_count however, will always remain consistent while
we do a modeset in fb_helper.

Note: This is just polish for 4.7, Dave Airlie's drm_connector
refcounting fixed these bugs for real. But it's good enough duct-tape
for stable kernel backporting, since backporting the refcounting
changes is way too invasive.

Signed-off-by: Lyude <cpaul@redhat.com>
[danvet: Clarify why we need this. Also remove the now unused "dev"
local variable to appease gcc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1463065021-18280-3-git-send-email-cpaul@redhat.com
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

drm/i915/fbdev: Fix num_connector references in intel_fb_initial_config()

commit 14a3842a1d5945067d1dd0788f314e14d5b18e5b upstream.

During boot time, MST devices usually send a ton of hotplug events
irregardless of whether or not any physical hotplugs actually occurred.
Hotplugs mean connectors being created/destroyed, and the number of DRM
connectors changing under us. This isn't a problem if we use
fb_helper->connector_count since we only set it once in the code,
however if we use num_connector from struct drm_mode_config we risk it's
value changing under us. On top of that, there's even a chance that
dev->mode_config.num_connector != fb_helper->connector_count. If the
number of connectors happens to increase under us, we'll end up using
the wrong array size for memcpy and start writing beyond the actual
length of the array, occasionally resulting in kernel panics.

Note: This is just polish for 4.7, Dave Airlie's drm_connector
refcounting fixed these bugs for real. But it's good enough duct-tape
for stable kernel backporting, since backporting the refcounting
changes is way too invasive.

Signed-off-by: Lyude <cpaul@redhat.com>
[danvet: Clarify why we need this.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1463065021-18280-2-git-send-email-cpaul@redhat.com
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

sched/preempt: Fix preempt_count manipulations

commit 2e636d5e66c35dfcbaf617aa8fa963f6847478fe upstream.

Vikram reported that his ARM64 compiler managed to 'optimize' away the
preempt_count manipulations in code like:

preempt_enable_no_resched();
put_user();
preempt_disable();

Irrespective of that fact that that is horrible code that should be
fixed for many reasons, it does highlight a deficiency in the generic
preempt_count manipulators. As it is never right to combine/elide
preempt_count manipulations like this.

Therefore sprinkle some volatile in the two generic accessors to
ensure the compiler is aware of the fact that the preempt_count is
observed outside of the regular program-order view and thus cannot be
optimized away like this.

x86; the only arch not using the generic code is not affected as we
do all this in asm in order to use the segment base per-cpu stuff.

Reported-by: Vikram Mulukutla <markivx@codeaurora.org>
Tested-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a787870924db ("sched, arch: Create asm/preempt.h")
Link: http://lkml.kernel.org/r/20160516131751.GH3205@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: use ACCESS_ONCE() instead of READ_ONCE()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

netlink: Fix dump skb leak/double free

commit 92964c79b357efd980812c4de5c1fd2ec8bb5520 upstream.

When we free cb->skb after a dump, we do it after releasing the
lock. This means that a new dump could have started in the time
being and we'll end up freeing their skb instead of ours.

This patch saves the skb and module before we unlock so we free
the right memory.

Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

PCI: Disable all BAR sizing for devices with non-compliant BARs

commit ad67b437f187ea818b2860524d10f878fadfdd99 upstream.

b84106b4e229 ("PCI: Disable IO/MEM decoding for devices with non-compliant
BARs") disabled BAR sizing for BARs 0-5 of devices that don't comply with
the PCI spec.  But it didn't do anything for expansion ROM BARs, so we
still try to size them, resulting in warnings like this on Broadwell-EP:

  pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]

Move the non-compliant BAR check from __pci_read_base() up to
pci_read_bases() so it applies to the expansion ROM BAR as well as
to BARs 0-5.

Note that direct callers of __pci_read_base(), like sriov_init(), will now
bypass this check.  We haven't had reports of devices with broken SR-IOV
BARs yet.

[bhelgaas: changelog]
Fixes: b84106b4e229 ("PCI: Disable IO/MEM decoding for devices with non-compliant BARs")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

x86/PCI: Mark Broadwell-EP Home Agent 1 as having non-compliant BARs

commit da77b67195de1c65bef4908fa29967c4d0af2da2 upstream.

Commit b894157145e4 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having
non-compliant BARs") marked Home Agent 0 & PCU has having non-compliant
BARs.  Home Agent 1 also has non-compliant BARs.

Mark Home Agent 1 as having non-compliant BARs so the PCI core doesn't
touch them.

The problem with these devices is documented in the Xeon v4 specification
update:

  BDF2          PCI BARs in the Home Agent Will Return Non-Zero Values
                During Enumeration

  Problem:      During system initialization the Operating System may access
                the standard PCI BARs (Base Address Registers).  Due to
                this erratum, accesses to the Home Agent BAR registers (Bus
                1; Device 18; Function 0,4; Offsets (0x14-0x24) will return
                non-zero values.

  Implication:  The operating system may issue a warning.  Intel has not
                observed any functional failures due to this erratum.

Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v4-spec-update.html
Fixes: b894157145e4 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

net/mlx4_core: Fix access to uninitialized index

commit 2bb07e155bb3e0c722c806723f737cf8020961ef upstream.

Prevent using uninitialized or negative index when handling
steering entries.

Fixes: b12d93d63c32 ('mlx4: Add support for promiscuous mode in the new steering model.')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

blk-mq: fix undefined behaviour in order_to_size()

commit b3a834b1596ac668df206aa2bb1f191c31f5f5e4 upstream.

When this_order variable in blk_mq_init_rq_map() becomes zero
the code incorrectly decrements the variable and passes the result
to order_to_size() helper causing undefined behaviour:

UBSAN: Undefined behaviour in block/blk-mq.c:1459:27
shift exponent 4294967295 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-00072-g33656a1 #22

Fix the code by checking this_order variable for not having the zero
value first.

Reported-by: Meelis Roos <mroos@linux.ee>
Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism")
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

mmc: mmc: Fix partition switch timeout for some eMMCs

commit 1c447116d017a98c90f8f71c8c5a611e0aa42178 upstream.

Some eMMCs set the partition switch timeout too low.

Now typically eMMCs are considered a critical component (e.g. because
they store the root file system) and consequently are expected to be
reliable. Thus we can neglect the use case where eMMCs can't switch
reliably and we might want a lower timeout to facilitate speedy
recovery.

Although we could employ a quirk for the cards that are affected (if
we could identify them all), as described above, there is little
benefit to having a low timeout, so instead simply set a minimum
timeout.

The minimum is set to 300ms somewhat arbitrarily - the examples that
have been seen had a timeout of 10ms but were sometimes taking 60-70ms.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

i40e: fix an uninitialized variable bug

commit 1c306f7f62a38ee5f05f0ee994dfe82d654cf47c upstream.

We removed this initialization but it is required. Let's put it back.

Fixes: 895106a577c4 ('i40e: trivial fixes')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

IB/core: Fix a potential array overrun in CMA and SA agent

commit 2fa2d4fb1166d1ef35f0aacac6165d53ab1b89c7 upstream.

Fix array overrun when going over callback table.
In declaration of callback table, the max size isn't provided and
in registration phase, it is provided.

There is potential scenario where a new operation is added
and it is not supported by current client. The acceptance of
such operation by ib_netlink will cause to array overrun.

Fixes: 809d5fc9bf65 ("infiniband: pass rdma_cm module to netlink_dump_start")
Fixes: b493d91d333e ("iwcm: common code for port mapper")
Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.16:
- Only cma.c needs to be fixed
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

IB/IWPM: Fix a potential skb leak

commit 5ed935e861a4cbf2158ad3386d6d26edd60d2658 upstream.

In case ibnl_put_msg fails in send_nlmsg_done,
the function returns with -ENOMEM without freeing.

This patch fixes this behavior.

Fixes: 30dc5e63d6a5 ("RDMA/core: Add support for iWARP Port Mapper user space service")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

RDMA/iw_cxgb4: Always wake up waiter in c4iw_peer_abort_intr()

commit 093108cb3640844cfdabb0f506fa6b592b64272d upstream.

Currently c4iw_peer_abort_intr() does not wake up the waiter if the
endpoint state indicates we're using MPAv2 and we're currently trying to
connect. This was introduced with commit 7c0a33d61187a ("RDMA/cxgb4:
Don't wakeup threads for MPAv2")

However, this original fix is flawed because it introduces a race that
can cause a deadlock of the iwarp stack.  Here is the race:

->local side sets up an active offload connection.

->local side sends MPA_START request.

->peer sends MPA_START response.

->local side ingress cpl thread begins processing the MPA_START response,
but before it changes the state from MPA_REQ_SENT to FPDU_MODE:

->peer sends a RST which results in a ABORT_REQ_RSS.  This triggers
peer_abort_intr() which sees the state in MPA_REQ_SENT and since mpa_rev
is 2, it will avoid waking up the endpoint with -ECONNRESET, assuming the
stack will re-attempt the connection using MPAv1.

->Meanwhile, the cpl thread moves the state to FPDU_MODE and calls
c4iw_modify_rc_qp() which calls rdma_init() which sends a RI_WR/INIT WR
to firmware.  But since HW sent an abort, FW correctly drops the RI_WR/INIT
WR.

->So the cpl thread is stuck waiting for a reply and cannot process the
ABORT_REQ_RSS cpl sitting in its input queue. Thus everything comes to a
halt because no more ingress cpls are processed by the stack...

The correct fix for the issue is to always do the wake up in
c4iw_abort_intr() but reinitialize the wait object in c4iw_reconnect().

Fixes: 7c0a33d61187a ("RDMA/cxgb4: Don't wakeup threads for MPAv2")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ring-buffer: Prevent overflow of size in ring_buffer_resize()

commit 59643d1535eb220668692a5359de22545af579f6 upstream.

If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE
then the DIV_ROUND_UP() will return zero.

Here's the details:

# echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb

tracing_entries_write() processes this and converts kb to bytes.

18014398509481980 << 10 = 18446744073709547520

and this is passed to ring_buffer_resize() as unsigned long size.

size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);

Where DIV_ROUND_UP(a, b) is (a + b - 1)/b

BUF_PAGE_SIZE is 4080 and here

18446744073709547520 + 4080 - 1 = 18446744073709551599

where 18446744073709551599 is still smaller than 2^64

2^64 - 18446744073709551599 = 17

But now 18446744073709551599 / 4080 = 4521260802379792

and size = size * 4080 = 18446744073709551360

This is checked to make sure its still greater than 2 * 4080,
which it is.

Then we convert to the number of buffer pages needed.

nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE)

but this time size is 18446744073709551360 and

2^64 - (18446744073709551360 + 4080 - 1) = -3823

Thus it overflows and the resulting number is less than 4080, which makes

3823 / 4080 = 0

an nr_pages is set to this. As we already checked against the minimum that
nr_pages may be, this causes the logic to fail as well, and we crash the
kernel.

There's no reason to have the two DIV_ROUND_UP() (that's just result of
historical code changes), clean up the code and fix this bug.

Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ring-buffer: Use long for nr_pages to avoid overflow failures

commit 9b94a8fba501f38368aef6ac1b30e7335252a220 upstream.

The size variable to change the ring buffer in ftrace is a long. The
nr_pages used to update the ring buffer based on the size is int. On 64 bit
machines this can cause an overflow problem.

For example, the following will cause the ring buffer to crash:

# cd /sys/kernel/debug/tracing
# echo 10 > buffer_size_kb
# echo 8556384240 > buffer_size_kb

Then you get the warning of:

WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260

Which is:

  RB_WARN_ON(cpu_buffer, nr_removed);

Note each ring buffer page holds 4080 bytes.

This is because:

1) 10 causes the ring buffer to have 3 pages.
    (10kb requires 3 * 4080 pages to hold)

2) (2^31 / 2^10  + 1) * 4080 = 8556384240
    The value written into buffer_size_kb is shifted by 10 and then passed
    to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760

3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE
    which is 4080. 8761737461760 / 4080 = 2147484672

4) nr_pages is subtracted from the current nr_pages (3) and we get:
    2147484669. This value is saved in a signed integer nr_pages_to_update

5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int
    turns into the value of -2147482627

6) As the value is a negative number, in update_pages_handler() it is
    negated and passed to rb_remove_pages() and 2147482627 pages will
    be removed, which is much larger than 3 and it causes the warning
    because not all the pages asked to be removed were removed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001
Fixes: 7a8e76a3829f1 ("tracing: unified trace buffer")
Reported-by: Hao Qin <QEver.cn@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

MIPS: math-emu: Fix jalr emulation when rd == $0

commit ab4a92e66741b35ca12f8497896bafbe579c28a1 upstream.

When emulating a jalr instruction with rd == $0, the code in
isBranchInstr was incorrectly writing to GPR $0 which should actually
always remain zeroed. This would lead to any further instructions
emulated which use $0 operating on a bogus value until the task is next
context switched, at which point the value of $0 in the task context
would be restored to the correct zero by a store in SAVE_SOME. Fix this
by not writing to rd if it is $0.

Fixes: 102cedc32a6e ("MIPS: microMIPS: Floating point support.")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Maciej W. Rozycki <macro@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13160/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

MIPS: Fix race condition in lazy cache flushing.

commit 4d46a67a3eb827ccf1125959936fd51ba318dabc upstream.

The lazy cache flushing implemented in the MIPS kernel suffers from a
race condition that is exposed by do_set_pte() in mm/memory.c.

A pre-condition is a file-system that writes to the page from the CPU
in its readpage method and then calls flush_dcache_page(). One example
is ubifs. Another pre-condition is that the dcache flush is postponed
in __flush_dcache_page().

Upon a page fault for an executable mapping not existing in the
page-cache, the following will happen:
1. Write to the page
2. flush_dcache_page
3. flush_icache_page
4. set_pte_at
5. update_mmu_cache (commits the flush of a dcache-dirty page)

Between steps 4 and 5 another thread can hit the same page and it will
encounter a valid pte. Because the data still is in the L1 dcache the CPU
will fetch stale data from L2 into the icache and execute garbage.

This fix moves the commit of the cache flush to step 3 to close the
race window. It also reduces the amount of flushes on non-executable
mappings because we never enter __flush_dcache_page() for non-aliasing
CPUs.

Regressions can occur in drivers that mistakenly relies on the
flush_dcache_page() in get_user_pages() for DMA operations.

[ralf@linux-mips.org: Folded in patch 9346 to fix highmem issue.]

Signed-off-by: Lars Persson <larper@axis.com>
Cc: linux-mips@linux-mips.org
Cc: paul.burton@imgtec.com
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/9346/
Patchwork: https://patchwork.linux-mips.org/patch/9738/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism

commit 8445a87f7092bc8336ea1305be9306f26b846d93 upstream.

Commit 39baadbf36ce ("powerpc/eeh: Remove eeh information from pci_dn")
changed the pci_dn struct by removing its EEH-related members.
As part of this clean-up, DDW mechanism was modified to read the device
configuration address from eeh_dev struct.

As a consequence, now if we disable EEH mechanism on kernel command-line
for example, the DDW mechanism will fail, generating a kernel oops by
dereferencing a NULL pointer (which turns to be the eeh_dev pointer).

This patch just changes the configuration address calculation on DDW
functions to a manual calculation based on pci_dn members instead of
using eeh_dev-based address.

No functional changes were made. This was tested on pSeries, both
in PHyp and qemu guest.

Fixes: 39baadbf36ce ("powerpc/eeh: Remove eeh information from pci_dn")
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems

commit 20878232c52329f92423d27a60e48b6a6389e0dd upstream.

Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.

Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.6-rc5, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.

Likewise but not as obviously noticeable, a fully loaded system with no
processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95
(multiplied by number of cores).

Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems. Note: 93/2048 = 0.0454..., which rounds up to 0.05.

It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.

By changing the way the internally kept value is rounded, that internal
value equivalent now can reach 0.00 on idle, and 1.00 on full load. Upon
increasing load, the internally kept load value is rounded up, when the
load is decreasing, the load value is rounded down.

The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.6-rc5 and on centos 7.1 kernel 3.10.0-327. It was
tested on single, dual, and octal cores system. It was tested on virtual
hosts and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.

Tested-by: Damien Wyart <damien.wyart@free.fr>
Signed-off-by: Vik Heyndrickx <vik.heyndrickx@veribox.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Doug Smythies <dsmythies@telus.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0f004f5a696a ("sched: Cure more NO_HZ load average woes")
Link: http://lkml.kernel.org/r/e8d32bff-d544-7748-72b5-3c86cc71f09f@veribox.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

rtlwifi: Fix logic error in enter/exit power-save mode

commit 873ffe154ae074c46ed2d72dbd9a2a99f06f55b4 upstream.

In commit a269913c52ad ("rtlwifi: Rework rtl_lps_leave() and
rtl_lps_enter() to use work queue"), the tests for enter/exit
power-save mode were inverted. With this change applied, the
wifi connection becomes much more stable.

Fixes: a269913c52ad ("rtlwifi: Rework rtl_lps_leave() and rtl_lps_enter() to use work queue")
Signed-off-by: Wang YanQing <udknight@gmail.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.16:
- We only set a flag here to be used later, but it was also set the wrong way
- Adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

perf tools: Fix perf regs mask generation

commit f47822078dece7189cad0a5f472f148e5e916736 upstream.

On some architectures (powerpc in particular), the number of registers
exceeds what can be represented in an integer bitmask. Ensure we
generate the proper bitmask on such platforms.

Fixes: 71ad0f5e4 ("perf tools: Support for DWARF CFI unwinding on post processing")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

powerpc/mm/hash64: Fix subpage protection with 4K HPTE config

commit aac55d7573c5d46ed9a62818d5d3e69dd2060105 upstream.

With Linux page size of 64K and hardware only supporting 4K HPTE, if we
use subpage protection, we always fail for the subpage 0 as shown
below (using the selftest subpage_prot test):

  520175565:  (4520111850): Failed at 0x3fffad4b0000 (p=13,sp=0,w=0), want=fault, got=pass !
  4520890210: (4520826495): Failed at 0x3fffad5b0000 (p=29,sp=0,w=0), want=fault, got=pass !
  4521574251: (4521510536): Failed at 0x3fffad6b0000 (p=45,sp=0,w=0), want=fault, got=pass !
  4522258324: (4522194609): Failed at 0x3fffad7b0000 (p=61,sp=0,w=0), want=fault, got=pass !

This is because hash preload wrongly inserts the HPTE entry for subpage
0 without looking at the subpage protection information.

Fix it by teaching should_hash_preload() not to preload if we have
subpage protection configured for that range.

It appears this has been broken since it was introduced in 2008.

Fixes: fa28237cfcc5 ("[POWERPC] Provide a way to protect 4k subpages when using 64k pages")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rework into should_hash_preload() to avoid build fails w/SLICES=n]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

powerpc/mm/hash64: Factor out hash preload psize check

commit 8bbc9b7b001eaab8abf7e9e24edf1bb285c8d825 upstream.

Currently we have a check in hash_preload() against the psize, which is
only included when CONFIG_PPC_MM_SLICES is enabled. We want to expand
this check in a subsequent patch, so factor it out to allow that. As a
bonus it removes the #ifdef in the C code.

Unfortunately we can't put this in the existing CONFIG_PPC_MM_SLICES
block because it would require a forward declaration.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

kbuild: move -Wunused-const-variable to W=1 warning level

commit c9c6837d39311b0cc14cdbe7c18e815ab44aefb1 upstream.

gcc-6 started warning by default about variables that are not
used anywhere and that are marked 'const', generating many
false positives in an allmodconfig build, e.g.:

arch/arm/mach-davinci/board-da830-evm.c:282:20: warning: 'da830_evm_emif25_pins' defined but not used [-Wunused-const-variable=]
arch/arm/plat-omap/dmtimer.c:958:34: warning: 'omap_timer_match' defined but not used [-Wunused-const-variable=]
drivers/bluetooth/hci_bcm.c:625:39: warning: 'acpi_bcm_default_gpios' defined but not used [-Wunused-const-variable=]
drivers/char/hw_random/omap-rng.c:92:18: warning: 'reg_map_omap4' defined but not used [-Wunused-const-variable=]
drivers/devfreq/exynos/exynos5_bus.c:381:32: warning: 'exynos5_busfreq_int_pm' defined but not used [-Wunused-const-variable=]
drivers/dma/mv_xor.c:1139:34: warning: 'mv_xor_dt_ids' defined but not used [-Wunused-const-variable=]

This is similar to the existing -Wunused-but-set-variable warning
that was added in an earlier release and that we disable by default
now and only enable when W=1 is set, so it makes sense to do
the same here. Once we have eliminated the majority of the
warnings for both, we can put them back into the default list.

We probably want this in backport kernels as well, to allow building
them with gcc-6 without introducing extra warnings.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

arm64: cpuinfo: Missing NULL terminator in compat_hwcap_str

commit f228b494e56d949be8d8ea09d4f973d1979201bf upstream.

The loop that browses the array compat_hwcap_str will stop when a NULL
is encountered, however NULL is missing at the end of array. This will
lead to overrun until a NULL is found somewhere in the following memory.
In reality, this works out because the compat_hwcap2_str array tends to
follow immediately in memory, and that *is* terminated correctly.
Furthermore, the unsigned int compat_elf_hwcap is checked before
printing each capability, so we end up doing the right thing because
the size of the two arrays is less than 32. Still, this is an obvious
mistake and should be fixed.

Note for backporting: commit 12d11817eaafa414 ("arm64: Move
/proc/cpuinfo handling code") moved this code in v4.4. Prior to that
commit, the same change should be made in arch/arm64/kernel/setup.c.

Fixes: 44b82b7700d0 "arm64: Fix up /proc/cpuinfo"
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

irqchip/gic: Ensure ordering between read of INTACK and shared data

commit f86c4fbd930ff6fecf3d8a1c313182bd0f49f496 upstream.

When an IPI is generated by a CPU, the pattern looks roughly like:

  <write shared data>
  smp_wmb();
  <write to GIC to signal SGI>

On the receiving CPU we rely on the fact that, once we've taken the
interrupt, then the freshly written shared data must be visible to us.
Put another way, the CPU isn't going to speculate taking an interrupt.

Unfortunately, this assumption turns out to be broken.

Consider that CPUx wants to send an IPI to CPUy, which will cause CPUy
to read some shared_data. Before CPUx has done anything, a random
peripheral raises an IRQ to the GIC and the IRQ line on CPUy is raised.
CPUy then takes the IRQ and starts executing the entry code, heading
towards gic_handle_irq. Furthermore, let's assume that a bunch of the
previous interrupts handled by CPUy were SGIs, so the branch predictor
kicks in and speculates that irqnr will be <16 and we're likely to
head into handle_IPI. The prefetcher then grabs a speculative copy of
shared_data which contains a stale value.

Meanwhile, CPUx gets round to updating shared_data and asking the GIC
to send an SGI to CPUy. Internally, the GIC decides that the SGI is
more important than the peripheral interrupt (which hasn't yet been
ACKed) but doesn't need to do anything to CPUy, because the IRQ line
is already raised.

CPUy then reads the ACK register on the GIC, sees the SGI value which
confirms the branch prediction and we end up with a stale shared_data
value.

This patch fixes the problem by adding an smp_rmb() to the IPI entry
code in gic_handle_irq. As it turns out, the combination of a control
dependency and an ISB instruction from the EOI in the GICv3 driver is
enough to provide the ordering we need, so we add a comment there
justifying the absence of an explicit smp_rmb().

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[bwh: Backported to 3.16: drop changes to irq-gic-v3]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

s390/vmem: fix identity mapping

commit c34a69059d7876e0793eb410deedfb08ccb22b02 upstream.

The identity mapping is suboptimal for the last 2GB frame. The mapping
will be established with a mix of 4KB and 1MB mappings instead of a
single 2GB mapping.

This happens because of a off-by-one bug introduced with
commit 50be63450728 ("s390/mm: Convert bootmem to memblock").

Currently the identity mapping looks like this:

0x0000000080000000-0x0000000180000000        4G PUD RW
0x0000000180000000-0x00000001fff00000     2047M PMD RW
0x00000001fff00000-0x0000000200000000        1M PTE RW

With the bug fixed it looks like this:

0x0000000080000000-0x0000000200000000        6G PUD RW

Fixes: 50be63450728 ("s390/mm: Convert bootmem to memblock")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ata: sata_dwc_460ex: remove incorrect locking

commit 55e610cdd28c0ad3dce0652030c0296d549673f3 upstream.

This lock is already taken in ata_scsi_queuecmd() a few levels up the
call stack so attempting to take it here is an error.  Moreover, it is
pointless in the first place since it only protects a single, atomic
assignment.

Enabling lock debugging gives the following output:

=============================================
[ INFO: possible recursive locking detected ]
4.4.0-rc5+ #189 Not tainted
---------------------------------------------
kworker/u2:3/37 is trying to acquire lock:
(&(&host->lock)->rlock){-.-...}, at: [<90283294>] sata_dwc_exec_command_by_tag.constprop.14+0x44/0x8c

but task is already holding lock:
(&(&host->lock)->rlock){-.-...}, at: [<902761ac>] ata_scsi_queuecmd+0x2c/0x330

other info that might help us debug this:
Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&host->lock)->rlock);
  lock(&(&host->lock)->rlock);

*** DEADLOCK ***
May be due to missing lock nesting notation

4 locks held by kworker/u2:3/37:
#0:  ("events_unbound"){.+.+.+}, at: [<9003a0a4>] process_one_work+0x12c/0x430
#1:  ((&entry->work)){+.+.+.}, at: [<9003a0a4>] process_one_work+0x12c/0x430
#2:  (&bdev->bd_mutex){+.+.+.}, at: [<9011fd54>] __blkdev_get+0x50/0x380
#3:  (&(&host->lock)->rlock){-.-...}, at: [<902761ac>] ata_scsi_queuecmd+0x2c/0x330

stack backtrace:
CPU: 0 PID: 37 Comm: kworker/u2:3 Not tainted 4.4.0-rc5+ #189
Workqueue: events_unbound async_run_entry_fn
Stack : 90b38e30 00000021 00000003 9b2a6040 00000000 9005f3f0 904fc8dc 00000025
        906b96e4 00000000 90528648 9b3336c4 904fc8dc 9009bf18 00000002 00000004
        00000000 00000000 9b3336c4 9b3336e4 904fc8dc 9003d074 00000000 90500000
        9005e738 00000000 00000000 00000000 00000000 00000000 00000000 00000000
        6e657665 755f7374 756f626e 0000646e 00000000 00000000 9b00ca00 9b025000
          ...
Call Trace:
[<90009d6c>] show_stack+0x88/0xa4
[<90057744>] __lock_acquire+0x1ce8/0x2154
[<900583e4>] lock_acquire+0x64/0x8c
[<9045ff10>] _raw_spin_lock_irqsave+0x54/0x78
[<90283294>] sata_dwc_exec_command_by_tag.constprop.14+0x44/0x8c
[<90283484>] sata_dwc_qc_issue+0x1a8/0x24c
[<9026b39c>] ata_qc_issue+0x1f0/0x410
[<90273c6c>] ata_scsi_translate+0xb4/0x200
[<90276234>] ata_scsi_queuecmd+0xb4/0x330
[<9025800c>] scsi_dispatch_cmd+0xd0/0x128
[<90259934>] scsi_request_fn+0x58c/0x638
[<901a3e50>] __blk_run_queue+0x40/0x5c
[<901a83d4>] blk_queue_bio+0x27c/0x28c
[<901a5914>] generic_make_request+0xf0/0x188
[<901a5a54>] submit_bio+0xa8/0x194
[<9011adcc>] submit_bh_wbc.isra.23+0x15c/0x17c
[<9011c908>] block_read_full_page+0x3e4/0x428
[<9009e2e0>] do_read_cache_page+0xac/0x210
[<9009fd90>] read_cache_page+0x18/0x24
[<901bbd18>] read_dev_sector+0x38/0xb0
[<901bd174>] msdos_partition+0xb4/0x5c0
[<901bcb8c>] check_partition+0x140/0x274
[<901bba60>] rescan_partitions+0xa0/0x2b0
[<9011ff68>] __blkdev_get+0x264/0x380
[<901201ac>] blkdev_get+0x128/0x36c
[<901b9378>] add_disk+0x3c0/0x4bc
[<90268268>] sd_probe_async+0x100/0x224
[<90043a44>] async_run_entry_fn+0x50/0x124
[<9003a11c>] process_one_work+0x1a4/0x430
[<9003a4f4>] worker_thread+0x14c/0x4fc
[<900408f4>] kthread+0xd0/0xe8
[<90004338>] ret_from_kernel_thread+0x14/0x1c

Fixes: 62936009f35a ("[libata] Add 460EX on-chip SATA driver, sata_dwc_460ex")
Tested-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

gcov: disable tree-loop-im to reduce stack usage

commit c87bf431448b404a6ef5fbabd74c0e3e42157a7f upstream.

Enabling CONFIG_GCOV_PROFILE_ALL produces us a lot of warnings like

lib/lz4/lz4hc_compress.c: In function 'lz4_compresshcctx':
lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1504 bytes is larger than 1024 bytes [-Wframe-larger-than=]

After some investigation, I found that this behavior started with gcc-4.9,
and opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702.
A suggested workaround for it is to use the -fno-tree-loop-im
flag that turns off one of the optimization stages in gcc, so the
code runs a little slower but does not use excessive amounts
of stack.

We could make this conditional on the gcc version, but I could not
find an easy way to do this in Kbuild and the benefit would be
fairly small, given that most of the gcc version in production are
affected now.

I'm marking this for 'stable' backports because it addresses a bug
with code generation in gcc that exists in all kernel versions
with the affected gcc releases.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

MIPS: KVM: Fix timer IRQ race when writing CP0_Compare

commit b45bacd2d048f405c7760e5cc9b60dd67708734f upstream.

Writing CP0_Compare clears the timer interrupt pending bit
(CP0_Cause.TI), but this wasn't being done atomically. If a timer
interrupt raced with the write of the guest CP0_Compare, the timer
interrupt could end up being pending even though the new CP0_Compare is
nowhere near CP0_Count.

We were already updating the hrtimer expiry with
kvm_mips_update_hrtimer(), which used both kvm_mips_freeze_hrtimer() and
kvm_mips_resume_hrtimer(). Close the race window by expanding out
kvm_mips_update_hrtimer(), and clearing CP0_Cause.TI and setting
CP0_Compare between the freeze and resume. Since the pending timer
interrupt should not be cleared when CP0_Compare is written via the KVM
user API, an ack argument is added to distinguish the source of the
write.

Fixes: e30492bbe95a ("MIPS: KVM: Rewrite count/compare timer emulation")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim KrÄ\8dmÃ¡Å™" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.16: adjust filenames]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

MIPS: KVM: Fix timer IRQ race when freezing timer

commit 4355c44f063d3de4f072d796604c7f4ba4085cc3 upstream.

There's a particularly narrow and subtle race condition when the
software emulated guest timer is frozen which can allow a guest timer
interrupt to be missed.

This happens due to the hrtimer expiry being inexact, so very
occasionally the freeze time will be after the moment when the emulated
CP0_Count transitions to the same value as CP0_Compare (so an IRQ should
be generated), but before the moment when the hrtimer is due to expire
(so no IRQ is generated). The IRQ won't be generated when the timer is
resumed either, since the resume CP0_Count will already match CP0_Compare.

With VZ guests in particular this is far more likely to happen, since
the soft timer may be frozen frequently in order to restore the timer
state to the hardware guest timer. This happens after 5-10 hours of
guest soak testing, resulting in an overflow in guest kernel timekeeping
calculations, hanging the guest. A more focussed test case to
intentionally hit the race (with the help of a new hypcall to cause the
timer state to migrated between hardware & software) hits the condition
fairly reliably within around 30 seconds.

Instead of relying purely on the inexact hrtimer expiry to determine
whether an IRQ should be generated, read the guest CP0_Compare and
directly check whether the freeze time is before or after it. Only if
CP0_Count is on or after CP0_Compare do we check the hrtimer expiry to
determine whether the last IRQ has already been generated (which will
have pushed back the expiry by one timer period).

Fixes: e30492bbe95a ("MIPS: KVM: Rewrite count/compare timer emulation")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim KrÄ\8dmÃ¡Å™" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

USB: serial: quatech2: fix use-after-free in probe error path

commit 028c49f5e02a257c94129cd815f7c8485f51d4ef upstream.

The interface read URB is submitted in attach, but was only unlinked by
the driver at disconnect.

In case of a late probe error (e.g. due to failed minor allocation),
disconnect is never called and we would end up with active URBs for an
unbound interface. This in turn could lead to deallocated memory being
dereferenced in the completion callback.

Fixes: f7a33e608d9a ("USB: serial: add quatech2 usb to serial driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

USB: serial: mxuport: fix use-after-free in probe error path

commit 9e45284984096314994777f27e1446dfbfd2f0d7 upstream.

The interface read and event URBs are submitted in attach, but were
never explicitly unlinked by the driver. Instead the URBs would have
been killed by usb-serial core on disconnect.

In case of a late probe error (e.g. due to failed minor allocation),
disconnect is never called and we could end up with active URBs for an
unbound interface. This in turn could lead to deallocated memory being
dereferenced in the completion callbacks.

Fixes: ee467a1f2066 ("USB: serial: add Moxa UPORT 12XX/14XX/16XX
driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

USB: serial: keyspan: fix use-after-free in probe error path

commit 35be1a71d70775e7bd7e45fa6d2897342ff4c9d2 upstream.

The interface instat and indat URBs were submitted in attach, but never
unlinked in release before deallocating the corresponding transfer
buffers.

In the case of a late probe error (e.g. due to failed minor allocation),
disconnect would not have been called before release, causing the
buffers to be freed while the URBs are still in use. We'd also end up
with active URBs for an unbound interface.

Fixes: f9c99bb8b3a1 ("USB: usb-serial: replace shutdown with disconnect,
release")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

USB: serial: io_edgeport: fix memory leaks in probe error path

commit c8d62957d450cc1a22ce3242908709fe367ddc8e upstream.

URBs and buffers allocated in attach for Epic devices would never be
deallocated in case of a later probe error (e.g. failure to allocate
minor numbers) as disconnect is then never called.

Fix by moving deallocation to release and making sure that the
URBs are first unlinked.

Fixes: f9c99bb8b3a1 ("USB: usb-serial: replace shutdown with disconnect,
release")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

USB: serial: io_edgeport: fix memory leaks in attach error path

commit c5c0c55598cefc826d6cfb0a417eeaee3631715c upstream.

Private data, URBs and buffers allocated for Epic devices during
attach were never released on errors (e.g. missing endpoints).

Fixes: 6e8cf7751f9f ("USB: add EPIC support to the io_edgeport driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

MIPS: Adjust set_pte() SMP fix to handle R10000_LLSC_WAR

commit 128639395b2ceacc6a56a0141d0261012bfe04d3 upstream.

Update the recent changes to set_pte() that were added in 46011e6ea392
to handle R10000_LLSC_WAR, and format the assembly to match other areas
of the MIPS tree using the same WAR.

This also incorporates a patch recently sent in my Markos Chandras,
"Remove local LL/SC preprocessor variants", so that patch doesn't need
to be applied if this one is accepted.

Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Fixes: 46011e6ea392 ("MIPS: Make set_pte() SMP safe.)
Cc: David Daney <david.daney@cavium.com>
Cc: Linux/MIPS <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/11103/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[bwh: Backported to 3.2:
- Use {LL,SC}_INSN not __{LL,SC}
- Use literal arch=r4000 instead of MIPS_ISA_ARCH_LEVEL since R6 is not
supported]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

MIPS: Avoid using unwind_stack() with usermode

commit 81a76d7119f63c359750e4adeff922a31ad1135f upstream.

When showing backtraces in response to traps, for example crashes and
address errors (usually unaligned accesses) when they are set in debugfs
to be reported, unwind_stack will be used if the PC was in the kernel
text address range. However since EVA it is possible for user and kernel
address ranges to overlap, and even without EVA userland can still
trigger an address error by jumping to a KSeg0 address.

Adjust the check to also ensure that it was running in kernel mode. I
don't believe any harm can come of this problem, since unwind_stack() is
sufficiently defensive, however it is only meant for unwinding kernel
code, so to be correct it should use the raw backtracing instead.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/11701/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
(cherry picked from commit d2941a975ac745c607dfb590e92bb30bc352dad9)
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

MIPS: Don't unwind to user mode with EVA

commit a816b306c62195b7c43c92cb13330821a96bdc27 upstream.

When unwinding through IRQs and exceptions, the unwinding only continues
if the PC is a kernel text address, however since EVA it is possible for
user and kernel address ranges to overlap, potentially allowing
unwinding to continue to user mode if the user PC happens to be in the
kernel text address range.

Adjust the check to also ensure that the register state from before the
exception is actually running in kernel mode, i.e. !user_mode(regs).

I don't believe any harm can come of this problem, since the PC is only
output, the stack pointer is checked to ensure it resides within the
task's stack page before it is dereferenced in search of the return
address, and the return address register is similarly only output (if
the PC is in a leaf function or the beginning of a non-leaf function).

However unwind_stack() is only meant for unwinding kernel code, so to be
correct the unwind should stop there.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/11700/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

MIPS: BMIPS: Fix PRID_IMP_BMIPS5000 masking for BMIPS5200

commit cbbda6e7c9c3e4532bd70a73ff9d5e6655c894dc upstream.

BMIPS5000 have a PrID value of 0x5A00 and BMIPS5200 have a PrID value of
0x5B00, which, masked with 0x5A00, returns 0x5A00. Update all conditionals on
the PrID to cover both variants since we are going to need this to enable
BMIPS5200 SMP. The existing check, masking with 0xFF00 would not cover
BMIPS5200 at all.

Fixes: 68e6a78373a6d ("MIPS: BMIPS: Add PRId for BMIPS5200 (Whirlwind)")
Fixes: 6465460c92a85 ("MIPS: BMIPS: change compile time checks to runtime checks")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: john@phrozen.org
Cc: cernekee@gmail.com
Cc: jogo@openwrt.org
Cc: jaedon.shin@gmail.com
Cc: jfraser@broadcom.com
Cc: pgynther@google.com
Cc: dragan.stancevic@gmail.com
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12279/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

MIPS: Fix siginfo.h to use strict posix types

commit 5daebc477da4dfeb31ae193d83084def58fd2697 upstream.

Commit 85efde6f4e0d ("make exported headers use strict posix types")
changed the asm-generic siginfo.h to use the __kernel_* types, and
commit 3a471cbc081b ("remove __KERNEL_STRICT_NAMES") make the internal
types accessible only to the kernel, but the MIPS implementation hasn't
been updated to match.

Switch to proper types now so that the exported asm/siginfo.h won't
produce quite so many compiler errors when included alone by a user
program.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Christopher Ferris <cferris@google.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/12477/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ARM: dts: exynos: Add interrupt line to MAX8997 PMIC on exynos4210-trats

commit 330d12764e15f6e3e94ff34cda29db96d2589c24 upstream.

MAX8997 PMIC requires interrupt and fails probing without it.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: d105f0b1215d ("ARM: dts: Add basic dts file for Samsung Trats board")
[k.kozlowski: Write commit message, add CC-stable]
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
[bwh: Backported to 3.16: adjust indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

arm64: Ensure pmd_present() returns false after pmd_mknotpresent()

commit 5bb1cc0ff9a6b68871970737e6c4c16919928d8b upstream.

Currently, pmd_present() only checks for a non-zero value, returning
true even after pmd_mknotpresent() (which only clears the type bits).
This patch converts pmd_present() to using pte_present(), similar to the
other pmd_*() checks. As a side effect, it will return true for
PROT_NONE mappings, though they are not yet used by the kernel with
transparent huge pages.

For consistency, also change pmd_mknotpresent() to only clear the
PMD_SECT_VALID bit, even though the PMD_TABLE_BIT is already 0 for block
mappings (no functional change). The unused PMD_SECT_PROT_NONE
definition is removed as transparent huge pages use the pte page prot
values.

Fixes: 9c7e535fcc17 ("arm64: mm: Route pmd thp functions through pte equivalents")
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ext4: fix oops on corrupted filesystem

commit 74177f55b70e2f2be770dd28684dd6d17106a4ba upstream.

When filesystem is corrupted in the right way, it can happen
ext4_mark_iloc_dirty() in ext4_orphan_add() returns error and we
subsequently remove inode from the in-memory orphan list. However this
deletion is done with list_del(&EXT4_I(inode)->i_orphan) and thus we
leave i_orphan list_head with a stale content. Later we can look at this
content causing list corruption, oops, or other issues. The reported
trace looked like:

WARNING: CPU: 0 PID: 46 at lib/list_debug.c:53 __list_del_entry+0x6b/0x100()
list_del corruption, 0000000061c1d6e0->next is LIST_POISON1
0000000000100100)
CPU: 0 PID: 46 Comm: ext4.exe Not tainted 4.1.0-rc4+ #250
Stack:
60462947 62219960 602ede24 62219960
602ede24 603ca293 622198f0 602f02eb
62219950 6002c12c 62219900 601b4d6b
Call Trace:
[<6005769c>] ? vprintk_emit+0x2dc/0x5c0
[<602ede24>] ? printk+0x0/0x94
[<600190bc>] show_stack+0xdc/0x1a0
[<602ede24>] ? printk+0x0/0x94
[<602ede24>] ? printk+0x0/0x94
[<602f02eb>] dump_stack+0x2a/0x2c
[<6002c12c>] warn_slowpath_common+0x9c/0xf0
[<601b4d6b>] ? __list_del_entry+0x6b/0x100
[<6002c254>] warn_slowpath_fmt+0x94/0xa0
[<602f4d09>] ? __mutex_lock_slowpath+0x239/0x3a0
[<6002c1c0>] ? warn_slowpath_fmt+0x0/0xa0
[<60023ebf>] ? set_signals+0x3f/0x50
[<600a205a>] ? kmem_cache_free+0x10a/0x180
[<602f4e88>] ? mutex_lock+0x18/0x30
[<601b4d6b>] __list_del_entry+0x6b/0x100
[<601177ec>] ext4_orphan_del+0x22c/0x2f0
[<6012f27c>] ? __ext4_journal_start_sb+0x2c/0xa0
[<6010b973>] ? ext4_truncate+0x383/0x390
[<6010bc8b>] ext4_write_begin+0x30b/0x4b0
[<6001bb50>] ? copy_from_user+0x0/0xb0
[<601aa840>] ? iov_iter_fault_in_readable+0xa0/0xc0
[<60072c4f>] generic_perform_write+0xaf/0x1e0
[<600c4166>] ? file_update_time+0x46/0x110
[<60072f0f>] __generic_file_write_iter+0x18f/0x1b0
[<6010030f>] ext4_file_write_iter+0x15f/0x470
[<60094e10>] ? unlink_file_vma+0x0/0x70
[<6009b180>] ? unlink_anon_vmas+0x0/0x260
[<6008f169>] ? free_pgtables+0xb9/0x100
[<600a6030>] __vfs_write+0xb0/0x130
[<600a61d5>] vfs_write+0xa5/0x170
[<600a63d6>] SyS_write+0x56/0xe0
[<6029fcb0>] ? __libc_waitpid+0x0/0xa0
[<6001b698>] handle_syscall+0x68/0x90
[<6002633d>] userspace+0x4fd/0x600
[<6002274f>] ? save_registers+0x1f/0x40
[<60028bd7>] ? arch_prctl+0x177/0x1b0
[<60017bd5>] fork_handler+0x85/0x90

Fix the problem by using list_del_init() as we always should with
i_orphan list.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ACPI / sysfs: fix error code in get_status()

commit f18ebc211e259d4f591e39e74b2aa2de226c9a1d upstream.

The problem with ornamental, do-nothing gotos is that they lead to
"forgot to set the error code" bugs.  We should be returning -EINVAL
here but we don't.  It leads to an uninitalized variable in
counter_show():

    drivers/acpi/sysfs.c:603 counter_show()
    error: uninitialized symbol 'status'.

Fixes: 1c8fce27e275 (ACPI: introduce drivers/acpi/sysfs.c)
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

cpufreq: Fix GOV_LIMITS handling for the userspace governor

commit e43e94c1eda76dabd686ddf6f7825f54d747b310 upstream.

Currently, the userspace governor only updates frequency on GOV_LIMITS
if policy->cur falls outside policy->{min/max}. However, it is also
necessary to update current frequency on GOV_LIMITS to match the user
requested value if it can be achieved within the new policy->{max/min}.

This was previously the behaviour in the governor until commit d1922f0
("cpufreq: Simplify userspace governor") which incorrectly assumed that
policy->cur == user requested frequency via scaling_setspeed. This won't
be true if the user requested frequency falls outside policy->{min/max}.
Ex: a temporary thermal cap throttled the user requested frequency.

Fix this by storing the user requested frequency in a seperate variable.
The governor will then try to achieve this request on every GOV_LIMITS
change.

Fixes: d1922f02562f (cpufreq: Simplify userspace governor)
Signed-off-by: Sai Gurrappadi <sgurrappadi@nvidia.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

USB: serial: option: add even more ZTE device ids

commit 74d2a91aec97ab832790c9398d320413ad185321 upstream.

Add even more ZTE device ids.

Signed-off-by: lei liu <liu.lei78@zte.com.cn>
[johan: rebase and replace commit message ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

USB: serial: option: add more ZTE device ids

commit f0d09463c59c2d764a6c6d492cbe6d2c77f27153 upstream.

More ZTE device ids.

Signed-off-by: lei liu <liu.lei78@zte.com.cn>
[properly sort them - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

tty: vt, return error when con_startup fails

commit 6798df4c5fe0a7e6d2065cf79649a794e5ba7114 upstream.

When csw->con_startup() fails in do_register_con_driver, we return no
error (i.e. 0). This was changed back in 2006 by commit 3e795de763.
Before that we used to return -ENODEV.

So fix the return value to be -ENODEV in that case again.

Fixes: 3e795de763 ("VT binding: Add binding/unbinding support for the VT console")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: "Dan Carpenter" <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

mcb: Fixed bar number assignment for the gdd

commit f75564d343010b025301d9548f2304f48eb25f01 upstream.

The bar number is found in reg2 within the gdd. Therefore
we need to change the assigment from reg1 to reg2 which
is the correct location.

Signed-off-by: Andreas Werner <andreas.werner@men.de>
Fixes: '3764e82e5' drivers: Introduce MEN Chameleon Bus
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

USB: serial: option: add support for Cinterion PH8 and AHxx

commit 444f94e9e625f6ec6bbe2cb232a6451c637f35a3 upstream.

Added support for Gemalto's Cinterion PH8 and AHxx products
with 2 RmNet Interfaces and products with 1 RmNet + 1 USB Audio interface.

In addition some minor renaming and formatting.

Signed-off-by: Hans-Christoph Schemmel <hans-christoph.schemmel@gemalto.com>
[johan: sort current entries and trim trailing whitespace ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

driver-core: use 'dev' argument in dev_dbg_ratelimited stub

commit 1f62ff34a90471d1b735bac2c79e894afc7c59bc upstream.

dev_dbg_ratelimited() is a macro that ignores its first argument when DEBUG is
not set, which can lead to unused variable warnings:

ethernet/mellanox/mlxsw/pci.c: In function 'mlxsw_pci_cqe_sdq_handle':
ethernet/mellanox/mlxsw/pci.c:646:18: warning: unused variable 'pdev' [-Wunused-variable]
ethernet/mellanox/mlxsw/pci.c: In function 'mlxsw_pci_cqe_rdq_handle':
ethernet/mellanox/mlxsw/pci.c:671:18: warning: unused variable 'pdev' [-Wunused-variable]

The macro already ensures that all its other arguments are silently
ignored by the compiler without triggering a warning, through the
use of the no_printk() macro, but the dev argument is not passed into
that.

This changes the definition to use the same trick as no_printk() with
an if(0) that leads the compiler to not evaluate the side-effects but
still see that 'dev' might not be unused.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 6f586e663e3b ("driver-core: Shut up dev_dbg_reatelimited() without DEBUG")
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

char: Drop bogus dependency of DEVPORT on !M68K

commit 309124e2648d668a0c23539c5078815660a4a850 upstream.

According to full-history-linux commit d3794f4fa7c3edc3 ("[PATCH] M68k
update (part 25)"), port operations are allowed on m68k if CONFIG_ISA is
defined.

However, commit 153dcc54df826d2f ("[PATCH] mem driver: fix conditional
on isa i/o support") accidentally changed an "||" into an "&&",
disabling it completely on m68k. This logic was retained when
introducing the DEVPORT symbol in commit 4f911d64e04a44c4 ("Make
/dev/port conditional on config symbol").

Drop the bogus dependency on !M68K to fix this.

Fixes: 153dcc54df826d2f ("[PATCH] mem driver: fix conditional on isa i/o support")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Al Stone <ahs3@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Fix OpenSSH pty regression on close

commit 0f40fbbcc34e093255a2b2d70b6b0fb48c3f39aa upstream.

OpenSSH expects the (non-blocking) read() of pty master to return
EAGAIN only if it has received all of the slave-side output after
it has received SIGCHLD. This used to work on pre-3.12 kernels.

This fix effectively forces non-blocking read() and poll() to
block for parallel i/o to complete for all ttys. It also unwinds
these changes:

1) f8747d4a466ab2cafe56112c51b3379f9fdb7a12
   tty: Fix pty master read() after slave closes

2) 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73
   pty, n_tty: Simplify input processing on final close

3) 1a48632ffed61352a7810ce089dc5a8bcd505a60
   pty: Fix input race when closing

Inspired by analysis and patch from Marc Aurele La France <tsi@tuyoix.net>

Reported-by: Volth <openssh@volth.com>
Reported-by: Marc Aurele La France <tsi@tuyoix.net>
BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=52
BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=2492
Signed-off-by: Brian Bloniarz <brian.bloniarz@gmail.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16:
- No need to unwind commits 2 and 3
- Keep using tty_flush_to_ldisc() rather than adding tty_buffer_flush_work()]]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Revert "tty: Fix pty master poll() after slave closes v2"

commit 2ce3c10c0c3e0d418c1a7a4c838319ba42c75388 upstream.

This reverts commit c4dc304677e8d566572c4738d95c48be150c6606.
This fix is superseded by commit 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73,
'pty, n_tty: Simplify input processing on final close'.

The final close now waits for input processing to complete before
destroying the pty, so poll() does not need to special case this
condition.

Cc: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

MIPS: ath79: make bootconsole wait for both THRE and TEMT

commit f5b556c94c8490d42fea79d7b4ae0ecbc291e69d upstream.

This makes the ath79 bootconsole behave the same way as the generic 8250
bootconsole.

Also waiting for TEMT (transmit buffer is empty) instead of just THRE
(transmit buffer is not full) ensures that all characters have been
transmitted before the real serial driver starts reconfiguring the serial
controller (which would sometimes result in garbage being transmitted.)
This change does not cause a visible performance loss.

In addition, this seems to fix a hang observed in certain configurations on
many AR7xxx/AR9xxx SoCs during autoconfig of the real serial driver.

A more complete follow-up patch will disable 8250 autoconfig for ath79
altogether (the serial controller is detected as a 16550A, which is not
fully compatible with the ath79 serial, and the autoconfig may lead to
undefined behavior on ath79.)

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ext4: clean up error handling when orphan list is corrupted

commit 7827a7f6ebfcb7f388dc47fddd48567a314701ba upstream.

Instead of just printing warning messages, if the orphan list is
corrupted, declare the file system is corrupted. If there are any
reserved inodes in the orphaned inode list, declare the file system
corrupted and stop right away to avoid doing more potential damage to
the file system.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16: leave error code as EIO]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ext4: fix hang when processing corrupted orphaned inode list

commit c9eb13a9105e2e418f72e46a2b6da3f49e696902 upstream.

If the orphaned inode list contains inode #5, ext4_iget() returns a
bad inode (since the bootloader inode should never be referenced
directly).  Because of the bad inode, we end up processing the inode
repeatedly and this hangs the machine.

This can be reproduced via:

   mke2fs -t ext4 /tmp/foo.img 100
   debugfs -w -R "ssv last_orphan 5" /tmp/foo.img
   mount -o loop /tmp/foo.img /mnt

(But don't do this if you are using an unpatched kernel if you care
about the system staying functional.  :-)

This bug was found by the port of American Fuzzy Lop into the kernel
to find file system problems[1].  (Since it *only* happens if inode #5
shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
surprising that AFL needed two hours before it found it.)

[1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf

Reported by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

aacraid: Fix for aac_command_thread hang

commit fc4bf75ea300a5e62a2419f89dd0e22189dd7ab7 upstream.

Typically under error conditions, it is possible for aac_command_thread()
to miss the wakeup from kthread_stop() and go back to sleep, causing it
to hang aac_shutdown.

In the observed scenario, the adapter is not functioning correctly and so
aac_fib_send() never completes (or time-outs depending on how it was
called). Shortly after aac_command_thread() starts it performs
aac_fib_send(SendHostTime) which hangs. When aac_probe_one
/aac_get_adapter_info send time outs, kthread_stop is called which breaks
the command thread out of it's hang.

The code will still go back to sleep in schedule_timeout() without
checking kthread_should_stop() so it causes aac_probe_one to hang until
the schedule_timeout() which is 30 minutes.

Fixed by: Adding another kthread_should_stop() before schedule_timeout()
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

aacraid: Relinquish CPU during timeout wait

commit 07beca2be24cc710461c0b131832524c9ee08910 upstream.

aac_fib_send has a special function case for initial commands during
driver initialization using wait < 0(pseudo sync mode). In this case,
the command does not sleep but rather spins checking for timeout.This
loop is calls cpu_relax() in an attempt to allow other processes/threads
to use the CPU, but this function does not relinquish the CPU and so the
command will hog the processor. This was observed in a KDUMP
"crashkernel" and that prevented the "command thread" (which is
responsible for completing the command from being timed out) from
starting because it could not get the CPU.

Fixed by replacing "cpu_relax()" call with "schedule()"
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

arm/arm64: KVM: Enforce Break-Before-Make on Stage-2 page tables

commit d4b9e0790aa764c0b01e18d4e8d33e93ba36d51f upstream.

The ARM architecture mandates that when changing a page table entry
from a valid entry to another valid entry, an invalid entry is first
written, TLB invalidated, and only then the new entry being written.

The current code doesn't respect this, directly writing the new
entry and only then invalidating TLBs. Let's fix it up.

Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl

commit 4c63c2454eff996c5e27991221106eb511f7db38 upstream.

32-bit ioctl uses these rather than the regular FS_IOC_* versions. They can
be handled in btrfs using the same code. Without this, 32-bit {ch,ls}attr
fail.

Signed-off-by: Luke Dashjr <luke-jr+git@utopios.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

crypto: s5p-sss - fix incorrect usage of scatterlists api

commit d1497977fecb9acce05988d6322ad415ef93bb39 upstream.

sg_dma_len() macro can be used only on scattelists which are mapped, so
all calls to it before dma_map_sg() are invalid. Replace them by proper
check for direct sg segment length read.

Fixes: a49e490c7a8a ("crypto: s5p-sss - add S5PV210 advanced crypto engine support")
Fixes: 9e4a1100a445 ("crypto: s5p-sss - Handle unaligned buffers")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.16: unaligned DMA is unsupported so there is a different
set of calls to replace]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

alpha/PCI: Call iomem_is_exclusive() for IORESOURCE_MEM, but not IORESOURCE_IO

commit c20e128030caf0537d5e906753eac1c28fefdb75 upstream.

The alpha pci_mmap_resource() is used for both IORESOURCE_MEM and
IORESOURCE_IO resources, but iomem_is_exclusive() is only applicable for
IORESOURCE_MEM.

Call iomem_is_exclusive() only for IORESOURCE_MEM resources, and do it
earlier to match the generic version of pci_mmap_resource().

Fixes: 10a0ef39fbd1 ("PCI/alpha: pci sysfs resources")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

PCI: Supply CPU physical address (not bus address) to iomem_is_exclusive()

commit ca620723d4ff9ea7ed484eab46264c3af871b9ae upstream.

iomem_is_exclusive() requires a CPU physical address, but on some arches we
supplied a PCI bus address instead.

On most arches, pci_resource_to_user(res) returns "res->start", which is a
CPU physical address. But on microblaze, mips, powerpc, and sparc, it
returns the PCI bus address corresponding to "res->start".

The result is that pci_mmap_resource() may fail when it shouldn't (if the
bus address happens to match an existing resource), or it may succeed when
it should fail (if the resource is exclusive but the bus address doesn't
match it).

Call iomem_is_exclusive() with "res->start", which is always a CPU physical
address, not the result of pci_resource_to_user().

Fixes: e8de1481fd71 ("resource: allow MMIO exclusivity for device drivers")
Suggested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

crypto: s5p-sss - Fix missed interrupts when working with 8 kB blocks

commit 79152e8d085fd64484afd473ef6830b45518acba upstream.

The tcrypt testing module on Exynos5422-based Odroid XU3/4 board failed on
testing 8 kB size blocks:

$ sudo modprobe tcrypt sec=1 mode=500
testing speed of async ecb(aes) (ecb-aes-s5p) encryption
test 0 (128 bit key, 16 byte blocks): 21971 operations in 1 seconds (351536 bytes)
test 1 (128 bit key, 64 byte blocks): 21731 operations in 1 seconds (1390784 bytes)
test 2 (128 bit key, 256 byte blocks): 21932 operations in 1 seconds (5614592 bytes)
test 3 (128 bit key, 1024 byte blocks): 21685 operations in 1 seconds (22205440 bytes)
test 4 (128 bit key, 8192 byte blocks):

This was caused by a race issue of missed BRDMA_DONE ("Block cipher
Receiving DMA") interrupt. Device starts processing the data in DMA mode
immediately after setting length of DMA block: receiving (FCBRDMAL) or
transmitting (FCBTDMAL). The driver sets these lengths from interrupt
handler through s5p_set_dma_indata() function (or xxx_setdata()).

However the interrupt handler was first dealing with receive buffer
(dma-unmap old, dma-map new, set receive block length which starts the
operation), then with transmit buffer and finally was clearing pending
interrupts (FCINTPEND). Because of the time window between setting
receive buffer length and clearing pending interrupts, the operation on
receive buffer could end already and driver would miss new interrupt.

User manual for Exynos5422 confirms in example code that setting DMA
block lengths should be the last operation.

The tcrypt hang could be also observed in following blocked-task dmesg:

INFO: task modprobe:258 blocked for more than 120 seconds.
Not tainted 4.6.0-rc4-next-20160419-00005-g9eac8b7b7753-dirty #42
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D c06b09d8 0 258 256 0x00000000
[<c06b09d8>] (__schedule) from [<c06b0f24>] (schedule+0x40/0xac)
[<c06b0f24>] (schedule) from [<c06b49f8>] (schedule_timeout+0x124/0x178)
[<c06b49f8>] (schedule_timeout) from [<c06b17fc>] (wait_for_common+0xb8/0x144)
[<c06b17fc>] (wait_for_common) from [<bf0013b8>] (test_acipher_speed+0x49c/0x740 [tcrypt])
[<bf0013b8>] (test_acipher_speed [tcrypt]) from [<bf003e8c>] (do_test+0x2240/0x30ec [tcrypt])
[<bf003e8c>] (do_test [tcrypt]) from [<bf008048>] (tcrypt_mod_init+0x48/0xa4 [tcrypt])
[<bf008048>] (tcrypt_mod_init [tcrypt]) from [<c010177c>] (do_one_initcall+0x3c/0x16c)
[<c010177c>] (do_one_initcall) from [<c0191ff0>] (do_init_module+0x5c/0x1ac)
[<c0191ff0>] (do_init_module) from [<c0185610>] (load_module+0x1a30/0x1d08)
[<c0185610>] (load_module) from [<c0185ab0>] (SyS_finit_module+0x8c/0x98)
[<c0185ab0>] (SyS_finit_module) from [<c01078c0>] (ret_fast_syscall+0x0/0x3c)

Fixes: a49e490c7a8a ("crypto: s5p-sss - add S5PV210 advanced crypto engine support")
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ext4: fix data exposure after a crash

commit 06bd3c36a733ac27962fea7d6f47168841376824 upstream.

Huang has reported that in his powerfail testing he is seeing stale
block contents in some of recently allocated blocks although he mounts
ext4 in data=ordered mode. After some investigation I have found out
that indeed when delayed allocation is used, we don't add inode to
transaction's list of inodes needing flushing before commit. Originally
we were doing that but commit f3b59291a69d removed the logic with a
flawed argument that it is not needed.

The problem is that although for delayed allocated blocks we write their
contents immediately after allocating them, there is no guarantee that
the IO scheduler or device doesn't reorder things and thus transaction
allocating blocks and attaching them to inode can reach stable storage
before actual block contents. Actually whenever we attach freshly
allocated blocks to inode using a written extent, we should add inode to
transaction's ordered inode list to make sure we properly wait for block
contents to be written before committing the transaction. So that is
what we do in this patch. This also handles other cases where stale data
exposure was possible - like filling hole via mmap in
data=ordered,nodelalloc mode.

The only exception to the above rule are extending direct IO writes where
blkdev_direct_IO() waits for IO to complete before increasing i_size and
thus stale data exposure is not possible. For now we don't complicate
the code with optimizing this special case since the overhead is pretty
low. In case this is observed to be a performance problem we can always
handle it using a special flag to ext4_map_blocks().

Fixes: f3b59291a69d0b734be1fc8be489fef2dd846d3d
Reported-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16:
- Drop check for EXT4_GET_BLOCKS_ZERO flag
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

EDAC: Increment correct counter in edac_inc_ue_error()

commit 993f88f1cc7f0879047ff353e824e5cc8f10adfc upstream.

Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of
ce_noinfo_count.

Signed-off-by: Emmanouil Maroudas <emmanouil.maroudas@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 4275be635597 ("edac: Change internal representation to work with layers")
Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

PM / Runtime: Fix error path in pm_runtime_force_resume()

commit 0ae3aeefabbeef26294e7a349b51f1c761d46c9f upstream.

As pm_runtime_set_active() may fail because the device's parent isn't
active, we can end up executing the ->runtime_resume() callback for the
device when it isn't allowed.

Fix this by invoking pm_runtime_set_active() before running the callback
and let's also deal with the error code.

Fixes: 37f204164dfb (PM: Add pm_runtime_suspend|resume_force functions)
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

powerpc/book3s64: Fix branching to OOL handlers in relocatable kernel

commit 8ed8ab40047a570fdd8043a40c104a57248dd3fd upstream.

Some of the interrupt vectors on 64-bit POWER server processors are only
32 bytes long (8 instructions), which is not enough for the full
first-level interrupt handler. For these we need to branch to an
out-of-line (OOL) handler. But when we are running a relocatable kernel,
interrupt vectors till __end_interrupts marker are copied down to real
address 0x100. So, branching to labels (ie. OOL handlers) outside this
section must be handled differently (see LOAD_HANDLER()), considering
relocatable kernel, which would need at least 4 instructions.

However, branching from interrupt vector means that we corrupt the
CFAR (come-from address register) on POWER7 and later processors as
mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions)
that contains the part up to the point where the CFAR is saved in the
PACA should be part of the short interrupt vectors before we branch out
to OOL handlers.

But as mentioned already, there are interrupt vectors on 64-bit POWER
server processors that are only 32 bytes long (like vectors 0x4f00,
0x4f20, etc.), which cannot accomodate the above two cases at the same
time owing to space constraint. Currently, in these interrupt vectors,
we simply branch out to OOL handlers, without using LOAD_HANDLER(),
which leaves us vulnerable when running a relocatable kernel (eg. kdump
case). While this has been the case for sometime now and kdump is used
widely, we were fortunate not to see any problems so far, for three
reasons:

  1. In almost all cases, production kernel (relocatable) is used for
     kdump as well, which would mean that crashed kernel's OOL handler
     would be at the same place where we end up branching to, from short
     interrupt vector of kdump kernel.
  2. Also, OOL handler was unlikely the reason for crash in almost all
     the kdump scenarios, which meant we had a sane OOL handler from
     crashed kernel that we branched to.
  3. On most 64-bit POWER server processors, page size is large enough
     that marking interrupt vector code as executable (see commit
     429d2e83) leads to marking OOL handler code from crashed kernel,
     that sits right below interrupt vector code from kdump kernel, as
     executable as well.

Let us fix this by moving the __end_interrupts marker down past OOL
handlers to make sure that we also copy OOL handlers to real address
0x100 when running a relocatable kernel.

This fix has been tested successfully in kdump scenario, on an LPAR with
4K page size by using different default/production kernel and kdump
kernel.

Also tested by manually corrupting the OOL handlers in the first kernel
and then kdump'ing, and then causing the OOL handlers to fire - mpe.

Fixes: c1fb6816fb1b ("powerpc: Add relocation on exception vector handlers")
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Bluetooth: vhci: Fix race at creating hci device

commit c7c999cb18da88a881e10e07f0724ad0bfaff770 upstream.

hci_vhci driver creates a hci device object dynamically upon each
HCI_VENDOR_PKT write. Although it checks the already created object
and returns an error, it's still racy and may build multiple hci_dev
objects concurrently when parallel writes are performed, as the device
tracks only a single hci_dev object.

This patch introduces a mutex to protect against the concurrent device
creations.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

serial: doc: Re-add paragraph documenting uart_console_write()

commit d124fd3bb36ceb40438f10c897ce642386b74b72 upstream.

Commit 834392a7d92677ff ("serial: doc: Un-document non-existing
uart_write_console()") removed a paragraph about a helper function that
seemed to never exist.

Peter Hurley pointed out that the function does exist, but is called
differently. Re-add the paragraph, with the function name corrected.

Fixes: 834392a7d92677ff ("serial: doc: Un-document non-existing uart_write_console()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Revert "scsi: fix soft lockup in scsi_remove_target() on module removal"

commit 305c2e71b3d733ec065cb716c76af7d554bd5571 upstream.

Now that we've done a more comprehensive fix with the intermediate
target state we can remove the previous hack introduced with commit
90a88d6ef88e ("scsi: fix soft lockup in scsi_remove_target() on module
removal").

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

scsi: Add intermediate STARGET_REMOVE state to scsi_target_state

commit f05795d3d771f30a7bdc3a138bf714b06d42aa95 upstream.

Add intermediate STARGET_REMOVE state to scsi_target_state to avoid
running into the BUG_ON() in scsi_target_reap(). The STARGET_REMOVE
state is only valid in the path from scsi_remove_target() to
scsi_target_destroy() indicating this target is going to be removed.

This re-fixes the problem introduced in commits bc3f02a795d3 ("[SCSI]
scsi_remove_target: fix softlockup regression on hot remove") and
40998193560d ("scsi: restart list search after unlock in
scsi_remove_target") in a more comprehensive way.

[mkp: Included James' fix for scsi_target_destroy()]

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

drm/i915: Prevent machine death on Ivybridge context switching

commit e9135c4f08d9acb0f3da3ad2643b669dee3217c2 upstream.

Two concurrent writes into the same register cacheline has the chance of
killing the machine on Ivybridge and other gen7. This includes LRI
emitted from the command parser.  The MI_SET_CONTEXT itself serves as
serialising barrier and prevents the pair of register writes in the first
packet from triggering the fault.  However, if a second switch-context
immediately occurs then we may have two adjacent blocks of LRI to the
same registers which may then trigger the hang. To counteract this we
need to insert a delay after the second register write using SRM.

This is easiest to reproduce with something like
igt/gem_ctx_switch/interruptible that triggers back-to-back context
switches (with no operations in between them in the command stream,
which requires the execbuf operation to be interrupted after the
MI_SET_CONTEXT) but can be observed sporadically elsewhere when running
interruptible igt. No reports from the wild though, so it must be of low
enough frequency that no one has correlated the random machine freezes
with i915.ko

The issue was introduced with
commit 2c550183476dfa25641309ae9a28d30feed14379 [v3.19]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 16 10:02:27 2014 +0000

    drm/i915: Disable PSMI sleep messages on all rings around context switches

Testcase: igt/gem_ctx_switch/render-interruptible #ivb
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-11-git-send-email-chris@chris-wilson.co.uk
[bwh: Backported to 3.16:
- Pass ring, not engine, to intel_ring_emit()
- Register type is u32 not i915_reg_t
- MI_STORE_REGISTER_MEM is a function-macro]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ipv6, token: allow for clearing the current device token

commit 47e27d5e92c46a3a62d4dfd8895b1ddb8613f531 upstream.

The original tokenized iid support implemented via f53adae4eae5 ("net: ipv6:
add tokenized interface identifier support") didn't allow for clearing a
device token as it was intended that this addressing mode was the only one
active for globally scoped IPv6 addresses. Later we relaxed that restriction
via 617fe29d45bd ("net: ipv6: only invalidate previously tokenized addresses"),
and we should also allow for clearing tokens as there's no good reason why
it shouldn't be allowed.

Fixes: 617fe29d45bd ("net: ipv6: only invalidate previously tokenized addresses")
Reported-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

cx23885: uninitialized variable in cx23885_av_work_handler()

commit 60587bd0680507f48ae3a7360983228fd207de8a upstream.

The "handled" variable could be uninitialized if the
interrupt_service_routine() call back hasn't been implimented or if it
has been implemented but doesn't initialize "handled" to zero at the
start. For example, adv76xx_isr() only sets "handled" to true.

Fixes: 44b153ca639f ('[media] m5mols: Add ISO sensitivity controls')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

mfd: lp8788-irq: Uninitialized variable in irq handler

commit 22aab38e7b59fd79ce1045006be69a9abab58e5a upstream.

Instead to being true/false, the "handled" is true/uninitialized.
Presumably this doesn't cause that many problems in real life because
normally we handle the IRQ.

Fixes: eea6b7cc53aa ('mfd: Add lp8788 mfd driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Milo Kim <milo.kim@ti.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ARM: OMAP2+: hwmod: fix _idle() hwmod state sanity check sequence

commit c20c8f750d9f8f8617f07ee2352d3ff560e66bc2 upstream.

The omap_hwmod _enable() function can return success without setting
the hwmod state to _HWMOD_STATE_ENABLED for IPs with reset lines when
all of the reset lines are asserted. The omap_hwmod _idle() function
also performs a similar check, but after checking for the hwmod state
first. This triggers the WARN when pm_runtime_get and pm_runtime_put
are invoked on IPs with all reset lines asserted. Reverse the checks
for hwmod state and reset lines status to fix this.

Issue found during a unbind operation on a device with reset lines
still asserted, example backtrace below

------------[ cut here ]------------
WARNING: CPU: 1 PID: 879 at arch/arm/mach-omap2/omap_hwmod.c:2207 _idle+0x1e4/0x240()
omap_hwmod: mmu_dsp: idle state can only be entered from enabled state
Modules linked in:
CPU: 1 PID: 879 Comm: sh Not tainted 4.4.0-00008-ga989d951331a #3
Hardware name: Generic OMAP5 (Flattened Device Tree)
[<c0018e60>] (unwind_backtrace) from [<c0014dc4>] (show_stack+0x10/0x14)
[<c0014dc4>] (show_stack) from [<c037ac28>] (dump_stack+0x90/0xc0)
[<c037ac28>] (dump_stack) from [<c003f420>] (warn_slowpath_common+0x78/0xb4)
[<c003f420>] (warn_slowpath_common) from [<c003f48c>] (warn_slowpath_fmt+0x30/0x40)
[<c003f48c>] (warn_slowpath_fmt) from [<c0028c20>] (_idle+0x1e4/0x240)
[<c0028c20>] (_idle) from [<c0029080>] (omap_hwmod_idle+0x28/0x48)
[<c0029080>] (omap_hwmod_idle) from [<c002a5a4>] (omap_device_idle+0x3c/0x90)
[<c002a5a4>] (omap_device_idle) from [<c0427a90>] (__rpm_callback+0x2c/0x60)
[<c0427a90>] (__rpm_callback) from [<c0427ae4>] (rpm_callback+0x20/0x80)
[<c0427ae4>] (rpm_callback) from [<c0427f84>] (rpm_suspend+0x138/0x74c)
[<c0427f84>] (rpm_suspend) from [<c0428b78>] (__pm_runtime_idle+0x78/0xa8)
[<c0428b78>] (__pm_runtime_idle) from [<c041f514>] (__device_release_driver+0x64/0x100)
[<c041f514>] (__device_release_driver) from [<c041f5d0>] (device_release_driver+0x20/0x2c)
[<c041f5d0>] (device_release_driver) from [<c041d85c>] (unbind_store+0x78/0xf8)
[<c041d85c>] (unbind_store) from [<c0206df8>] (kernfs_fop_write+0xc0/0x1c4)
[<c0206df8>] (kernfs_fop_write) from [<c018a120>] (__vfs_write+0x20/0xdc)
[<c018a120>] (__vfs_write) from [<c018a9cc>] (vfs_write+0x90/0x164)
[<c018a9cc>] (vfs_write) from [<c018b1f0>] (SyS_write+0x44/0x9c)
[<c018b1f0>] (SyS_write) from [<c0010420>] (ret_fast_syscall+0x0/0x1c)
---[ end trace a4182013c75a9f50 ]---

While at this, fix the sequence in _shutdown() as well, though there
is no easy reproducible scenario.

Fixes: 747834ab8347 ("ARM: OMAP2+: hwmod: revise hardreset behavior")
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

cpuidle: Indicate when a device has been unregistered

commit c998c07836f985b24361629dc98506ec7893e7a0 upstream.

Currently the 'registered' member of the cpuidle_device struct is set
to 1 during cpuidle_register_device. In this same function there are
checks to see if the device is already registered to prevent duplicate
calls to register the device, but this value is never set to 0 even on
unregister of the device. Because of this, any attempt to call
cpuidle_register_device after a call to cpuidle_unregister_device will
fail which shouldn't be the case.

To prevent this, set registered to 0 when the device is unregistered.

Fixes: c878a52d3c7c (cpuidle: Check if device is already registered)
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Bluetooth: vhci: purge unhandled skbs

commit 13407376b255325fa817798800117a839f3aa055 upstream.

The write handler allocates skbs and queues them into data->readq.
Read side should read them, if there is any. If there is none, skbs
should be dropped by hdev->flush. But this happens only if the device
is HCI_UP, i.e. hdev->power_on work was triggered already. When it was
not, skbs stay allocated in the queue when /dev/vhci is closed. So
purge the queue in ->release.

Program to reproduce:
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

#include <sys/stat.h>
#include <sys/types.h>
#include <sys/uio.h>

int main()
{
char buf[] = { 0xff, 0 };
struct iovec iov = {
.iov_base = buf,
.iov_len = sizeof(buf),
};
int fd;

while (1) {
fd = open("/dev/vhci", O_RDWR);
if (fd < 0)
err(1, "open");

usleep(50);

if (writev(fd, &iov, 1) < 0)
err(1, "writev");

usleep(50);

close(fd);
}

return 0;
}

Result:
kmemleak: 4609 new suspected memory leaks
unreferenced object 0xffff88059f4d5440 (size 232):
  comm "vhci", pid 1084, jiffies 4294912542 (age 37569.296s)
  hex dump (first 32 bytes):
    20 f0 23 87 05 88 ff ff 20 f0 23 87 05 88 ff ff   .#..... .#.....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
...
    [<ffffffff81ece010>] __alloc_skb+0x0/0x5a0
    [<ffffffffa021886c>] vhci_create_device+0x5c/0x580 [hci_vhci]
    [<ffffffffa0219436>] vhci_write+0x306/0x4c8 [hci_vhci]

Fixes: 23424c0d31 (Bluetooth: Add support creating virtual AMP controllers)
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Bluetooth: vhci: fix open_timeout vs. hdev race

commit 373a32c848ae3a1c03618517cce85f9211a6facf upstream.

Both vhci_get_user and vhci_release race with open_timeout work. They
both contain cancel_delayed_work_sync, but do not test whether the
work actually created hdev or not. Since the work can be in progress
and _sync will wait for finishing it, we can have data->hdev allocated
when cancel_delayed_work_sync returns. But the call sites do 'if
(data->hdev)' *before* cancel_delayed_work_sync.

As a result:
* vhci_get_user allocates a second hdev and puts it into
  data->hdev. The former is leaked.
* vhci_release does not release data->hdev properly as it thinks there
  is none.

Fix both cases by moving the actual test *after* the call to
cancel_delayed_work_sync.

This can be hit by this program:
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

#include <sys/stat.h>
#include <sys/types.h>

int main(int argc, char **argv)
{
int fd;

srand(time(NULL));

while (1) {
const int delta = (rand() % 200 - 100) * 100;

fd = open("/dev/vhci", O_RDWR);
if (fd < 0)
err(1, "open");

usleep(1000000 + delta);

close(fd);
}

return 0;
}

And the result is:
BUG: KASAN: use-after-free in skb_queue_tail+0x13e/0x150 at addr ffff88006b0c1228
Read of size 8 by task kworker/u13:1/32068
=============================================================================
BUG kmalloc-192 (Tainted: G            E     ): kasan: bad access detected
-----------------------------------------------------------------------------

Disabling lock debugging due to kernel taint
INFO: Allocated in vhci_open+0x50/0x330 [hci_vhci] age=260 cpu=3 pid=32040
...
kmem_cache_alloc_trace+0x150/0x190
vhci_open+0x50/0x330 [hci_vhci]
misc_open+0x35b/0x4e0
chrdev_open+0x23b/0x510
...
INFO: Freed in vhci_release+0xa4/0xd0 [hci_vhci] age=9 cpu=2 pid=32040
...
__slab_free+0x204/0x310
vhci_release+0xa4/0xd0 [hci_vhci]
...
INFO: Slab 0xffffea0001ac3000 objects=16 used=13 fp=0xffff88006b0c1e00 flags=0x5fffff80004080
INFO: Object 0xffff88006b0c1200 @offset=4608 fp=0xffff88006b0c0600
Bytes b4 ffff88006b0c11f0: 09 df 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff88006b0c1200: 00 06 0c 6b 00 88 ff ff 00 00 00 00 00 00 00 00  ...k............
Object ffff88006b0c1210: 10 12 0c 6b 00 88 ff ff 10 12 0c 6b 00 88 ff ff  ...k.......k....
Object ffff88006b0c1220: c0 46 c2 6b 00 88 ff ff c0 46 c2 6b 00 88 ff ff  .F.k.....F.k....
Object ffff88006b0c1230: 01 00 00 00 01 00 00 00 e0 ff ff ff 0f 00 00 00  ................
Object ffff88006b0c1240: 40 12 0c 6b 00 88 ff ff 40 12 0c 6b 00 88 ff ff  @..k....@..k....
Object ffff88006b0c1250: 50 0d 6e a0 ff ff ff ff 00 02 00 00 00 00 ad de  P.n.............
Object ffff88006b0c1260: 00 00 00 00 00 00 00 00 ab 62 02 00 01 00 00 00  .........b......
Object ffff88006b0c1270: 90 b9 19 81 ff ff ff ff 38 12 0c 6b 00 88 ff ff  ........8..k....
Object ffff88006b0c1280: 03 00 20 00 ff ff ff ff ff ff ff ff 00 00 00 00  .. .............
Object ffff88006b0c1290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff88006b0c12a0: 00 00 00 00 00 00 00 00 00 80 cd 3d 00 88 ff ff  ...........=....
Object ffff88006b0c12b0: 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00  . ..............
Redzone ffff88006b0c12c0: bb bb bb bb bb bb bb bb                          ........
Padding ffff88006b0c13f8: 00 00 00 00 00 00 00 00                          ........
CPU: 3 PID: 32068 Comm: kworker/u13:1 Tainted: G    B       E      4.4.6-0-default #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
Workqueue: hci0 hci_cmd_work [bluetooth]
00000000ffffffff ffffffff81926cfa ffff88006be37c68 ffff88006bc27180
ffff88006b0c1200 ffff88006b0c1234 ffffffff81577993 ffffffff82489320
ffff88006bc24240 0000000000000046 ffff88006a100000 000000026e51eb80
Call Trace:
...
[<ffffffff81ec8ebe>] ? skb_queue_tail+0x13e/0x150
[<ffffffffa06e027c>] ? vhci_send_frame+0xac/0x100 [hci_vhci]
[<ffffffffa0c61268>] ? hci_send_frame+0x188/0x320 [bluetooth]
[<ffffffffa0c61515>] ? hci_cmd_work+0x115/0x310 [bluetooth]
[<ffffffff811a1375>] ? process_one_work+0x815/0x1340
[<ffffffff811a1f85>] ? worker_thread+0xe5/0x11f0
[<ffffffff811a1ea0>] ? process_one_work+0x1340/0x1340
[<ffffffff811b3c68>] ? kthread+0x1c8/0x230
...
Memory state around the buggy address:
ffff88006b0c1100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88006b0c1180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88006b0c1200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                  ^
ffff88006b0c1280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff88006b0c1300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Fixes: 23424c0d31 (Bluetooth: Add support creating virtual AMP controllers)
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

drm/gma500: Fix possible out of bounds read

commit 7ccca1d5bf69fdd1d3c5fcf84faf1659a6e0ad11 upstream.

Fix possible out of bounds read, by adding missing comma.
The code may read pass the end of the dsi_errors array
when the most significant bit (bit #31) in the intr_stat register
is set.
This bug has been detected using CppCheck (static analysis tool).

Signed-off-by: Itai Handler <itai_handler@hotmail.com>
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

xfs: disallow rw remount on fs with unknown ro-compat features

commit d0a58e833931234c44e515b5b8bede32bd4e6eed upstream.

Today, a kernel which refuses to mount a filesystem read-write
due to unknown ro-compat features can still transition to read-write
via the remount path.  The old kernel is most likely none the wiser,
because it's unaware of the new feature, and isn't using it.  However,
writing to the filesystem may well corrupt metadata related to that
new feature, and moving to a newer kernel which understand the feature
will have problems.

Right now the only ro-compat feature we have is the free inode btree,
which showed up in v3.16.  It would be good to push this back to
all the active stable kernels, I think, so that if anyone is using
newer mkfs (which enables the finobt feature) with older kernel
releases, they'll be protected.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

iommu/vt-d: Improve fault handler error messages

commit a0fe14d7dcf5db2f101b4fe8689ecabb255ab7d3 upstream.

Remove new line in error logs, avoid duplicate and explicit pr_fmt.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Fixes: 0ac2491f57af ('x86, dmar: move page fault handling code to dmar.c')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

iommu/vt-d: Ratelimit fault handler

commit c43fce4eebae257ca413733690e2076757282093 upstream.

Fault rates can easily overwhelm the console and make the system
unresponsive. Ratelimit to allow an opportunity for maintenance.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Fixes: 0ac2491f57af ('x86, dmar: move page fault handling code to dmar.c')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ath5k: Change led pin configuration for compaq c700 laptop

commit 7b9bc799a445aea95f64f15e0083cb19b5789abe upstream.

BugLink: http://bugs.launchpad.net/bugs/972604
Commit 09c9bae26b0d3c9472cb6ae45010460a2cee8b8d ("ath5k: add led pin
configuration for compaq c700 laptop") added a pin configuration for the Compaq
c700 laptop. However, the polarity of the led pin is reversed. It should be
red for wifi off and blue for wifi on, but it is the opposite. This bug was
reported in the following bug report:
http://pad.lv/972604

Fixes: 09c9bae26b0d3c9472cb6ae45010460a2cee8b8d ("ath5k: add led pin configuration for compaq c700 laptop")
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

serial: doc: Un-document non-existing uart_write_console()

commit 834392a7d92677ff2bdc1c709b1171ee585b55c9 upstream.

uart_write_console() never existed, not even when the "new
uart_write_console function" was documented.

Fixes: 67ab7f596b6adbae ("[SERIAL] Update serial driver documentation")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ARM: dts: kirkwood: add kirkwood-nsa320.dtb to Makefile

commit 9ec423ed62b8278412400fae6c064edb6ce1bb51 upstream.

Commit be3d7d023b87 ("ARM: kirkwood: Add DTS file for NSA320")
created the new file kirkwood-nsa320.dts but did not
add it to the Makefile.

Fixes: be3d7d023b87 ("ARM: kirkwood: Add DTS file for NSA320")
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ARM: dts: kirkwood: add kirkwood-ds112.dtb to Makefile

commit fc5c796e12511a7c027b5a4438719dde2f796208 upstream.

Commit 2d0a7addbd10 ("ARM: Kirkwood: Add support for many Synology
NAS devices") created the new file kirkwood-ds112.dts but did not
add it to the Makefile.

Fixes: 2d0a7addbd10 ("ARM: Kirkwood: Add support for many Synology NAS devices")
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

regmap: cache: Fix typo in cache_bypass parameter description

commit 267c85860308d36bc163c5573308cd024f659d7c upstream.

Setting the flag 'cache_bypass' will bypass the cache not the hardware.
Fix this comment here.

Fixes: 0eef6b0415f5 ("regmap: Fix doc comment")
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Linux 3.16.36

sched, dl: Convert switched_{from, to}_dl() / prio_changed_dl() to balance callbacks

commit 9916e214998a4a363b152b637245e5c958067350 upstream.

Remove the direct {push,pull} balancing operations from
switched_{from,to}_dl() / prio_changed_dl() and use the balance
callback queue.

Again, err on the side of too many reschedules; since too few is a
hard bug while too many is just annoying.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: oleg@redhat.com
Cc: wanpeng.li@linux.intel.com
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124742.968262663@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[Conflicts: kernel/sched/deadline.c]
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[bwh: Backported to 3.16:
- The check_resched / !CONFIG_SMP case in switched_to_dl() is different
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

sched,dl: Remove return value from pull_dl_task()

commit 0ea60c2054fc3b0c3eb68ac4f6884f3ee78d9925 upstream.

In order to be able to use pull_dl_task() from a callback, we need to
do away with the return value.

Since the return value indicates if we should reschedule, do this
inside the function. Since not all callers currently do this, this can
increase the number of reschedules due rt balancing.

Too many reschedules is not a correctness issues, too few are.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: oleg@redhat.com
Cc: wanpeng.li@linux.intel.com
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124742.859398977@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[Conflicts: kernel/sched/deadline.c]
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[bwh: Backported to 3.16: use resched_task() instead of resched_curr()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance callbacks

commit 9916e214998a4a363b152b637245e5c958067350 upstream.

Remove the direct {push,pull} balancing operations from
switched_{from,to}_rt() / prio_changed_rt() and use the balance
callback queue.

Again, err on the side of too many reschedules; since too few is a
hard bug while too many is just annoying.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: oleg@redhat.com
Cc: wanpeng.li@linux.intel.com
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124742.766832367@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[Conflicts: kernel/sched/rt.c]
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>