]> git.ipfire.org Git - people/ms/linux.git/log
people/ms/linux.git
10 years agoip_tunnel: fix ipv4 pmtu check to honor inner ip header df
Timo Teräs [Tue, 7 Jul 2015 05:34:13 +0000 (08:34 +0300)] 
ip_tunnel: fix ipv4 pmtu check to honor inner ip header df

[ Upstream commit fc24f2b2094366da8786f59f2606307e934cea17 ]

Frag needed should be sent only if the inner header asked
to not fragment. Currently fragmentation is broken if the
tunnel has df set, but df was not asked in the original
packet. The tunnel's df needs to be still checked to update
internally the pmtu cache.

Commit 23a3647bc4f93bac broke it, and this commit fixes
the ipv4 df check back to the way it was.

Fixes: 23a3647bc4f93bac ("ip_tunnels: Use skb-len to PMTU check.")
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agortnetlink: verify IFLA_VF_INFO attributes before passing them to driver
Daniel Borkmann [Mon, 6 Jul 2015 22:07:52 +0000 (00:07 +0200)] 
rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver

[ Upstream commit 4f7d2cdfdde71ffe962399b7020c674050329423 ]

Jason Gunthorpe reported that since commit c02db8c6290b ("rtnetlink: make
SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes
anymore with respect to their policy, that is, ifla_vfinfo_policy[].

Before, they were part of ifla_policy[], but they have been nested since
placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO,
which is another nested attribute for the actual VF attributes such as
IFLA_VF_MAC, IFLA_VF_VLAN, etc.

Despite the policy being split out from ifla_policy[] in this commit,
it's never applied anywhere. nla_for_each_nested() only does basic nla_ok()
testing for struct nlattr, but it doesn't know about the data context and
their requirements.

Fix, on top of Jason's initial work, does 1) parsing of the attributes
with the right policy, and 2) using the resulting parsed attribute table
from 1) instead of the nla_for_each_nested() loop (just like we used to
do when still part of ifla_policy[]).

Reference: http://thread.gmane.org/gmane.linux.network/368913
Fixes: c02db8c6290b ("rtnetlink: make SR-IOV VF interface symmetric")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Rony Efraim <ronye@mellanox.com>
Cc: Vlad Zolotarov <vladz@cloudius-systems.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agonet: graceful exit from netif_alloc_netdev_queues()
Eric Dumazet [Mon, 6 Jul 2015 15:13:26 +0000 (17:13 +0200)] 
net: graceful exit from netif_alloc_netdev_queues()

[ Upstream commit d339727c2b1a10f25e6636670ab6e1841170e328 ]

User space can crash kernel with

ip link add ifb10 numtxqueues 100000 type ifb

We must replace a BUG_ON() by proper test and return -EINVAL for
crazy values.

Fixes: 60877a32bce00 ("net: allow large number of tx queues")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoipv6: Make MLD packets to only be processed locally
Angga [Fri, 3 Jul 2015 02:40:52 +0000 (14:40 +1200)] 
ipv6: Make MLD packets to only be processed locally

[ Upstream commit 4c938d22c88a9ddccc8c55a85e0430e9c62b1ac5 ]

Before commit daad151263cf ("ipv6: Make ipv6_is_mld() inline and use it
from ip6_mc_input().") MLD packets were only processed locally. After the
change, a copy of MLD packet goes through ip6_mr_input, causing
MRT6MSG_NOCACHE message to be generated to user space.

Make MLD packet only processed locally.

Fixes: daad151263cf ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().")
Signed-off-by: Hermin Anggawijaya <hermin.anggawijaya@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agohfs,hfsplus: cache pages correctly between bnode_create and bnode_free
Hin-Tak Leung [Wed, 9 Sep 2015 22:38:04 +0000 (15:38 -0700)] 
hfs,hfsplus: cache pages correctly between bnode_create and bnode_free

commit 7cb74be6fd827e314f81df3c5889b87e4c87c569 upstream.

Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().

This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state".  Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:

$ df -i / /home /mnt
Filesystem                  Inodes   IUsed      IFree IUse% Mounted on
/dev/mapper/fedora-root    3276800  551665    2725135   17% /
/dev/mapper/fedora-home   52879360  716221   52163139    2% /home
/dev/nbd0p2             4294967295 1387818 4293579477    1% /mnt

After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.

There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years.  [1]

The unpatched code [2] has always been wrong since it entered the kernel
tree.  The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.

The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents.  When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging.  There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create().  In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.

References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")

Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
https://bugzilla.kernel.org/show_bug.cgi?id=42342
https://bugzilla.kernel.org/show_bug.cgi?id=63841
https://bugzilla.kernel.org/show_bug.cgi?id=78761

[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:36 2004 -0800

    [PATCH] HFS rewrite

http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd

commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:48 2004 -0800

    [PATCH] HFS+ support

[3]:
http://sourceforge.net/projects/linux-hfsplus/

http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/

http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5

Date:   Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.

[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d

commit a5e3985fa014029eb6795664c704953720cc7f7d
Author: Roman Zippel <zippel@linux-m68k.org>
Date:   Tue Sep 6 15:18:47 2005 -0700

[PATCH] hfs: remove debug code

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agostmmac: troubleshoot unexpected bits in des0 & des1
Alexey Brodkin [Wed, 24 Jun 2015 08:47:41 +0000 (11:47 +0300)] 
stmmac: troubleshoot unexpected bits in des0 & des1

commit f1590670ce069eefeb93916391a67643e6ad1630 upstream.

Current implementation of descriptor init procedure only takes
care about setting/clearing ownership flag in "des0"/"des1"
fields while it is perfectly possible to get unexpected bits
set because of the following factors:

 [1] On driver probe underlying memory allocated with
     dma_alloc_coherent() might not be zeroed and so
     it will be filled with garbage.

 [2] During driver operation some bits could be set by SD/MMC
     controller (for example error flags etc).

And unexpected and/or randomly set flags in "des0"/"des1"
fields may lead to unpredictable behavior of GMAC DMA block.

This change addresses both items above with:

 [1] Use of dma_zalloc_coherent() instead of simple
     dma_alloc_coherent() to make sure allocated memory is
     zeroed. That shouldn't affect performance because
     this allocation only happens once on driver probe.

 [2] Do explicit zeroing of both "des0" and "des1" fields
     of all buffer descriptors during initialization of
     DMA transfer.

And while at it fixed identation of dma_free_coherent()
counterpart as well.

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: arc-linux-dev@synopsys.com
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoIB/mlx4: Use correct SL on AH query under RoCE
Noa Osherovich [Thu, 30 Jul 2015 14:34:24 +0000 (17:34 +0300)] 
IB/mlx4: Use correct SL on AH query under RoCE

commit 5e99b139f1b68acd65e36515ca347b03856dfb5a upstream.

The mlx4 IB driver implementation for ib_query_ah used a wrong offset
(28 instead of 29) when link type is Ethernet. Fixed to use the correct one.

Fixes: fa417f7b520e ('IB/mlx4: Add support for IBoE')
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoIB/mlx4: Forbid using sysfs to change RoCE pkeys
Jack Morgenstein [Thu, 30 Jul 2015 14:34:23 +0000 (17:34 +0300)] 
IB/mlx4: Forbid using sysfs to change RoCE pkeys

commit 2b135db3e81301d0452e6aa107349abe67b097d6 upstream.

The pkey mapping for RoCE must remain the default mapping:
VFs:
  virtual index 0 = mapped to real index 0 (0xFFFF)
  All others indices: mapped to a real pkey index containing an
                      invalid pkey.
PF:
  virtual index i = real index i.

Don't allow users to change these mappings using files found in
sysfs.

Fixes: c1e7e466120b ('IB/mlx4: Add iov directory in sysfs under the ib device')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoIB/uverbs: Fix race between ib_uverbs_open and remove_one
Yishai Hadas [Thu, 13 Aug 2015 15:32:03 +0000 (18:32 +0300)] 
IB/uverbs: Fix race between ib_uverbs_open and remove_one

commit 35d4a0b63dc0c6d1177d4f532a9deae958f0662c upstream.

Fixes: 2a72f212263701b927559f6850446421d5906c41 ("IB/uverbs: Remove dev_table")
Before this commit there was a device look-up table that was protected
by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When
it was dropped and container_of was used instead, it enabled the race
with remove_one as dev might be freed just after:
dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but
before the kref_get.

In addition, this buggy patch added some dead code as
container_of(x,y,z) can never be NULL and so dev can never be NULL.
As a result the comment above ib_uverbs_open saying "the open method
will either immediately run -ENXIO" is wrong as it can never happen.

The solution follows Jason Gunthorpe suggestion from below URL:
https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html

cdev will hold a kref on the parent (the containing structure,
ib_uverbs_device) and only when that kref is released it is
guaranteed that open will never be called again.

In addition, fixes the active count scheme to use an atomic
not a kref to prevent WARN_ON as pointed by above comment
from Jason.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoIB/uverbs: reject invalid or unknown opcodes
Christoph Hellwig [Wed, 26 Aug 2015 09:00:37 +0000 (11:00 +0200)] 
IB/uverbs: reject invalid or unknown opcodes

commit b632ffa7cee439ba5dce3b3bc4a5cbe2b3e20133 upstream.

We have many WR opcodes that are only supported in kernel space
and/or require optional information to be copied into the WR
structure.  Reject all those not explicitly handled so that we
can't pass invalid information to drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoIB/qib: Change lkey table allocation to support more MRs
Mike Marciniszyn [Tue, 21 Jul 2015 12:36:07 +0000 (08:36 -0400)] 
IB/qib: Change lkey table allocation to support more MRs

commit d6f1c17e162b2a11e708f28fa93f2f79c164b442 upstream.

The lkey table is allocated with with a get_user_pages() with an
order based on a number of index bits from a module parameter.

The underlying kernel code cannot allocate that many contiguous pages.

There is no reason the underlying memory needs to be physically
contiguous.

This patch:
- switches the allocation/deallocation to vmalloc/vfree
- caps the number of bits to 23 to insure at least 1 generation bit
  o this matches the module parameter description

Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agohfs: fix B-tree corruption after insertion at position 0
Hin-Tak Leung [Wed, 9 Sep 2015 22:38:07 +0000 (15:38 -0700)] 
hfs: fix B-tree corruption after insertion at position 0

commit b4cc0efea4f0bfa2477c56af406cfcf3d3e58680 upstream.

Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert().

This is an identical change to the corresponding hfs b-tree code to Sergei
Antonov's "hfsplus: fix B-tree corruption after insertion at position 0",
to keep similar code paths in the hfs and hfsplus drivers in sync, where
appropriate.

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Sergei Antonov <saproj@gmail.com>
Cc: Joe Perches <joe@perches.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoxen/gntdev: convert priv->lock to a mutex
David Vrabel [Fri, 9 Jan 2015 18:06:12 +0000 (18:06 +0000)] 
xen/gntdev: convert priv->lock to a mutex

commit 1401c00e59ea021c575f74612fe2dbba36d6a4ee upstream.

Unmapping may require sleeping and we unmap while holding priv->lock, so
convert it to a mutex.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomd/raid10: always set reshape_safe when initializing reshape_position.
NeilBrown [Mon, 6 Jul 2015 07:37:49 +0000 (17:37 +1000)] 
md/raid10: always set reshape_safe when initializing reshape_position.

commit 299b0685e31c9f3dcc2d58ee3beca761a40b44b3 upstream.

'reshape_position' tracks where in the reshape we have reached.
'reshape_safe' tracks where in the reshape we have safely recorded
in the metadata.

These are compared to determine when to update the metadata.
So it is important that reshape_safe is initialised properly.
Currently it isn't.  When starting a reshape from the beginning
it usually has the correct value by luck.  But when reducing the
number of devices in a RAID10, it has the wrong value and this leads
to the metadata not being updated correctly.
This can lead to corruption if the reshape is not allowed to complete.

This patch is suitable for any -stable kernel which supports RAID10
reshape, which is 3.5 and later.

Fixes: 3ea7daa5d7fd ("md/raid10: add reshape support")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agommc: core: fix race condition in mmc_wait_data_done
Jialing Fu [Fri, 28 Aug 2015 03:13:09 +0000 (11:13 +0800)] 
mmc: core: fix race condition in mmc_wait_data_done

commit 71f8a4b81d040b3d094424197ca2f1bf811b1245 upstream.

The following panic is captured in ker3.14, but the issue still exists
in latest kernel.
---------------------------------------------------------------------
[   20.738217] c0 3136 (Compiler) Unable to handle kernel NULL pointer dereference
at virtual address 00000578
......
[   20.738499] c0 3136 (Compiler) PC is at _raw_spin_lock_irqsave+0x24/0x60
[   20.738527] c0 3136 (Compiler) LR is at _raw_spin_lock_irqsave+0x20/0x60
[   20.740134] c0 3136 (Compiler) Call trace:
[   20.740165] c0 3136 (Compiler) [<ffffffc0008ee900>] _raw_spin_lock_irqsave+0x24/0x60
[   20.740200] c0 3136 (Compiler) [<ffffffc0000dd024>] __wake_up+0x1c/0x54
[   20.740230] c0 3136 (Compiler) [<ffffffc000639414>] mmc_wait_data_done+0x28/0x34
[   20.740262] c0 3136 (Compiler) [<ffffffc0006391a0>] mmc_request_done+0xa4/0x220
[   20.740314] c0 3136 (Compiler) [<ffffffc000656894>] sdhci_tasklet_finish+0xac/0x264
[   20.740352] c0 3136 (Compiler) [<ffffffc0000a2b58>] tasklet_action+0xa0/0x158
[   20.740382] c0 3136 (Compiler) [<ffffffc0000a2078>] __do_softirq+0x10c/0x2e4
[   20.740411] c0 3136 (Compiler) [<ffffffc0000a24bc>] irq_exit+0x8c/0xc0
[   20.740439] c0 3136 (Compiler) [<ffffffc00008489c>] handle_IRQ+0x48/0xac
[   20.740469] c0 3136 (Compiler) [<ffffffc000081428>] gic_handle_irq+0x38/0x7c
----------------------------------------------------------------------
Because in SMP, "mrq" has race condition between below two paths:
path1: CPU0: <tasklet context>
  static void mmc_wait_data_done(struct mmc_request *mrq)
  {
     mrq->host->context_info.is_done_rcv = true;
     //
     // If CPU0 has just finished "is_done_rcv = true" in path1, and at
     // this moment, IRQ or ICache line missing happens in CPU0.
     // What happens in CPU1 (path2)?
     //
     // If the mmcqd thread in CPU1(path2) hasn't entered to sleep mode:
     // path2 would have chance to break from wait_event_interruptible
     // in mmc_wait_for_data_req_done and continue to run for next
     // mmc_request (mmc_blk_rw_rq_prep).
     //
     // Within mmc_blk_rq_prep, mrq is cleared to 0.
     // If below line still gets host from "mrq" as the result of
     // compiler, the panic happens as we traced.
     wake_up_interruptible(&mrq->host->context_info.wait);
  }

path2: CPU1: <The mmcqd thread runs mmc_queue_thread>
  static int mmc_wait_for_data_req_done(...
  {
     ...
     while (1) {
           wait_event_interruptible(context_info->wait,
                   (context_info->is_done_rcv ||
                    context_info->is_new_req));
         static void mmc_blk_rw_rq_prep(...
           {
           ...
           memset(brq, 0, sizeof(struct mmc_blk_request));

This issue happens very coincidentally; however adding mdelay(1) in
mmc_wait_data_done as below could duplicate it easily.

   static void mmc_wait_data_done(struct mmc_request *mrq)
   {
     mrq->host->context_info.is_done_rcv = true;
+    mdelay(1);
     wake_up_interruptible(&mrq->host->context_info.wait);
    }

At runtime, IRQ or ICache line missing may just happen at the same place
of the mdelay(1).

This patch gets the mmc_context_info at the beginning of function, it can
avoid this race condition.

Signed-off-by: Jialing Fu <jlfu@marvell.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Fixes: 2220eedfd7ae ("mmc: fix async request mechanism ....")
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agofs: if a coredump already exists, unlink and recreate with O_EXCL
Jann Horn [Wed, 9 Sep 2015 22:38:28 +0000 (15:38 -0700)] 
fs: if a coredump already exists, unlink and recreate with O_EXCL

commit fbb1816942c04429e85dbf4c1a080accc534299e upstream.

It was possible for an attacking user to trick root (or another user) into
writing his coredumps into an attacker-readable, pre-existing file using
rename() or link(), causing the disclosure of secret data from the victim
process' virtual memory.  Depending on the configuration, it was also
possible to trick root into overwriting system files with coredumps.  Fix
that issue by never writing coredumps into existing files.

Requirements for the attack:
 - The attack only applies if the victim's process has a nonzero
   RLIMIT_CORE and is dumpable.
 - The attacker can trick the victim into coredumping into an
   attacker-writable directory D, either because the core_pattern is
   relative and the victim's cwd is attacker-writable or because an
   absolute core_pattern pointing to a world-writable directory is used.
 - The attacker has one of these:
  A: on a system with protected_hardlinks=0:
     execute access to a folder containing a victim-owned,
     attacker-readable file on the same partition as D, and the
     victim-owned file will be deleted before the main part of the attack
     takes place. (In practice, there are lots of files that fulfill
     this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
     This does not apply to most Linux systems because most distros set
     protected_hardlinks=1.
  B: on a system with protected_hardlinks=1:
     execute access to a folder containing a victim-owned,
     attacker-readable and attacker-writable file on the same partition
     as D, and the victim-owned file will be deleted before the main part
     of the attack takes place.
     (This seems to be uncommon.)
  C: on any system, independent of protected_hardlinks:
     write access to a non-sticky folder containing a victim-owned,
     attacker-readable file on the same partition as D
     (This seems to be uncommon.)

The basic idea is that the attacker moves the victim-owned file to where
he expects the victim process to dump its core.  The victim process dumps
its core into the existing file, and the attacker reads the coredump from
it.

If the attacker can't move the file because he does not have write access
to the containing directory, he can instead link the file to a directory
he controls, then wait for the original link to the file to be deleted
(because the kernel checks that the link count of the corefile is 1).

A less reliable variant that requires D to be non-sticky works with link()
and does not require deletion of the original link: link() the file into
D, but then unlink() it directly before the kernel performs the link count
check.

On systems with protected_hardlinks=0, this variant allows an attacker to
not only gain information from coredumps, but also clobber existing,
victim-writable files with coredumps.  (This could theoretically lead to a
privilege escalation.)

Signed-off-by: Jann Horn <jann@thejh.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agovmscan: fix increasing nr_isolated incurred by putback unevictable pages
Jaewon Kim [Tue, 8 Sep 2015 22:02:21 +0000 (15:02 -0700)] 
vmscan: fix increasing nr_isolated incurred by putback unevictable pages

commit c54839a722a02818677bcabe57e957f0ce4f841d upstream.

reclaim_clean_pages_from_list() assumes that shrink_page_list() returns
number of pages removed from the candidate list.  But shrink_page_list()
puts back mlocked pages without passing it to caller and without
counting as nr_reclaimed.  This increases nr_isolated.

To fix this, this patch changes shrink_page_list() to pass unevictable
pages back to caller.  Caller will take care those pages.

Minchan said:

It fixes two issues.

1. With unevictable page, cma_alloc will be successful.

Exactly speaking, cma_alloc of current kernel will fail due to
unevictable pages.

2. fix leaking of NR_ISOLATED counter of vmstat

With it, too_many_isolated works.  Otherwise, it could make hang until
the process get SIGKILL.

Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoparisc: Filter out spurious interrupts in PA-RISC irq handler
Helge Deller [Thu, 3 Sep 2015 20:45:21 +0000 (22:45 +0200)] 
parisc: Filter out spurious interrupts in PA-RISC irq handler

commit b1b4e435e4ef7de77f07bf2a42c8380b960c2d44 upstream.

When detecting a serial port on newer PA-RISC machines (with iosapic) we have a
long way to go to find the right IRQ line, registering it, then registering the
serial port and the irq handler for the serial port. During this phase spurious
interrupts for the serial port may happen which then crashes the kernel because
the action handler might not have been set up yet.

So, basically it's a race condition between the serial port hardware and the
CPU which sets up the necessary fields in the irq sructs. The main reason for
this race is, that we unmask the serial port irqs too early without having set
up everything properly before (which isn't easily possible because we need the
IRQ number to register the serial ports).

This patch is a work-around for this problem. It adds checks to the CPU irq
handler to verify if the IRQ action field has been initialized already. If not,
we just skip this interrupt (which isn't critical for a serial port at bootup).
The real fix would probably involve rewriting all PA-RISC specific IRQ code
(for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of
the irq chips and proper irq enabling along this line.

This bug has been in the PA-RISC port since the beginning, but the crashes
happened very rarely with currently used hardware.  But on the latest machine
which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900,
1GHz) and which has the largest possible L1 cache size (64MB each), the kernel
crashed at every boot because of this race. So, without this patch the machine
would currently be unuseable.

For the record, here is the flow logic:
1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq().
2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq.
3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq
4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!)
5. serial_init_chip() then registers the 8250 port.
Problems:
- In step 4 the CPU irq shouldn't have been registered yet, but after step 5
- If serial irq happens between 4 and 5 have finished, the kernel will crash

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoparisc: Use double word condition in 64bit CAS operation
John David Anglin [Tue, 8 Sep 2015 00:13:28 +0000 (20:13 -0400)] 
parisc: Use double word condition in 64bit CAS operation

commit 1b59ddfcf1678de38a1f8ca9fb8ea5eebeff1843 upstream.

The attached change fixes the condition used in the "sub" instruction.
A double word comparison is needed.  This fixes the 64-bit LWS CAS
operation on 64-bit kernels.

I can now enable 64-bit atomic support in GCC.

Signed-off-by: John David Anglin <dave.anglin>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoNFS: nfs_set_pgio_error sometimes misses errors
Trond Myklebust [Mon, 17 Aug 2015 17:57:07 +0000 (12:57 -0500)] 
NFS: nfs_set_pgio_error sometimes misses errors

commit e9ae58aeee8842a50f7e199d602a5ccb2e41a95f upstream.

We should ensure that we always set the pgio_header's error field
if a READ or WRITE RPC call returns an error. The current code depends
on 'hdr->good_bytes' always being initialised to a large value, which
is not always done correctly by callers.
When this happens, applications may end up missing important errors.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoNFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client
Kinglong Mee [Sat, 15 Aug 2015 13:52:10 +0000 (21:52 +0800)] 
NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client

commit 18e3b739fdc826481c6a1335ce0c5b19b3d415da upstream.

---Steps to Reproduce--
<nfs-server>
# cat /etc/exports
/nfs/referal  *(rw,insecure,no_subtree_check,no_root_squash,crossmnt)
/nfs/old      *(ro,insecure,subtree_check,root_squash,crossmnt)

<nfs-client>
# mount -t nfs nfs-server:/nfs/ /mnt/
# ll /mnt/*/

<nfs-server>
# cat /etc/exports
/nfs/referal   *(rw,insecure,no_subtree_check,no_root_squash,crossmnt,refer=/nfs/old/@nfs-server)
/nfs/old       *(ro,insecure,subtree_check,root_squash,crossmnt)
# service nfs restart

<nfs-client>
# ll /mnt/*/    --->>>>> oops here

[ 5123.102925] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 5123.103363] IP: [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.103752] PGD 587b9067 PUD 3cbf5067 PMD 0
[ 5123.104131] Oops: 0000 [#1]
[ 5123.104529] Modules linked in: nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon parport_pc parport i2c_piix4 shpchp auth_rpcgss nfs_acl vmw_vmci lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi serio_raw scsi_transport_spi e1000 mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd]
[ 5123.105887] CPU: 0 PID: 15853 Comm: ::1-manager Tainted: G           OE   4.2.0-rc6+ #214
[ 5123.106358] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 5123.106860] task: ffff88007620f300 ti: ffff88005877c000 task.ti: ffff88005877c000
[ 5123.107363] RIP: 0010:[<ffffffffa03ed38b>]  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.107909] RSP: 0018:ffff88005877fdb8  EFLAGS: 00010246
[ 5123.108435] RAX: ffff880053f3bc00 RBX: ffff88006ce6c908 RCX: ffff880053a0d240
[ 5123.108968] RDX: ffffea0000e6d940 RSI: ffff8800399a0000 RDI: ffff88006ce6c908
[ 5123.109503] RBP: ffff88005877fe28 R08: ffffffff81c708a0 R09: 0000000000000000
[ 5123.110045] R10: 00000000000001a2 R11: ffff88003ba7f5c8 R12: ffff880054c55800
[ 5123.110618] R13: 0000000000000000 R14: ffff880053a0d240 R15: ffff880053a0d240
[ 5123.111169] FS:  0000000000000000(0000) GS:ffffffff81c27000(0000) knlGS:0000000000000000
[ 5123.111726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5123.112286] CR2: 0000000000000000 CR3: 0000000054cac000 CR4: 00000000001406f0
[ 5123.112888] Stack:
[ 5123.113458]  ffffea0000e6d940 ffff8800399a0000 00000000000167d0 0000000000000000
[ 5123.114049]  0000000000000000 0000000000000000 0000000000000000 00000000a7ec82c6
[ 5123.114662]  ffff88005877fe18 ffffea0000e6d940 ffff8800399a0000 ffff880054c55800
[ 5123.115264] Call Trace:
[ 5123.115868]  [<ffffffffa03fb44b>] nfs4_try_migration+0xbb/0x220 [nfsv4]
[ 5123.116487]  [<ffffffffa03fcb3b>] nfs4_run_state_manager+0x4ab/0x7b0 [nfsv4]
[ 5123.117104]  [<ffffffffa03fc690>] ? nfs4_do_reclaim+0x510/0x510 [nfsv4]
[ 5123.117813]  [<ffffffff810a4527>] kthread+0xd7/0xf0
[ 5123.118456]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
[ 5123.119108]  [<ffffffff816d9cdf>] ret_from_fork+0x3f/0x70
[ 5123.119723]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
[ 5123.120329] Code: 4c 8b 6a 58 74 17 eb 52 48 8d 55 a8 89 c6 4c 89 e7 e8 4a b5 ff ff 8b 45 b0 85 c0 74 1c 4c 89 f9 48 8b 55 90 48 8b 75 98 48 89 df <41> ff 55 00 3d e8 d8 ff ff 41 89 c6 74 cf 48 8b 4d c8 65 48 33
[ 5123.121643] RIP  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.122308]  RSP <ffff88005877fdb8>
[ 5123.122942] CR2: 0000000000000000

Fixes: ec011fe847 ("NFS: Introduce a vector of migration recovery ops")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoNFSv4: don't set SETATTR for O_RDONLY|O_EXCL
NeilBrown [Thu, 30 Jul 2015 03:00:56 +0000 (13:00 +1000)] 
NFSv4: don't set SETATTR for O_RDONLY|O_EXCL

commit efcbc04e16dfa95fef76309f89710dd1d99a5453 upstream.

It is unusual to combine the open flags O_RDONLY and O_EXCL, but
it appears that libre-office does just that.

[pid  3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0
[pid  3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...>

NFSv4 takes O_EXCL as a sign that a setattr command should be sent,
probably to reset the timestamps.

When it was an O_RDONLY open, the SETATTR command does not
identify any actual attributes to change.
If no delegation was provided to the open, the SETATTR uses the
all-zeros stateid and the request is accepted (at least by the
Linux NFS server - no harm, no foul).

If a read-delegation was provided, this is used in the SETATTR
request, and a Netapp filer will justifiably claim
NFS4ERR_BAD_STATEID, which the Linux client takes as a sign
to retry - indefinitely.

So only treat O_EXCL specially if O_CREAT was also given.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoBtrfs: check if previous transaction aborted to avoid fs corruption
Filipe Manana [Wed, 12 Aug 2015 10:54:35 +0000 (11:54 +0100)] 
Btrfs: check if previous transaction aborted to avoid fs corruption

commit 1f9b8c8fbc9a4d029760b16f477b9d15500e3a34 upstream.

While we are committing a transaction, it's possible the previous one is
still finishing its commit and therefore we wait for it to finish first.
However we were not checking if that previous transaction ended up getting
aborted after we waited for it to commit, so we ended up committing the
current transaction which can lead to fs corruption because the new
superblock can point to trees that have had one or more nodes/leafs that
were never durably persisted.
The following sequence diagram exemplifies how this is possible:

          CPU 0                                                        CPU 1

  transaction N starts

  (...)

  btrfs_commit_transaction(N)

    cur_trans->state = TRANS_STATE_COMMIT_START;
    (...)
    cur_trans->state = TRANS_STATE_COMMIT_DOING;
    (...)

    cur_trans->state = TRANS_STATE_UNBLOCKED;
    root->fs_info->running_transaction = NULL;

                                                              btrfs_start_transaction()
                                                                 --> starts transaction N + 1

    btrfs_write_and_wait_transaction(trans, root);
      --> starts writing all new or COWed ebs created
          at transaction N

                                                              creates some new ebs, COWs some
                                                              existing ebs but doesn't COW or
                                                              deletes eb X

                                                              btrfs_commit_transaction(N + 1)
                                                                (...)
                                                                cur_trans->state = TRANS_STATE_COMMIT_START;
                                                                (...)
                                                                wait_for_commit(root, prev_trans);
                                                                  --> prev_trans == transaction N

    btrfs_write_and_wait_transaction() continues
    writing ebs
       --> fails writing eb X, we abort transaction N
           and set bit BTRFS_FS_STATE_ERROR on
           fs_info->fs_state, so no new transactions
           can start after setting that bit

       cleanup_transaction()
         btrfs_cleanup_one_transaction()
           wakes up task at CPU 1

                                                                continues, doesn't abort because
                                                                cur_trans->aborted (transaction N + 1)
                                                                is zero, and no checks for bit
                                                                BTRFS_FS_STATE_ERROR in fs_info->fs_state
                                                                are made

                                                                btrfs_write_and_wait_transaction(trans, root);
                                                                  --> succeeds, no errors during writeback

                                                                write_ctree_super(trans, root, 0);
                                                                  --> succeeds
                                                                  --> we have now a superblock that points us
                                                                      to some root that uses eb X, which was
                                                                      never written to disk

In this scenario future attempts to read eb X from disk results in an
error message like "parent transid verify failed on X wanted Y found Z".

So fix this by aborting the current transaction if after waiting for the
previous transaction we verify that it was aborted.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agov4l: omap3isp: Fix sub-device power management code
Sakari Ailus [Fri, 12 Jun 2015 23:06:23 +0000 (20:06 -0300)] 
v4l: omap3isp: Fix sub-device power management code

commit 9d39f05490115bf145e5ea03c0b7ec9d3d015b01 upstream.

Commit 813f5c0ac5cc ("media: Change media device link_notify behaviour")
modified the media controller link setup notification API and updated the
OMAP3 ISP driver accordingly. As a side effect it introduced a bug by
turning power on after setting the link instead of before. This results in
sub-devices not being powered down in some cases when they should be. Fix
it.

Fixes: 813f5c0ac5cc [media] media: Change media device link_notify behaviour
Signed-off-by: Sakari Ailus <sakari.ailus@iki.fi>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agorc-core: fix remove uevent generation
David Härdeman [Tue, 19 May 2015 22:03:12 +0000 (19:03 -0300)] 
rc-core: fix remove uevent generation

commit a66b0c41ad277ae62a3ae6ac430a71882f899557 upstream.

The input_dev is already gone when the rc device is being unregistered
so checking for its presence only means that no remove uevent will be
generated.

Signed-off-by: David Härdeman <david@hardeman.nu>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agox86/mm: Initialize pmd_idx in page_table_range_init_count()
Minfei Huang [Sun, 12 Jul 2015 12:18:42 +0000 (20:18 +0800)] 
x86/mm: Initialize pmd_idx in page_table_range_init_count()

commit 9962eea9e55f797f05f20ba6448929cab2a9f018 upstream.

The variable pmd_idx is not initialized for the first iteration of the
for loop.

Assign the proper value which indexes the start address.

Fixes: 719272c45b82 'x86, mm: only call early_ioremap_page_table_range_init() once'
Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Cc: tony.luck@intel.com
Cc: wangnan0@huawei.com
Cc: david.vrabel@citrix.com
Reviewed-by: yinghai@kernel.org
Link: http://lkml.kernel.org/r/1436703522-29552-1-git-send-email-mhuang@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomm: check if section present during memory block registering
Yinghai Lu [Fri, 4 Sep 2015 22:42:39 +0000 (15:42 -0700)] 
mm: check if section present during memory block registering

commit 04697858d89e4bf2650364f8d6956e2554e8ef88 upstream.

Tony Luck found on his setup, if memory block size 512M will cause crash
during booting.

  BUG: unable to handle kernel paging request at ffffea0074000020
  IP: get_nid_for_pfn+0x17/0x40
  PGD 128ffcb067 PUD 128ffc9067 PMD 0
  Oops: 0000 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc8 #1
  ...
  Call Trace:
     ? register_mem_sect_under_node+0x66/0xe0
     register_one_node+0x17b/0x240
     ? pci_iommu_alloc+0x6e/0x6e
     topology_init+0x3c/0x95
     do_one_initcall+0xcd/0x1f0

The system has non continuous RAM address:
 BIOS-e820: [mem 0x0000001300000000-0x0000001cffffffff] usable
 BIOS-e820: [mem 0x0000001d70000000-0x0000001ec7ffefff] usable
 BIOS-e820: [mem 0x0000001f00000000-0x0000002bffffffff] usable
 BIOS-e820: [mem 0x0000002c18000000-0x0000002d6fffefff] usable
 BIOS-e820: [mem 0x0000002e00000000-0x00000039ffffffff] usable

So there are start sections in memory block not present.  For example:

    memory block : [0x2c18000000, 0x2c20000000) 512M

first three sections are not present.

The current register_mem_sect_under_node() assume first section is
present, but memory block section number range [start_section_nr,
end_section_nr] would include not present section.

For arch that support vmemmap, we don't setup memmap for struct page
area within not present sections area.

So skip the pfn range that belong to absent section.

[akpm@linux-foundation.org: simplification]
[rientjes@google.com: more simplification]
Fixes: bdee237c0343 ("x86: mm: Use 2GB memory block size on large memory x86-64 systems")
Fixes: 982792c782ef ("x86, mm: probe memory block size for generic x86 64bit")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Tested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoAdd radeon suspend/resume quirk for HP Compaq dc5750.
Jeffery Miller [Tue, 1 Sep 2015 15:23:02 +0000 (11:23 -0400)] 
Add radeon suspend/resume quirk for HP Compaq dc5750.

commit 09bfda10e6efd7b65bcc29237bee1765ed779657 upstream.

With the radeon driver loaded the HP Compaq dc5750
Small Form Factor machine fails to resume from suspend.
Adding a quirk similar to other devices avoids
the problem and the system resumes properly.

Signed-off-by: Jeffery Miller <jmiller@neverware.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoCIFS: fix type confusion in copy offload ioctl
Jann Horn [Fri, 11 Sep 2015 14:27:27 +0000 (16:27 +0200)] 
CIFS: fix type confusion in copy offload ioctl

commit 4c17a6d56bb0cad3066a714e94f7185a24b40f49 upstream.

This might lead to local privilege escalation (code execution as
kernel) for systems where the following conditions are met:

 - CONFIG_CIFS_SMB2 and CONFIG_CIFS_POSIX are enabled
 - a cifs filesystem is mounted where:
  - the mount option "vers" was used and set to a value >=2.0
  - the attacker has write access to at least one file on the filesystem

To attack this, an attacker would have to guess the target_tcon
pointer (but guessing wrong doesn't cause a crash, it just returns an
error code) and win a narrow race.

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agopowerpc/mm: Recompute hash value after a failed update
Aneesh Kumar K.V [Tue, 15 Sep 2015 07:00:08 +0000 (12:30 +0530)] 
powerpc/mm: Recompute hash value after a failed update

commit 36b35d5d807b7e57aff7d08e63de8b17731ee211 upstream.

If we had secondary hash flag set, we ended up modifying hash value in
the updatepp code path. Hence with a failed updatepp we will be using
a wrong hash value for the following hash insert. Fix this by
recomputing hash before insert.

Without this patch we can end up with using wrong slot number in linux
pte. That can result in us missing an hash pte update or invalidate
which can cause memory corruption or even machine check.

Fixes: 6d492ecc6489 ("powerpc/THP: Add code to handle HPTE faults for hugepages")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agopowerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers
Thomas Huth [Fri, 17 Jul 2015 10:46:58 +0000 (12:46 +0200)] 
powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers

commit 1c2cb594441d02815d304cccec9742ff5c707495 upstream.

The EPOW interrupt handler uses rtas_get_sensor(), which in turn
uses rtas_busy_delay() to wait for RTAS becoming ready in case it
is necessary. But rtas_busy_delay() is annotated with might_sleep()
and thus may not be used by interrupts handlers like the EPOW handler!
This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled:

 BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
 Call Trace:
 [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
 [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
 [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
 [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
 [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
 [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
 [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
 [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
 [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
 [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
 [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
 [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
 [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180

Fix this issue by introducing a new rtas_get_sensor_fast() function
that does not use rtas_busy_delay() - and thus can only be used for
sensors that do not cause a BUSY condition - known as "fast" sensors.

The EPOW sensor is defined to be "fast" in sPAPR - mpe.

Fixes: 587f83e8dd50 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agopowerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash
Michael Ellerman [Fri, 7 Aug 2015 06:19:43 +0000 (16:19 +1000)] 
powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash

commit 74b5037baa2011a2799e2c43adde7d171b072f9e upstream.

The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
PAGE_SIZE.

However when built with a 4K PAGE_SIZE there is an additional config
option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.

This is used in one obscure configuration, to support 64K pages for SPU
local store on the Cell processor when the rest of the kernel is using
4K pages.

In this configuration, pte_pagesize_index() is defined to just pass
through its arguments to get_slice_psize(). However pte_pagesize_index()
is called for both user and kernel addresses, whereas get_slice_psize()
only knows how to handle user addresses.

This has been broken forever, however until recently it happened to
work. That was because in get_slice_psize() the large kernel address
would cause the right shift of the slice mask to return zero.

However in commit 7aa0727f3302 ("powerpc/mm: Increase the slice range to
64TB"), the get_slice_psize() code was changed so that instead of a
right shift we do an array lookup based on the address. When passed a
kernel address this means we index way off the end of the slice array
and return random junk.

That is only fatal if we happen to hit something non-zero, but when we
do return a non-zero value we confuse the MMU code and eventually cause
a check stop.

This fix is ugly, but simple. When we're called for a kernel address we
return 4K, which is always correct in this configuration, otherwise we
use the slice mask.

Fixes: 7aa0727f3302 ("powerpc/mm: Increase the slice range to 64TB")
Reported-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437
Takashi Iwai [Thu, 13 Aug 2015 16:05:06 +0000 (18:05 +0200)] 
ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437

commit a161574e200ae63a5042120e0d8c36830e81bde3 upstream.

It turned out that the machine has a bass speaker, so take a correct
fixup entry.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102501
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoALSA: hda - Enable headphone jack detect on old Fujitsu laptops
Takashi Iwai [Thu, 13 Aug 2015 16:02:39 +0000 (18:02 +0200)] 
ALSA: hda - Enable headphone jack detect on old Fujitsu laptops

commit bb148bdeb0ab16fc0ae8009799471e4d7180073b upstream.

According to the bug report, FSC Amilo laptops with ALC880 can detect
the headphone jack but currently the driver disables it.  It's partly
intentionally, as non-working jack detect was reported in the past.
Let's enable now.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102501
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoInput: evdev - do not report errors form flush()
Takashi Iwai [Fri, 4 Sep 2015 05:20:00 +0000 (22:20 -0700)] 
Input: evdev - do not report errors form flush()

commit eb38f3a4f6e86f8bb10a3217ebd85ecc5d763aae upstream.

We've got bug reports showing the old systemd-logind (at least
system-210) aborting unexpectedly, and this turned out to be because
of an invalid error code from close() call to evdev devices.  close()
is supposed to return only either EINTR or EBADFD, while the device
returned ENODEV.  logind was overreacting to it and decided to kill
itself when an unexpected error code was received.  What a tragedy.

The bad error code comes from flush fops, and actually evdev_flush()
returns ENODEV when device is disconnected or client's access to it is
revoked. But in these cases the fact that flush did not actually happen is
not an error, but rather normal behavior. For non-disconnected devices
result of flush is also not that interesting as there is no potential of
data loss and even if it fails application has no way of handling the
error. Because of that we are better off always returning success from
evdev_flush().

Also returning EINTR from flush()/close() is discouraged (as it is not
clear how application should handle this error), so let's stop taking
evdev->mutex interruptibly.

Bugzilla: http://bugzilla.suse.com/show_bug.cgi?id=939834
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoarm64: KVM: Disable virtual timer even if the guest is not using it
Marc Zyngier [Wed, 16 Sep 2015 15:18:59 +0000 (16:18 +0100)] 
arm64: KVM: Disable virtual timer even if the guest is not using it

commit c4cbba9fa078f55d9f6d081dbb4aec7cf969e7c7 upstream.

When running a guest with the architected timer disabled (with QEMU and
the kernel_irqchip=off option, for example), it is important to make
sure the timer gets turned off. Otherwise, the guest may try to
enable it anyway, leading to a screaming HW interrupt.

The fix is to unconditionally turn off the virtual timer on guest
exit.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoarm64: errata: add module build workaround for erratum #843419
Will Deacon [Tue, 17 Mar 2015 12:15:02 +0000 (12:15 +0000)] 
arm64: errata: add module build workaround for erratum #843419

commit df057cc7b4fa59e9b55f07ffdb6c62bf02e99a00 upstream.

Cortex-A53 processors <= r0p4 are affected by erratum #843419 which can
lead to a memory access using an incorrect address in certain sequences
headed by an ADRP instruction.

There is a linker fix to generate veneers for ADRP instructions, but
this doesn't work for kernel modules which are built as unlinked ELF
objects.

This patch adds a new config option for the erratum which, when enabled,
builds kernel modules with the mcmodel=large flag. This uses absolute
addressing for all kernel symbols, thereby removing the use of ADRP as
a PC-relative form of addressing. The ADRP relocs are removed from the
module loader so that we fail to load any potentially affected modules.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoarm64: head.S: initialise mdcr_el2 in el2_setup
Will Deacon [Wed, 2 Sep 2015 17:49:28 +0000 (18:49 +0100)] 
arm64: head.S: initialise mdcr_el2 in el2_setup

commit d10bcd473301888f957ec4b6b12aa3621be78d59 upstream.

When entering the kernel at EL2, we fail to initialise the MDCR_EL2
register which controls debug access and PMU capabilities at EL1.

This patch ensures that the register is initialised so that all traps
are disabled and all the PMU counters are available to the host. When a
guest is scheduled, KVM takes care to configure trapping appropriately.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoarm64: compat: fix vfp save/restore across signal handlers in big-endian
Will Deacon [Tue, 15 Sep 2015 11:07:06 +0000 (12:07 +0100)] 
arm64: compat: fix vfp save/restore across signal handlers in big-endian

commit bdec97a855ef1e239f130f7a11584721c9a1bf04 upstream.

When saving/restoring the VFP registers from a compat (AArch32)
signal frame, we rely on the compat registers forming a prefix of the
native register file and therefore make use of copy_{to,from}_user to
transfer between the native fpsimd_state and the compat_vfp_sigframe.

Unfortunately, this doesn't work so well in a big-endian environment.
Our fpsimd save/restore code operates directly on 128-bit quantities
(Q registers) whereas the compat_vfp_sigframe represents the registers
as an array of 64-bit (D) registers. The architecture packs the compat D
registers into the Q registers, with the least significant bytes holding
the lower register. Consequently, we need to swap the 64-bit halves when
converting between these two representations on a big-endian machine.

This patch replaces the __copy_{to,from}_user invocations in our
compat VFP signal handling code with explicit __put_user loops that
operate on 64-bit values and swap them accordingly.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoarm64: kconfig: Move LIST_POISON to a safe value
Jeff Vander Stoep [Tue, 18 Aug 2015 19:50:10 +0000 (20:50 +0100)] 
arm64: kconfig: Move LIST_POISON to a safe value

commit bf0c4e04732479f650ff59d1ee82de761c0071f0 upstream.

Move the poison pointer offset to 0xdead000000000000, a
recognized value that is not mappable by user-space exploits.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Thierry Strudel <tstrudel@google.com>
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomac80211: enable assoc check for mesh interfaces
Bob Copeland [Sat, 13 Jun 2015 14:16:31 +0000 (10:16 -0400)] 
mac80211: enable assoc check for mesh interfaces

commit 3633ebebab2bbe88124388b7620442315c968e8f upstream.

We already set a station to be associated when peering completes, both
in user space and in the kernel.  Thus we should always have an
associated sta before sending data frames to that station.

Failure to check assoc state can cause crashes in the lower-level driver
due to transmitting unicast data frames before driver sta structures
(e.g. ampdu state in ath9k) are initialized.  This occurred when
forwarding in the presence of fixed mesh paths: frames were transmitted
to stations with whom we hadn't yet completed peering.

Reported-by: Alexis Green <agreen@cococorp.com>
Tested-by: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agotg3: Fix temperature reporting
Jean Delvare [Tue, 1 Sep 2015 16:07:41 +0000 (18:07 +0200)] 
tg3: Fix temperature reporting

commit d3d11fe08ccc9bff174fc958722b5661f0932486 upstream.

The temperature registers appear to report values in degrees Celsius
while the hwmon API mandates values to be exposed in millidegrees
Celsius. Do the conversion so that the values reported by "sensors"
are correct.

Fixes: aed93e0bf493 ("tg3: Add hwmon support for temperature")
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agortlwifi: rtl8192cu: Add new device ID
Adrien Schildknecht [Wed, 19 Aug 2015 15:33:12 +0000 (17:33 +0200)] 
rtlwifi: rtl8192cu: Add new device ID

commit 1642d09fb9b128e8e538b2a4179962a34f38dff9 upstream.

The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043

Signed-off-by: Adrien Schildknecht <adrien+dev@schischi.me>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agounshare: Unsharing a thread does not require unsharing a vm
Eric W. Biederman [Mon, 10 Aug 2015 22:35:07 +0000 (17:35 -0500)] 
unshare: Unsharing a thread does not require unsharing a vm

commit 12c641ab8270f787dfcce08b5f20ce8b65008096 upstream.

In the logic in the initial commit of unshare made creating a new
thread group for a process, contingent upon creating a new memory
address space for that process.  That is wrong.  Two separate
processes in different thread groups can share a memory address space
and clone allows creation of such proceses.

This is significant because it was observed that mm_users > 1 does not
mean that a process is multi-threaded, as reading /proc/PID/maps
temporarily increments mm_users, which allows other processes to
(accidentally) interfere with unshare() calls.

Correct the check in check_unshare_flags() to test for
!thread_group_empty() for CLONE_THREAD, CLONE_SIGHAND, and CLONE_VM.
For sighand->count > 1 for CLONE_SIGHAND and CLONE_VM.
For !current_is_single_threaded instead of mm_users > 1 for CLONE_VM.

By using the correct checks in unshare this removes the possibility of
an accidental denial of service attack.

Additionally using the correct checks in unshare ensures that only an
explicit unshare(CLONE_VM) can possibly trigger the slow path of
current_is_single_threaded().  As an explict unshare(CLONE_VM) is
pointless it is not expected there are many applications that make
that call.

Fixes: b2e0d98705e60e45bbb3c0032c48824ad7ae0704 userns: Implement unshare of the user namespace
Reported-by: Ricky Zhou <rickyz@chromium.org>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoblk-mq: fix buffer overflow when reading sysfs file of 'pending'
Ming Lei [Sun, 9 Aug 2015 07:41:50 +0000 (03:41 -0400)] 
blk-mq: fix buffer overflow when reading sysfs file of 'pending'

commit 596f5aad2a704b72934e5abec1b1b4114c16f45b upstream.

There may be lots of pending requests so that the buffer of PAGE_SIZE
can't hold them at all.

One typical example is scsi-mq, the queue depth(.can_queue) of
scsi_host and blk-mq is quite big but scsi_device's queue_depth
is a bit small(.cmd_per_lun), then it is quite easy to have lots
of pending requests in hw queue.

This patch fixes the following warning and the related memory
destruction.

[  359.025101] fill_read_buffer: blk_mq_hw_sysfs_show+0x0/0x7d returned bad count^M
[  359.055595] irq event stamp: 15537^M
[  359.055606] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
[  359.055614] Dumping ftrace buffer:^M
[  359.055660]    (ftrace buffer empty)^M
[  359.055672] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
[  359.055678] CPU: 4 PID: 21631 Comm: stress-ng-sysfs Not tainted 4.2.0-rc5-next-20150805 #434^M
[  359.055679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M
[  359.055682] task: ffff8802161cc000 ti: ffff88021b4a8000 task.ti: ffff88021b4a8000^M
[  359.055693] RIP: 0010:[<ffffffff811541c5>]  [<ffffffff811541c5>] __kmalloc+0xe8/0x152^M

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoLinux 3.14.53
Greg Kroah-Hartman [Mon, 21 Sep 2015 17:02:35 +0000 (10:02 -0700)] 
Linux 3.14.53

10 years agotty/vt: don't set font mappings on vc not supporting this
Imre Deak [Thu, 2 Oct 2014 13:34:31 +0000 (16:34 +0300)] 
tty/vt: don't set font mappings on vc not supporting this

commit 9e326f78713a4421fe11afc2ddeac07698fac131 upstream.

We can call this function for a dummy console that doesn't support
setting the font mapping, which will result in a null ptr BUG. So check
for this case and return error for consoles w/o font mapping support.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=59321
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agohpfs: update ctime and mtime on directory modification
Mikulas Patocka [Wed, 2 Sep 2015 20:51:53 +0000 (22:51 +0200)] 
hpfs: update ctime and mtime on directory modification

commit f49a26e7718dd30b49e3541e3e25aecf5e7294e2 upstream.

Update ctime and mtime when a directory is modified. (though OS/2 doesn't
update them anyway)

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrivercore: Fix unregistration path of platform devices
Grant Likely [Sun, 7 Jun 2015 14:20:11 +0000 (15:20 +0100)] 
drivercore: Fix unregistration path of platform devices

commit 7f5dcaf1fdf289767a126a0a5cc3ef39b5254b06 upstream.

The unregister path of platform_device is broken. On registration, it
will register all resources with either a parent already set, or
type==IORESOURCE_{IO,MEM}. However, on unregister it will release
everything with type==IORESOURCE_{IO,MEM}, but ignore the others. There
are also cases where resources don't get registered in the first place,
like with devices created by of_platform_populate()*.

Fix the unregister path to be symmetrical with the register path by
checking the parent pointer instead of the type field to decide which
resources to unregister. This is safe because the upshot of the
registration path algorithm is that registered resources have a parent
pointer, and non-registered resources do not.

* It can be argued that of_platform_populate() should be registering
  it's resources, and they argument has some merit. However, there are
  quite a few platforms that end up broken if we try to do that due to
  overlapping resources in the device tree. Until that is fixed, we need
  to solve the immediate problem.

Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoARM: OMAP2+: DRA7: clockdomain: change l4per2_7xx_clkdm to SW_WKUP
Vignesh R [Wed, 3 Jun 2015 11:51:20 +0000 (17:21 +0530)] 
ARM: OMAP2+: DRA7: clockdomain: change l4per2_7xx_clkdm to SW_WKUP

commit b9e23f321940d2db2c9def8ff723b8464fb86343 upstream.

Legacy IPs like PWMSS, present under l4per2_7xx_clkdm, cannot support
smart-idle when its clock domain is in HW_AUTO on DRA7 SoCs. Hence,
program clock domain to SW_WKUP.

Signed-off-by: Vignesh R <vigneshr@ti.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
Reviewed-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoof/address: Don't loop forever in of_find_matching_node_by_address().
David Daney [Wed, 19 Aug 2015 20:17:47 +0000 (13:17 -0700)] 
of/address: Don't loop forever in of_find_matching_node_by_address().

commit 3a496b00b6f90c41bd21a410871dfc97d4f3c7ab upstream.

If the internal call to of_address_to_resource() fails, we end up
looping forever in of_find_matching_node_by_address().  This can be
caused by a defective device tree, or calling with an incorrect
matches argument.

Fix by calling of_find_matching_node() unconditionally at the end of
the loop.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoauxdisplay: ks0108: fix refcount
Sudip Mukherjee [Mon, 20 Jul 2015 11:57:21 +0000 (17:27 +0530)] 
auxdisplay: ks0108: fix refcount

commit bab383de3b84e584b0f09227151020b2a43dc34c upstream.

parport_find_base() will implicitly do parport_get_port() which
increases the refcount. Then parport_register_device() will again
increment the refcount. But while unloading the module we are only
doing parport_unregister_device() decrementing the refcount only once.
We add an parport_put_port() to neutralize the effect of
parport_get_port().

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoDoc: ABI: testing: configfs-usb-gadget-sourcesink
Peter Chen [Fri, 31 Jul 2015 08:36:29 +0000 (16:36 +0800)] 
Doc: ABI: testing: configfs-usb-gadget-sourcesink

commit 4bc58eb16bb2352854b9c664cc36c1c68d2bfbb7 upstream.

Fix the name of attribute

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoDoc: ABI: testing: configfs-usb-gadget-loopback
Peter Chen [Fri, 31 Jul 2015 08:36:28 +0000 (16:36 +0800)] 
Doc: ABI: testing: configfs-usb-gadget-loopback

commit 8cd50626823c00ca7472b2f61cb8c0eb9798ddc0 upstream.

Fix the name of attribute

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodevres: fix devres_get()
Masahiro Yamada [Wed, 15 Jul 2015 01:29:00 +0000 (10:29 +0900)] 
devres: fix devres_get()

commit 64526370d11ce8868ca495723d595b61e8697fbf upstream.

Currently, devres_get() passes devres_free() the pointer to devres,
but devres_free() should be given with the pointer to resource data.

Fixes: 9ac7849e35f7 ("devres: device resource management")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoxtensa: fix kernel register spilling
Max Filippov [Thu, 16 Jul 2015 07:41:02 +0000 (10:41 +0300)] 
xtensa: fix kernel register spilling

commit 77d6273e79e3a86552fcf10cdd31a69b46ed2ce6 upstream.

call12 can't be safely used as the first call in the inline function,
because the compiler does not extend the stack frame of the bounding
function accordingly, which may result in corruption of local variables.

If a call needs to be done, do call8 first followed by call12.

For pure assembly code in _switch_to increase stack frame size of the
bounding function.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoxtensa: fix threadptr reload on return to userspace
Max Filippov [Sat, 4 Jul 2015 12:27:39 +0000 (15:27 +0300)] 
xtensa: fix threadptr reload on return to userspace

commit 4229fb12a03e5da5882b420b0aa4a02e77447b86 upstream.

Userspace return code may skip restoring THREADPTR register if there are
no registers that need to be zeroed. This leads to spurious failures in
libc NPTL tests.

Always restore THREADPTR on return to userspace.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoKVM: MMU: fix validation of mmio page fault
Xiao Guangrong [Wed, 5 Aug 2015 04:04:19 +0000 (12:04 +0800)] 
KVM: MMU: fix validation of mmio page fault

commit 6f691251c0350ac52a007c54bf3ef62e9d8cdc5e upstream.

We got the bug that qemu complained with "KVM: unknown exit, hardware
reason 31" and KVM shown these info:
[84245.284948] EPT: Misconfiguration.
[84245.285056] EPT: GPA: 0xfeda848
[84245.285154] ept_misconfig_inspect_spte: spte 0x5eaef50107 level 4
[84245.285344] ept_misconfig_inspect_spte: spte 0x5f5fadc107 level 3
[84245.285532] ept_misconfig_inspect_spte: spte 0x5141d18107 level 2
[84245.285723] ept_misconfig_inspect_spte: spte 0x52e40dad77 level 1

This is because we got a mmio #PF and the handler see the mmio spte becomes
normal (points to the ram page)

However, this is valid after introducing fast mmio spte invalidation which
increases the generation-number instead of zapping mmio sptes, a example
is as follows:
1. QEMU drops mmio region by adding a new memslot
2. invalidate all mmio sptes
3.

        VCPU 0                        VCPU 1
    access the invalid mmio spte
                            access the region originally was MMIO before
                            set the spte to the normal ram map

    mmio #PF
    check the spte and see it becomes normal ram mapping !!!

This patch fixes the bug just by dropping the check in mmio handler, it's
good for backport. Full check will be introduced in later patches

Reported-by: Pavel Shirshov <ru.pchel@gmail.com>
Tested-by: Pavel Shirshov <ru.pchel@gmail.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoHID: usbhid: Fix the check for HID_RESET_PENDING in hid_io_error
Don Zickus [Mon, 10 Aug 2015 16:06:53 +0000 (12:06 -0400)] 
HID: usbhid: Fix the check for HID_RESET_PENDING in hid_io_error

commit 3af4e5a95184d6d3c1c6a065f163faa174a96a1d upstream.

It was reported that after 10-20 reboots, a usb keyboard plugged
into a docking station would not work unless it was replugged in.

Using usbmon, it turns out the interrupt URBs were streaming with
callback errors of -71 for some reason.  The hid-core.c::hid_io_error was
supposed to retry and then reset, but the reset wasn't really happening.

The check for HID_NO_BANDWIDTH was inverted.  Fix was simple.

Tested by reporter and locally by me by unplugging a keyboard halfway until I
could recreate a stream of errors but no disconnect.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agocrypto: ghash-clmulni: specify context size for ghash async algorithm
Andrey Ryabinin [Thu, 3 Sep 2015 11:32:01 +0000 (14:32 +0300)] 
crypto: ghash-clmulni: specify context size for ghash async algorithm

commit 71c6da846be478a61556717ef1ee1cea91f5d6a8 upstream.

Currently context size (cra_ctxsize) doesn't specified for
ghash_async_alg. Which means it's zero. Thus crypto_create_tfm()
doesn't allocate needed space for ghash_async_ctx, so any
read/write to ctx (e.g. in ghash_async_init_tfm()) is not valid.

Signed-off-by: Andrey Ryabinin <aryabinin@odin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoserial: 8250: don't bind to SMSC IrCC IR port
Maciej S. Szmigiero [Sun, 2 Aug 2015 21:11:52 +0000 (23:11 +0200)] 
serial: 8250: don't bind to SMSC IrCC IR port

commit ffa34de03bcfbfa88d8352942bc238bb48e94e2d upstream.

SMSC IrCC SIR/FIR port should not be bound to by
(legacy) serial driver so its own driver (smsc-ircc2)
can bind to it.

Signed-off-by: Maciej Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agousb: host: ehci-sys: delete useless bus_to_hcd conversion
Peter Chen [Mon, 17 Aug 2015 02:23:03 +0000 (10:23 +0800)] 
usb: host: ehci-sys: delete useless bus_to_hcd conversion

commit 0521cfd06e1ebcd575e7ae36aab068b38df23850 upstream.

The ehci platform device's drvdata is the pointer of struct usb_hcd
already, so we doesn't need to call bus_to_hcd conversion again.

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agousb: dwc3: ep0: Fix mem corruption on OUT transfers of more than 512 bytes
Kishon Vijay Abraham I [Mon, 27 Jul 2015 06:55:27 +0000 (12:25 +0530)] 
usb: dwc3: ep0: Fix mem corruption on OUT transfers of more than 512 bytes

commit b2fb5b1a0f50d3ebc12342c8d8dead245e9c9d4e upstream.

DWC3 uses bounce buffer to handle non max packet aligned OUT transfers and
the size of bounce buffer is 512 bytes. However if the host initiates OUT
transfers of size more than 512 bytes (and non max packet aligned), the
driver throws a WARN dump but still programs the TRB to receive more than
512 bytes. This will cause bounce buffer to overflow and corrupt the
adjacent memory locations which can be fatal.

Fix it by programming the TRB to receive a maximum of DWC3_EP0_BOUNCE_SIZE
(512) bytes.

Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoUSB: ftdi_sio: Added custom PID for CustomWare products
Matthijs Kooijman [Tue, 18 Aug 2015 08:33:56 +0000 (10:33 +0200)] 
USB: ftdi_sio: Added custom PID for CustomWare products

commit 1fb8dc36384ae1140ee6ccc470de74397606a9d5 upstream.

CustomWare uses the FTDI VID with custom PIDs for their ShipModul MiniPlex
products.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoUSB: symbolserial: Use usb_get_serial_port_data
Philipp Hachtmann [Mon, 17 Aug 2015 15:31:46 +0000 (17:31 +0200)] 
USB: symbolserial: Use usb_get_serial_port_data

commit 951d3793bbfc0a441d791d820183aa3085c83ea9 upstream.

The driver used usb_get_serial_data(port->serial) which compiled but resulted
in a NULL pointer being returned (and subsequently used). I did not go deeper
into this but I guess this is a regression.

Signed-off-by: Philipp Hachtmann <hachti@hachti.de>
Fixes: a85796ee5149 ("USB: symbolserial: move private-data allocation to
port_probe")
Acked-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoPCI: Fix TI816X class code quirk
Bjorn Helgaas [Fri, 19 Jun 2015 20:58:24 +0000 (15:58 -0500)] 
PCI: Fix TI816X class code quirk

commit d1541dc977d376406f4584d8eb055488655c98ec upstream.

In fixup_ti816x_class(), we assigned "class = PCI_CLASS_MULTIMEDIA_VIDEO".
But PCI_CLASS_MULTIMEDIA_VIDEO is only the two-byte base class/sub-class
and needs to be shifted to make space for the low-order interface byte.

Shift PCI_CLASS_MULTIMEDIA_VIDEO to set the correct class code.

Fixes: 63c4408074cb ("PCI: Add quirk for setting valid class for TI816X Endpoint")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Hemant Pedanekar <hemantp@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoclk: versatile: off by one in clk_sp810_timerclken_of_get()
Dan Carpenter [Wed, 29 Jul 2015 10:17:06 +0000 (13:17 +0300)] 
clk: versatile: off by one in clk_sp810_timerclken_of_get()

commit 3294bee87091be5f179474f6c39d1d87769635e2 upstream.

The ">" should be ">=" or we end up reading beyond the end of the array.

Fixes: 6e973d2c4385 ('clk: vexpress: Add separate SP810 driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agostaging: comedi: adl_pci7x3x: fix digital output on PCI-7230
Ian Abbott [Tue, 11 Aug 2015 12:05:10 +0000 (13:05 +0100)] 
staging: comedi: adl_pci7x3x: fix digital output on PCI-7230

commit ad83dbd974feb2e2a8cc071a1d28782bd4d2c70e upstream.

The "adl_pci7x3x" driver replaced the "adl_pci7230" and "adl_pci7432"
drivers in commits 8f567c373c4b ("staging: comedi: new adl_pci7x3x
driver") and 657f77d173d3 ("staging: comedi: remove adl_pci7230 and
adl_pci7432 drivers").  Although the new driver code agrees with the
user manuals for the respective boards, digital outputs stopped working
on the PCI-7230.  This has 16 digital output channels and the previous
adl_pci7230 driver shifted the 16 bit output state left by 16 bits
before writing to the hardware register.  The new adl_pci7x3x driver
doesn't do that.  Fix it in `adl_pci7x3x_do_insn_bits()` by checking
for the special case of the subdevice having only 16 channels and
duplicating the 16 bit output state into both halves of the 32-bit
register.  That should work both for what the board actually does and
for what the user manual says it should do.

Fixes: 8f567c373c4b ("staging: comedi: new adl_pci7x3x driver")
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoiio: adis16480: Fix scale factors
Lars-Peter Clausen [Wed, 5 Aug 2015 13:38:15 +0000 (15:38 +0200)] 
iio: adis16480: Fix scale factors

commit 7abad1063deb0f77d275c61f58863ec319c58c5c upstream.

The different devices support by the adis16480 driver have slightly
different scales for the gyroscope and accelerometer channels.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoiio: Add inverse unit conversion macros
Lars-Peter Clausen [Wed, 5 Aug 2015 13:38:14 +0000 (15:38 +0200)] 
iio: Add inverse unit conversion macros

commit c689a923c867eac40ed3826c1d9328edea8b6bc7 upstream.

Add inverse unit conversion macro to convert from standard IIO units to
units that might be used by some devices.

Those are useful in combination with scale factors that are specified as
IIO_VAL_FRACTIONAL. Typically the denominator for those specifications will
contain the maximum raw value the sensor will generate and the numerator
the value it maps to in a specific unit. Sometimes datasheets specify those
in different units than the standard IIO units (e.g. degree/s instead of
rad/s) and so we need to do a unit conversion.

From a mathematical point of view it does not make a difference whether we
apply the unit conversion to the numerator or the inverse unit conversion
to the denominator since (x / y) / z = x / (y * z). But as the denominator
is typically a larger value and we are rounding both the numerator and
denominator to integer values using the later method gives us a better
precision (E.g. the relative error is smaller if we round 8000.3 to 8000
rather than rounding 8.3 to 8).

This is where in inverse unit conversion macros will be used.

Marked for stable as used by some upcoming fixes.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoiio: industrialio-buffer: Fix iio_buffer_poll return value
Cristina Opriceana [Mon, 3 Aug 2015 10:37:40 +0000 (13:37 +0300)] 
iio: industrialio-buffer: Fix iio_buffer_poll return value

commit 1bdc0293901cbea23c6dc29432e81919d4719844 upstream.

Change return value to 0 if no device is bound since
unsigned int cannot support negative error codes.

Fixes: f18e7a068 ("iio: Return -ENODEV for file operations if the
device has been unregistered")

Signed-off-by: Cristina Opriceana <cristina.opriceana@gmail.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoiio: event: Remove negative error code from iio_event_poll
Cristina Opriceana [Mon, 3 Aug 2015 10:00:47 +0000 (13:00 +0300)] 
iio: event: Remove negative error code from iio_event_poll

commit 41d903c00051d8f31c98a8136edbac67e6f8688f upstream.

Negative return values are not supported by iio_event_poll since
its return type is unsigned int.

Fixes: f18e7a068a0a3 ("iio: Return -ENODEV for file operations if the device has been unregistered")
Signed-off-by: Cristina Opriceana <cristina.opriceana@gmail.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoiio: bmg160: IIO_BUFFER and IIO_TRIGGERED_BUFFER are required
Markus Pargmann [Wed, 29 Jul 2015 13:46:03 +0000 (15:46 +0200)] 
iio: bmg160: IIO_BUFFER and IIO_TRIGGERED_BUFFER are required

commit 06d2f6ca5a38abe92f1f3a132b331eee773868c3 upstream.

This patch adds selects for IIO_BUFFER and IIO_TRIGGERED_BUFFER. Without
IIO_BUFFER, the driver does not compile.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agos390/sclp: fix compile error
Sebastian Ott [Thu, 25 Jun 2015 07:32:22 +0000 (09:32 +0200)] 
s390/sclp: fix compile error

commit a313bdc5310dd807655d3ca3eb2219cd65dfe45a upstream.

Fix this error when compiling with CONFIG_SMP=n and
CONFIG_DYNAMIC_DEBUG=y:

drivers/s390/char/sclp_early.c: In function 'sclp_read_info_early':
drivers/s390/char/sclp_early.c:87:19: error: 'EBUSY' undeclared (first use in this function)
   } while (rc == -EBUSY);
                   ^

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/qxl: validate monitors config modes
Jonathon Jongsma [Thu, 20 Aug 2015 19:04:32 +0000 (14:04 -0500)] 
drm/qxl: validate monitors config modes

commit bd3e1c7c6de9f5f70d97cdb6c817151c0477c5e3 upstream.

Due to some recent changes in
drm_helper_probe_single_connector_modes_merge_bits(), old custom modes
were not being pruned properly. In current kernels,
drm_mode_validate_basic() is called to sanity-check each mode in the
list. If the sanity-check passes, the mode's status gets set to to
MODE_OK. In older kernels this check was not done, so old custom modes
would still have a status of MODE_UNVERIFIED at this point, and would
therefore be pruned later in the function.

As a result of this new behavior, the list of modes for a device always
includes every custom mode ever configured for the device, with the
largest one listed first. Since desktop environments usually choose the
first preferred mode when a hotplug event is emitted, this had the
result of making it very difficult for the user to reduce the size of
the display.

The qxl driver did implement the mode_valid connector function, but it
was empty. In order to restore the old behavior where old custom modes
are pruned, we implement a proper mode_valid function for the qxl
driver. This function now checks each mode against the last configured
custom mode and the list of standard modes. If the mode doesn't match
any of these, its status is set to MODE_BAD so that it will be pruned as
expected.

Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoDRM - radeon: Don't link train DisplayPort on HPD until we get the dpcd
Stephen Chandler Paul [Fri, 21 Aug 2015 18:16:12 +0000 (14:16 -0400)] 
DRM - radeon: Don't link train DisplayPort on HPD until we get the dpcd

commit 924f92bf12bfbef3662619e3ed24a1cea7c1cbcd upstream.

Most of the time this isn't an issue since hotplugging an adaptor will
trigger a crtc mode change which in turn, causes the driver to probe
every DisplayPort for a dpcd. However, in cases where hotplugging
doesn't cause a mode change (specifically when one unplugs a monitor
from a DisplayPort connector, then plugs that same monitor back in
seconds later on the same port without any other monitors connected), we
never probe for the dpcd before starting the initial link training. What
happens from there looks like this:

- GPU has only one monitor connected. It's connected via
  DisplayPort, and does not go through an adaptor of any sort.

- User unplugs DisplayPort connector from GPU.

- Change in HPD is detected by the driver, we probe every
  DisplayPort for a possible connection.

- Probe the port the user originally had the monitor connected
  on for it's dpcd. This fails, and we clear the first (and only
  the first) byte of the dpcd to indicate we no longer have a
  dpcd for this port.

- User plugs the previously disconnected monitor back into the
  same DisplayPort.

- radeon_connector_hotplug() is called before everyone else,
  and tries to handle the link training. Since only the first
  byte of the dpcd is zeroed, the driver is able to complete
  link training but does so against the wrong dpcd, causing it
  to initialize the link with the wrong settings.

- Display stays blank (usually), dpcd is probed after the
  initial link training, and the driver prints no obvious
  messages to the log.

In theory, since only one byte of the dpcd is chopped off (specifically,
the byte that contains the revision information for DisplayPort), it's
not entirely impossible that this bug may not show on certain monitors.
For instance, the only reason this bug was visible on my ASUS PB238
monitor was due to the fact that this monitor using the enhanced framing
symbol sequence, the flag for which is ignored if the radeon driver
thinks that the DisplayPort version is below 1.1.

Signed-off-by: Stephen Chandler Paul <cpaul@redhat.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoLinux 3.14.52
Greg Kroah-Hartman [Sun, 13 Sep 2015 16:11:03 +0000 (09:11 -0700)] 
Linux 3.14.52

10 years agoarm64: KVM: Fix host crash when injecting a fault into a 32bit guest
Marc Zyngier [Thu, 27 Aug 2015 15:10:01 +0000 (16:10 +0100)] 
arm64: KVM: Fix host crash when injecting a fault into a 32bit guest

commit 126c69a0bd0e441bf6766a5d9bf20de011be9f68 upstream.

When injecting a fault into a misbehaving 32bit guest, it seems
rather idiotic to also inject a 64bit fault that is only going
to corrupt the guest state. This leads to a situation where we
perform an illegal exception return at EL2 causing the host
to crash instead of killing the guest.

Just fix the stupid bug that has been there from day 1.

Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoSCSI: Fix NULL pointer dereference in runtime PM
Alan Stern [Mon, 17 Aug 2015 15:02:42 +0000 (11:02 -0400)] 
SCSI: Fix NULL pointer dereference in runtime PM

commit 49718f0fb8c9af192b33d8af3a2826db04025371 upstream.

The routines in scsi_rpm.c assume that if a runtime-PM callback is
invoked for a SCSI device, it can only mean that the device's driver
has asked the block layer to handle the runtime power management (by
calling blk_pm_runtime_init(), which among other things sets q->dev).

However, this assumption turns out to be wrong for things like the ses
driver.  Normally ses devices are not allowed to do runtime PM, but
userspace can override this setting.  If this happens, the kernel gets
a NULL pointer dereference when blk_post_runtime_resume() tries to use
the uninitialized q->dev pointer.

This patch fixes the problem by calling the block layer's runtime-PM
routines only if the device's driver really does have a runtime-PM
callback routine.  Since ses doesn't define any such callbacks, the
crash won't occur.

This fixes Bugzilla #101371.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Stanisław Pitucha <viraptor@gmail.com>
Reported-by: Ilan Cohen <ilanco@gmail.com>
Tested-by: Ilan Cohen <ilanco@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoarm64/mm: Remove hack in mmap randomize layout
Yann Droneaud [Mon, 17 Nov 2014 23:02:19 +0000 (23:02 +0000)] 
arm64/mm: Remove hack in mmap randomize layout

commit d6c763afab142a85e4770b4bc2a5f40f256d5c5d upstream.

Since commit 8a0a9bd4db63 ('random: make get_random_int() more
random'), get_random_int() returns a random value for each call,
so comment and hack introduced in mmap_rnd() as part of commit
1d18c47c735e ('arm64: MMU fault handling and page table management')
are incorrects.

Commit 1d18c47c735e seems to use the same hack introduced by
commit a5adc91a4b44 ('powerpc: Ensure random space between stack
and mmaps'), latter copied in commit 5a0efea09f42 ('sparc64: Sharpen
address space randomization calculations.').

But both architectures were cleaned up as part of commit
fa8cbaaf5a68 ('powerpc+sparc64/mm: Remove hack in mmap randomize
layout') as hack is no more needed since commit 8a0a9bd4db63.

So the present patch removes the comment and the hack around
get_random_int() on AArch64's mmap_rnd().

Cc: David S. Miller <davem@davemloft.net>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agocrypto: caam - fix memory corruption in ahash_final_ctx
Horia Geant? [Tue, 11 Aug 2015 17:19:20 +0000 (20:19 +0300)] 
crypto: caam - fix memory corruption in ahash_final_ctx

commit b310c178e6d897f82abb9da3af1cd7c02b09f592 upstream.

When doing pointer operation for accessing the HW S/G table,
a value representing number of entries (and not number of bytes)
must be used.

Fixes: 045e36780f115 ("crypto: caam - ahash hmac support")
Signed-off-by: Horia Geant? <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoregmap: regcache-rbtree: Clean new present bits on present bitmap resize
Guenter Roeck [Mon, 27 Jul 2015 04:34:50 +0000 (21:34 -0700)] 
regmap: regcache-rbtree: Clean new present bits on present bitmap resize

commit 8ef9724bf9718af81cfc5132253372f79c71b7e2 upstream.

When inserting a new register into a block, the present bit map size is
increased using krealloc. krealloc does not clear the additionally
allocated memory, leaving it filled with random values. Result is that
some registers are considered cached even though this is not the case.

Fix the problem by clearing the additionally allocated memory. Also, if
the bitmap size does not increase, do not reallocate the bitmap at all
to reduce overhead.

Fixes: 3f4ff561bc88 ("regmap: rbtree: Make cache_present bitmap per node")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agolibfc: Fix fc_fcp_cleanup_each_cmd()
Bart Van Assche [Fri, 5 Jun 2015 21:20:51 +0000 (14:20 -0700)] 
libfc: Fix fc_fcp_cleanup_each_cmd()

commit 8f2777f53e3d5ad8ef2a176a4463a5c8e1a16431 upstream.

Since fc_fcp_cleanup_cmd() can sleep this function must not
be called while holding a spinlock. This patch avoids that
fc_fcp_cleanup_each_cmd() triggers the following bug:

BUG: scheduling while atomic: sg_reset/1512/0x00000202
1 lock held by sg_reset/1512:
 #0:  (&(&fsp->scsi_pkt_lock)->rlock){+.-...}, at: [<ffffffffc0225cd5>] fc_fcp_cleanup_each_cmd.isra.21+0xa5/0x150 [libfc]
Preemption disabled at:[<ffffffffc0225cd5>] fc_fcp_cleanup_each_cmd.isra.21+0xa5/0x150 [libfc]
Call Trace:
 [<ffffffff816c612c>] dump_stack+0x4f/0x7b
 [<ffffffff810828bc>] __schedule_bug+0x6c/0xd0
 [<ffffffff816c87aa>] __schedule+0x71a/0xa10
 [<ffffffff816c8ad2>] schedule+0x32/0x80
 [<ffffffffc0217eac>] fc_seq_set_resp+0xac/0x100 [libfc]
 [<ffffffffc0218b11>] fc_exch_done+0x41/0x60 [libfc]
 [<ffffffffc0225cff>] fc_fcp_cleanup_each_cmd.isra.21+0xcf/0x150 [libfc]
 [<ffffffffc0225f43>] fc_eh_device_reset+0x1c3/0x270 [libfc]
 [<ffffffff814a2cc9>] scsi_try_bus_device_reset+0x29/0x60
 [<ffffffff814a3908>] scsi_ioctl_reset+0x258/0x2d0
 [<ffffffff814a2650>] scsi_ioctl+0x150/0x440
 [<ffffffff814b3a9d>] sd_ioctl+0xad/0x120
 [<ffffffff8132f266>] blkdev_ioctl+0x1b6/0x810
 [<ffffffff811da608>] block_ioctl+0x38/0x40
 [<ffffffff811b4e08>] do_vfs_ioctl+0x2f8/0x530
 [<ffffffff811b50c1>] SyS_ioctl+0x81/0xa0
 [<ffffffff816cf8b2>] system_call_fastpath+0x16/0x7a

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agolibfc: Fix fc_exch_recv_req() error path
Bart Van Assche [Fri, 5 Jun 2015 21:20:46 +0000 (14:20 -0700)] 
libfc: Fix fc_exch_recv_req() error path

commit f6979adeaab578f8ca14fdd32b06ddee0d9d3314 upstream.

Due to patch "libfc: Do not invoke the response handler after
fc_exch_done()" (commit ID 7030fd62) the lport_recv() call
in fc_exch_recv_req() is passed a dangling pointer. Avoid this
by moving the fc_frame_free() call from fc_invoke_resp() to its
callers. This patch fixes the following crash:

general protection fault: 0000 [#3] PREEMPT SMP
RIP: fc_lport_recv_req+0x72/0x280 [libfc]
Call Trace:
 fc_exch_recv+0x642/0xde0 [libfc]
 fcoe_percpu_receive_thread+0x46a/0x5ed [fcoe]
 kthread+0x10a/0x120
 ret_from_fork+0x42/0x70

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/vmwgfx: Fix execbuf locking issues
Thomas Hellstrom [Wed, 12 Aug 2015 05:31:17 +0000 (22:31 -0700)] 
drm/vmwgfx: Fix execbuf locking issues

commit 3e04e2fe6d87807d27521ad6ebb9e7919d628f25 upstream.

This addresses two issues that cause problems with viewperf maya-03 in
situation with memory pressure.

The first issue causes attempts to unreserve buffers if batched
reservation fails due to, for example, a signal pending. While previously
the ttm_eu api was resistant against this type of error, it is no longer
and the lockdep code will complain about attempting to unreserve buffers
that are not reserved. The issue is resolved by avoid calling
ttm_eu_backoff_reservation in the buffer reserve error path.

The second issue is that the binding_mutex may be held when user-space
fence objects are created and hence during memory reclaims. This may cause
recursive attempts to grab the binding mutex. The issue is resolved by not
holding the binding mutex across fence creation and submission.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/radeon: add new OLAND pci id
Alex Deucher [Mon, 10 Aug 2015 19:28:49 +0000 (15:28 -0400)] 
drm/radeon: add new OLAND pci id

commit e037239e5e7b61007763984aa35a8329596d8c88 upstream.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoEDAC, ppc4xx: Access mci->csrows array elements properly
Michael Walle [Tue, 21 Jul 2015 09:00:53 +0000 (11:00 +0200)] 
EDAC, ppc4xx: Access mci->csrows array elements properly

commit 5c16179b550b9fd8114637a56b153c9768ea06a5 upstream.

The commit

  de3910eb79ac ("edac: change the mem allocation scheme to
 make Documentation/kobject.txt happy")

changed the memory allocation for the csrows member. But ppc4xx_edac was
forgotten in the patch. Fix it.

Signed-off-by: Michael Walle <michael@walle.cc>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.cc
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agolocalmodconfig: Use Kbuild files too
Richard Weinberger [Sun, 26 Jul 2015 22:06:55 +0000 (00:06 +0200)] 
localmodconfig: Use Kbuild files too

commit c0ddc8c745b7f89c50385fd7aa03c78dc543fa7a upstream.

In kbuild it is allowed to define objects in files named "Makefile"
and "Kbuild".
Currently localmodconfig reads objects only from "Makefile"s and misses
modules like nouveau.

Link: http://lkml.kernel.org/r/1437948415-16290-1-git-send-email-richard@nod.at
Reported-and-tested-by: Leonidas Spyropoulos <artafinde@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodm thin metadata: delete btrees when releasing metadata snapshot
Joe Thornber [Wed, 12 Aug 2015 14:10:21 +0000 (15:10 +0100)] 
dm thin metadata: delete btrees when releasing metadata snapshot

commit 7f518ad0a212e2a6fd68630e176af1de395070a7 upstream.

The device details and mapping trees were just being decremented
before.  Now btree_del() is called to do a deep delete.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoperf: Fix PERF_EVENT_IOC_PERIOD migration race
Peter Zijlstra [Tue, 4 Aug 2015 17:22:49 +0000 (19:22 +0200)] 
perf: Fix PERF_EVENT_IOC_PERIOD migration race

commit c7999c6f3fed9e383d3131474588f282ae6d56b9 upstream.

I ran the perf fuzzer, which triggered some WARN()s which are due to
trying to stop/restart an event on the wrong CPU.

Use the normal IPI pattern to ensure we run the code on the correct CPU.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: bad7192b842c ("perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the period")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoperf: Fix fasync handling on inherited events
Peter Zijlstra [Thu, 11 Jun 2015 08:32:01 +0000 (10:32 +0200)] 
perf: Fix fasync handling on inherited events

commit fed66e2cdd4f127a43fd11b8d92a99bdd429528c upstream.

Vince reported that the fasync signal stuff doesn't work proper for
inherited events. So fix that.

Installing fasync allocates memory and sets filp->f_flags |= FASYNC,
which upon the demise of the file descriptor ensures the allocation is
freed and state is updated.

Now for perf, we can have the events stick around for a while after the
original FD is dead because of references from child events. So we
cannot copy the fasync pointer around. We can however consistently use
the parent's fasync, as that will be updated.

Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho deMelo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoxen-blkfront: don't add indirect pages to list when !feature_persistent
Bob Liu [Wed, 22 Jul 2015 06:40:09 +0000 (14:40 +0800)] 
xen-blkfront: don't add indirect pages to list when !feature_persistent

commit 7b0767502b5db11cb1f0daef2d01f6d71b1192dc upstream.

We should consider info->feature_persistent when adding indirect page to list
info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered.

When we are using persistent grants the indirect_pages list
should always be empty because blkfront has pre-allocated enough
persistent pages to fill all requests on the ring.

Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomm/hwpoison: fix page refcount of unknown non LRU page
Wanpeng Li [Fri, 14 Aug 2015 22:34:56 +0000 (15:34 -0700)] 
mm/hwpoison: fix page refcount of unknown non LRU page

commit 4f32be677b124a49459e2603321c7a5605ceb9f8 upstream.

After trying to drain pages from pagevec/pageset, we try to get reference
count of the page again, however, the reference count of the page is not
reduced if the page is still not on LRU list.

Fix it by adding the put_page() to drop the page reference which is from
__get_any_page().

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoipc/sem.c: update/correct memory barriers
Manfred Spraul [Fri, 14 Aug 2015 22:35:10 +0000 (15:35 -0700)] 
ipc/sem.c: update/correct memory barriers

commit 3ed1f8a99d70ea1cd1508910eb107d0edcae5009 upstream.

sem_lock() did not properly pair memory barriers:

!spin_is_locked() and spin_unlock_wait() are both only control barriers.
The code needs an acquire barrier, otherwise the cpu might perform read
operations before the lock test.

As no primitive exists inside <include/spinlock.h> and since it seems
noone wants another primitive, the code creates a local primitive within
ipc/sem.c.

With regards to -stable:

The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
nop (just a preprocessor redefinition to improve the readability).  The
bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
starting from 3.10).

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
Herton R. Krzesinski [Fri, 14 Aug 2015 22:35:02 +0000 (15:35 -0700)] 
ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits

commit 602b8593d2b4138c10e922eeaafe306f6b51817b upstream.

The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).

For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:

   Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
   000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
   010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
   Prev obj: start=ffff88003b45c180, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
   Next obj: start=ffff88003b45c200, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
   BUG: spinlock wrong CPU on CPU#2, test/18028
   general protection fault: 0000 [#1] SMP
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: spin_dump+0x53/0xc0
   Call Trace:
     spin_bug+0x30/0x40
     do_raw_spin_unlock+0x71/0xa0
     _raw_spin_unlock+0xe/0x10
     freeary+0x82/0x2a0
     ? _raw_spin_lock+0xe/0x10
     semctl_down.clone.0+0xce/0x160
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     SyS_semctl+0x236/0x2c0
     ? syscall_trace_leave+0xde/0x130
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
   RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
    RSP <ffff88003750fd68>
   ---[ end trace 783ebb76612867a0 ]---
   NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: native_read_tsc+0x0/0x20
   Call Trace:
     ? delay_tsc+0x40/0x70
     __delay+0xf/0x20
     do_raw_spin_lock+0x96/0x140
     _raw_spin_lock+0xe/0x10
     sem_lock_and_putref+0x11/0x70
     SYSC_semtimedop+0x7bf/0x960
     ? handle_mm_fault+0xbf6/0x1880
     ? dequeue_task_fair+0x79/0x4a0
     ? __do_page_fault+0x19a/0x430
     ? kfree_debugcheck+0x16/0x40
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     ? do_audit_syscall_entry+0x66/0x70
     ? syscall_trace_enter_phase1+0x139/0x160
     SyS_semtimedop+0xe/0x10
     SyS_semop+0x10/0x20
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
   Kernel panic - not syncing: softlockup: hung tasks

I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).

The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).

After the patch I'm unable to reproduce the problem using the test case
[1].

[1] Test case used below:

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/ipc.h>
    #include <sys/sem.h>
    #include <sys/wait.h>
    #include <stdlib.h>
    #include <time.h>
    #include <unistd.h>
    #include <errno.h>

    #define NSEM 1
    #define NSET 5

    int sid[NSET];

    void thread()
    {
            struct sembuf op;
            int s;
            uid_t pid = getuid();

            s = rand() % NSET;
            op.sem_num = pid % NSEM;
            op.sem_op = 1;
            op.sem_flg = SEM_UNDO;

            semop(sid[s], &op, 1);
            exit(EXIT_SUCCESS);
    }

    void create_set()
    {
            int i, j;
            pid_t p;
            union {
                    int val;
                    struct semid_ds *buf;
                    unsigned short int *array;
                    struct seminfo *__buf;
            } un;

            /* Create and initialize semaphore set */
            for (i = 0; i < NSET; i++) {
                    sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                    if (sid[i] < 0) {
                            perror("semget");
                            exit(EXIT_FAILURE);
                    }
            }
            un.val = 0;
            for (i = 0; i < NSET; i++) {
                    for (j = 0; j < NSEM; j++) {
                            if (semctl(sid[i], j, SETVAL, un) < 0)
                                    perror("semctl");
                    }
            }

            /* Launch threads that operate on semaphore set */
            for (i = 0; i < NSEM * NSET * NSET; i++) {
                    p = fork();
                    if (p < 0)
                            perror("fork");
                    if (p == 0)
                            thread();
            }

            /* Free semaphore set */
            for (i = 0; i < NSET; i++) {
                    if (semctl(sid[i], NSEM, IPC_RMID))
                            perror("IPC_RMID");
            }

            /* Wait for forked processes to exit */
            while (wait(NULL)) {
                    if (errno == ECHILD)
                            break;
            };
    }

    int main(int argc, char **argv)
    {
            pid_t p;

            srand(time(NULL));

            while (1) {
                    p = fork();
                    if (p < 0) {
                            perror("fork");
                            exit(EXIT_FAILURE);
                    }
                    if (p == 0) {
                            create_set();
                            goto end;
                    }

                    /* Wait for forked processes to exit */
                    while (wait(NULL)) {
                            if (errno == ECHILD)
                                    break;
                    };
            }
    end:
            return 0;
    }

[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoLinux 3.14.51
Greg Kroah-Hartman [Mon, 17 Aug 2015 03:53:12 +0000 (20:53 -0700)] 
Linux 3.14.51

10 years agomm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
Michal Hocko [Tue, 4 Aug 2015 21:36:58 +0000 (14:36 -0700)] 
mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations

commit ecf5fc6e9654cd7a268c782a523f072b2f1959f9 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384e9da8 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f44fcb0 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f44fcb0 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomd/bitmap: return an error when bitmap superblock is corrupt.
NeilBrown [Fri, 14 Aug 2015 07:04:21 +0000 (17:04 +1000)] 
md/bitmap: return an error when bitmap superblock is corrupt.

commit b97e92574c0bf335db1cd2ec491d8ff5cd5d0b49 upstream
    Use separate bitmaps for each nodes in the cluster

bitmap_read_sb() validates the bitmap superblock that it reads in.
If it finds an inconsistency like a bad magic number or out-of-range
version number, it prints an error and returns, but it incorrectly
returns zero, so the array is still assembled with the (invalid) bitmap.

This means it could try to use a bitmap with a new version number which
it therefore does not understand.

This bug was introduced in 3.5 and fix as part of a larger patch in 4.1.
So the patch is suitable for any -stable kernel in that range.

Fixes: 27581e5ae01f ("md/bitmap: centralise allocation of bitmap file pages.")
Signed-off-by: NeilBrown <neilb@suse.com>
Reported-by: GuoQing Jiang <gqjiang@suse.com>
10 years agopath_openat(): fix double fput()
Al Viro [Sat, 9 May 2015 02:53:15 +0000 (22:53 -0400)] 
path_openat(): fix double fput()

commit f15133df088ecadd141ea1907f2c96df67c729f0 upstream.

path_openat() jumps to the wrong place after do_tmpfile() - it has
already done path_cleanup() (as part of path_lookupat() called by
do_tmpfile()), so doing that again can lead to double fput().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agokvm: x86: fix kvm_apic_has_events to check for NULL pointer
Paolo Bonzini [Sat, 30 May 2015 12:31:24 +0000 (14:31 +0200)] 
kvm: x86: fix kvm_apic_has_events to check for NULL pointer

commit ce40cd3fc7fa40a6119e5fe6c0f2bc0eb4541009 upstream.

Malicious (or egregiously buggy) userspace can trigger it, but it
should never happen in normal operation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wang Kai <morgan.wang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>