]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
10 years agodm bufio: update last_accessed when relinking a buffer
Joe Thornber [Tue, 30 Sep 2014 08:32:46 +0000 (09:32 +0100)] 
dm bufio: update last_accessed when relinking a buffer

commit eb76faf53b1ff7a77ce3f78cc98ad392ac70c2a0 upstream.

The 'last_accessed' member of the dm_buffer structure was only set when
the the buffer was created.  This led to each buffer being discarded
after dm_bufio_max_age time even if it was used recently.  In practice
this resulted in all thinp metadata being evicted soon after being read
-- this is particularly problematic for metadata intensive workloads
like multithreaded small random IO.

'last_accessed' is now updated each time the buffer is moved to the head
of the LRU list, so the buffer is now properly discarded if it was not
used in dm_bufio_max_age time.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoblk-mq: fix potential hang if rolling wakeup depth is too high
Jens Axboe [Tue, 7 Oct 2014 14:39:20 +0000 (08:39 -0600)] 
blk-mq: fix potential hang if rolling wakeup depth is too high

commit abab13b5c4fd1fec4f9a61622548012d93dc2831 upstream.

We currently divide the queue depth by 4 as our batch wakeup
count, but we split the wakeups over BT_WAIT_QUEUES number of
wait queues. This defaults to 8. If the product of the resulting
batch wake count and BT_WAIT_QUEUES is higher than the device
queue depth, we can get into a situation where a task goes to
sleep waiting for a request, but never gets woken up.

Reported-by: Bart Van Assche <bvanassche@acm.org>
Fixes: 4bb659b156996
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/cirrus: bind also to qemu-xen-traditional
Olaf Hering [Fri, 11 Apr 2014 06:56:23 +0000 (08:56 +0200)] 
drm/cirrus: bind also to qemu-xen-traditional

commit c0c3e735fa7bae29c6623511127fd021b2d6d849 upstream.

qemu as used by xend/xm toolstack uses a different subvendor id.
Bind the drm driver also to this emulated card.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoxen-blkback: fix leak on grant map error path
Roger Pau Monné [Mon, 15 Sep 2014 09:55:27 +0000 (11:55 +0200)] 
xen-blkback: fix leak on grant map error path

commit 61cecca865280bef4f8a9748d0a9afa5df351ac2 upstream.

Fix leaking a page when a grant mapping has failed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-and-Tested-by: Tao Chen <boby.chen@huawei.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoxen/blkback: unmap all persistent grants when frontend gets disconnected
Vitaly Kuznetsov [Mon, 8 Sep 2014 13:21:33 +0000 (15:21 +0200)] 
xen/blkback: unmap all persistent grants when frontend gets disconnected

commit 12ea729645ace01e08f9654df155622898d3aae6 upstream.

blkback does not unmap persistent grants when frontend goes to Closed
state (e.g. when blkfront module is being removed). This leads to the
following in guest's dmesg:

[  343.243825] xen:grant_table: WARNING: g.e. 0x445 still in use!
[  343.243825] xen:grant_table: WARNING: g.e. 0x42a still in use!
...

When load module -> use device -> unload module sequence is performed multiple times
it is possible to hit BUG() condition in blkfront module:

[  343.243825] kernel BUG at drivers/block/xen-blkfront.c:954!
[  343.243825] invalid opcode: 0000 [#1] SMP
[  343.243825] Modules linked in: xen_blkfront(-) ata_generic pata_acpi [last unloaded: xen_blkfront]
...
[  343.243825] Call Trace:
[  343.243825]  [<ffffffff814111ef>] ? unregister_xenbus_watch+0x16f/0x1e0
[  343.243825]  [<ffffffffa0016fbf>] blkfront_remove+0x3f/0x140 [xen_blkfront]
...
[  343.243825] RIP  [<ffffffffa0016aae>] blkif_free+0x34e/0x360 [xen_blkfront]
[  343.243825]  RSP <ffff88001eb8fdc0>

We don't need to keep these grants if we're disconnecting as frontend might already
forgot about them. Solve the issue by moving xen_blkbk_free_caches() call from
xen_blkif_free() to xen_blkif_disconnect().

Now we can see the following:
[  928.590893] xen:grant_table: WARNING: g.e. 0x587 still in use!
[  928.591861] xen:grant_table: WARNING: g.e. 0x372 still in use!
...
[  929.592146] xen:grant_table: freeing g.e. 0x587
[  929.597174] xen:grant_table: freeing g.e. 0x372
...

Backend does not keep persistent grants any more, reconnect works fine.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agovirtio_pci: fix virtio spec compliance on restore
Michael S. Tsirkin [Tue, 14 Oct 2014 00:10:29 +0000 (10:40 +1030)] 
virtio_pci: fix virtio spec compliance on restore

commit 6fbc198cf623944ab60a1db6d306a4d55cdd820d upstream.

On restore, virtio pci does the following:
+ set features
+ init vqs etc - device can be used at this point!
+ set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits

This is in violation of the virtio spec, which
requires the following order:
- ACKNOWLEDGE
- DRIVER
- init vqs
- DRIVER_OK

This behaviour will break with hypervisors that assume spec compliant
behaviour.  It seems like a good idea to have this patch applied to
stable branches to reduce the support butden for the hypervisors.

Cc: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agopower: charger-manager: Fix NULL pointer exception with missing cm-fuel-gauge
Krzysztof Kozlowski [Fri, 26 Sep 2014 11:27:03 +0000 (13:27 +0200)] 
power: charger-manager: Fix NULL pointer exception with missing cm-fuel-gauge

commit 661a88860274e059fdb744dfaa98c045db7b5d1d upstream.

NULL pointer exception happens during charger-manager probe if
'cm-fuel-gauge' property is not present.

[    2.448536] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[    2.456572] pgd = c0004000
[    2.459217] [00000000] *pgd=00000000
[    2.462759] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[    2.468047] Modules linked in:
[    2.471089] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc6-00251-ge44cf96cd525-dirty #969
[    2.479765] task: ea890000 ti: ea87a000 task.ti: ea87a000
[    2.485161] PC is at strcmp+0x4/0x30
[    2.488719] LR is at power_supply_match_device_by_name+0x10/0x1c
[    2.494695] pc : [<c01f4220>]    lr : [<c030fe38>]    psr: a0000113
[    2.494695] sp : ea87bde0  ip : 00000000  fp : eaa97010
[    2.506150] r10: 00000004  r9 : ea97269c  r8 : ea3bbfd0
[    2.511360] r7 : eaa97000  r6 : c030fe28  r5 : 00000000  r4 : ea3b0000
[    2.517869] r3 : 0000006d  r2 : 00000000  r1 : 00000000  r0 : c057c195
[    2.524381] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[    2.531671] Control: 10c5387d  Table: 4000404a  DAC: 00000015
[    2.537399] Process swapper/0 (pid: 1, stack limit = 0xea87a240)
[    2.543388] Stack: (0xea87bde0 to 0xea87c000)
[    2.547733] bde0: ea3b0210 c026b1c8 eaa97010 eaa97000 eaa97010 eabb60a8 ea3b0210 00000000
[    2.555891] be00: 00000008 ea2db210 ea1a3410 c030fee0 ea3bbf90 c03138fc c068969c c013526c
[    2.564050] be20: eaa040c0 00000000 c068969c 00000000 eaa040c0 ea2da300 00000002 00000000
[    2.572208] be40: 00000001 ea2da3c0 00000000 00000001 00000000 eaa97010 c068969c 00000000
[    2.580367] be60: 00000000 c068969c 00000000 00000002 00000000 c026b71c c026b6f0 eaa97010
[    2.588527] be80: c0e82530 c026a330 00000000 eaa97010 c068969c eaa97044 00000000 c061df50
[    2.596686] bea0: ea87a000 c026a4dc 00000000 c068969c c026a448 c0268b5c ea8054a8 eaa8fd50
[    2.604845] bec0: c068969c ea2db180 c06801f8 c0269b18 c0590f68 c068969c c0656c98 c068969c
[    2.613004] bee0: c0656c98 ea3bbe40 c06988c0 c026aaf0 00000000 c0656c98 c0656c98 c00088a4
[    2.621163] bf00: 00000000 c0055f48 00000000 00000004 00000000 ea890000 c05dbc54 c062c178
[    2.629323] bf20: c0603518 c005f674 00000001 ea87a000 eb7ff83b c0476440 00000091 c003d41c
[    2.637482] bf40: c05db344 00000007 eb7ff858 00000007 c065a76c c0647d24 00000007 c062c170
[    2.645642] bf60: c06988c0 00000091 c062c178 c0603518 00000000 c0603cc4 00000007 00000007
[    2.653801] bf80: c0603518 c0c0c0c0 00000000 c0453948 00000000 00000000 00000000 00000000
[    2.661959] bfa0: 00000000 c0453950 00000000 c000e728 00000000 00000000 00000000 00000000
[    2.670118] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    2.678277] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 c0c0c0c0 c0c0c0c0
[    2.686454] [<c01f4220>] (strcmp) from [<c030fe38>] (power_supply_match_device_by_name+0x10/0x1c)
[    2.695303] [<c030fe38>] (power_supply_match_device_by_name) from [<c026b1c8>] (class_find_device+0x54/0xac)
[    2.705106] [<c026b1c8>] (class_find_device) from [<c030fee0>] (power_supply_get_by_name+0x1c/0x30)
[    2.714137] [<c030fee0>] (power_supply_get_by_name) from [<c03138fc>] (charger_manager_probe+0x3d8/0xe58)
[    2.723683] [<c03138fc>] (charger_manager_probe) from [<c026b71c>] (platform_drv_probe+0x2c/0x5c)
[    2.732532] [<c026b71c>] (platform_drv_probe) from [<c026a330>] (driver_probe_device+0x10c/0x224)
[    2.741384] [<c026a330>] (driver_probe_device) from [<c026a4dc>] (__driver_attach+0x94/0x98)
[    2.749813] [<c026a4dc>] (__driver_attach) from [<c0268b5c>] (bus_for_each_dev+0x54/0x88)
[    2.757969] [<c0268b5c>] (bus_for_each_dev) from [<c0269b18>] (bus_add_driver+0xd4/0x1d0)
[    2.766123] [<c0269b18>] (bus_add_driver) from [<c026aaf0>] (driver_register+0x78/0xf4)
[    2.774110] [<c026aaf0>] (driver_register) from [<c00088a4>] (do_one_initcall+0x80/0x1bc)
[    2.782276] [<c00088a4>] (do_one_initcall) from [<c0603cc4>] (kernel_init_freeable+0x100/0x1cc)
[    2.790952] [<c0603cc4>] (kernel_init_freeable) from [<c0453950>] (kernel_init+0x8/0xec)
[    2.799029] [<c0453950>] (kernel_init) from [<c000e728>] (ret_from_fork+0x14/0x2c)
[    2.806572] Code: e12fff1e e1a03000 eafffff7 e4d03001 (e4d12001)
[    2.812832] ---[ end trace 7f12556111b9e7ef ]---

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 856ee6115e2d ("charger-manager: Support deivce tree in charger manager driver")
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoselinux: fix inode security list corruption
Stephen Smalley [Mon, 6 Oct 2014 20:32:52 +0000 (16:32 -0400)] 
selinux: fix inode security list corruption

commit 923190d32de4428afbea5e5773be86bea60a9925 upstream.

sb_finish_set_opts() can race with inode_free_security()
when initializing inode security structures for inodes
created prior to initial policy load or by the filesystem
during ->mount().   This appears to have always been
a possible race, but commit 3dc91d4 ("SELinux:  Fix possible
NULL pointer dereference in selinux_inode_permission()")
made it more evident by immediately reusing the unioned
list/rcu element  of the inode security structure for call_rcu()
upon an inode_free_security().  But the underlying issue
was already present before that commit as a possible use-after-free
of isec.

Shivnandan Kumar reported the list corruption and proposed
a patch to split the list and rcu elements out of the union
as separate fields of the inode_security_struct so that setting
the rcu element would not affect the list element.  However,
this would merely hide the issue and not truly fix the code.

This patch instead moves up the deletion of the list entry
prior to dropping the sbsec->isec_lock initially.  Then,
if the inode is dropped subsequently, there will be no further
references to the isec.

Reported-by: Shivnandan Kumar <shivnandan.k@samsung.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agopstore: Fix duplicate {console,ftrace}-efi entries
Valdis Kletnieks [Mon, 13 Oct 2014 03:09:08 +0000 (23:09 -0400)] 
pstore: Fix duplicate {console,ftrace}-efi entries

commit d4bf205da618bbd0b038e404d646f14e76915718 upstream.

The pstore filesystem still creates duplicate filename/inode pairs for
some pstore types.  Add the id to the filename to prevent that.

Before patch:

[/sys/fs/pstore] ls -li
total 0
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi

After:

[/sys/fs/pstore] ls -li
total 0
1232 -r--r--r--. 1 root root 148 Sep 29 17:09 console-efi-141202499100000
1231 -r--r--r--. 1 root root  67 Sep 29 17:09 console-efi-141202499200000
1230 -r--r--r--. 1 root root 148 Sep 29 17:44 console-efi-141202705400000
1229 -r--r--r--. 1 root root  67 Sep 29 17:44 console-efi-141202705500000
1228 -r--r--r--. 1 root root  67 Sep 29 20:42 console-efi-141203772600000
1227 -r--r--r--. 1 root root 148 Sep 29 23:42 console-efi-141204854900000
1226 -r--r--r--. 1 root root  67 Sep 29 23:42 console-efi-141204855000000
1225 -r--r--r--. 1 root root 148 Sep 29 23:59 console-efi-141204954200000
1224 -r--r--r--. 1 root root  67 Sep 29 23:59 console-efi-141204954400000

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoiommu/amd: Split init_iommu_group() from iommu_init_device()
Alex Williamson [Fri, 19 Sep 2014 16:03:13 +0000 (10:03 -0600)] 
iommu/amd: Split init_iommu_group() from iommu_init_device()

commit 25b11ce2a3607d7c39a2ca121eea0c67c722b34e upstream.

For a PCI device, aliases from the IVRS table won't be populated
into dma_alias_devfn until after iommu_init_device() is called on
each device.  We therefore want to split init_iommu_group() to
be called from a separate loop immediately following.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoiommu: Rework iommu_group_get_for_pci_dev()
Alex Williamson [Fri, 19 Sep 2014 16:03:06 +0000 (10:03 -0600)] 
iommu: Rework iommu_group_get_for_pci_dev()

commit f096c061f5525d1b35a65b793057b52061dcb486 upstream.

It turns out that our assumption that aliases are always to the same
slot isn't true.  One particular platform reports an IVRS alias of the
SATA controller (00:11.0) for the legacy IDE controller (00:14.1).
When we hit this, we attempt to use a single IOMMU group for
everything on the same bus, which in this case is the root complex.
We already have multiple groups defined for the root complex by this
point, resulting in multiple WARN_ON hits.

This patch makes these sorts of aliases work again with IOMMU groups
by reworking how we search through the PCI address space to find
existing groups.  This should also now handle looped dependencies and
all sorts of crazy inter-dependencies that we'll likely never see.

The recursion used here should never be very deep.  It's unlikely to
have individual aliases and only theoretical that we'd ever see a
chain where one alias causes us to search through to yet another
alias.  We're also only dealing with PCIe device on a single bus,
which means we'll typically only see multiple slots in use on the root
complex.  Loops are also a theoretically possibility, which I've
tested using fake DMA alias quirks and prevent from causing problems
using a bitmap of the devfn space that's been visited.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomfd: rtsx_pcr: Fix MSI enable error handling
Chris Ball [Thu, 4 Sep 2014 16:11:53 +0000 (17:11 +0100)] 
mfd: rtsx_pcr: Fix MSI enable error handling

commit 5152970538a5e16c03bbcb9f1c780489a795ed40 upstream.

pci_enable_msi() can return failure with both positive and negative
integers -- it returns 0 for success -- but is only tested here for
"if (ret < 0)".  This causes us to try to use MSI on the RTS5249 SD
reader in the Dell XPS 11 when enabling MSI failed, causing:

[    1.737110] rtsx_pci: probe of 0000:05:00.0 failed with error -110

Reported-by: D. Jared Dominguez <Jared_Dominguez@Dell.com>
Tested-by: D. Jared Dominguez <Jared_Dominguez@Dell.com>
Signed-off-by: Chris Ball <chris@printf.net>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomfd: ti_am335x_tscadc: Fix TSC resume
Sebastian Andrzej Siewior [Mon, 8 Sep 2014 13:28:42 +0000 (15:28 +0200)] 
mfd: ti_am335x_tscadc: Fix TSC resume

commit 6a71f38dd87f255a0586104ce2a14d5a3ddf3401 upstream.

In the resume path, the ADC invokes am335x_tsc_se_set_cache() with 0 as
the steps argument if continous mode is not in use. This in turn disables
all steps and so the TSC is not working until one ADC sampling is
performed.

This patch fixes it by writing the current cached mask instead of the
passed steps.

Fixes: 7ca6740cd1cd ("mfd: input: iio: ti_amm335x: Rework TSC/ADCA
synchronization")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomfd: ti_am335x_tscadc: Fix TSC operation after ADC continouous mode
Vignesh R [Mon, 1 Sep 2014 06:31:06 +0000 (12:01 +0530)] 
mfd: ti_am335x_tscadc: Fix TSC operation after ADC continouous mode

commit 6ac734d2242949f41eb1346ca0fd4ed010c937aa upstream.

After enabling and disabling ADC continuous mode via sysfs, ts_print_raw
fails to return any data. This is because when ADC is configured for
continuous mode, it disables touch screen steps.These steps are not
re-enabled when ADC continuous mode is disabled. Therefore existing values
of REG_SE needs to be cached before enabling continuous mode and
disabling touch screen steps and enabling ADC steps. The cached value
are to be restored to REG_SE once ADC is disabled.

Fixes: 7ca6740cd1cd ("mfd: input: iio: ti_amm335x: Rework TSC/ADC synchronization")
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomnt: Prevent pivot_root from creating a loop in the mount tree
Eric W. Biederman [Wed, 8 Oct 2014 17:42:27 +0000 (10:42 -0700)] 
mnt: Prevent pivot_root from creating a loop in the mount tree

commit 0d0826019e529f21c84687521d03f60cd241ca7d upstream.

Andy Lutomirski recently demonstrated that when chroot is used to set
the root path below the path for the new ``root'' passed to pivot_root
the pivot_root system call succeeds and leaks mounts.

In examining the code I see that starting with a new root that is
below the current root in the mount tree will result in a loop in the
mount tree after the mounts are detached and then reattached to one
another.  Resulting in all kinds of ugliness including a leak of that
mounts involved in the leak of the mount loop.

Prevent this problem by ensuring that the new mount is reachable from
the current root of the mount tree.

[Added stable cc.  Fixes CVE-2014-7970.  --Andy]

Reported-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoUBI: add missing kmem_cache_free() in process_pool_aeb error path
Richard Genoud [Tue, 9 Sep 2014 12:25:01 +0000 (14:25 +0200)] 
UBI: add missing kmem_cache_free() in process_pool_aeb error path

commit 1bf1890e86869032099b539bc83b098be12fc5a7 upstream.

I ran into this error after a ubiupdatevol, because I forgot to backport
e9110361a9a4 UBI: fix the volumes tree sorting criteria.

UBI error: process_pool_aeb: orphaned volume in fastmap pool
UBI error: ubi_scan_fastmap: Attach by fastmap failed, doing a full scan!
kmem_cache_destroy ubi_ainf_peb_slab: Slab cache still has objects
CPU: 0 PID: 1 Comm: swapper Not tainted 3.14.18-00053-gf05cac8dbf85 #1
[<c000d298>] (unwind_backtrace) from [<c000baa8>] (show_stack+0x10/0x14)
[<c000baa8>] (show_stack) from [<c01b7a68>] (destroy_ai+0x230/0x244)
[<c01b7a68>] (destroy_ai) from [<c01b8fd4>] (ubi_attach+0x98/0x1ec)
[<c01b8fd4>] (ubi_attach) from [<c01ade90>] (ubi_attach_mtd_dev+0x2b8/0x868)
[<c01ade90>] (ubi_attach_mtd_dev) from [<c038b510>] (ubi_init+0x1dc/0x2ac)
[<c038b510>] (ubi_init) from [<c0008860>] (do_one_initcall+0x94/0x140)
[<c0008860>] (do_one_initcall) from [<c037aadc>] (kernel_init_freeable+0xe8/0x1b0)
[<c037aadc>] (kernel_init_freeable) from [<c02730ac>] (kernel_init+0x8/0xe4)
[<c02730ac>] (kernel_init) from [<c00093f0>] (ret_from_fork+0x14/0x24)
UBI: scanning is finished

Freeing the cache in the error path fixes the Slab error.

Tested on at91sam9g35 (3.14.18+fastmap backports)

Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoUBI: Dispatch update notification if the volume is updated
Ezequiel Garcia [Fri, 29 Aug 2014 21:42:30 +0000 (18:42 -0300)] 
UBI: Dispatch update notification if the volume is updated

commit fda322a1b3b9e8ee231913c500f73c6988b1aff5 upstream.

The UBI_IOCVOLUP ioctl is used to start an update and also to
truncate a volume. In the first case, a "volume updated" notification
is dispatched when the update is done.

This commit adds the "volume updated" notification to be also sent when
the volume is truncated. This is required for UBI block and gluebi to get
notified about the new volume size.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoUBI: block: Add support for the UBI_VOLUME_UPDATED notification
Ezequiel Garcia [Fri, 29 Aug 2014 21:42:29 +0000 (18:42 -0300)] 
UBI: block: Add support for the UBI_VOLUME_UPDATED notification

commit 06d9c2905f745c8b1920a335cbb366ba6b0fc754 upstream.

Static volumes can change its 'used_bytes' when they get updated,
and so the block interface must listen to the UBI_VOLUME_UPDATED
notification to resize the block device accordingly.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoUBI: block: Fix block device size setting
Ezequiel Garcia [Fri, 29 Aug 2014 21:42:28 +0000 (18:42 -0300)] 
UBI: block: Fix block device size setting

commit 978d6496758d19de2431ebf163337fc7b92f8c45 upstream.

We are currently taking the block device size from the ubi_volume_info.size
field. However, this is not the amount of data in the volume, but the
number of reserved physical eraseblocks, and hence leads to an incorrect
representation of the volume.

In particular, this produces I/O errors on static volumes as the block
interface may attempt to read unmapped PEBs:

$ cat /dev/ubiblock0_0 > /dev/null
UBI error: ubiblock_read_to_buf: ubiblock0_0 ubi_read error -22
end_request: I/O error, dev ubiblock0_0, sector 9536
Buffer I/O error on device ubiblock0_0, logical block 2384
[snip]

Fix this by using the ubi_volume_info.used_bytes field which is set to the
actual number of data bytes for both static and dynamic volumes.

While here, improve the error message to be less stupid and more useful:
UBI error: ubiblock_read_to_buf: ubiblock0_1 ubi_read error -9 on LEB=0, off=15872, len=512

It's worth noticing that the 512-byte sector representation of the volume
is only correct if the volume size is multiple of 512-bytes. This is true for
virtually any NAND device, given eraseblocks and pages are 512-byte multiple
and hence so is the LEB size.

Artem: tweak the error message and make it look more like other UBI error
messages.

Fixes: 9d54c8a33eec ("UBI: R/O block driver on top of UBI volumes")
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agos390/topology: call set_sched_topology early
Martin Schwidefsky [Wed, 24 Sep 2014 14:37:20 +0000 (16:37 +0200)] 
s390/topology: call set_sched_topology early

commit 48e9a6c1f54695609b709bf674aac133794ada00 upstream.

The call to topology_init is too late for the set_sched_topology call.
The initial scheduling domain structure has already been established
with default topology array. Use the smp_cpus_done() call to get the
s390 specific topology array registered early enough.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agorandom: add and use memzero_explicit() for clearing data
Daniel Borkmann [Wed, 27 Aug 2014 03:16:35 +0000 (23:16 -0400)] 
random: add and use memzero_explicit() for clearing data

commit d4c5efdb97773f59a2b711754ca0953f24516739 upstream.

zatimend has reported that in his environment (3.16/gcc4.8.3/corei7)
memset() calls which clear out sensitive data in extract_{buf,entropy,
entropy_user}() in random driver are being optimized away by gcc.

Add a helper memzero_explicit() (similarly as explicit_bzero() variants)
that can be used in such cases where a variable with sensitive data is
being cleared out in the end. Other use cases might also be in crypto
code. [ I have put this into lib/string.c though, as it's always built-in
and doesn't need any dependencies then. ]

Fixes kernel bugzilla: 82041

Reported-by: zatimend@hotmail.co.uk
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoum: ubd: Fix for processes stuck in D state forever
Thorsten Knabe [Sat, 23 Aug 2014 13:47:38 +0000 (15:47 +0200)] 
um: ubd: Fix for processes stuck in D state forever

commit 2a2361228c5e6d8c1733f00653481de918598e50 upstream.

Starting with Linux 3.12 processes get stuck in D state forever in
UserModeLinux under sync heavy workloads. This bug was introduced by
commit 805f11a0d5 (um: ubd: Add REQ_FLUSH suppport).
Fix bug by adding a check if FLUSH request was successfully submitted to
the I/O thread and keeping the FLUSH request on the request queue on
submission failures.

Fixes: 805f11a0d5 (um: ubd: Add REQ_FLUSH suppport)
Signed-off-by: Thorsten Knabe <linux@thorsten-knabe.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosched: Use dl_bw_of() under RCU read lock
Kirill Tkhai [Mon, 22 Sep 2014 18:36:24 +0000 (22:36 +0400)] 
sched: Use dl_bw_of() under RCU read lock

commit 66339c31bc3978d5fff9c4b4cb590a861def4db2 upstream.

dl_bw_of() dereferences rq->rd which has to have RCU read lock held.
Probability of use-after-free isn't zero here.

Also add lockdep assert into dl_bw_cpus().

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140922183624.11015.71558.stgit@localhost
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agolibceph: ceph-msgr workqueue needs a resque worker
Ilya Dryomov [Fri, 10 Oct 2014 12:39:05 +0000 (16:39 +0400)] 
libceph: ceph-msgr workqueue needs a resque worker

commit f9865f06f7f18c6661c88d0511f05c48612319cc upstream.

Commit f363e45fd118 ("net/ceph: make ceph_msgr_wq non-reentrant")
effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq.  This is
wrong - libceph is very much a memory reclaim path, so restore it.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Tested-by: Micha Krause <micha@krausam.de>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agorbd: rbd workqueues need a resque worker
Ilya Dryomov [Fri, 10 Oct 2014 14:36:07 +0000 (18:36 +0400)] 
rbd: rbd workqueues need a resque worker

commit 792c3a914910bd34302c5345578f85cfcb5e2c01 upstream.

Need to use WQ_MEM_RECLAIM for our workqueues to prevent I/O lockups
under memory pressure - we sit on the memory reclaim path.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Tested-by: Micha Krause <micha@krausam.de>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agofix misuses of f_count() in ppp and netlink
Al Viro [Thu, 9 Oct 2014 03:44:00 +0000 (23:44 -0400)] 
fix misuses of f_count() in ppp and netlink

commit 24dff96a37a2ca319e75a74d3929b2de22447ca6 upstream.

we used to check for "nobody else could start doing anything with
that opened file" by checking that refcount was 2 or less - one
for descriptor table and one we'd acquired in fget() on the way to
wherever we are.  That was race-prone (somebody else might have
had a reference to descriptor table and do fget() just as we'd
been checking) and it had become flat-out incorrect back when
we switched to fget_light() on those codepaths - unlike fget(),
it doesn't grab an extra reference unless the descriptor table
is shared.  The same change allowed a race-free check, though -
we are safe exactly when refcount is less than 2.

It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading
to ppp one) and 2.6.17 for sendmsg() (netlink one).  OTOH,
netlink hadn't grown that check until 3.9 and ppp used to live
in drivers/net, not drivers/net/ppp until 3.1.  The bug existed
well before that, though, and the same fix used to apply in old
location of file.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agokill wbuf_queued/wbuf_dwork_lock
Al Viro [Fri, 1 Aug 2014 19:13:40 +0000 (20:13 +0100)] 
kill wbuf_queued/wbuf_dwork_lock

commit 99358a1ca53e8e6ce09423500191396f0e6584d2 upstream.

schedule_delayed_work() happening when the work is already pending is
a cheap no-op.  Don't bother with ->wbuf_queued logics - it's both
broken (cancelling ->wbuf_dwork leaves it set, as spotted by Jeff Harris)
and pointless.  It's cheaper to let schedule_delayed_work() handle that
case.

Reported-by: Jeff Harris <jefftharris@gmail.com>
Tested-by: Jeff Harris <jefftharris@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomissing data dependency barrier in prepend_name()
Al Viro [Mon, 29 Sep 2014 18:46:30 +0000 (14:46 -0400)] 
missing data dependency barrier in prepend_name()

commit 6d13f69444bd3d4888e43f7756449748f5a98bad upstream.

AFAICS, prepend_name() is broken on SMP alpha.  Disclaimer: I don't have
SMP alpha boxen to reproduce it on.  However, it really looks like the race
is real.

CPU1: d_path() on /mnt/ramfs/<255-character>/foo
CPU2: mv /mnt/ramfs/<255-character> /mnt/ramfs/<63-character>

CPU2 does d_alloc(), which allocates an external name, stores the name there
including terminating NUL, does smp_wmb() and stores its address in
dentry->d_name.name.  It proceeds to d_add(dentry, NULL) and d_move()
old dentry over to that.  ->d_name.name value ends up in that dentry.

In the meanwhile, CPU1 gets to prepend_name() for that dentry.  It fetches
->d_name.name and ->d_name.len; the former ends up pointing to new name
(64-byte kmalloc'ed array), the latter - 255 (length of the old name).
Nothing to force the ordering there, and normally that would be OK, since we'd
run into the terminating NUL and stop.  Except that it's alpha, and we'd need
a data dependency barrier to guarantee that we see that store of NUL
__d_alloc() has done.  In a similar situation dentry_cmp() would survive; it
does explicit smp_read_barrier_depends() after fetching ->d_name.name.
prepend_name() doesn't and it risks walking past the end of kmalloc'ed object
and possibly oops due to taking a page fault in kernel mode.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode
Takashi Iwai [Tue, 28 Oct 2014 11:42:19 +0000 (12:42 +0100)] 
ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode

commit 317168d0c766defd14b3d0e9c2c4a9a258b803ee upstream.

In compat mode, we copy each field of snd_pcm_status struct but don't
touch the reserved fields, and this leaves uninitialized values
there.  Meanwhile the native ioctl does zero-clear the whole
structure, so we should follow the same rule in compat mode, too.

Reported-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoALSA: bebob: Uninitialized id returned by saffirepro_both_clk_src_get
Christian Vogel [Sat, 25 Oct 2014 11:40:41 +0000 (13:40 +0200)] 
ALSA: bebob: Uninitialized id returned by saffirepro_both_clk_src_get

commit d1d0b6b668818571122d30d68a0b3f768bd83a52 upstream.

snd_bebob_stream_check_internal_clock() may get an id from
saffirepro_both_clk_src_get (via clk_src->get()) that was uninitialized.

a) make logic in saffirepro_both_clk_src_get explicit
b) test if id used in snd_bebob_stream_check_internal_clock matches array size

[fixed missing signed prefix to *_maps[] by tiwai]

Signed-off-by: Christian Vogel <vogelchr@vogel.cx>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoALSA: hda - Add workaround for CMI8888 snoop behavior
Takashi Iwai [Wed, 29 Oct 2014 15:13:05 +0000 (16:13 +0100)] 
ALSA: hda - Add workaround for CMI8888 snoop behavior

commit 3b70bdba2fcb374a2235a56ab73334348d819579 upstream.

CMI8888 shows the stuttering playback when the snooping is disabled
on the audio buffer.  Meanwhile, we've got reports that CORB/RIRB
doesn't work in the snooped mode.  So, as a compromise, disable the
snoop only for CORB/RIRB and enable the snoop for the stream buffers.

The resultant patch became a bit ugly, unfortunately, but we still can
live with it.

Reported-and-tested-by: Geoffrey McRae <geoff@spacevs.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoevm: check xattr value length and type in evm_inode_setxattr()
Dmitry Kasatkin [Tue, 28 Oct 2014 12:28:49 +0000 (14:28 +0200)] 
evm: check xattr value length and type in evm_inode_setxattr()

commit 3b1deef6b1289a99505858a3b212c5b50adf0c2f upstream.

evm_inode_setxattr() can be called with no value. The function does not
check the length so that following command can be used to produce the
kernel oops: setfattr -n security.evm FOO. This patch fixes it.

Changes in v3:
* there is no reason to return different error codes for EVM_XATTR_HMAC
  and non EVM_XATTR_HMAC. Remove unnecessary test then.

Changes in v2:
* testing for validity of xattr type

[ 1106.396921] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1106.398192] IP: [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48
[ 1106.399244] PGD 29048067 PUD 290d7067 PMD 0
[ 1106.399953] Oops: 0000 [#1] SMP
[ 1106.400020] Modules linked in: bridge stp llc evdev serio_raw i2c_piix4 button fuse
[ 1106.400020] CPU: 0 PID: 3635 Comm: setxattr Not tainted 3.16.0-kds+ #2936
[ 1106.400020] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1106.400020] task: ffff8800291a0000 ti: ffff88002917c000 task.ti: ffff88002917c000
[ 1106.400020] RIP: 0010:[<ffffffff812af7b8>]  [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48
[ 1106.400020] RSP: 0018:ffff88002917fd50  EFLAGS: 00010246
[ 1106.400020] RAX: 0000000000000000 RBX: ffff88002917fdf8 RCX: 0000000000000000
[ 1106.400020] RDX: 0000000000000000 RSI: ffffffff818136d3 RDI: ffff88002917fdf8
[ 1106.400020] RBP: ffff88002917fd68 R08: 0000000000000000 R09: 00000000003ec1df
[ 1106.400020] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800438a0a00
[ 1106.400020] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1106.400020] FS:  00007f7dfa7d7740(0000) GS:ffff88005da00000(0000) knlGS:0000000000000000
[ 1106.400020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1106.400020] CR2: 0000000000000000 CR3: 000000003763e000 CR4: 00000000000006f0
[ 1106.400020] Stack:
[ 1106.400020]  ffff8800438a0a00 ffff88002917fdf8 0000000000000000 ffff88002917fd98
[ 1106.400020]  ffffffff812a1030 ffff8800438a0a00 ffff88002917fdf8 0000000000000000
[ 1106.400020]  0000000000000000 ffff88002917fde0 ffffffff8116d08a ffff88002917fdc8
[ 1106.400020] Call Trace:
[ 1106.400020]  [<ffffffff812a1030>] security_inode_setxattr+0x5d/0x6a
[ 1106.400020]  [<ffffffff8116d08a>] vfs_setxattr+0x6b/0x9f
[ 1106.400020]  [<ffffffff8116d1e0>] setxattr+0x122/0x16c
[ 1106.400020]  [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45
[ 1106.400020]  [<ffffffff8114d011>] ? __sb_start_write+0x10f/0x143
[ 1106.400020]  [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45
[ 1106.400020]  [<ffffffff811687c0>] ? __mnt_want_write+0x48/0x4f
[ 1106.400020]  [<ffffffff8116d3e6>] SyS_setxattr+0x6e/0xb0
[ 1106.400020]  [<ffffffff81529da9>] system_call_fastpath+0x16/0x1b
[ 1106.400020] Code: c3 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 d5 41 54 49 89 fc 53 48 89 f3 48 c7 c6 d3 36 81 81 48 89 df e8 18 22 04 00 85 c0 75 07 <41> 80 7d 00 02 74 0d 48 89 de 4c 89 e7 e8 5a fe ff ff eb 03 83
[ 1106.400020] RIP  [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48
[ 1106.400020]  RSP <ffff88002917fd50>
[ 1106.400020] CR2: 0000000000000000
[ 1106.428061] ---[ end trace ae08331628ba3050 ]---

Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoevm: properly handle INTEGRITY_NOXATTRS EVM status
Dmitry Kasatkin [Tue, 2 Sep 2014 13:31:43 +0000 (16:31 +0300)] 
evm: properly handle INTEGRITY_NOXATTRS EVM status

commit 3dcbad52cf18c3c379e96b992d22815439ebbe53 upstream.

Unless an LSM labels a file during d_instantiate(), newly created
files are not labeled with an initial security.evm xattr, until
the file closes.  EVM, before allowing a protected, security xattr
to be written, verifies the existing 'security.evm' value is good.
For newly created files without a security.evm label, this
verification prevents writing any protected, security xattrs,
until the file closes.

Following is the example when this happens:
fd = open("foo", O_CREAT | O_WRONLY, 0644);
setxattr("foo", "security.SMACK64", value, sizeof(value), 0);
close(fd);

While INTEGRITY_NOXATTRS status is handled in other places, such
as evm_inode_setattr(), it does not handle it in all cases in
evm_protect_xattr().  By limiting the use of INTEGRITY_NOXATTRS to
newly created files, we can now allow setting "protected" xattrs.

Changelog:
- limit the use of INTEGRITY_NOXATTRS to IMA identified new files

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoperf: Fix unclone_ctx() vs. locking
Peter Zijlstra [Tue, 30 Sep 2014 17:23:08 +0000 (19:23 +0200)] 
perf: Fix unclone_ctx() vs. locking

commit 211de6eba8960521e2be450a7d07db85fba4604c upstream.

The idiot who did 4a1c0f262f88 ("perf: Fix lockdep warning on process exit")
forgot to pay attention and fix all similar cases. Do so now.

In particular, unclone_ctx() must be called while holding ctx->lock,
therefore all such sites are broken for the same reason. Pull the
put_ctx() call out from under ctx->lock.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Probably-also-reported-by: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 4a1c0f262f88 ("perf: Fix lockdep warning on process exit")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140930172308.GI4241@worktop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agox86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
Dexuan Cui [Wed, 29 Oct 2014 10:53:37 +0000 (03:53 -0700)] 
x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE

commit d1cd1210834649ce1ca6bafe5ac25d2f40331343 upstream.

pte_pfn() returns a PFN of long (32 bits in 32-PAE), so "long <<
PAGE_SHIFT" will overflow for PFNs above 4GB.

Due to this issue, some Linux 32-PAE distros, running as guests on Hyper-V,
with 5GB memory assigned, can't load the netvsc driver successfully and
hence the synthetic network device can't work (we can use the kernel parameter
mem=3000M to work around the issue).

Cast pte_pfn() to phys_addr_t before shifting.

Fixes: "commit d76565344512: x86, mm: Create slow_virt_to_phys()"
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: gregkh@linuxfoundation.org
Cc: linux-mm@kvack.org
Cc: olaf@aepfle.de
Cc: apw@canonical.com
Cc: jasowang@redhat.com
Cc: dave.hansen@intel.com
Cc: riel@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1414580017-27444-1-git-send-email-decui@microsoft.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agox86_64, entry: Fix out of bounds read on sysenter
Andy Lutomirski [Sat, 1 Nov 2014 01:08:45 +0000 (18:08 -0700)] 
x86_64, entry: Fix out of bounds read on sysenter

commit 653bc77af60911ead1f423e588f54fc2547c4957 upstream.

Rusty noticed a Really Bad Bug (tm) in my NT fix.  The entry code
reads out of bounds, causing the NT fix to be unreliable.  But, and
this is much, much worse, if your stack is somehow just below the
top of the direct map (or a hole), you read out of bounds and crash.

Excerpt from the crash:

[    1.129513] RSP: 0018:ffff88001da4bf88  EFLAGS: 00010296

  2b:*    f7 84 24 90 00 00 00     testl  $0x4000,0x90(%rsp)

That read is deterministically above the top of the stack.  I
thought I even single-stepped through this code when I wrote it to
check the offset, but I clearly screwed it up.

Fixes: 8c7aa698baca ("x86_64, entry: Filter RFLAGS.NT on entry from userspace")
Reported-by: Rusty Russell <rusty@ozlabs.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agox86_64, entry: Filter RFLAGS.NT on entry from userspace
Andy Lutomirski [Wed, 1 Oct 2014 18:49:04 +0000 (11:49 -0700)] 
x86_64, entry: Filter RFLAGS.NT on entry from userspace

commit 8c7aa698baca5e8f1ba9edb68081f1e7a1abf455 upstream.

The NT flag doesn't do anything in long mode other than causing IRET
to #GP.  Oddly, CPL3 code can still set NT using popf.

Entry via hardware or software interrupt clears NT automatically, so
the only relevant entries are fast syscalls.

If user code causes kernel code to run with NT set, then there's at
least some (small) chance that it could cause trouble.  For example,
user code could cause a call to EFI code with NT set, and who knows
what would happen?  Apparently some games on Wine sometimes do
this (!), and, if an IRET return happens, they will segfault.  That
segfault cannot be handled, because signal delivery fails, too.

This patch programs the CPU to clear NT on entry via SYSCALL (both
32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
in software on entry via SYSENTER.

To save a few cycles, this borrows a trick from Jan Beulich in Xen:
it checks whether NT is set before trying to clear it.  As a result,
it seems to have very little effect on SYSENTER performance on my
machine.

There's another minor bug fix in here: it looks like the CFI
annotations were wrong if CONFIG_AUDITSYSCALL=n.

Testers beware: on Xen, SYSENTER with NT set turns into a GPF.

I haven't touched anything on 32-bit kernels.

The syscall mask change comes from a variant of this patch by Anish
Bhatt.

Note to stable maintainers: there is no known security issue here.
A misguided program can set NT and cause the kernel to try and fail
to deliver SIGSEGV, crashing the program.  This patch fixes Far Cry
on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275

Reported-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agox86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
Oleg Nesterov [Tue, 2 Sep 2014 17:57:13 +0000 (19:57 +0200)] 
x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()

commit 66463db4fc5605d51c7bb81d009d5bf30a783a2c upstream.

save_xstate_sig()->drop_init_fpu() doesn't look right. setup_rt_frame()
can fail after that, in this case the next setup_rt_frame() triggered
by SIGSEGV won't save fpu simply because the old state was lost. This
obviously mean that fpu won't be restored after sys_rt_sigreturn() from
SIGSEGV handler.

Shift drop_init_fpu() into !failed branch in handle_signal().

Test-case (needs -O2):

#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <pthread.h>
#include <assert.h>

volatile double D;

void test(double d)
{
int pid = getpid();

for (D = d; D == d; ) {
/* sys_tkill(pid, SIGHUP); asm to avoid save/reload
 * fp regs around "C" call */
asm ("" : : "a"(200), "D"(pid), "S"(1));
asm ("syscall" : : : "ax");
}

printf("ERR!!\n");
}

void sigh(int sig)
{
}

char altstack[4096 * 10] __attribute__((aligned(4096)));

void *tfunc(void *arg)
{
for (;;) {
mprotect(altstack, sizeof(altstack), PROT_READ);
mprotect(altstack, sizeof(altstack), PROT_READ|PROT_WRITE);
}
}

int main(void)
{
stack_t st = {
.ss_sp = altstack,
.ss_size = sizeof(altstack),
.ss_flags = SS_ONSTACK,
};

struct sigaction sa = {
.sa_handler = sigh,
};

pthread_t pt;

sigaction(SIGSEGV, &sa, NULL);
sigaltstack(&st, NULL);
sa.sa_flags = SA_ONSTACK;
sigaction(SIGHUP, &sa, NULL);

pthread_create(&pt, NULL, tfunc, NULL);

test(123.456);
return 0;
}

Reported-by: Bean Anderson <bean@azulsystems.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140902175713.GA21646@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agox86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()
Oleg Nesterov [Tue, 2 Sep 2014 17:57:17 +0000 (19:57 +0200)] 
x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()

commit df24fb859a4e200d9324e2974229fbb7adf00aef upstream.

Add preempt_disable() + preempt_enable() around math_state_restore() in
__restore_xstate_sig(). Otherwise __switch_to() after __thread_fpu_begin()
can overwrite fpu->state we are going to restore.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140902175717.GA21649@redhat.com
Reviewed-by: Suresh Siddha <sbsiddha@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agox86: Reject x32 executables if x32 ABI not supported
Ben Hutchings [Sun, 7 Sep 2014 20:05:05 +0000 (21:05 +0100)] 
x86: Reject x32 executables if x32 ABI not supported

commit 0e6d3112a4e95d55cf6dca88f298d5f4b8f29bd1 upstream.

It is currently possible to execve() an x32 executable on an x86_64
kernel that has only ia32 compat enabled.  However all its syscalls
will fail, even _exit().  This usually causes it to segfault.

Change the ELF compat architecture check so that x32 executables are
rejected if we don't support the x32 ABI.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agovfs: fix data corruption when blocksize < pagesize for mmaped data
Jan Kara [Thu, 2 Oct 2014 01:49:18 +0000 (21:49 -0400)] 
vfs: fix data corruption when blocksize < pagesize for mmaped data

commit 90a8020278c1598fafd071736a0846b38510309c upstream.

->page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently discarding data later when writepage is called.

However VFS fails to call ->page_mkwrite() in all the cases where
filesystems need it when blocksize < pagesize. For example when
blocksize = 1024, pagesize = 4096 the following is problematic:
  ftruncate(fd, 0);
  pwrite(fd, buf, 1024, 0);
  map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
  map[0] = 'a';       ----> page_mkwrite() for index 0 is called
  ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
  mremap(map, 1024, 10000, 0);
  map[4095] = 'a';    ----> no page_mkwrite() called

At the moment ->page_mkwrite() is called, filesystem can allocate only
one block for the page because i_size == 1024. Otherwise it would create
blocks beyond i_size which is generally undesirable. But later at
->writepage() time, we also need to store data at offset 4095 but we
don't have block allocated for it.

This patch introduces a helper function filesystems can use to have
->page_mkwrite() called at all the necessary moments.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoUBIFS: fix free log space calculation
Artem Bityutskiy [Wed, 16 Jul 2014 12:22:29 +0000 (15:22 +0300)] 
UBIFS: fix free log space calculation

commit ba29e721eb2df6df8f33c1f248388bb037a47914 upstream.

Hu (hujianyang <hujianyang@huawei.com>) discovered an issue in the
'empty_log_bytes()' function, which calculates how many bytes are left in the
log:

"
If 'c->lhead_lnum + 1 == c->ltail_lnum' and 'c->lhead_offs == c->leb_size', 'h'
would equalent to 't' and 'empty_log_bytes()' would return 'c->log_bytes'
instead of 0.
"

At this point it is not clear what would be the consequences of this, and
whether this may lead to any problems, but this patch addresses the issue just
in case.

Tested-by: hujianyang <hujianyang@huawei.com>
Reported-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoUBIFS: fix a race condition
Artem Bityutskiy [Sun, 29 Jun 2014 14:00:45 +0000 (17:00 +0300)] 
UBIFS: fix a race condition

commit 052c28073ff26f771d44ef33952a41d18dadd255 upstream.

Hu (hujianyang@huawei.com) discovered a race condition which may lead to a
situation when UBIFS is unable to mount the file-system after an unclean
reboot. The problem is theoretical, though.

In UBIFS, we have the log, which basically a set of LEBs in a certain area. The
log has the tail and the head.

Every time user writes data to the file-system, the UBIFS journal grows, and
the log grows as well, because we append new reference nodes to the head of the
log. So the head moves forward all the time, while the log tail stays at the
same position.

At any time, the UBIFS master node points to the tail of the log. When we mount
the file-system, we scan the log, and we always start from its tail, because
this is where the master node points to. The only occasion when the tail of the
log changes is the commit operation.

The commit operation has 2 phases - "commit start" and "commit end". The former
is relatively short, and does not involve much I/O. During this phase we mostly
just build various in-memory lists of the things which have to be written to
the flash media during "commit end" phase.

During the commit start phase, what we do is we "clean" the log. Indeed, the
commit operation will index all the data in the journal, so the entire journal
"disappears", and therefore the data in the log become unneeded. So we just
move the head of the log to the next LEB, and write the CS node there. This LEB
will be the tail of the new log when the commit operation finishes.

When the "commit start" phase finishes, users may write more data to the
file-system, in parallel with the ongoing "commit end" operation. At this point
the log tail was not changed yet, it is the same as it had been before we
started the commit. The log head keeps moving forward, though.

The commit operation now needs to write the new master node, and the new master
node should point to the new log tail. After this the LEBs between the old log
tail and the new log tail can be unmapped and re-used again.

And here is the possible problem. We do 2 operations: (a) We first update the
log tail position in memory (see 'ubifs_log_end_commit()'). (b) And then we
write the master node (see the big lock of code in 'do_commit()').

But nothing prevents the log head from moving forward between (a) and (b), and
the log head may "wrap" now to the old log tail. And when the "wrap" happens,
the contends of the log tail gets erased. Now a power cut happens and we are in
trouble. We end up with the old master node pointing to the old tail, which was
erased. And replay fails because it expects the master node to point to the
correct log tail at all times.

This patch merges the abovementioned (a) and (b) operations by moving the master
node change code to the 'ubifs_log_end_commit()' function, so that it runs with
the log mutex locked, which will prevent the log from being changed benween
operations (a) and (b).

Reported-by: hujianyang <hujianyang@huawei.com>
Tested-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agofs: allow open(dir, O_TMPFILE|..., 0) with mode 0
Eric Rannaud [Thu, 30 Oct 2014 08:51:01 +0000 (01:51 -0700)] 
fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

commit 69a91c237ab0ebe4e9fdeaf6d0090c85275594ec upstream.

The man page for open(2) indicates that when O_CREAT is specified, the
'mode' argument applies only to future accesses to the file:

Note that this mode applies only to future accesses of the newly
created file; the open() call that creates a read-only file
may well return a read/write file descriptor.

The man page for open(2) implies that 'mode' is treated identically by
O_CREAT and O_TMPFILE.

O_TMPFILE, however, behaves differently:

int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
assert(fd == -1);
assert(errno == EACCES);

int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
assert(fd > 0);

For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:

if (*opened & FILE_CREATED) {
/* Don't check for write permission, don't truncate */
open_flag &= ~O_TRUNC;
will_truncate = false;
acc_mode = MAY_OPEN;
path_to_nameidata(path, nd);
goto finish_open_created;
}

But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
may_open().

This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
inode is created, may_open() is called with acc_mode = MAY_OPEN, in
do_tmpfile().

A different, but related glibc bug revealed the discrepancy:
https://sourceware.org/bugzilla/show_bug.cgi?id=17523

The glibc lazily loads the 'mode' argument of open() and openat() using
va_arg() only if O_CREAT is present in 'flags' (to support both the 2
argument and the 3 argument forms of open; same idea for openat()).
However, the glibc ignores the 'mode' argument if O_TMPFILE is in
'flags'.

On x86_64, for open(), it magically works anyway, as 'mode' is in
RDX when entering open(), and is still in RDX on SYSCALL, which is where
the kernel looks for the 3rd argument of a syscall.

But openat() is not quite so lucky: 'mode' is in RCX when entering the
glibc wrapper for openat(), while the kernel looks for the 4th argument
of a syscall in R10. Indeed, the syscall calling convention differs from
the regular calling convention in this respect on x86_64. So the kernel
sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
fails with EACCES.

Signed-off-by: Eric Rannaud <e@nanocritical.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agofs: Fix theoretical division by 0 in super_cache_scan().
Tetsuo Handa [Sat, 17 May 2014 11:56:38 +0000 (20:56 +0900)] 
fs: Fix theoretical division by 0 in super_cache_scan().

commit 475d0db742e3755c6b267f48577ff7cbb7dfda0d upstream.

total_objects could be 0 and is used as a denom.

While total_objects is a "long", total_objects == 0 unlikely happens for
3.12 and later kernels because 32-bit architectures would not be able to
hold (1 << 32) objects. However, total_objects == 0 may happen for kernels
between 3.1 and 3.11 because total_objects in prune_super() was an "int"
and (e.g.) x86_64 architecture might be able to hold (1 << 32) objects.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agofs: make cont_expand_zero interruptible
Mikulas Patocka [Sun, 27 Jul 2014 17:00:41 +0000 (13:00 -0400)] 
fs: make cont_expand_zero interruptible

commit c2ca0fcd202863b14bd041a7fece2e789926c225 upstream.

This patch makes it possible to kill a process looping in
cont_expand_zero. A process may spend a lot of time in this function, so
it is desirable to be able to kill it.

It happened to me that I wanted to copy a piece data from the disk to a
file. By mistake, I used the "seek" parameter to dd instead of "skip". Due
to the "seek" parameter, dd attempted to extend the file and became stuck
doing so - the only possibility was to reset the machine or wait many
hours until the filesystem runs out of space and cont_expand_zero fails.
We need this patch to be able to terminate the process.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agommc: sdhci-s3c: fix runtime PM handling on sdhci_add_host() failure
Bartlomiej Zolnierkiewicz [Thu, 7 Aug 2014 16:07:07 +0000 (18:07 +0200)] 
mmc: sdhci-s3c: fix runtime PM handling on sdhci_add_host() failure

commit 221414db1934c1c883501998f510bb75acfbaa51 upstream.

Runtime Power Management handling for the sdhci_add_host() failure
case in sdhci_s3c_probe() should match the code in sdhci_s3c_remove()
(which uses pm_runtime_disable() call which matches the earlier
pm_runtime_enable() one).  Fix it.

This patch fixes "BUG: spinlock bad magic on CPU#0, swapper/0/1" and
"Unbalanced pm_runtime_enable!" warnings.

>From the kernel log:
...
[    1.659631] s3c-sdhci 12530000.sdhci: sdhci_add_host() failed
[    1.665096] BUG: spinlock bad magic on CPU#0, swapper/0/1
[    1.670433]  lock: 0xea01e484, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[    1.677895] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.0-next-20140804-00008-ga59480f-dirty #707
[    1.687037] [<c0013ae4>] (unwind_backtrace) from [<c0010d70>] (show_stack+0x10/0x14)
[    1.694740] [<c0010d70>] (show_stack) from [<c04050c8>] (dump_stack+0x68/0xb8)
[    1.701948] [<c04050c8>] (dump_stack) from [<c0052558>] (do_raw_spin_lock+0x15c/0x1a4)
[    1.709848] [<c0052558>] (do_raw_spin_lock) from [<c040a630>] (_raw_spin_lock_irqsave+0x20/0x28)
[    1.718619] [<c040a630>] (_raw_spin_lock_irqsave) from [<c030d7d0>] (sdhci_do_set_ios+0x1c/0x5cc)
[    1.727464] [<c030d7d0>] (sdhci_do_set_ios) from [<c030ddfc>] (sdhci_runtime_resume_host+0x50/0x104)
[    1.736574] [<c030ddfc>] (sdhci_runtime_resume_host) from [<c02462dc>] (pm_generic_runtime_resume+0x2c/0x40)
[    1.746383] [<c02462dc>] (pm_generic_runtime_resume) from [<c0247898>] (__rpm_callback+0x34/0x70)
[    1.755233] [<c0247898>] (__rpm_callback) from [<c02478fc>] (rpm_callback+0x28/0x88)
[    1.762958] [<c02478fc>] (rpm_callback) from [<c02486f0>] (rpm_resume+0x384/0x4ec)
[    1.770511] [<c02486f0>] (rpm_resume) from [<c02488b0>] (pm_runtime_forbid+0x58/0x64)
[    1.778325] [<c02488b0>] (pm_runtime_forbid) from [<c030ea70>] (sdhci_s3c_probe+0x4a4/0x540)
[    1.786749] [<c030ea70>] (sdhci_s3c_probe) from [<c02429cc>] (platform_drv_probe+0x2c/0x5c)
[    1.795076] [<c02429cc>] (platform_drv_probe) from [<c02415f0>] (driver_probe_device+0x114/0x234)
[    1.803929] [<c02415f0>] (driver_probe_device) from [<c024179c>] (__driver_attach+0x8c/0x90)
[    1.812347] [<c024179c>] (__driver_attach) from [<c023ffb4>] (bus_for_each_dev+0x54/0x88)
[    1.820506] [<c023ffb4>] (bus_for_each_dev) from [<c0240df8>] (bus_add_driver+0xd8/0x1cc)
[    1.828665] [<c0240df8>] (bus_add_driver) from [<c0241db8>] (driver_register+0x78/0xf4)
[    1.836652] [<c0241db8>] (driver_register) from [<c00088a4>] (do_one_initcall+0x80/0x1d0)
[    1.844816] [<c00088a4>] (do_one_initcall) from [<c059ac94>] (kernel_init_freeable+0x108/0x1d4)
[    1.853503] [<c059ac94>] (kernel_init_freeable) from [<c0401300>] (kernel_init+0x8/0xe4)
[    1.861568] [<c0401300>] (kernel_init) from [<c000e538>] (ret_from_fork+0x14/0x3c)
[    1.869582] platform 12530000.sdhci: Driver s3c-sdhci requests probe deferral
...
[    1.997047] s3c-sdhci 12530000.sdhci: Unbalanced pm_runtime_enable!
...
[    2.027235] s3c-sdhci 12530000.sdhci: sdhci_add_host() failed
[    2.032884] platform 12530000.sdhci: Driver s3c-sdhci requests probe deferral
...

Tested on Hardkernel's Exynos4412 based ODROID-U3 board.

Fixes: 9f4e8151dbbc ("mmc: sdhci-s3c: Enable runtime power management")
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Jaehoon Chung <jh80.chung@samsung.com>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agommc: rtsx_pci_sdmmc: fix incorrect last byte in R2 response
Roger Tseng [Fri, 15 Aug 2014 06:06:00 +0000 (14:06 +0800)] 
mmc: rtsx_pci_sdmmc: fix incorrect last byte in R2 response

commit d1419d50c1bf711e9fd27b516a739c86b23f7cf9 upstream.

Current code erroneously fill the last byte of R2 response with an undefined
value. In addition, the controller actually 'offloads' the last byte
(CRC7, end bit) while receiving R2 response and thus it's impossible to get the
actual value. This could cause mmc stack to obtain inconsistent CID from the
same card after resume and misidentify it as a different card.

Fix by assigning dummy CRC and end bit: {7'b0, 1} = 0x1 to the last byte of R2.

Fixes: ff984e57d36e ("mmc: Add realtek pcie sdmmc host driver")
Signed-off-by: Roger Tseng <rogerable@realtek.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agommc: don't request CD IRQ until mmc_start_host()
Stephen Warren [Mon, 22 Sep 2014 15:57:42 +0000 (09:57 -0600)] 
mmc: don't request CD IRQ until mmc_start_host()

commit d4d11449088ee9aca16fd1884b852b8b73a4bda1 upstream.

As soon as the CD IRQ is requested, it can trigger, since it's an
externally controlled event. If it does, delayed_work host->detect will
be scheduled.

Many host controller probe()s are roughly structured as:

*_probe() {
    host = sdhci_pltfm_init();
    mmc_of_parse(host->mmc);
    rc = sdhci_add_host(host);
    if (rc) {
        sdhci_pltfm_free();
        return rc;
    }

In 3.17, CD IRQs can are enabled quite early via *_probe() ->
mmc_of_parse() -> mmc_gpio_request_cd() -> mmc_gpiod_request_cd_irq().

Note that in linux-next, mmc_of_parse() calls mmc_gpio*d*_request_cd()
rather than mmc_gpio_request_cd(), and mmc_gpio*d*_request_cd() doesn't
call mmc_gpiod_request_cd_irq(). However, this issue still exists if
mmc_gpio_request_cd() is called directly before mmc_start_host().

sdhci_add_host() may fail part way through (e.g. due to deferred
probe for a vmmc regulator), and sdhci_pltfm_free() does nothing to
unrequest the CD IRQ nor cancel the delayed_work. sdhci_pltfm_free() is
coded to assume that if sdhci_add_host() failed, then the delayed_work
cannot (or should not) have been triggered.

This can lead to the following with CONFIG_DEBUG_OBJECTS_* enabled, when
kfree(host) is eventually called inside sdhci_pltfm_free():

WARNING: CPU: 2 PID: 6 at lib/debugobjects.c:263 debug_print_object+0x8c/0xb4()
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x18

The object being complained about is host->detect.

There's no need to request the CD IRQ so early; mmc_start_host() already
requests it. For most SDHCI hosts at least, the typical call path that
does this is: *_probe() -> sdhci_add_host() -> mmc_add_host() ->
mmc_start_host(). Therefore, remove the call to mmc_gpiod_request_cd_irq()
from mmc_gpio_request_cd(). This also matches mmc_gpio*d*_request_cd(),
which already doesn't call mmc_gpiod_request_cd_irq().

However, some host controller drivers call mmc_gpio_request_cd() after
mmc_start_host() has already been called, and assume that this will also
call mmc_gpiod_request_cd_irq(). Update those drivers to explicitly call
mmc_gpiod_request_cd_irq() themselves. Ideally, these drivers should be
modified to move their call to mmc_gpio_request_cd() before their call
to mmc_add_host(). However that's too large a change for stable.

This solves the problem (eliminates the kernel error message above),
since it guarantees that the IRQ can't trigger before mmc_start_host()
is called.

The critical point here is that once sdhci_add_host() calls
mmc_add_host() -> mmc_start_host(), sdhci_add_host() is coded not to
fail. In other words, if there's a chance that mmc_start_host() may have
been called, and CD IRQs triggered, and the delayed_work scheduled,
sdhci_add_host() won't fail, and so cleanup is no longer via
sdhci_pltfm_free() (which doesn't free the IRQ or cancel the work queue)
but instead must be via sdhci_remove_host(), which calls mmc_remove_host()
-> mmc_stop_host(), which does free the IRQ and cancel the work queue.

CC: Russell King <linux@arm.linux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agommc: rtsx_usb_sdmmc: fix incorrect last byte in R2 response
Roger Tseng [Fri, 15 Aug 2014 06:06:01 +0000 (14:06 +0800)] 
mmc: rtsx_usb_sdmmc: fix incorrect last byte in R2 response

commit 6f67cc6fd1cf339a0f19b9d4a998ec3c0123b1b6 upstream.

Current code erroneously fill the last byte of R2 response with an undefined
value. In addition, the controller actually 'offloads' the last byte
(CRC7, end bit) while receiving R2 response and thus it's impossible to get the
actual value. This could cause mmc stack to obtain inconsistent CID from the
same card after resume and misidentify it as a different card.

Fix by assigning dummy CRC and end bit: {7'b0, 1} = 0x1 to the last byte of R2.

Fixes: c7f6558d84af ("mmc: Add realtek USB sdmmc host driver")
Signed-off-by: Roger Tseng <rogerable@realtek.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agommc: sdhci-pxav3: set_uhs_signaling is initialized twice differently
Peter Griffin [Fri, 15 Aug 2014 13:02:15 +0000 (14:02 +0100)] 
mmc: sdhci-pxav3: set_uhs_signaling is initialized twice differently

commit b315376573778b195e640a163675fb9f5937ddca upstream.

.set_uhs_signaling field is currently initialised twice once to the
arch specific callback pxav3_set_uhs_signaling, and also to the generic
sdhci_set_uhs_signaling callback.

This means that uhs is currently broken for this platform currently, as pxav3
has some special constriants which means it can't use the generic callback.

This happened in
commit 96d7b78cfc2f ("mmc: sdhci: convert sdhci_set_uhs_signaling() into a library function")
commit a702c8abb2a9 ("mmc: host: split up sdhci-pxa, create sdhci-pxav3.c")'

Fix this and hopefully prevent it happening in the future by ensuring named
initialisers always follow the declaration order in the structure definition.

Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agommc: core: sdio: Fix unconditional wake_up_process() on sdio thread
Fu Zhonghui [Mon, 18 Aug 2014 02:48:14 +0000 (10:48 +0800)] 
mmc: core: sdio: Fix unconditional wake_up_process() on sdio thread

commit dea67c4ec8218b301d7cac7ee6e63dac0bc566cb upstream.

781e989cf59 ("mmc: sdhci: convert to new SDIO IRQ handling") and
bf3b5ec66bd ("mmc: sdio_irq: rework sdio irq handling") disabled
the use of our own custom threaded IRQ handler, but left in an
unconditional wake_up_process() on that handler at resume-time.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=80151
In addition, the check for MMC_CAP_SDIO_IRQ capability is added
before enable sdio IRQ.

Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Chris Ball <chris@printf.net>
Signed-off-by: Fu Zhonghui <zhonghui.fu@linux.intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoASoC: adau1761: Fix input PGA volume
Lars-Peter Clausen [Wed, 22 Oct 2014 08:51:18 +0000 (10:51 +0200)] 
ASoC: adau1761: Fix input PGA volume

commit 3b283f0893f55cb79e4507e5ec34e49c17d0a787 upstream.

For the input PGA to work correctly the ALC clock needs to be active.
Otherwise volume changes are not applied.

Fixes: dab464b60b2 ("ASoC: Add ADAU1361/ADAU1761 audio CODEC support")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoASoC: Intel: HSW/BDW only support S16 and S24 formats.
Liam Girdwood [Thu, 16 Oct 2014 14:29:14 +0000 (15:29 +0100)] 
ASoC: Intel: HSW/BDW only support S16 and S24 formats.

commit 2ccf3bd4f8b120936cdfc796baf40c5d3dfab68d upstream.

Fix driver with correct formats.

Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoASoC: tlv320aic3x: fix PLL D configuration
Dmitry Lavnikevich [Fri, 3 Oct 2014 13:18:56 +0000 (16:18 +0300)] 
ASoC: tlv320aic3x: fix PLL D configuration

commit 31d9f8faf9a54c851e835af489c82f45105a442f upstream.

Current caching implementation during regcache_sync() call bypasses
all register writes of values that are already known as default
(regmap reg_defaults). Same time in TLV320AIC3x codecs register 5
(AIC3X_PLL_PROGC_REG) write should be immediately followed by register
6 write (AIC3X_PLL_PROGD_REG) even if it was not changed. Otherwise
both registers will not be written.

This brings to issue that appears particulary in case of 44.1kHz
playback with 19.2MHz master clock. In this case AIC3X_PLL_PROGC_REG
is 0x6e while AIC3X_PLL_PROGD_REG is 0x0 (same as register
default). Thus AIC3X_PLL_PROGC_REG also remains not written and we get
wrong playback speed.

In this patch snd_soc_read() is used to get cached pll values and
snd_soc_write() (unlike regcache_sync() this function doesn't bypasses
hardware default values) to write them to registers.

Signed-off-by: Dmitry Lavnikevich <d.lavnikevich@sam-solutions.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoASoC: soc-pcm: fix sig_bits determination in soc_pcm_apply_msb()
Daniel Mack [Tue, 7 Oct 2014 12:33:46 +0000 (14:33 +0200)] 
ASoC: soc-pcm: fix sig_bits determination in soc_pcm_apply_msb()

commit 5e63dfccf34d4dbf21429c4919f33c028ff49991 upstream.

In the SNDRV_PCM_STREAM_CAPTURE branch in soc_pcm_apply_msb(), look at
sig_bits of the capture stream, not the playback one.

Spotted by coverity.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoASoC: soc-dapm: fix use after free
Daniel Mack [Tue, 7 Oct 2014 11:41:24 +0000 (13:41 +0200)] 
ASoC: soc-dapm: fix use after free

commit e5092c96c9c28f4d12811edcd02ca8eec16e748e upstream.

Coverity spotted the following possible use-after-free condition in
dapm_create_or_share_mixmux_kcontrol():

If kcontrol is NULL, and (wname_in_long_name && kcname_in_long_name)
validates to true, 'name' will be set to an allocated string, and be
freed a few lines later via the 'long_name' alias. 'name', however,
is used by dev_err() in case snd_ctl_add() fails.

Fix this by adding a jump label that frees 'long_name' at the end of
the function.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoASoC: core: fix use after free in snd_soc_remove_platform()
Daniel Mack [Tue, 7 Oct 2014 11:41:23 +0000 (13:41 +0200)] 
ASoC: core: fix use after free in snd_soc_remove_platform()

commit decc27b01d584c985c231e73d3b493de6ec07af8 upstream.

Coverity spotted an use-after-free condition in snd_soc_remove_platform().
Fix this by moving snd_soc_component_cleanup() after the debug print
statement which uses the component's string.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agolibata-sff: Fix controllers with no ctl port
Ondrej Zary [Fri, 26 Sep 2014 22:04:46 +0000 (00:04 +0200)] 
libata-sff: Fix controllers with no ctl port

commit 6d8ca28fa688a9354bc9fbc935bdaeb3651b6677 upstream.

Currently, ata_sff_softreset is skipped for controllers with no ctl port.
But that also skips ata_sff_dev_classify required for device detection.
This means that libata is currently broken on controllers with no ctl port.

No device connected:
[    1.872480] pata_isapnp 01:01.02: activated
[    1.889823] scsi2 : pata_isapnp
[    1.890109] ata3: PATA max PIO0 cmd 0x1e8 ctl 0x0 irq 11
[    6.888110] ata3.01: qc timeout (cmd 0xec)
[    6.888179] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[   16.888085] ata3.01: qc timeout (cmd 0xec)
[   16.888147] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[   46.888086] ata3.01: qc timeout (cmd 0xec)
[   46.888148] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[   51.888100] ata3.00: qc timeout (cmd 0xec)
[   51.888160] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[   61.888079] ata3.00: qc timeout (cmd 0xec)
[   61.888141] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[   91.888089] ata3.00: qc timeout (cmd 0xec)
[   91.888152] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5)

ATAPI device connected:
[    1.882061] pata_isapnp 01:01.02: activated
[    1.893430] scsi2 : pata_isapnp
[    1.893719] ata3: PATA max PIO0 cmd 0x1e8 ctl 0x0 irq 11
[    6.892107] ata3.01: qc timeout (cmd 0xec)
[    6.892171] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[   16.892079] ata3.01: qc timeout (cmd 0xec)
[   16.892138] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[   46.892079] ata3.01: qc timeout (cmd 0xec)
[   46.892138] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[   46.908586] ata3.00: ATAPI: ACER CD-767E/O, V1.5X, max PIO2, CDB intr
[   46.924570] ata3.00: configured for PIO0 (device error ignored)
[   46.926295] scsi 2:0:0:0: CD-ROM            ACER     CD-767E/O        1.5X PQ: 0 ANSI: 5
[   46.984519] sr0: scsi3-mmc drive: 6x/6x xa/form2 tray
[   46.984592] cdrom: Uniform CD-ROM driver Revision: 3.20

So don't skip ata_sff_softreset, just skip the reset part of ata_bus_softreset
if the ctl port is not available.

This makes IDE port on ES968 behave correctly:

No device connected:
[    4.670888] pata_isapnp 01:01.02: activated
[    4.673207] scsi host2: pata_isapnp
[    4.673675] ata3: PATA max PIO0 cmd 0x1e8 ctl 0x0 irq 11
[    7.081840] Adding 2541652k swap on /dev/sda2.  Priority:-1 extents:1 across:2541652k

ATAPI device connected:
[    4.704362] pata_isapnp 01:01.02: activated
[    4.706620] scsi host2: pata_isapnp
[    4.706877] ata3: PATA max PIO0 cmd 0x1e8 ctl 0x0 irq 11
[    4.872782] ata3.00: ATAPI: ACER CD-767E/O, V1.5X, max PIO2, CDB intr
[    4.888673] ata3.00: configured for PIO0 (device error ignored)
[    4.893984] scsi 2:0:0:0: CD-ROM            ACER     CD-767E/O        1.5X PQ: 0 ANSI: 5
[    7.015578] Adding 2541652k swap on /dev/sda2.  Priority:-1 extents:1 across:2541652k

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agopata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller
Scott Carter [Thu, 25 Sep 2014 01:13:09 +0000 (18:13 -0700)] 
pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller

commit 37017ac6849e772e67dd187ba2fbd056c4afa533 upstream.

The Broadcom OSB4 IDE Controller (vendor and device IDs: 1166:0211)
does not support 64-KB DMA transfers.
Whenever a 64-KB DMA transfer is attempted,
the transfer fails and messages similar to the following
are written to the console log:

   [ 2431.851125] sr 0:0:0:0: [sr0] Unhandled sense code
   [ 2431.851139] sr 0:0:0:0: [sr0]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
   [ 2431.851152] sr 0:0:0:0: [sr0]  Sense Key : Hardware Error [current]
   [ 2431.851166] sr 0:0:0:0: [sr0]  Add. Sense: Logical unit communication time-out
   [ 2431.851182] sr 0:0:0:0: [sr0] CDB: Read(10): 28 00 00 00 76 f4 00 00 40 00
   [ 2431.851210] end_request: I/O error, dev sr0, sector 121808

When the libata and pata_serverworks modules
are recompiled with ATA_DEBUG and ATA_VERBOSE_DEBUG defined in libata.h,
the 64-KB transfer size in the scatter-gather list can be seen
in the console log:

   [ 2664.897267] sr 9:0:0:0: [sr0] Send:
   [ 2664.897274] 0xf63d85e0
   [ 2664.897283] sr 9:0:0:0: [sr0] CDB:
   [ 2664.897288] Read(10): 28 00 00 00 7f b4 00 00 40 00
   [ 2664.897319] buffer = 0xf6d6fbc0, bufflen = 131072, queuecommand 0xf81b7700
   [ 2664.897331] ata_scsi_dump_cdb: CDB (1:0,0,0) 28 00 00 00 7f b4 00 00 40
   [ 2664.897338] ata_scsi_translate: ENTER
   [ 2664.897345] ata_sg_setup: ENTER, ata1
   [ 2664.897356] ata_sg_setup: 3 sg elements mapped
   [ 2664.897364] ata_bmdma_fill_sg: PRD[0] = (0x66FD2000, 0xE000)
   [ 2664.897371] ata_bmdma_fill_sg: PRD[1] = (0x65000000, 0x10000)
   ------------------------------------------------------> =======
   [ 2664.897378] ata_bmdma_fill_sg: PRD[2] = (0x66A10000, 0x2000)
   [ 2664.897386] ata1: ata_dev_select: ENTER, device 0, wait 1
   [ 2664.897422] ata_sff_tf_load: feat 0x1 nsect 0x0 lba 0x0 0x0 0xFC
   [ 2664.897428] ata_sff_tf_load: device 0xA0
   [ 2664.897448] ata_sff_exec_command: ata1: cmd 0xA0
   [ 2664.897457] ata_scsi_translate: EXIT
   [ 2664.897462] leaving scsi_dispatch_cmnd()
   [ 2664.897497] Doing sr request, dev = sr0, block = 0
   [ 2664.897507] sr0 : reading 64/256 512 byte blocks.
   [ 2664.897553] ata_sff_hsm_move: ata1: protocol 7 task_state 1 (dev_stat 0x58)
   [ 2664.897560] atapi_send_cdb: send cdb
   [ 2666.910058] ata_bmdma_port_intr: ata1: host_stat 0x64
   [ 2666.910079] __ata_sff_port_intr: ata1: protocol 7 task_state 3
   [ 2666.910093] ata_sff_hsm_move: ata1: protocol 7 task_state 3 (dev_stat 0x51)
   [ 2666.910101] ata_sff_hsm_move: ata1: protocol 7 task_state 4 (dev_stat 0x51)
   [ 2666.910129] sr 9:0:0:0: [sr0] Done:
   [ 2666.910136] 0xf63d85e0 TIMEOUT

lspci shows that the driver used for the Broadcom OSB4 IDE Controller is
pata_serverworks:

   00:0f.1 IDE interface: Broadcom OSB4 IDE Controller (prog-if 8e [Master SecP SecO PriP])
           Flags: bus master, medium devsel, latency 64
           [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
           [virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1]
           I/O ports at 0170 [size=8]
           I/O ports at 0374 [size=4]
           I/O ports at 1440 [size=16]
           Kernel driver in use: pata_serverworks

The pata_serverworks driver supports five distinct device IDs,
one being the OSB4 and the other four belonging to the CSB series.
The CSB series appears to support 64-KB DMA transfers,
as tests on a machine with an SAI2 motherboard
containing a Broadcom CSB5 IDE Controller (vendor and device IDs: 1166:0212)
showed no problems with 64-KB DMA transfers.

This problem was first discovered when attempting to install openSUSE
from a DVD on a machine with an STL2 motherboard.
Using the pata_serverworks module,
older releases of openSUSE will not install at all due to the timeouts.
Releases of openSUSE prior to 11.3 can be installed by disabling
the pata_serverworks module using the brokenmodules boot parameter,
which causes the serverworks module to be used instead.
Recent releases of openSUSE (12.2 and later) include better error recovery and
will install, though very slowly.
On all openSUSE releases, the problem can be recreated
on a machine containing a Broadcom OSB4 IDE Controller
by mounting an install DVD and running a command similar to the following:

   find /mnt -type f -print | xargs cat > /dev/null

The patch below corrects the problem.
Similar to the other ATA drivers that do not support 64-KB DMA transfers,
the patch changes the ata_port_operations qc_prep vector to point to a routine
that breaks any 64-KB segment into two 32-KB segments and
changes the scsi_host_template sg_tablesize element to reduce by half
the number of scatter/gather elements allowed.
These two changes affect only the OSB4.

Signed-off-by: Scott Carter <ccscott@funsoft.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoRevert "percpu: free percpu allocation info for uniprocessor system"
Guenter Roeck [Sun, 21 Sep 2014 22:04:53 +0000 (15:04 -0700)] 
Revert "percpu: free percpu allocation info for uniprocessor system"

commit bb2e226b3bef596dd56be97df655d857b4603923 upstream.

This reverts commit 3189eddbcafc ("percpu: free percpu allocation info for
uniprocessor system").

The commit causes a hang with a crisv32 image. This may be an architecture
problem, but at least for now the revert is necessary to be able to boot a
crisv32 image.

Cc: Tejun Heo <tj@kernel.org>
Cc: Honggang Li <enjoymindful@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 3189eddbcafc ("percpu: free percpu allocation info for uniprocessor system")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoSUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
Trond Myklebust [Thu, 25 Sep 2014 02:35:58 +0000 (22:35 -0400)] 
SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT

commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e upstream.

The flag RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT was intended introduced in
order to allow NFSv4 clients to disable resend timeouts. Since those
cause the RPC layer to break the connection, they mess up the duplicate
reply caches that remain indexed on the port number in NFSv4..

This patch includes the code that was missing in the original to
set the appropriate flag in struct rpc_clnt, when the caller of
rpc_create() sets RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT.

Fixes: 8a19a0b6cb2e (SUNRPC: Add RPC task and client level options to...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoSUNRPC: Don't wake tasks during connection abort
Benjamin Coddington [Tue, 23 Sep 2014 16:26:19 +0000 (12:26 -0400)] 
SUNRPC: Don't wake tasks during connection abort

commit a743419f420a64d442280845c0377a915b76644f upstream.

When aborting a connection to preserve source ports, don't wake the task in
xs_error_report.  This allows tasks with RPC_TASK_SOFTCONN to succeed if the
connection needs to be re-established since it preserves the task's status
instead of setting it to the status of the aborting kernel_connect().

This may also avoid a potential conflict on the socket's lock.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agolockd: Try to reconnect if statd has moved
Benjamin Coddington [Tue, 23 Sep 2014 16:26:20 +0000 (12:26 -0400)] 
lockd: Try to reconnect if statd has moved

commit 173b3afceebe76fa2205b2c8808682d5b541fe3c upstream.

If rpc.statd is restarted, upcalls to monitor hosts can fail with
ECONNREFUSED.  In that case force a lookup of statd's new port and retry the
upcall.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agostmmac: pci: set default of the filter bins
Andy Shevchenko [Fri, 31 Oct 2014 16:28:03 +0000 (18:28 +0200)] 
stmmac: pci: set default of the filter bins

[ Upstream commit 1e19e084eae727654052339757ab7f1eaff58bad ]

The commit 3b57de958e2a brought the support for a different amount of the
filter bins, but didn't update the PCI driver accordingly. This patch appends
the default values when the device is enumerated via PCI bus.

Fixes: 3b57de958e2a (net: stmmac: Support devicetree configs for mcast and ucast filter entries)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrivers/net: macvtap and tun depend on INET
Ben Hutchings [Fri, 31 Oct 2014 03:10:31 +0000 (03:10 +0000)] 
drivers/net: macvtap and tun depend on INET

[ Upstream commit de11b0e8c569b96c2cf6a811e3805b7aeef498a3 ]

These drivers now call ipv6_proxy_select_ident(), which is defined
only if CONFIG_INET is enabled.  However, they have really depended
on CONFIG_INET for as long as they have allowed sending GSO packets
from userland.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: f43798c27684 ("tun: Allow GSO using virtio_net_hdr")
Fixes: b9fb9ee07e67 ("macvtap: add GSO/csum offload support")
Fixes: 5188cd44c55d ("drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets
Ben Hutchings [Thu, 30 Oct 2014 18:27:17 +0000 (18:27 +0000)] 
drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets

[ Upstream commit 5188cd44c55db3e92cd9e77a40b5baa7ed4340f7 ]

UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway.  Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agogre: Use inner mac length when computing tunnel length
Tom Herbert [Thu, 30 Oct 2014 15:40:56 +0000 (08:40 -0700)] 
gre: Use inner mac length when computing tunnel length

[ Upstream commit 14051f0452a2c26a3f4791e6ad6a435e8f1945ff ]

Currently, skb_inner_network_header is used but this does not account
for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
handles TEB and also should work with IP encapsulation in which case
inner mac and inner network headers are the same.

Tested: Ran TCP_STREAM over GRE, worked as expected.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomlx4: Avoid leaking steering rules on flow creation error flow
Or Gerlitz [Thu, 30 Oct 2014 13:59:28 +0000 (15:59 +0200)] 
mlx4: Avoid leaking steering rules on flow creation error flow

[ Upstream commit 571e1b2c7a4c2fd5faa1648462a6b65fa26530d7 ]

If mlx4_ib_create_flow() attempts to create > 1 rules with the
firmware, and one of these registrations fail, we leaked the
already created flow rules.

One example of the leak is when the registration of the VXLAN ghost
steering rule fails, we didn't unregister the original rule requested
by the user, introduced in commit d2fce8a9060d "mlx4: Set
user-space raw Ethernet QPs to properly handle VXLAN traffic".

While here, add dump of the VXLAN portion of steering rules
so it can actually be seen when flow creation fails.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agonet/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN
Or Gerlitz [Thu, 30 Oct 2014 13:59:27 +0000 (15:59 +0200)] 
net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN

[ Upstream commit a4f2dacbf2a5045e34b98a35d9a3857800f25a7b ]

For VXLAN/NVGRE encapsulation, the current HW doesn't support offloading
both the outer UDP TX checksum and the inner TCP/UDP TX checksum.

The driver doesn't advertize SKB_GSO_UDP_TUNNEL_CSUM, however we are wrongly
telling the HW to offload the outer UDP checksum for encapsulated packets,
fix that.

Fixes: 837052d0ccc5 ('net/mlx4_en: Add netdev support for TCP/IP
     offloads of vxlan tunneling')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoipv4: Do not cache routing failures due to disabled forwarding.
Nicolas Cavallari [Thu, 30 Oct 2014 09:09:53 +0000 (10:09 +0100)] 
ipv4: Do not cache routing failures due to disabled forwarding.

[ Upstream commit fa19c2b050ab5254326f5fc07096dd3c6a8d5d58 ]

If we cache them, the kernel will reuse them, independently of
whether forwarding is enabled or not.  Which means that if forwarding is
disabled on the input interface where the first routing request comes
from, then that unreachable result will be cached and reused for
other interfaces, even if forwarding is enabled on them.  The opposite
is also true.

This can be verified with two interfaces A and B and an output interface
C, where B has forwarding enabled, but not A and trying
ip route get $dst iif A from $src && ip route get $dst iif B from $src

Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agocxgb4 : Fix missing initialization of win0_lock
Anish Bhatt [Thu, 30 Oct 2014 00:54:03 +0000 (17:54 -0700)] 
cxgb4 : Fix missing initialization of win0_lock

[ Upstream commit e327c225c911529898ec300cb96d2088893de3df ]

win0_lock was being used un-initialized, resulting in warning traces
being seen when lock debugging is enabled (and just wrong)

Fixes : fc5ab0209650 ('cxgb4: Replaced the backdoor mechanism to access the HW
 memory with PCIe Window method')

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomacvlan: fix a race on port dismantle and possible skb leaks
Eric Dumazet [Thu, 23 Oct 2014 02:43:46 +0000 (19:43 -0700)] 
macvlan: fix a race on port dismantle and possible skb leaks

[ Upstream commit fe0ca7328d03d36aafecebb3af650e1bb2841c20 ]

We need to cancel the work queue after rcu grace period,
otherwise it can be rescheduled by incoming packets.

We need to purge queue if some skbs are still in it.

We can use __skb_queue_head_init() variant in
macvlan_process_broadcast()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 412ca1550cbec ("macvlan: Move broadcasts into a work queue")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agotcp: md5: do not use alloc_percpu()
Eric Dumazet [Thu, 23 Oct 2014 19:58:58 +0000 (12:58 -0700)] 
tcp: md5: do not use alloc_percpu()

[ Upstream commit 349ce993ac706869d553a1816426d3a4bfda02b1 ]

percpu tcp_md5sig_pool contains memory blobs that ultimately
go through sg_set_buf().

-> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));

This requires that whole area is in a physically contiguous portion
of memory. And that @buf is not backed by vmalloc().

Given that alloc_percpu() can use vmalloc() areas, this does not
fit the requirements.

Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool
is small anyway, there is no gain to dynamically allocate it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 765cf9976e93 ("tcp: md5: remove one indirection level in tcp_md5sig_pool")
Reported-by: Crestez Dan Leonard <cdleonard@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agohyperv: Fix the total_data_buflen in send path
Haiyang Zhang [Wed, 22 Oct 2014 20:47:18 +0000 (13:47 -0700)] 
hyperv: Fix the total_data_buflen in send path

[ Upstream commit 942396b01989d54977120f3625e5ba31afe7a75c ]

total_data_buflen is used by netvsc_send() to decide if a packet can be put
into send buffer. It should also include the size of RNDIS message before the
Ethernet frame. Otherwise, a messge with total size bigger than send_section_size
may be copied into the send buffer, and cause data corruption.

[Request to include this patch to the Stable branches]

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agonet: fix saving TX flow hash in sock for outgoing connections
Sathya Perla [Wed, 22 Oct 2014 16:12:01 +0000 (21:42 +0530)] 
net: fix saving TX flow hash in sock for outgoing connections

[ Upstream commit 9e7ceb060754f134231f68cb29d5db31419fe1ed ]

The commit "net: Save TX flow hash in sock and set in skbuf on xmit"
introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate
and record flow hash(sk_txhash) in the socket structure. sk_txhash is used
to set skb->hash which is used to spread flows across multiple TXQs.

But, the above routines are invoked before the source port of the connection
is created. Because of this all outgoing connections that just differ in the
source port get hashed into the same TXQ.

This patch fixes this problem for IPv4/6 by invoking the the above routines
after the source port is available for the socket.

Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit")
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agonet: tso: fix unaligned access to crafted TCP header in helper API
Karl Beldan [Tue, 21 Oct 2014 14:06:05 +0000 (16:06 +0200)] 
net: tso: fix unaligned access to crafted TCP header in helper API

[ Upstream commit a63ba13eec092b70d4e5522d692eaeb2f9747387 ]

The crafted header start address is from a driver supplied buffer, which
one can reasonably expect to be aligned on a 4-bytes boundary.
However ATM the TSO helper API is only used by ethernet drivers and
the tcp header will then be aligned to a 2-bytes only boundary from the
header start address.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agonetlink: Re-add locking to netlink_lookup() and seq walker
Thomas Graf [Tue, 21 Oct 2014 20:05:38 +0000 (22:05 +0200)] 
netlink: Re-add locking to netlink_lookup() and seq walker

[ Upstream commit 78fd1d0ab072d4d9b5f0b7c14a1516665170b565 ]

The synchronize_rcu() in netlink_release() introduces unacceptable
latency. Reintroduce minimal lookup so we can drop the
synchronize_rcu() until socket destruction has been RCUfied.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoax88179_178a: fix bonding failure
Ian Morgan [Sun, 19 Oct 2014 12:05:13 +0000 (08:05 -0400)] 
ax88179_178a: fix bonding failure

[ Upstream commit 95ff88688781db2f64042e69bd499e518bbb36e5 ]

The following patch fixes a bug which causes the ax88179_178a driver to be
incapable of being added to a bond.

When I brought up the issue with the bonding maintainers, they indicated
that the real problem was with the NIC driver which must return zero for
success (of setting the MAC address). I see that several other NIC drivers
follow that pattern by either simply always returing zero, or by passing
through a negative (error) result while rewriting any positive return code
to zero. With that same philisophy applied to the ax88179_178a driver, it
allows it to work correctly with the bonding driver.

I believe this is suitable for queuing in -stable, as it's a small, simple,
and obvious fix that corrects a defect with no other known workaround.

This patch is against vanilla 3.17(.0).

Signed-off-by: Ian Morgan <imorgan@primordial.ca>
 drivers/net/usb/ax88179_178a.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agotipc: fix bug in bundled buffer reception
Jon Paul Maloy [Fri, 17 Oct 2014 19:25:28 +0000 (15:25 -0400)] 
tipc: fix bug in bundled buffer reception

[ Upstream commit 643566d4b47e2956110e79c0e6f65db9b9ea42c6 ]

In commit ec8a2e5621db2da24badb3969eda7fd359e1869f ("tipc: same receive
code path for connection protocol and data messages") we omitted the
the possiblilty that an arriving message extracted from a bundle buffer
may be a multicast message. Such messages need to be to be delivered to
the socket via a separate function, tipc_sk_mcast_rcv(). As a result,
small multicast messages arriving as members of a bundle buffer will be
silently dropped.

This commit corrects the error by considering this case in the function
tipc_link_bundle_rcv().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoipv4: fix a potential use after free in ip_tunnel_core.c
Li RongQing [Fri, 17 Oct 2014 08:53:23 +0000 (16:53 +0800)] 
ipv4: fix a potential use after free in ip_tunnel_core.c

[ Upstream commit 1245dfc8cadb258386fcd27df38215a0eccb1f17 ]

pskb_may_pull() maybe change skb->data and make eth pointer oboslete,
so set eth after pskb_may_pull()

Fixes:3d7b46cd("ip_tunnel: push generic protocol handling to ip_tunnel module")
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoipv4: dst_entry leak in ip_send_unicast_reply()
Vasily Averin [Wed, 15 Oct 2014 12:24:02 +0000 (16:24 +0400)] 
ipv4: dst_entry leak in ip_send_unicast_reply()

[ Upstream commit 4062090e3e5caaf55bed4523a69f26c3265cc1d2 ]

ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork
and in case errors in __ip_append_data() nobody frees stolen dst entry

Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
Signed-off-by: Vasily Averin <vvs@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agovxlan: fix a free after use
Li RongQing [Fri, 17 Oct 2014 06:06:16 +0000 (14:06 +0800)] 
vxlan: fix a free after use

[ Upstream commit 7a9f526fc3ee49b6034af2f243676ee0a27dcaa8 ]

pskb_may_pull maybe change skb->data and make eth pointer oboslete,
so eth needs to reload

Fixes: 91269e390d062 ("vxlan: using pskb_may_pull as early as possible")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agovxlan: using pskb_may_pull as early as possible
Li RongQing [Thu, 16 Oct 2014 01:17:18 +0000 (09:17 +0800)] 
vxlan: using pskb_may_pull as early as possible

[ Upstream commit 91269e390d062b526432f2ef1352b8df82e0e0bc ]

pskb_may_pull should be used to check if skb->data has enough space,
skb->len can not ensure that.

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agovxlan: fix a use after free in vxlan_encap_bypass
Li RongQing [Thu, 16 Oct 2014 00:49:41 +0000 (08:49 +0800)] 
vxlan: fix a use after free in vxlan_encap_bypass

[ Upstream commit ce6502a8f9572179f044a4d62667c4645256d6e4 ]

when netif_rx() is done, the netif_rx handled skb maybe be freed,
and should not be used.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoipv4: fix nexthop attlen check in fib_nh_match
Jiri Pirko [Mon, 13 Oct 2014 14:34:10 +0000 (16:34 +0200)] 
ipv4: fix nexthop attlen check in fib_nh_match

[ Upstream commit f76936d07c4eeb36d8dbb64ebd30ab46ff85d9f7 ]

fib_nh_match does not match nexthops correctly. Example:

ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \
                          nexthop via 192.168.122.13 dev eth0
ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \
                          nexthop via 192.168.122.15 dev eth0

Del command is successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process

Please consider this for stable trees as well.

Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agox86: bpf_jit: fix two bugs in eBPF JIT compiler
Alexei Starovoitov [Sat, 11 Oct 2014 03:30:23 +0000 (20:30 -0700)] 
x86: bpf_jit: fix two bugs in eBPF JIT compiler

[ Upstream commit e0ee9c12157dc74e49e4731e0d07512e7d1ceb95 ]

1.
JIT compiler using multi-pass approach to converge to final image size,
since x86 instructions are variable length. It starts with large
gaps between instructions (so some jumps may use imm32 instead of imm8)
and iterates until total program size is the same as in previous pass.
This algorithm works only if program size is strictly decreasing.
Programs that use LD_ABS insn need additional code in prologue, but it
was not emitted during 1st pass, so there was a chance that 2nd pass would
adjust imm32->imm8 jump offsets to the same number of bytes as increase in
prologue, which may cause algorithm to erroneously decide that size converged.
Fix it by always emitting largest prologue in the first pass which
is detected by oldproglen==0 check.
Also change error check condition 'proglen != oldproglen' to fail gracefully.

2.
while staring at the code realized that 64-byte buffer may not be enough
when 1st insn is large, so increase it to 128 to avoid buffer overflow
(theoretical maximum size of prologue+div is 109) and add runtime check.

Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT")
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoKVM: emulator: fix execution close to the segment limit
Paolo Bonzini [Mon, 27 Oct 2014 13:40:39 +0000 (14:40 +0100)] 
KVM: emulator: fix execution close to the segment limit

commit fd56e1546a5f734290cbedd2b81c518850736511 upstream.

Emulation of code that is 14 bytes to the segment limit or closer
(e.g. RIP = 0xFFFFFFF2 after reset) is broken because we try to read as
many as 15 bytes from the beginning of the instruction, and __linearize
fails when the passed (address, size) pair reaches out of the segment.

To fix this, let __linearize return the maximum accessible size (clamped
to 2^32-1) for usage in __do_insn_fetch_bytes, and avoid the limit check
by passing zero for the desired size.

For expand-down segments, __linearize is performing a redundant check.
(u32)(addr.ea + size - 1) <= lim can only happen if addr.ea is close
to 4GB; in this case, addr.ea + size - 1 will also fail the check against
the upper bound of the segment (which is provided by the D/B bit).
After eliminating the redundant check, it is simple to compute
the *max_size for expand-down segments too.

Now that the limit check is done in __do_insn_fetch_bytes, we want
to inject a general protection fault there if size < op_size (like
__linearize would have done), instead of just aborting.

This fixes booting Tiano Core from emulated flash with EPT disabled.

Fixes: 719d5a9b2487e0562f178f61e323c3dc18a8b200
Reported-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agotracing/syscalls: Ignore numbers outside NR_syscalls' range
Rabin Vincent [Wed, 29 Oct 2014 22:06:58 +0000 (23:06 +0100)] 
tracing/syscalls: Ignore numbers outside NR_syscalls' range

commit 086ba77a6db00ed858ff07451bedee197df868c9 upstream.

ARM has some private syscalls (for example, set_tls(2)) which lie
outside the range of NR_syscalls.  If any of these are called while
syscall tracing is being performed, out-of-bounds array access will
occur in the ftrace and perf sys_{enter,exit} handlers.

 # trace-cmd record -e raw_syscalls:* true && trace-cmd report
 ...
 true-653   [000]   384.675777: sys_enter:            NR 192 (0, 1000, 3, 4000022ffffffff, 0)
 true-653   [000]   384.675812: sys_exit:             NR 192 = 1995915264
 true-653   [000]   384.675971: sys_enter:            NR 983045 (76f7448076f7400076f74b2876f7448076f76f74, 1)
 true-653   [000]   384.675988: sys_exit:             NR 983045 = 0
 ...

 # trace-cmd record -e syscalls:* true
 [   17.289329] Unable to handle kernel paging request at virtual address aaaaaace
 [   17.289590] pgd = 9e71c000
 [   17.289696] [aaaaaace] *pgd=00000000
 [   17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
 [   17.290169] Modules linked in:
 [   17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
 [   17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
 [   17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
 [   17.290866] LR is at syscall_trace_enter+0x124/0x184

Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.

Commit cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls"
added the check for less than zero, but it should have also checked
for greater than NR_syscalls.

Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in
Fixes: cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls"
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoLinux 3.17.2 v3.17.2
Greg Kroah-Hartman [Thu, 30 Oct 2014 16:43:25 +0000 (09:43 -0700)] 
Linux 3.17.2

10 years agosparc64: Implement __get_user_pages_fast().
David S. Miller [Fri, 24 Oct 2014 16:59:02 +0000 (09:59 -0700)] 
sparc64: Implement __get_user_pages_fast().

[ Upstream commit 06090e8ed89ea2113a236befb41f71d51f100e60 ]

It is not sufficient to only implement get_user_pages_fast(), you
must also implement the atomic version __get_user_pages_fast()
otherwise you end up using the weak symbol fallback implementation
which simply returns zero.

This is dangerous, because it causes the futex code to loop forever
if transparent hugepages are supported (see get_futex_key()).

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosparc64: Fix register corruption in top-most kernel stack frame during boot.
David S. Miller [Thu, 23 Oct 2014 19:58:13 +0000 (12:58 -0700)] 
sparc64: Fix register corruption in top-most kernel stack frame during boot.

[ Upstream commit ef3e035c3a9b81da8a778bc333d10637acf6c199 ]

Meelis Roos reported that kernels built with gcc-4.9 do not boot, we
eventually narrowed this down to only impacting machines using
UltraSPARC-III and derivitive cpus.

The crash happens right when the first user process is spawned:

[   54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
[   54.451346]
[   54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab #96
[   54.666431] Call Trace:
[   54.698453]  [0000000000762f8c] panic+0xb0/0x224
[   54.759071]  [000000000045cf68] do_exit+0x948/0x960
[   54.823123]  [000000000042cbc0] fault_in_user_windows+0xe0/0x100
[   54.902036]  [0000000000404ad0] __handle_user_windows+0x0/0x10
[   54.978662] Press Stop-A (L1-A) to return to the boot prom
[   55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004

Further investigation showed that compiling only per_cpu_patch() with
an older compiler fixes the boot.

Detailed analysis showed that the function is not being miscompiled by
gcc-4.9, but it is using a different register allocation ordering.

With the gcc-4.9 compiled function, something during the code patching
causes some of the %i* input registers to get corrupted.  Perhaps
we have a TLB miss path into the firmware that is deep enough to
cause a register window spill and subsequent restore when we get
back from the TLB miss trap.

Let's plug this up by doing two things:

1) Stop using the firmware stack for client interface calls into
   the firmware.  Just use the kernel's stack.

2) As soon as we can, call into a new function "start_early_boot()"
   to put a one-register-window buffer between the firmware's
   deepest stack frame and the top-most initial kernel one.

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosparc64: Increase size of boot string to 1024 bytes
Dave Kleikamp [Tue, 7 Oct 2014 13:12:37 +0000 (08:12 -0500)] 
sparc64: Increase size of boot string to 1024 bytes

[ Upstream commit 1cef94c36bd4d79b5ae3a3df99ee0d76d6a4a6dc ]

This is the longest boot string that silo supports.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosparc64: Kill unnecessary tables and increase MAX_BANKS.
David S. Miller [Sun, 28 Sep 2014 04:30:57 +0000 (21:30 -0700)] 
sparc64: Kill unnecessary tables and increase MAX_BANKS.

[ Upstream commit d195b71bad4347d2df51072a537f922546a904f1 ]

swapper_low_pmd_dir and swapper_pud_dir are actually completely
useless and unnecessary.

We just need swapper_pg_dir[].  Naturally the other page table chunks
will be allocated on an as-needed basis.  Since the kernel actually
accesses these tables in the PAGE_OFFSET view, there is not even a TLB
locality advantage of placing them in the kernel image.

Use the hard coded vmlinux.ld.S slot for swapper_pg_dir which is
naturally page aligned.

Increase MAX_BANKS to 1024 in order to handle heavily fragmented
virtual guests.

Even with this MAX_BANKS increase, the kernel is 20K+ smaller.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosparc64: sparse irq
bob picco [Thu, 25 Sep 2014 19:25:03 +0000 (12:25 -0700)] 
sparc64: sparse irq

[ Upstream commit ee6a9333fa58e11577c1b531b8e0f5ffc0fd6f50 ]

This patch attempts to do a few things. The highlights are: 1) enable
SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates
ivector_table at boot time and 4) default to cookie only VIRQ mechanism
for supported firmware. The first firmware with cookie only support for
me appears on T5. You can optionally force the HV firmware to not cookie
only mode which is the sysino support.

The sysino is a deprecated HV mechanism according to the most recent
SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the
cookie/sysino firmware versioning.

The history of this interface is:

1) Major version 1.0 only supported sysino based interrupt interfaces.

2) Major version 2.0 added cookie based VIRQs, however due to the fact
   that OSs were using the VIRQs without negoatiating major version
   2.0 (Linux and Solaris are both guilty), the VIRQs calls were
   allowed even with major version 1.0

   To complicate things even further, the VIRQ interfaces were only
   actually hooked up in the hypervisor for LDC interrupt sources.
   VIRQ calls on other device types would result in HV_EINVAL errors.

   So effectively, major version 2.0 is unusable.

3) Major version 3.0 was created to signal use of VIRQs and the fact
   that the hypervisor has these calls hooked up for all interrupt
   sources, not just those for LDC devices.

A new boot option is provided should cookie only HV support have issues.
hvirq - this is the version for HV_GRP_INTR. This is related to HV API
versioning.  The code attempts major=3 first by default. The option can
be used to override this default.

I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no.

Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosparc64: Adjust vmalloc region size based upon available virtual address bits.
David S. Miller [Sat, 27 Sep 2014 18:05:21 +0000 (11:05 -0700)] 
sparc64: Adjust vmalloc region size based upon available virtual address bits.

[ Upstream commit bb4e6e85daa52a9f6210fa06a5ec6269598a202b ]

In order to accomodate embedded per-cpu allocation with large numbers
of cpus and numa nodes, we have to use as much virtual address space
as possible for the vmalloc region.  Otherwise we can get things like:

PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000

So, once we select a value for PAGE_OFFSET, derive the size of the
vmalloc region based upon that.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosparc64: Increase MAX_PHYS_ADDRESS_BITS to 53.
David S. Miller [Thu, 25 Sep 2014 04:49:29 +0000 (21:49 -0700)] 
sparc64: Increase MAX_PHYS_ADDRESS_BITS to 53.

Make sure, at compile time, that the kernel can properly support
whatever MAX_PHYS_ADDRESS_BITS is defined to.

On M7 chips, use a max_phys_bits value of 49.

Based upon a patch by Bob Picco.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosparc64: Use kernel page tables for vmemmap.
David S. Miller [Thu, 25 Sep 2014 04:20:14 +0000 (21:20 -0700)] 
sparc64: Use kernel page tables for vmemmap.

[ Upstream commit c06240c7f5c39c83dfd7849c0770775562441b96 ]

For sparse memory configurations, the vmemmap array behaves terribly
and it takes up an inordinate amount of space in the BSS section of
the kernel image unconditionally.

Just build huge PMDs and look them up just like we do for TLB misses
in the vmalloc area.

Kernel BSS shrinks by about 2MB.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosparc64: Fix physical memory management regressions with large max_phys_bits.
David S. Miller [Thu, 25 Sep 2014 03:56:11 +0000 (20:56 -0700)] 
sparc64: Fix physical memory management regressions with large max_phys_bits.

[ Upstream commit 0dd5b7b09e13dae32869371e08e1048349fd040c ]

If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.

Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.

Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.

The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.

The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.

The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.

These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image.  Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.

So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.

We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.

On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled.  Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agosparc64: Adjust KTSB assembler to support larger physical addresses.
David S. Miller [Wed, 17 Sep 2014 17:14:56 +0000 (10:14 -0700)] 
sparc64: Adjust KTSB assembler to support larger physical addresses.

[ Upstream commit 8c82dc0e883821c098c8b0b130ffebabf9aab5df ]

As currently coded the KTSB accesses in the kernel only support up to
47 bits of physical addressing.

Adjust the instruction and patching sequence in order to support
arbitrary 64 bits addresses.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>