Accordingly to discussion [1] and followed up documentation the DMA controller
driver shouldn't start any DMA operations when dmaengine_submit() is called.
This patch fixes the workflow in dw_dmac driver to follow the documentation.
Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue
when using the ARM model of the PL181 and running QEMU.
The bug was reported for the following QEMU version:
$ qemu-system-arm -version
QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
(c) 2003-2008 Fabrice Bellard
To resolve the problem, let's restore the old behavior were the DATA
irq is handled prior the CMD irq, but only for the arm_variant, which
the problem was reported for.
Reported-by: John Stultz <john.stultz@linaro.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Russell King <linux@arm.linux.org.uk> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: John Stultz <john.stultz@linaro.org> Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[kees: backported to 3.16] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc604767613b6d2036cdc35b660bc39451040a47
("ipvs: changes for local real server") from 2.6.37
introduced DNAT support to local real server but the
IPv6 LOCAL_OUT handler ip_vs_local_reply6() is
registered incorrectly as IPv4 hook causing any outgoing
IPv4 traffic to be dropped depending on the IP header values.
Chris tracked down the problem to CONFIG_IP_VS_IPV6=y
Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768
Reported-by: Chris J Arges <chris.j.arges@canonical.com> Tested-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously, only the four high bits of the tclass were maintained in the
ipv6 case. This matches the behavior of ipv4, though whether or not we
should reflect ECN bits may be up for debate.
Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xt_hashlimit cannot be used with large hash tables, because garbage
collector is run from a timer. If table is really big, its possible
to hold cpu for more than 500 msec, which is unacceptable.
Switch to a work queue, and use proper scheduling points to remove
latencies spikes.
Later, we also could switch to a smoother garbage collection done
at lookup time, one bucket at a time...
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
r1_bio->start_next_window is not initialised in the READ
case, so allow_barrier may incorrectly decrement
conf->current_window_requests
which can cause raise_barrier() to block forever.
If a devices is being recovered it is not InSync and is not Faulty.
If a read error is experienced on that device, fix_read_error()
will be called, but it ignores non-InSync devices. So it will
neither fix the error nor fail the device.
It is incorrect that fix_read_error() ignores non-InSync devices.
It should only ignore Faulty devices. So fix it.
This became a bug when we allowed reading from a device that was being
recovered. It is suitable for any subsequent -stable kernel.
Fixes: da8840a747c0dbf49506ec906757a6b87b9741e9 Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com> Tested-by: Alexander Lyakas <alex.bolshoy@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Both normal IO and resync IO can be retried with reschedule_retry()
and so be counted into ->nr_queued, but only normal IO gets counted in
->nr_pending.
Before the recent improvement to RAID1 resync there could only
possibly have been one or the other on the queue. When handling a
read failure it could only be normal IO. So when handle_read_error()
called freeze_array() the fact that freeze_array only compares
->nr_queued against ->nr_pending was safe.
But now that these two types can interleave, we can have both normal
and resync IO requests queued, so we need to count them both in
nr_pending.
This error can lead to freeze_array() hanging if there is a read
error, so it is suitable for -stable.
next_resync is (approximately) the location for the next resync request.
However it does *not* reliably determine the earliest location
at which resync might be happening.
This is because resync requests can complete out of order, and
we only limit the number of current requests, not the distance
from the earliest pending request to the latest.
mddev->curr_resync_completed is a reliable indicator of the earliest
position at which resync could be happening. It is updated less
frequently, but is actually reliable which is more important.
So use it to determine if a write request is before the region
being resynced and so safe from conflict.
This error can allow resync IO to interfere with normal IO which
could lead to data corruption. Hence: stable.
The resync/recovery process for raid1 was recently changed
so that writes could happen in parallel with resync providing
they were in different regions of the device.
There is a problem though: While a write request will always
wait for conflicting resync to complete, a resync request
will *not* always wait for conflicting writes to complete.
Two changes are needed to fix this:
1/ raise_barrier (which waits until it is safe to do resync)
must wait until current_window_requests is zero
2/ wait_battier (which waits at the start of a new write request)
must update current_window_requests if the request could
possible conflict with a concurrent resync.
As concurrent writes and resync can lead to data loss,
this patch is suitable for -stable.
commit 79ef3a8aa1cb1523cc231c9a90a278333c21f761 made
it possible for reads to happen concurrently with resync.
This means that we need to be more careful where read_balancing
is allowed during resync - we can no longer be sure that any
resync that has already started will definitely finish.
So keep read_balancing to before recovery_cp, which is conservative
but safe.
This bug makes it possible to read from a device that doesn't
have up-to-date data, so it can cause data corruption.
So it is suitable for any kernel since 3.11.
If there are outstanding writes when close_sync is called,
the change to ->start_next_window might cause them to
decrement the wrong counter when they complete. Fix this
by merging the two counters into the one that will be decremented.
Having an incorrect value in a counter can cause raise_barrier()
to hangs, so this is suitable for -stable.
So accept all three possible states there since I can no longer tell the
difference between vb2_buffer_done called from start_streaming or from
elsewhere.
Instead add a WARN_ON at the end of start_streaming that will check whether
any buffers were added to the done list, since that implies that the wrong
state was used as well.
sg_alloc_table_from_pages() only allocates a sg_table, so it should just use
GFP_KERNEL, not gfp_flags. If gfp_flags contains __GFP_DMA32 then mm/sl[au]b.c
will call BUG_ON:
IT9135 RF tuner clock is coming from demodulator. We need enable it
early in demod init, before any tuner I/O. Currently it is enabled
by tuner driver itself, but it is too late and performance will be
reduced as some registers are not updated correctly. Clock is
disabled automatically when demod is put onto sleep.
Cpufreq core introduces cpufreq_suspended flag to let cpufreq sysfs nodes
across S2RAM/S2DISK. But the flag is only set in the cpufreq_suspend()
for cpufreq drivers which have target or target_index callback. This
skips intel_pstate driver. This patch is to set the flag before checking
target or target_index callback.
Fixes: 2f0aea936360 (cpufreq: suspend governors on system suspend/hibernate) Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
[rjw: Subject] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While debugging a cpufreq-related hardware failure on a system I saw the
following lockdep warning:
=========================
[ BUG: held lock freed! ] 3.17.0-rc4+ #1 Tainted: G E
-------------------------
insmod/2247 is freeing memory ffff88006e1b1400-ffff88006e1b17ff, with a lock still held there!
(&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80
3 locks held by insmod/2247:
#0: (subsys mutex#5){+.+.+.}, at: [<ffffffff81485579>] subsys_interface_register+0x69/0x120
#1: (cpufreq_rwsem){.+.+.+}, at: [<ffffffff8156cf73>] __cpufreq_add_dev.isra.21+0x73/0xb80
#2: (&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80
The warning occurs in the __cpufreq_add_dev() code which does
down_write(&policy->rwsem);
...
if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
policy->cur = cpufreq_driver->get(policy->cpu);
if (!policy->cur) {
pr_err("%s: ->get() failed\n", __func__);
goto err_get_freq;
}
If cpufreq_driver->get(policy->cpu) returns an error we execute the
code at err_get_freq, which does not up the policy->rwsem. This causes
the lockdep warning.
Trivial patch to up the policy->rwsem in the error path.
After the patch has been applied, and an error occurs in the
cpufreq_driver->get(policy->cpu) call we will now see
cpufreq: __cpufreq_add_dev: ->get() failed
cpufreq: __cpufreq_add_dev: ->get() failed
modprobe: ERROR: could not insert 'pcc_cpufreq': No such device
Fixes: 4e97b631f24c (cpufreq: Initialize governor for a new policy under policy->rwsem) Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 1820ffdccb9b ("PCI: Make sure bus number resources stay
within their parents bounds") because it breaks some systems with LSI Logic
FC949ES Fibre Channel Adapters, apparently by exposing a defect in those
adapters.
Dirk tested a Tyan VX50 (B4985) with this device that worked like this
prior to 1820ffdccb9b:
bus: [bus 00-7f] on node 0 link 1
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-07])
pci 0000:00:0e.0: PCI bridge to [bus 0a]
pci_bus 0000:0a: busn_res: can not insert [bus 0a] under [bus 00-07] (conflicts with (null) [bus 00-07])
pci 0000:0a:00.0: [1000:0646] type 00 class 0x0c0400 (FC adapter)
Note that the root bridge [bus 00-07] aperture is wrong; this is a BIOS
defect in the PCI0 _CRS method. But prior to 1820ffdccb9b, we didn't
enforce that aperture, and the FC adapter worked fine at 0a:00.0.
After 1820ffdccb9b, we notice that 00:0e.0's aperture is not contained in
the root bridge's aperture, so we reconfigure it so it *is* contained:
This effectively moves the FC device from 0a:00.0 to 07:00.0, which should
be legal. But when we enumerate bus 06, the FC device doesn't respond, so
we don't find anything. This is probably a defect in the FC device.
Possible fixes (due to Yinghai):
1) Add a quirk to fix the _CRS information based on what amd_bus.c read
from the hardware
2) Reset the FC device after we change its bus number
Fix 1 would be relatively easy, but it does sweep the LSI FC issue under
the rug. We might want to reconfigure bus numbers in the future for some
other reason, e.g., hotplug, and then we could trip over this again.
For that reason, I like fix 2, but we don't know whether it actually works,
and we don't have a patch for it yet.
This revert is fix 3, which also sweeps the LSI FC issue under the rug.
In testmode and vendor command reply/event SKBs we use the
skb cb data to store nl80211 parameters between allocation
and sending. This causes the code for CONFIG_NETLINK_MMAP
to get confused, because it takes ownership of the skb cb
data when the SKB is handed off to netlink, and it doesn't
explicitly clear it.
Clear the skb cb explicitly when we're done and before it
gets passed to netlink to avoid this issue.
Reported-by: Assaf Azulay <assaf.azulay@intel.com> Reported-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the ccp is built as a built-in module, then ccp-crypto (whether
built as a module or a built-in module) will be able to load and
it will register its crypto algorithms. If the system does not have
a CCP this will result in -ENODEV being returned whenever a command
is attempted to be queued by the registered crypto algorithms.
Add an API, ccp_present(), that checks for the presence of a CCP
on the system. The ccp-crypto module can use this to determine if it
should register it's crypto alogorithms.
Reported-by: Scot Doyle <lkml14@scotdoyle.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Scot Doyle <lkml14@scotdoyle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch should fix the bug reported in
https://lkml.org/lkml/2014/9/11/249.
We have to initialize at least the atomic_flags and the cmd_flags when
allocating storage for the requests.
Otherwise blk_mq_timeout_check() might dereference uninitialized
pointers when racing with the creation of a request.
Also move the reset of cmd_flags for the initializing code to the point
where a request is freed. So we will never end up with pending flush
request indicators that might trigger dereferences of invalid pointers
in blk_mq_timeout_check().
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reported-by: Paulo De Rezende Pinatti <ppinatti@linux.vnet.ibm.com> Tested-by: Paulo De Rezende Pinatti <ppinatti@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On 32-bit architectures, the legacy buffer_head functions are not always
handling the sector number with the proper 64-bit types, and will thus
fail on 4TB+ disks.
Any code that uses __getblk() (and thus bread(), breadahead(),
sb_bread(), sb_breadahead(), sb_getblk()), and calls it using a 64-bit
block on a 32-bit arch (where "long" is 32-bit) causes an inifinite loop
in __getblk_slow() with an infinite stream of errors logged to dmesg
like this:
Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the
top 32-bits are missing (in this case the 0x1 at the top).
This is because grow_dev_page() is broken and has a 32-bit overflow due
to shifting the page index value (a pgoff_t - which is just 32 bits on
32-bit architectures) left-shifted as the block number. But the top
bits to get lost as the pgoff_t is not type cast to sector_t / 64-bit
before the shift.
This patch fixes this issue by type casting "index" to sector_t before
doing the left shift.
Note this is not a theoretical bug but has been seen in the field on a
4TiB hard drive with logical sector size 512 bytes.
This patch has been verified to fix the infinite loop problem on 3.17-rc5
kernel using a 4TB disk image mounted using "-o loop". Without this patch
doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop
100% reproducibly whilst with the patch it works fine as expected.
This reverts commit fc1b253141b3 ("PCI: Don't scan random busses in
pci_scan_bridge()") because it breaks CardBus on some machines.
David tested a Dell Latitude D505 that worked like this prior to fc1b253141b3:
pci 0000:00:1e.0: PCI bridge to [bus 01]
pci 0000:01:01.0: CardBus bridge to [bus 02-05]
Note that the 01:01.0 CardBus bridge has a bus number aperture of
[bus 02-05], but those buses are all outside the 00:1e.0 PCI bridge bus
number aperture, so accesses to buses 02-05 never reach CardBus. This is
later patched up by yenta_fixup_parent_bridge(), which changes the
subordinate bus number of the 00:1e.0 PCI bridge:
pci_bus 0000:01: Raising subordinate bus# of parent bus (#01) from #01 to #05
With fc1b253141b3, pci_scan_bridge() fails immediately when it notices that
we can't allocate a valid secondary bus number for the CardBus bridge, and
CardBus doesn't work at all:
pci 0000:01:01.0: can't allocate child bus 01 from [bus 01]
I'd prefer to fix this by integrating the yenta_fixup_parent_bridge() logic
into pci_scan_bridge() so we fix the bus number apertures up front. But
I don't think we can do that before v3.17, so I'm going to revert this to
avoid the problem while we're working on the long-term fix.
Powering off a hot-pluggable device, e.g., with pci_set_power_state(D3cold),
normally generates a hot-remove event that unbinds the driver.
Some drivers expect to remain bound to a device even while they power it
off and back on again. This can be dangerous, because if the device is
removed or replaced while it is powered off, the driver doesn't know that
anything changed. But some drivers accept that risk.
Add pci_ignore_hotplug() for use by drivers that know their device cannot
be removed. Using pci_ignore_hotplug() tells the PCI core that hot-plug
events for the device should be ignored.
The radeon and nouveau drivers use this to switch between a low-power,
integrated GPU and a higher-power, higher-performance discrete GPU. They
power off the unused GPU, but they want to remain bound to it.
This is a reimplementation of f244d8b623da ("ACPIPHP / radeon / nouveau:
Fix VGA switcheroo problem related to hotplug") but extends it to work with
both acpiphp and pciehp.
This fixes a problem where systems with dual GPUs using the radeon drivers
become unusable, freezing every few seconds (see bugzillas below). The
resume of the radeon device may also fail, e.g.,
This fixes problems on dual GPU systems where the radeon driver becomes
unusable because of problems while suspending the device, as in bug 79701:
[drm] radeon: finishing device.
radeon 0000:01:00.0: Userspace still has active objects !
radeon 0000:01:00.0: ffff8800cb4ec288ffff8800cb4ec000 16384 4294967297 force free
...
WARNING: CPU: 0 PID: 67 at /home/apw/COD/linux/drivers/gpu/drm/radeon/radeon_gart.c:234 radeon_gart_unbind+0xd2/0xe0 [radeon]()
trying to unbind memory from uninitialized GART !
or while resuming it, as in bug 77261:
radeon 0000:01:00.0: ring 0 stalled for more than 10158msec
radeon 0000:01:00.0: GPU lockup ...
radeon 0000:01:00.0: GPU pci config reset
pciehp 0000:00:01.0:pcie04: Card not present on Slot(1-1)
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
*ERROR* radeon: dpm resume failed
radeon 0000:01:00.0: Wait for MC idle timedout !
Link: https://bugzilla.kernel.org/show_bug.cgi?id=77261 Link: https://bugzilla.kernel.org/show_bug.cgi?id=79701 Reported-by: Shawn Starr <shawn.starr@rogers.com> Reported-by: Jose P. <lbdkmjdf@sharklasers.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajat Jain <rajatxjain@gmail.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ref-cycles event is specially to Intel core, but can still used in arm
architecture with the wrong return value with 3.10 stable. this patch fix the
bug and make it return NOT SUPPORTED distinctly.
In upstream this bug has been fixed by other way, which changes more than one
file and more than 1000 lines. the primary commit is 6b7658ec8a100b608e59e3cde353434db51f5be0. besides we can not simply
cherry-pick.
Signed-off-by: Zhiqiang Zhang <zhangzhiqiang.zhang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com Cc: Will Deacon <will.deacon@arm.com> Cc: Christopher Covington <cov@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We saw a kernel soft lockup in perf_remove_from_context(),
it looks like the `perf` process, when exiting, could not go
out of the retry loop. Meanwhile, the target process was forking
a child. So either the target process should execute the smp
function call to deactive the event (if it was running) or it should
do a context switch which deactives the event.
It seems we optimize out a context switch in perf_event_context_sched_out(),
and what's more important, we still test an obsolete task pointer when
retrying, so no one actually would deactive that event in this situation.
Fix it directly by reloading the task pointer in perf_remove_from_context().
This should cure the above soft lockup.
Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1409696840-843-1-git-send-email-xiyou.wangcong@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dma_pool_create() needs to unlock the mutex in error case. The bug was
introduced in the 3.16 by commit cc6b664aa26d ("mm/dmapool.c: remove
redundant NULL check for dev in dma_pool_create()")/
in spi interrupt handler, we need check RX_IO_DMA status to ensure
rx fifo have received the specify count data.
if not set, the while statement in spi isr function will keep loop,
at last, make the kernel hang.
[The code is actually there in the interrupt handler but apparently it
needs the interrupt unmasking so the handler sees the status -- broonie]
Signed-off-by: Qipan Li <Qipan.Li@csr.com> Signed-off-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
device_add() expects that any memory allocated via devm_* API is only
done in the device's probe function.
Fix below boot warning:
WARNING: CPU: 1 PID: 1 at drivers/base/dd.c:286 driver_probe_device+0x2b4/0x2f4()
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.16.0-10474-g835c90b-dirty #160
[<c0016364>] (unwind_backtrace) from [<c001251c>] (show_stack+0x20/0x24)
[<c001251c>] (show_stack) from [<c04eaefc>] (dump_stack+0x7c/0x98)
[<c04eaefc>] (dump_stack) from [<c0023d4c>] (warn_slowpath_common+0x78/0x9c)
[<c0023d4c>] (warn_slowpath_common) from [<c0023d9c>] (warn_slowpath_null+0x2c/0x34)
[<c0023d9c>] (warn_slowpath_null) from [<c0302c60>] (driver_probe_device+0x2b4/0x2f4)
[<c0302c60>] (driver_probe_device) from [<c0302d90>] (__device_attach+0x50/0x54)
[<c0302d90>] (__device_attach) from [<c0300e60>] (bus_for_each_drv+0x54/0x9c)
[<c0300e60>] (bus_for_each_drv) from [<c0302958>] (device_attach+0x84/0x90)
[<c0302958>] (device_attach) from [<c0301f10>] (bus_probe_device+0x94/0xb8)
[<c0301f10>] (bus_probe_device) from [<c03000c0>] (device_add+0x434/0x4fc)
[<c03000c0>] (device_add) from [<c0342dd4>] (spi_add_device+0x98/0x164)
[<c0342dd4>] (spi_add_device) from [<c03444a4>] (spi_register_master+0x598/0x768)
[<c03444a4>] (spi_register_master) from [<c03446b4>] (devm_spi_register_master+0x40/0x80)
[<c03446b4>] (devm_spi_register_master) from [<c0346214>] (dw_spi_add_host+0x1a8/0x258)
[<c0346214>] (dw_spi_add_host) from [<c0346920>] (dw_spi_mmio_probe+0x1d4/0x294)
[<c0346920>] (dw_spi_mmio_probe) from [<c0304560>] (platform_drv_probe+0x3c/0x6c)
[<c0304560>] (platform_drv_probe) from [<c0302a98>] (driver_probe_device+0xec/0x2f4)
[<c0302a98>] (driver_probe_device) from [<c0302d3c>] (__driver_attach+0x9c/0xa0)
[<c0302d3c>] (__driver_attach) from [<c0300f0c>] (bus_for_each_dev+0x64/0x98)
[<c0300f0c>] (bus_for_each_dev) from [<c0302518>] (driver_attach+0x2c/0x30)
[<c0302518>] (driver_attach) from [<c0302134>] (bus_add_driver+0xdc/0x1f4)
[<c0302134>] (bus_add_driver) from [<c03035c8>] (driver_register+0x88/0x104)
[<c03035c8>] (driver_register) from [<c030445c>] (__platform_driver_register+0x58/0x6c)
[<c030445c>] (__platform_driver_register) from [<c0700f00>] (dw_spi_mmio_driver_init+0x18/0x20)
[<c0700f00>] (dw_spi_mmio_driver_init) from [<c0008914>] (do_one_initcall+0x90/0x1d4)
[<c0008914>] (do_one_initcall) from [<c06d7d90>] (kernel_init_freeable+0x178/0x248)
[<c06d7d90>] (kernel_init_freeable) from [<c04e687c>] (kernel_init+0x18/0xfc)
[<c04e687c>] (kernel_init) from [<c000ecd8>] (ret_from_fork+0x14/0x20)
Reported-by: Thor Thayer <tthayer@opensource.altera.com> Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: leroy christophe <christophe.leroy@c-s.fr> Signed-off-by: Axel Lin <axel.lin@ingics.com> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When reading the IPv6 addresses from the net-device, make sure to
avoid adding a duplicate entry to the GID table because of equality
between the default GID we generate and the default IPv6 link-local
address of the device.
Fixes: acc4fccf4eff ("IB/mlx4: Make sure GID index 0 is always occupied") Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When Ethernet netdev is not present for a port (e.g. when the link
layer type of the port is InfiniBand) it's possible to dereference a
null pointer when we do netdevice scanning.
To fix that, we move a section of code that needs to run only when
netdev is present to a proper if () statement.
Fixes: ad4885d279b6 ("IB/mlx4: Build the port IBoE GID table properly under bonding") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This particular reference count is not needed with the rcu protection,
and the current code leaks a reference count, causing a hang in
qib_qp_destroy().
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Callers of d_splice_alias(dentry, inode) don't need iput(), neither
on success nor on failure. Either the reference to inode is stored
in a previously negative dentry, or it's dropped. In either case
inode reference the caller used to hold is consumed.
__gfs2_lookup() does iput() in case when d_splice_alias() has failed.
Double iput() if we ever hit that. And gfs2_create_inode() ends up
not only with double iput(), but with link count dropped to zero - on
an inode it has just found in directory.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that we use the virtio ->scan() function to register with the hwrng
core, we will not get read requests till probe is successfully finished.
So revert the workaround we had in place to refuse read requests while
we were not yet setup completely.
Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Instead of calling hwrng_register() in the probe routing, call it in the
scan routine. This ensures that when hwrng_register() is successful,
and it requests a few random bytes to seed the kernel's pool at init,
we're ready to service that request.
This will also enable us to remove the workaround added previously to
check whether probe was completed, and only then ask for data from the
host. The revert follows in the next commit.
There's a slight behaviour change here on unsuccessful hwrng_register().
Previously, when hwrng_register() failed, the probe() routine would
fail, and the vqs would be torn down, and driver would be marked not
initialized. Now, the vqs will remain initialized, driver would be
marked initialized as well, but won't be available in the list of RNGs
available to hwrng core. To fix the failures, the procedure remains the
same, i.e. unload and re-load the module, and hope things succeed the
next time around.
Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Locks the k_itimer's it_lock member when handling the alarm timer's
expiry callback.
The regular posix timers defined in posix-timers.c have this lock held
during timout processing because their callbacks are routed through
posix_timer_fn(). The alarm timers follow a different path, so they
ought to grab the lock somewhere else.
Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by: Richard Larocque <rlarocque@google.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avoids sending a signal to alarm timers created with sigev_notify set to
SIGEV_NONE by checking for that special case in the timeout callback.
The regular posix timers avoid sending signals to SIGEV_NONE timers by
not scheduling any callbacks for them in the first place. Although it
would be possible to do something similar for alarm timers, it's simpler
to handle this as a special case in the timeout.
Prior to this patch, the alarm timer would ignore the sigev_notify value
and try to deliver signals to the process anyway. Even worse, the
sanity check for the value of sigev_signo is skipped when SIGEV_NONE was
specified, so the signal number could be bogus. If sigev_signo was an
unitialized value (as it often would be if SIGEV_NONE is used), then
it's hard to predict which signal will be sent.
Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by: Richard Larocque <rlarocque@google.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Returns the time remaining for an alarm timer, rather than the time at
which it is scheduled to expire. If the timer has already expired or it
is not currently scheduled, the it_value's members are set to zero.
This new behavior matches that of the other posix-timers and the POSIX
specifications.
This is a change in user-visible behavior, and may break existing
applications. Hopefully, few users rely on the old incorrect behavior.
Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by: Richard Larocque <rlarocque@google.com>
[jstultz: minor style tweak] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In spite of what the GCC manual says, the -mfast-indirect-calls has
never been supported in the 64-bit parisc compiler. Indirect calls have
always been done using function descriptors irrespective of the
-mfast-indirect-calls option.
Recently, it was noticed that a function descriptor was always requested
when the -mfast-indirect-calls option was specified. This caused
problems when the option was used in application code and doesn't make
any sense because the whole point of the option is to avoid using a
function descriptor for indirect calls.
Fixing this broke 64-bit kernel builds.
I will fix GCC but for now we need the attached change. This results in
the same kernel code as before.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
return the value instead, and have path_init() do the assignment. Broken by
"vfs: Fix absolute RCU path walk failures due to uninitialized seq number",
which was Cc-stable with 2.6.38+ as destination. This one should go where
it went.
To avoid dummy value returned in case when root is already set (it would do
no harm, actually, since the only caller that doesn't ignore the return value
is guaranteed to have nd->root *not* set, but it's more obvious that way),
lift the check into callers. And do the same to set_root(), to keep them
in sync.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Similar to the previous commit which described why we need to add a
barrier to arch_spin_is_locked(), we have a similar problem with
spin_unlock_wait().
We need a barrier on entry to ensure any spinlock we have previously
taken is visibly locked prior to the load of lock->slock.
It's also not clear if spin_unlock_wait() is intended to have ACQUIRE
semantics. For now be conservative and add a barrier on exit to give it
ACQUIRE semantics.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kernel defines the function spin_is_locked(), which can be used to
check if a spinlock is currently locked.
Using spin_is_locked() on a lock you don't hold is obviously racy. That
is, even though you may observe that the lock is unlocked, it may become
locked at any time.
There is (at least) one exception to that, which is if two locks are
used as a pair, and the holder of each checks the status of the other
before doing any update.
Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic
value:
The first CPU does:
spin_lock(*A)
if spin_is_locked(*B)
# nothing
else
smp_mb()
LOAD r = *COUNTER
r++
STORE *COUNTER = r
spin_unlock(*A)
And the second CPU does:
spin_lock(*B)
if spin_is_locked(*A)
# nothing
else
smp_mb()
LOAD r = *COUNTER
r++
STORE *COUNTER = r
spin_unlock(*B)
Although this is a strange locking construct, it should work.
It seems to be understood, but not documented, that spin_is_locked() is
not a memory barrier, so in the examples above and below the caller
inserts its own memory barrier before acting on the result of
spin_is_locked().
For now we assume spin_is_locked() is implemented as below, and we break
it out in our examples:
bool spin_is_locked(*LOCK) {
LOAD l = *LOCK
return l.locked
}
Our intuition is that there should be no problem even if the two code
sequences run simultaneously such as:
CPU 0 CPU 1
==================================================
spin_lock(*A) spin_lock(*B)
LOAD b = *B LOAD a = *A
if b.locked # true if a.locked # true
# nothing # nothing
spin_unlock(*A) spin_unlock(*B)
If one CPU gets the lock before the other then it will do the update and
the other CPU will back off:
CPU 0 CPU 1
==================================================
spin_lock(*A)
LOAD b = *B
spin_lock(*B)
if b.locked # false LOAD a = *A
else if a.locked # true
smp_mb() # nothing
LOAD r1 = *COUNTER spin_unlock(*B)
r1++
STORE *COUNTER = r1
spin_unlock(*A)
However in reality spin_lock() itself is not indivisible. On powerpc we
implement it as a load-and-reserve and store-conditional.
Ignoring the retry logic for the lost reservation case, it boils down to:
spin_lock(*LOCK) {
LOAD l = *LOCK
l.locked = true
STORE *LOCK = l
ACQUIRE_BARRIER
}
The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as
defined in memory-barriers.txt:
This acts as a one-way permeable barrier. It guarantees that all
memory operations after the ACQUIRE operation will appear to happen
after the ACQUIRE operation with respect to the other components of
the system.
On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is
also know as "lightweight sync", or "sync 1".
As described in Power ISA v2.07 section B.2.1.1, in this scenario the
lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to
act as the barrier, preventing any loads or stores in the locked region
from occurring prior to the load of *LOCK.
Whether this behaviour is in accordance with the definition of ACQUIRE
semantics in memory-barriers.txt is open to discussion, we may switch to
a different barrier in future.
What this means in practice is that the following can occur:
CPU 0 CPU 1
==================================================
LOAD a = *A LOAD b = *B
a.locked = true b.locked = true
LOAD b = *B LOAD a = *A
STORE *A = a STORE *B = b
if b.locked # false if a.locked # false
else else
smp_mb() smp_mb()
LOAD r1 = *COUNTER LOAD r2 = *COUNTER
r1++ r2++
STORE *COUNTER = r1
STORE *COUNTER = r2 # Lost update
spin_unlock(*A) spin_unlock(*B)
That is, the load of *B can occur prior to the store that makes *A
visibly locked. And similarly for CPU 1. The result is both CPUs hold
their lock and believe the other lock is unlocked.
The easiest fix for this is to add a full memory barrier to the start of
spin_is_locked(), so adding to our previous definition would give us:
The new barrier orders the store to the lock we are locking vs the load
of the other lock:
CPU 0 CPU 1
==================================================
LOAD a = *A LOAD b = *B
a.locked = true b.locked = true
STORE *A = a STORE *B = b
smp_mb() smp_mb()
LOAD b = *B LOAD a = *A
if b.locked # true if a.locked # true
# nothing # nothing
spin_unlock(*A) spin_unlock(*B)
Although the above example is theoretical, there is code similar to this
example in sem_lock() in ipc/sem.c. This commit in addition to the next
commit appears to be a fix for crashes we are seeing in that code where
we believe this race happens in practice.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The problem is in valid_next_sp() where we check that the new stack
pointer is at least STACK_FRAME_OVERHEAD below the previous one.
ABIv1 has a minimum stack frame size of 112 bytes consisting of 48 bytes
and 64 bytes of parameter save area. ABIv2 changes that to 32 bytes
with no paramter save area.
STACK_FRAME_OVERHEAD is in theory the minimum stack frame size,
but we over 240 uses of it, some of which assume that it includes
space for the parameter area.
We need to work through all our stack defines and rationalise them
but let's fix perf now by creating STACK_FRAME_MIN_SIZE and using
in valid_next_sp(). This fixes the issue:
In v3.15 the driver stopped to accept network packets after successful
authentification, which could be worked around by passing the
nohwcrypt=1 module parameter. This was not reproducible by
everyone, and showed random behaviour in some tests.
It was caused by an uninitialized variable introduced
in 4ed1a8d4a257 ("ath9k_htc: use ath9k_cmn_rx_accept") and
used in 341b29b9cd2f ("ath9k_htc: use ath9k_cmn_rx_skb_postprocess").
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=78581 Fixes: 341b29b9cd2f ("ath9k_htc: use ath9k_cmn_rx_skb_postprocess") Signed-off-by: Johannes Stezenbach <js@sig21.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The firmware notifies about interface changes through the IF event
which has a NO_IF flag that means host can ignore the event. This
behaviour was introduced in the driver by:
brcmfmac: ignore IF event if firmware indicates it
It turns out that the IF event for the P2P_DEVICE also has this
flag set, but the event should not be ignored in this scenario.
The mentioned commit caused a regression in 3.12 kernel in creation
of the P2P_DEVICE interface.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com> Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com> Reviewed-by: Daniel (Deognyoun) Kim <dekim@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com> Signed-off-by: Arend van Spriel <arend@broadcom.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Last level cache shared mask is built during CPU up and the
build_sched_domain() routine takes advantage of it to setup
the sched domain CPU topology.
However, llc_shared_mask is not released during CPU disable,
which leads to an invalid sched domainCPU topology.
This patch fix it by releasing the llc_shared_mask correctly
during CPU disable.
Yasuaki also reported that this can happen on real hardware:
https://lkml.org/lkml/2014/7/22/1018
His case is here:
==
Here is an example on my system.
My system has 4 sockets and each socket has 15 cores and HT is
enabled. In this case, each core of sockes is numbered as
follows:
Then llc_shared_mask of CPU#30 becomes
0x3fff8000fffffffc0000000. It means that last level cache of
Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
the wrong value.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Tested-by: Linn Crosetto <linn@hp.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes the same bug as b43790eedd31 ("mm: softdirty: don't forget to
save file map softdiry bit on unmap") and 9aed8614af5a ("mm/memory.c:
don't forget to set softdirty on file mapped fault") where the return
value of pte_*mksoft_dirty was being ignored.
To be sure that no other pte/pmd "mk" function return values were being
ignored, I annotated the functions in arch/x86/include/asm/pgtable.h
with __must_check and rebuilt.
The userspace effect of this bug is that the softdirty mark might be
lost if a file mapped pte get zapped.
Signed-off-by: Peter Feiner <pfeiner@google.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Jamie Liu <jamieliu@google.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 4590685546a3 ("mm/sl[aou]b: Common alignment code"), the
"ralign" automatic variable in __kmem_cache_create() may be used as
uninitialized.
The proper alignment defaults to BYTES_PER_WORD and can be overridden by
SLAB_RED_ZONE or the alignment specified by the caller.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=85031
Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Andrei Elovikov <a.elovikov@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a deadlock case which reported by Guozhonghua:
https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html
This case is caused by &res->spinlock and &dlm->master_lock
misordering in different threads.
It was introduced by commit 8d400b81cc83 ("ocfs2/dlm: Clean up refmap
helpers"). Since lockres is new, it doesn't not require the
&res->spinlock. So remove it.
Fixes: 8d400b81cc83 ("ocfs2/dlm: Clean up refmap helpers") Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: joyce.xue <xuejiufei@huawei.com> Reported-by: Guozhonghua <guozhonghua@h3c.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:
----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").
If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.
This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.
[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/] Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Tested-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The C operator <= defines a perfectly fine total ordering on the set of
values representable in a long. However, unlike its namesake in the
integers, it is not translation invariant, meaning that we do not have
"b <= c" iff "a+b <= a+c" for all a,b,c.
This means that it is always wrong to try to boil down the relationship
between two longs to a question about the sign of their difference,
because the resulting relation [a LEQ b iff a-b <= 0] is neither
anti-symmetric or transitive. The former is due to -LONG_MIN==LONG_MIN
(take any two a,b with a-b = LONG_MIN; then a LEQ b and b LEQ a, but a !=
b). The latter can either be seen observing that x LEQ x+1 for all x,
implying x LEQ x+1 LEQ x+2 ... LEQ x-1 LEQ x; or more directly with the
simple example a=LONG_MIN, b=0, c=1, for which a-b < 0, b-c < 0, but a-c >
0.
Note that it makes absolutely no difference that a transmogrying bijection
has been applied before the comparison is done. In fact, had the
obfuscation not been done, one could probably not observe the bug
(assuming all values being compared always lie in one half of the address
space, the mathematical value of a-b is always representable in a long).
As it stands, one can easily obtain three file descriptors exhibiting the
non-transitivity of kcmp().
Side note 1: I can't see that ensuring the MSB of the multiplier is
set serves any purpose other than obfuscating the obfuscating code.
When calling epoll_ctl with operation EPOLL_CTL_DEL, structure epds is
not initialized but ep_take_care_of_epollwakeup reads its event field.
When this unintialized field has EPOLLWAKEUP bit set, a capability check
is done for CAP_BLOCK_SUSPEND in ep_take_care_of_epollwakeup. This
produces unexpected messages in the audit log, such as (on a system
running SELinux):
We shouldn't set text_len in the code path that detects printk recursion
because text_len corresponds to the length of the string inside textbuf.
A few lines down from the line
text_len = strlen(recursion_msg);
is the line
text_len += vscnprintf(text + text_len, ...);
So if printk detects recursion, it sets text_len to 29 (the length of
recursion_msg) and logs an error. Then the message supplied by the
caller of printk is stored inside textbuf but offset by 29 bytes. This
means that the output of the recursive call to printk will contain 29
bytes of garbage in front of it.
This defect is caused by commit 458df9fd4815 ("printk: remove separate
printk_sched buffers and use printk buf instead") which turned the line
text_len = vscnprintf(text, ...);
into
text_len += vscnprintf(text + text_len, ...);
To fix this, this patch avoids setting text_len when logging the printk
recursion error. This patch also marks unlikely() the branch leading up
to this code.
Fixes: 458df9fd4815b478 ("printk: remove separate printk_sched buffers and use printk buf instead") Signed-off-by: Patrick Palka <patrick@parcs.ath.cx> Reviewed-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
That commit was wrong since it uses data that hasn't even been set
up yet, but might be a hold-over from a previous connection.
Additionally, it seems like a driver-specific workaround that
shouldn't have been in mac80211 to start with.
Fixes: 24aa11ab8ae0 ("mac80211: disable uAPSD if all ACs are under ACM") Reviewed-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When updating what an ftrace_ops traces, if it is registered (that is,
actively tracing), and that ftrace_ops uses the shared global_ops
local_hash, then we need to update all tracers that are active and
also share the global_ops' ftrace_hash_ops.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The latest rewrite of ftrace removed the separate ftrace_ops of
the function tracer and the function graph tracer and had them
share the same ftrace_ops. This simplified the accounting by removing
the multiple layers of functions called, where the global_ops func
would call a special list that would iterate over the other ops that
were registered within it (like function and function graph), which
itself was registered to the ftrace ops list of all functions
currently active. If that sounds confusing, the code that implemented
it was also confusing and its removal is a good thing.
The problem with this change was that it assumed that the function
and function graph tracer can never be used at the same time.
This is mostly true, but there is an exception. That is when the
function profiler uses the function graph tracer to profile.
The function profiler can be activated the same time as the function
tracer, and this breaks the assumption and the result is that ftrace
will crash (it detects the error and shuts itself down, it does not
cause a kernel oops).
To solve this issue, a previous change allowed the hash tables
for the functions traced by a ftrace_ops to be a pointer and let
multiple ftrace_ops share the same hash. This allows the function
and function_graph tracer to have separate ftrace_ops, but still
share the hash, which is what is done.
Now the function and function graph tracers have separate ftrace_ops
again, and the function tracer can be run while the function_profile
is active.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the top level debug file system function tracer shares its
ftrace_ops with the function graph tracer. This was thought to be fine
because the tracers are not used together, as one can only enable
function or function_graph tracer in the current_tracer file.
But that assumption proved to be incorrect. The function profiler
can use the function graph tracer when function tracing is enabled.
Since all function graph users uses the function tracing ftrace_ops
this causes a conflict and when a user enables both function profiling
as well as the function tracer it will crash ftrace and disable it.
The quick solution so far is to move them as separate ftrace_ops like
it was earlier. The problem though is to synchronize the functions that
are traced because both function and function_graph tracer are limited
by the selections made in the set_ftrace_filter and set_ftrace_notrace
files.
To handle this, a new structure is made called ftrace_ops_hash. This
structure will now hold the filter_hash and notrace_hash, and the
ftrace_ops will point to this structure. That will allow two ftrace_ops
to share the same hashes.
Since most ftrace_ops do not share the hashes, and to keep allocation
simple, the ftrace_ops structure will include both a pointer to the
ftrace_ops_hash called func_hash, as well as the structure itself,
called local_hash. When the ops are registered, the func_hash pointer
will be initialized to point to the local_hash within the ftrace_ops
structure. Some of the ftrace internal ftrace_ops will be initialized
statically. This will allow for the function and function_graph tracer
to have separate ops but still share the same hash tables that determine
what functions they trace.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit 2ec2a8be (usb: dwc3: gadget:
always enable IOC on bulk/interrupt transfers)
we created a situation where it was possible to
hang a bulk/interrupt endpoint if we had more
than one pending request in our queue and they
were both started with a single Start Transfer
command.
The problems triggers because we had not enabled
Transfer In Progress event for those endpoints
and we were not able to process early giveback
of requests completed without LST bit set.
Fix the problem by finally enabling Xfer In Progress
event for all endpoint types, except control.
Fixes: 2ec2a8be (usb: dwc3: gadget: always
enable IOC on bulk/interrupt transfers) Reported-by: Pratyush Anand <pratyush.anand@st.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 2da78092 changed the locking from a mutex to a spinlock,
so we now longer sleep in this context. But there was a leftover
might_sleep() in there, which now triggers since we do the final
free from an RCU callback. Get rid of it.
This commit reverts the addition of lockdep checking to raw_seqcount_begin
for the following reasons:
1) It violates the naming convention that raw_* functions should not
do lockdep checks (a convention that is also followed by the other
raw_*_seqcount_begin functions).
2) raw_seqcount_begin does not spin, so it can only be part of an ABBA
deadlock in very special circumstances (for instance if a lock
is held across the entire raw_seqcount_begin()+read_seqcount_retry()
loop while also being taken inside the write_seqcount protected area).
3) It is causing false positives with some existing callers, and there
is no non-lockdep alternative for those callers to use.
None of the three existing callers (__d_lookup_rcu, netdev_get_name, and
the NFS state code) appear to use the function in a manner that is ABBA
deadlock prone.
Fixes: 1ca7d67cf5d5: seqcount: Add lockdep functionality to seqcount/seqlock Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/CAHQdGtRR6SvEhXiqWo24hoUh9AU9cL82Z8Z-d8-7u951F_d+5g@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nikita Yuschenko reported that booting a kernel with init=/bin/sh and
then nfs mounting without portmap or rpcbind running using a busybox
mount resulted in:
# mount -t nfs 10.30.130.21:/opt /mnt
svc: failed to register lockdv1 RPC service (errno 111).
lockd_up: makesock failed, error=-111
Unable to handle kernel paging request for data at address 0x00000030
Faulting instruction address: 0xc055e65c
Oops: Kernel access of bad area, sig: 11 [#1]
MPC85xx CDS
Modules linked in:
CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
task: cf29cea0 ti: cf35c000 task.ti: cf35c000
NIP: c055e65c LR: c0566490 CTR: c055e648
REGS: cf35dad0 TRAP: 0300 Not tainted (3.10.44.cge)
MSR: 00029000 <CE,EE,ME> CR: 22442488 XER: 20000000
DEAR: 00000030, ESR: 00000000
The addition of svc_shutdown_net() resulted in two calls to
svc_rpcb_cleanup(); the second is no longer necessary and crashes when
it calls rpcb_register_call with clnt=NULL.
Reported-by: Nikita Yushchenko <nyushchenko@dev.rtsoft.ru> Fixes: 679b033df484 "lockd: ensure we tear down any live sockets when socket creation fails during lockd_up" Acked-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If pcpu_map_pages() fails midway, it unmaps the already mapped pages.
Currently, it doesn't flush tlb after the partial unmapping. This may
be okay in most cases as the established mapping hasn't been used at
that point but it can go wrong and when it goes wrong it'd be
extremely difficult to track down.
When pcpu_alloc_pages() fails midway, pcpu_free_pages() is invoked to
free what has already been allocated. The invocation is across the
whole requested range and pcpu_free_pages() will try to free all
non-NULL pages; unfortunately, this is incorrect as
pcpu_get_pages_and_bitmap(), unlike what its comment suggests, doesn't
clear the pages array and thus the array may have entries from the
previous invocations making the partial failure path free incorrect
pages.
Fix it by open-coding the partial freeing of the already allocated
pages.
Currently, only SMP system free the percpu allocation info.
Uniprocessor system should free it too. For example, one x86 UML
virtual machine with 256MB memory, UML kernel wastes one page memory.
There is possibility with misconfigured pins that interrupt occurs instantly
after setting irq_set_chained_handler() in gpiochip_set_chained_irqchip().
Now if handler gets called before irq_set_handler_data() the handler gets
NULL handler data.
Fix this by moving irq_set_handler_data() call before
irq_set_chained_handler() in gpiochip_set_chained_irqchip().
The sys_vendor / product_name are somewhat generic unfortunately, so this
may lead to some false positives. But nomux usually does no harm, where as
not having it clearly is causing problems on the Avatar AVIU-145A6.
https://bugzilla.kernel.org/show_bug.cgi?id=77391
Reported-by: Hugo P <saurosii@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are getting more and more reports about LG laptops not having
functioning keyboard if we try to deactivate keyboard during probe.
Given that having keyboard deactivated is merely "nice to have"
instead of a hard requirement for probing, let's disable it on all
LG boxes instead of trying to hunt down particular models.
This change is prompted by patches trying to add "LG Electronics"/"ROCKY"
and "LG Electronics"/"LW60-F27B" to the DMI list.
ForcePads are found on HP EliteBook 1040 laptops. They lack any kind of
physical buttons, instead they generate primary button click when user
presses somewhat hard on the surface of the touchpad. Unfortunately they
also report primary button click whenever there are 2 or more contacts
on the pad, messing up all multi-finger gestures (2-finger scrolling,
multi-finger tapping, etc). To cope with this behavior we introduce a
delay (currently 50 msecs) in reporting primary press in case more
contacts appear.
Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When running a 32-bit inputattach utility in a 64-bit system, there will be
error code "inputattach: can't set device type". This is caused by the
serport device driver not supporting compat_ioctl, so that SPIOCSTYPE ioctl
fails.
The DM crypt target accesses memory beyond allocated space resulting in
a crash on 32 bit x86 systems.
This bug is very old (it dates back to 2.6.25 commit 3a7f6c990ad04 "dm
crypt: use async crypto"). However, this bug was masked by the fact
that kmalloc rounds the size up to the next power of two. This bug
wasn't exposed until 3.17-rc1 commit 298a9fa08a ("dm crypt: use per-bio
data"). By switching to using per-bio data there was no longer any
padding beyond the end of a dm-crypt allocated memory block.
To minimize allocation overhead dm-crypt puts several structures into one
block allocated with kmalloc. The block holds struct ablkcipher_request,
cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))),
struct dm_crypt_request and an initialization vector.
The variable dmreq_start is set to offset of struct dm_crypt_request
within this memory block. dm-crypt allocates the block with this size:
cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size.
When accessing the initialization vector, dm-crypt uses the function
iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq
+ 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1).
dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request
structure. However, when dm-crypt accesses the initialization vector, it
takes a pointer to the end of dm_crypt_request, aligns it, and then uses
it as the initialization vector. If the end of dm_crypt_request is not
aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the
alignment causes the initialization vector to point beyond the allocated
space.
Fix this bug by calculating the variable iv_size_padding and adding it
to the allocated size.
Also correct the alignment of dm_crypt_request. struct dm_crypt_request
is specific to dm-crypt (it isn't used by the crypto subsystem at all),
so it is aligned on __alignof__(struct dm_crypt_request).
Also align per_bio_data_size on ARCH_KMALLOC_MINALIGN, so that it is
aligned as if the block was allocated with kmalloc.
Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a writeback or a promotion of a block is completed, the cell of
that block is removed from the prison, the block is marked as clean, and
the clear_dirty() callback of the cache policy is called.
Unfortunately, performing those actions in this order allows an incoming
new write bio for that block to come in before clearing the dirty status
is completed and therefore possibly causing one of these two scenarios:
Scenario A:
Thread 1 Thread 2
cell_defer() .
- cell removed from prison .
- detained bios queued .
. incoming write bio
. remapped to cache
. set_dirty() called,
. but block already dirty
. => it does nothing
clear_dirty() .
- block marked clean .
- policy clear_dirty() called .
Result: Block is marked clean even though it is actually dirty. No
writeback will occur.
Scenario B:
Thread 1 Thread 2
cell_defer() .
- cell removed from prison .
- detained bios queued .
clear_dirty() .
- block marked clean .
. incoming write bio
. remapped to cache
. set_dirty() called
. - block marked dirty
. - policy set_dirty() called
- policy clear_dirty() called .
Result: Block is properly marked as dirty, but policy thinks it is clean
and therefore never asks us to writeback it.
This case is visible in "dmsetup status" dirty block count (which
normally decreases to 0 on a quiet device).
Fix these issues by calling clear_dirty() before calling cell_defer().
Incoming bios for that block will then be detained in the cell and
released only after clear_dirty() has completed, so the race will not
occur.
Found by inspecting the code after noticing spurious dirty counts
(scenario B).
Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Releases the dev_t minor when all references are closed to prevent
another device from acquiring the same major/minor.
Since the partition's release may be invoked from call_rcu's soft-irq
context, the ext_dev_idr's mutex had to be replaced with a spinlock so
as not so sleep.