At dot clocks > approx. 250 Mhz, some of these calcs will overflow and
cause miscalculation of latency watermarks, and for some overflows also
divide-by-zero driver crash ("divide error: 0000 [#1] PREEMPT SMP" in
"dce_v10_0_latency_watermark+0x12d/0x190").
This zero-divide happened, e.g., on AMD Tonga Pro under DCE-10,
on a Displayport panel when trying to set a video mode of 2560x1440
at 165 Hz vrefresh with a dot clock of 635.540 Mhz.
Refine calculations to avoid the overflows.
Tested for DCE-10 with R9 380 Tonga + ASUS ROG PG279 panel.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avoid big roundoff errors in scanline/hactive durations for
high pixel clocks, especially for >= 500 Mhz, and thereby
program more accurate display fifo watermarks.
Implemented here for DCE 6,8,10,11.
Successfully tested on DCE 10 with AMD R9 380 Tonga.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.
Fixes: 36bcce430657 ("ath9k_htc: Handle storage devices") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
git commit c5328901aa1db134 "[S390] entry[64].S improvements" removed
the update of the exit_timer lowcore field from the critical section
cleanup of the .Lsysc_restore/.Lsysc_done and .Lio_restore/.Lio_done
blocks. If the PSW is updated by the critical section cleanup to point to
user space again, the interrupt entry code will do a vtime calculation
after the cleanup completed with an exit_timer value which has *not* been
updated. Due to this incorrect system time deltas are calculated.
If an interrupt occured with an old PSW between .Lsysc_restore/.Lsysc_done
or .Lio_restore/.Lio_done update __LC_EXIT_TIMER with the system entry
time of the interrupt.
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since linux v3.14 with commit 38dfac843cb6d7be1 ("vmcore: prevent PT_NOTE
p_memsz overflow during header update") on s390 we get the following
message in the kdump kernel:
The reason for this is that we don't create a final zero note in
the ELF header which the proc/vmcore code uses to find out the end
of the notes section (see also kernel/kexec_core.c:final_note()).
It still worked on s390 by chance because we (most of the time?) have the
byte pattern 0x6b6b6b6b after the notes section which also makes the notes
parsing code stop in update_note_header_size_elf64() because 0x6b6b6b6b is
interpreded as note size:
if ((real_sz + sz) > max_sz) {
pr_warn("Warning: Exceeded p_memsz, dropping P ...);
break;
}
So fix this and add the missing final note to the ELF header.
We don't have to adjust the memory size for ELF header ("alloc_size")
because the new ELF note still fits into the 0x1000 base memory.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 43530b69d758328d3ffe6ab98fd640463e8e3667 ("regulator: Use
regmap_read/write(), regmap_update_bits functions directly") intended
to replace working inline helper functions with standard regmap
calls. However, it also inverted the set/clear logic of the "CORE ADJ
Allowed" bit. That patch was clearly never tested, since without that
bit cleared, the core VDCDC1 voltage output does not react to I2C
configuration changes.
This patch fixes the issue by clearing the bit as in the original,
correct implementation. Note for stable back porting that, due to
subsequent driver churn, this patch will not apply on every kernel
version.
Fixes: 43530b69d758 ("regulator: Use regmap_read/write(), regmap_update_bits functions directly") Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code to fetch a 64-bit value from user space was entirely buggered,
and has been since the code was merged in early 2016 in commit b2f680380ddf ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit
kernels").
Happily the buggered routine is almost certainly entirely unused, since
the normal way to access user space memory is just with the non-inlined
"get_user()", and the inlined version didn't even historically exist.
The normal "get_user()" case is handled by external hand-written asm in
arch/x86/lib/getuser.S that doesn't have either of these issues.
There were two independent bugs in __get_user_asm_u64():
- it still did the STAC/CLAC user space access marking, even though
that is now done by the wrapper macros, see commit 11f1a4b9755f
("x86: reorganize SMAP handling in user space accesses").
This didn't result in a semantic error, it just means that the
inlined optimized version was hugely less efficient than the
allegedly slower standard version, since the CLAC/STAC overhead is
quite high on modern Intel CPU's.
- the double register %eax/%edx was marked as an output, but the %eax
part of it was touched early in the asm, and could thus clobber other
inputs to the asm that gcc didn't expect it to touch.
In particular, that meant that the generated code could look like
this:
mov (%eax),%eax
mov 0x4(%eax),%edx
where the load of %edx obviously was _supposed_ to be from the 32-bit
word that followed the source of %eax, but because %eax was
overwritten by the first instruction, the source of %edx was
basically random garbage.
The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark
the 64-bit output as early-clobber to let gcc know that no inputs should
alias with the output register.
Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Huawei folks reported a read out-of-bounds vulnerability in kvm pio emulation.
- "inb" instruction to access PIT Mod/Command register (ioport 0x43, write only,
a read should be ignored) in guest can get a random number.
- "rep insb" instruction to access PIT register port 0x43 can control memcpy()
in emulator_pio_in_emulated() to copy max 0x400 bytes but only read 1 bytes,
which will disclose the unimportant kernel memory in host but no crash.
The similar test program below can reproduce the read out-of-bounds vulnerability:
void hexdump(void *mem, unsigned int len)
{
unsigned int i, j;
The vcpu->arch.pio_data buffer is used by both in/out instrutions emulation
w/o clear after using which results in some random datas are left over in
the buffer. Guest reads port 0x43 will be ignored since it is write only,
however, the function kernel_pio() can't distigush this ignore from successfully
reads data from device's ioport. There is no new data fill the buffer from
port 0x43, however, emulator_pio_in_emulated() will copy the stale data in
the buffer to the guest unconditionally. This patch fixes it by clearing the
buffer before in instruction emulation to avoid to grant guest the stale data
in the buffer.
In addition, string I/O is not supported for in kernel device. So there is no
iteration to read ioport %RCX times for string I/O. The function kernel_pio()
just reads one round, and then copy the io size * %RCX to the guest unconditionally,
actually it copies the one round ioport data w/ other random datas which are left
over in the vcpu->arch.pio_data buffer to the guest. This patch fixes it by
introducing the string I/O support for in kernel device in order to grant the right
ioport datas to the guest.
Before the patch:
0x000000: fe 38 93 93 ff ff ab ab .8......
0x000008: ab ab ab ab ab ab ab ab ........
0x000010: ab ab ab ab ab ab ab ab ........
0x000018: ab ab ab ab ab ab ab ab ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
0x000000: f6 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
0x000018: 30 30 20 33 20 20 20 20 00 3
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
0x000000: f6 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
0x000018: 30 30 20 33 20 20 20 20 00 3
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
After the patch:
0x000000: 1e 02 f8 00 ff ff ab ab ........
0x000008: ab ab ab ab ab ab ab ab ........
0x000010: ab ab ab ab ab ab ab ab ........
0x000018: ab ab ab ab ab ab ab ab ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
0x000000: d2 e2 d2 df d2 db d2 d7 ........
0x000008: d2 d3 d2 cf d2 cb d2 c7 ........
0x000010: d2 c4 d2 c0 d2 bc d2 b8 ........
0x000018: d2 b4 d2 b0 d2 ac d2 a8 ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
0x000000: 00 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 00 00 00 00 ........
0x000018: 00 00 00 00 00 00 00 00 ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
This can be reproduced by run kvm-unit-tests/hyperv_stimer.flat w/
CONFIG_PREEMPT and CONFIG_DEBUG_PREEMPT enabled.
Safe access to per-CPU data requires a couple of constraints, though: the
thread working with the data cannot be preempted and it cannot be migrated
while it manipulates per-CPU variables. If the thread is preempted, the
thread that replaces it could try to work with the same variables; migration
to another CPU could also cause confusion. However there is no preemption
disable when reads host per-CPU tsc rate to calculate the current kvmclock
timestamp.
This patch fixes it by utilizing get_cpu/put_cpu pair to guarantee both
__this_cpu_read() and rdtsc() are not preempted.
Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SDM mentioned that "The MXCSR has several reserved bits, and attempting to write
a 1 to any of these bits will cause a general-protection exception(#GP) to be
generated". The syzkaller forks' testcase overrides xsave area w/ random values
and steps on the reserved bits of MXCSR register. The damaged MXCSR register
values of guest will be restored to SSEx MXCSR register before vmentry. This
patch fixes it by catching userspace override MXCSR register reserved bits w/
random values and bails out immediately.
Reported-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Modifying the attributes of a file makes ima_inode_post_setattr reset
the IMA cache flags. So if the file, which has just been created,
is opened a second time before the first file descriptor is closed,
verification fails since the security.ima xattr has not been written
yet. We therefore have to look at the IMA_NEW_FILE even if the file
already existed.
With this patch there should no longer be an error when cat tries to
open testfile:
Command buffers (skb's) are allocated by the main driver, and freed upon
the last use. That last use is often in mwifiex_free_cmd_buffer(). In
the meantime, if the command buffer gets used by the PCI driver, we map
it as DMA-able, and store the mapping information in the 'cb' memory.
However, if a command was in-flight when resetting the device (and
therefore was still mapped), we don't get a chance to unmap this memory
until after the core has cleaned up its command handling.
Let's keep a refcount within the PCI driver, so we ensure the memory
only gets freed after we've finished unmapping it.
Noticed by KASAN when forcing a reset via:
echo 1 > /sys/bus/pci/.../reset
The same code path can presumably be exercised in remove() and
shutdown().
This bug has been around in different forms for a while. It was sort of
noticed in commit 955ab095c51a ("mwifiex: Do not kfree cmd buf while
unregistering PCIe"), but it just fixed the double-free, without
acknowledging the potential for use-after-free.
Fixes: fc3314609047 ("mwifiex: use pci_alloc/free_consistent APIs for PCIe") Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nl80211 provides the NL80211_SCAN_FLAG_RANDOM_ADDR for every scan
request that should be randomized; the absence of such a flag means we
should not randomize. However, mwifiex was stashing the latest
randomization request and *always* using it for future scans, even those
that didn't set the flag.
Let's zero out the randomization info whenever we get a scan request
without NL80211_SCAN_FLAG_RANDOM_ADDR. I'd prefer to remove
priv->random_mac entirely (and plumb the randomization MAC properly
through the call sequence), but the spaghetti is a little difficult to
unravel here for me.
Fixes: c2a8f0ff9c6c ("mwifiex: support random MAC address for scanning") Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
if called md_set_readonly and set MD_CLOSING bit, the mddev cannot
be opened any more due to the MD_CLOING bit wasn't cleared. Thus it
needs to be cleared in md_ioctl after any call to md_set_readonly()
or do_md_stop().
Signed-off-by: NeilBrown <neilb@suse.com> Fixes: af8d8e6f0315 ("md: changes for MD_STILL_CLOSED flag") Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When growing raid5 device on machine with small memory, there is chance that
mdadm will be killed and the following bug report can be observed. The same
bug could also be reproduced in linux-4.10.6.
The problem is that resize_stripes() releases new stripe_heads before assigning new
slab cache to conf->slab_cache. If the shrinker function raid5_cache_scan() gets called
after resize_stripes() starting releasing new stripes but right before new slab cache
being assigned, it is possible that these new stripe_heads will be freed with the old
slab_cache which was already been destoryed and that triggers this bug.
Signed-off-by: Dennis Yang <dennisyang@qnap.com> Fixes: edbe83ab4c27 ("md/raid5: allow the stripe_cache to grow and shrink.") Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change the type of the parameter "retain_bytes" from unsigned to
unsigned long, so that on 64-bit machines the user can set more than
4GiB of data to be retained.
Also, change the type of the variable "count" in the function
"__evict_old_buffers" to unsigned long. The assignment
"count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];"
could result in unsigned long to unsigned overflow and that could result
in buffers not being freed when they should.
While at it, avoid division in get_retain_buffers(). Division is slow,
we can change it to shift because we have precalculated the log2 of
block size.
Otherwise it is possible to trigger crashes due to the metadata being
inaccessible yet these methods don't safely account for that possibility
without these checks.
Requeuing a request immediately while path initialization is ongoing
causes high CPU usage, something that is undesired. Hence delay
requeuing while path initialization is in progress.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If blk_get_request() fails, check whether the failure is due to a path
being removed. If that is the case, fail the path by triggering a call
to fail_path(). This avoids that the following scenario can be
encountered while removing paths:
* CPU usage of a kworker thread jumps to 100%.
* Removing the DM device becomes impossible.
Delay requeueing if blk_get_request() returns -EBUSY or -EWOULDBLOCK,
and the queue is not dying, because in these cases immediate requeuing
is inappropriate.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
activate_path() is renamed to activate_path_work() which now calls
activate_or_offline_path(). activate_or_offline_path() will be used
by the next commit.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If blk_get_request() returns ENODEV then multipath_clone_and_map()
causes a request to be requeued immediately. This can cause a kworker
thread to spend 100% of the CPU time of a single core in
__blk_mq_run_hw_queue() and also can cause device removal to never
finish.
Avoid this by only requeuing after a delay if blk_get_request() fails.
Additionally, reduce the requeue delay.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dm-bufio checks a watermark when it allocates a new buffer in
__bufio_new(). However, it doesn't check the watermark when the user
changes /sys/module/dm_bufio/parameters/max_cache_size_bytes.
This may result in a problem - if the watermark is high enough so that
all possible buffers are allocated and if the user lowers the value of
"max_cache_size_bytes", the watermark will never be checked against the
new value because no new buffer would be allocated.
To fix this, change __evict_old_buffers() so that it checks the
watermark. __evict_old_buffers() is called every 30 seconds, so if the
user reduces "max_cache_size_bytes", dm-bufio will react to this change
within 30 seconds and decrease memory consumption.
Depends-on: 1b0fb5a5b2 ("dm bufio: avoid a possible ABBA deadlock") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
__get_memory_limit() tests if dm_bufio_cache_size changed and calls
__cache_size_refresh() if it did. It takes dm_bufio_clients_lock while
it already holds the client lock. However, lock ordering is violated
because in cleanup_old_buffers() dm_bufio_clients_lock is taken before
the client lock.
This results in a possible deadlock and lockdep engine warning.
Fix this deadlock by changing mutex_lock() to mutex_trylock(). If the
lock can't be taken, it will be re-checked next time when a new buffer
is allocated.
Also add "unlikely" to the if condition, so that the optimizer assumes
that the condition is false.
Since the commit 0cf4503174c1 ("dm raid: add support for the MD RAID0
personality"), the dm-raid subsystem can activate a RAID-0 array.
Therefore, add MD_RAID0 to the dependencies of DM_RAID, so that MD_RAID0
will be selected when DM_RAID is selected.
Fixes: 0cf4503174c1 ("dm raid: add support for the MD RAID0 personality") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dm_btree_find_lowest_key() is giving incorrect results. find_key()
traverses the btree correctly for finding the highest key, but there is
an error in the way it traverses the btree for retrieving the lowest
key. dm_btree_find_lowest_key() fetches the first key of the rightmost
block of the btree instead of fetching the first key from the leftmost
block.
Fix this by conditionally passing the correct parameter to value64()
based on the @find_highest flag.
Signed-off-by: Erez Zadok <ezk@fsl.cs.sunysb.edu> Signed-off-by: Vinothkumar Raja <vinraja@cs.stonybrook.edu> Signed-off-by: Nidhi Panpalia <npanpalia@cs.stonybrook.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The infiniband address handle can be triggered to resolve an ipv6
address in response to MAD packets, regardless of the ipv6
module being disabled via the kernel command line argument.
That will cause a call into the ipv6 routing code, which is not
initialized, and a conseguent oops.
This commit addresses the above issue replacing the direct lookup
call with an indirect one via the ipv6 stub, which is properly
initialized according to the ipv6 status (e.g. if ipv6 is
disabled, the routing lookup fails gracefully)
When TPM2 log has entries with more than 3 digests, or with digests
not listed in the log header, log gets misparsed, eventually
leading to kernel complaint that code tried to vmalloc 512MB of
memory (I have no idea what would happen on bigger system).
So code should not parse only first 3 digests: both event header
and event itself are already in memory, so we can parse any number
of digests, as long as we do not try to parse whole memory when
given count of 0xFFFFFFFF.
So this change:
* Rejects event entry with more digests than log header describes.
Digest types should be unique, and all should be described in
log header, so there cannot be more digests in the event than in
the header.
* Reject event entry with digest that is not described in the
log header. In theory code could hardcode information about
digest IDs already assigned by TCG, but if firmware authors
cannot get event log format right, why should anyone believe
that they got event log content right.
Fixes: 4d23cc323cdb ("tpm: add securityfs support for TPM 2.0 firmware event log") Signed-off-by: Petr Vandrovec <petr@vmware.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code passes the address of tpm_chip as the argument to
dev_get_drvdata() without prior NULL check in
tpm_ibmvtpm_get_desired_dma. This resulted an oops during kernel
boot when vTPM is enabled in Power partition configured in active
memory sharing mode.
The vio_driver's get_desired_dma() is called before the probe(), which
for vtpm is tpm_ibmvtpm_probe, and it's this latter function that
initializes the driver and set data. Attempting to get data before
the probe() caused the problem.
This patch adds a NULL check to the tpm_ibmvtpm_get_desired_dma.
fixes: 9e0d39d8a6a0 ("tpm: Remove useless priv field in struct tpm_vendor_specific") Signed-off-by: Hon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com> Reviewed-by: Jarkko Sakkine <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure size of response buffer is at least 6 bytes, or
we will underflow and pass large size_t to memcpy_fromio().
This was encountered while testing earlier version of
locality patchset.
Fixes: 30fc8d138e912 ("tpm: TPM 2.0 CRB Interface") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, there is an unnecessary 1 msec delay added in
i2c_nuvoton_write_status() for the successful case. This
function is called multiple times during send() and recv(),
which implies adding multiple extra delays for every TPM
operation.
This patch calls usleep_range() only if retry is to be done.
Signed-off-by: Nayna Jain <nayna@linux.vnet.ibm.com> Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 500462a9de65 "timers: Switch to a non-cascading wheel" replaced
the 'classic' timer wheel, which aimed for near 'exact' expiry of the
timers. Their analysis was that the vast majority of timeout timers
are used as safeguards, not as real timers, and are cancelled or
rearmed before expiration. The only exception noted to this were
networking timers with a small expiry time.
Not included in the analysis was the TPM polling timer, which resulted
in a longer normal delay and, every so often, a very long delay. The
non-cascading wheel delay is based on CONFIG_HZ. For a description of
the different rings and their delays, refer to the comments in
kernel/time/timer.c.
Below are the delays given for rings 0 - 2, which explains the longer
"normal" delays and the very, long delays as seen on systems with
CONFIG_HZ 250.
* HZ 1000 steps
* Level Offset Granularity Range
* 0 0 1 ms 0 ms - 63 ms
* 1 64 8 ms 64 ms - 511 ms
* 2 128 64 ms 512 ms - 4095 ms (512ms - ~4s)
* HZ 250
* Level Offset Granularity Range
* 0 0 4 ms 0 ms - 255 ms
* 1 64 32 ms 256 ms - 2047 ms (256ms - ~2s)
* 2 128 256 ms 2048 ms - 16383 ms (~2s - ~16s)
Below is a comparison of extending the TPM with 1000 measurements,
using msleep() vs. usleep_delay() when configured for 1000 hz vs. 250
hz, before and after commit 500462a9de65.
Testing the implementation with a Raspberry Pi 2 showed that under some
circumstances its SPI master erroneously releases the CS line before the
transfer is complete, i.e. before the end of the last clock. In this case
the TPM ignores the transfer and misses for example the GO command. The
driver is unable to detect this communication problem and will wait for a
command response that is never going to arrive, timing out eventually.
As a workaround, the small delay ensures that the CS line is held long
enough, even with a faulty SPI master. Other SPI masters are not affected,
except for a negligible performance penalty.
Fixes: 0edbfea537d1 ("tpm/tpm_tis_spi: Add support for spi phy") Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> Signed-off-by: Peter Huewe <peter.huewe@infineon.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Benoit Houyere <benoit.houyere@st.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Limiting transfers to MAX_SPI_FRAMESIZE was not expected by the upper
layers, as tpm_tis has no such limitation. Add a loop to hide that
limitation.
v2: Moved scope of spi_message to the top as requested by Jarkko Fixes: 0edbfea537d1 ("tpm/tpm_tis_spi: Add support for spi phy") Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> Signed-off-by: Peter Huewe <peter.huewe@infineon.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Benoit Houyere <benoit.houyere@st.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Abort the transfer with ETIMEDOUT when the TPM signals more than
TPM_RETRY wait states. Continuing with the transfer in this state
will only lead to arbitrary failures in other parts of the code.
Fixes: 0edbfea537d1 ("tpm/tpm_tis_spi: Add support for spi phy") Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> Signed-off-by: Peter Huewe <peter.huewe@infineon.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Benoit Houyere <benoit.houyere@st.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The algorithm for sending data to the TPM is mostly identical to the
algorithm for receiving data from the TPM, so a single function is
sufficient to handle both cases.
This is a prequisite for all the other fixes, so we don't have to fix
everything twice (send/receive)
v2: u16 instead of u8 for the length. Fixes: 0edbfea537d1 ("tpm/tpm_tis_spi: Add support for spi phy") Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> Signed-off-by: Peter Huewe <peter.huewe@infineon.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Benoit Houyere <benoit.houyere@st.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When delivering an event to userspace for a file on an NFS share,
if the file is deleted on server side before user reads the event,
user will not get the event.
If the event queue contained several events, the stale event is
quietly dropped and read() returns to user with events read so far
in the buffer.
If the event queue contains a single stale event or if the stale
event is a permission event, read() returns to user with the kernel
internal error code 518 (EOPENSTALE), which is not a POSIX error code.
Check the internal return value -EOPENSTALE in fanotify_read(), just
the same as it is checked in path_openat() and drop the event in the
cases that it is not already dropped.
This is a reproducer from Marko Rauhamaa:
Just take the example program listed under "man fanotify" ("fantest")
and follow these steps:
==============================================================
NFS Server NFS Client(1) NFS Client(2)
==============================================================
# echo foo >/nfsshare/bar.txt
# cat /nfsshare/bar.txt
foo
# ./fantest /nfsshare
Press enter key to terminate.
Listening for events.
# rm -f /nfsshare/bar.txt
# cat /nfsshare/bar.txt
read: Unknown error 518
cat: /nfsshare/bar.txt: Operation not permitted
==============================================================
where NFS Client (1) and (2) are two terminal sessions on a single NFS
Client machine.
Reported-by: Marko Rauhamaa <marko.rauhamaa@f-secure.com> Tested-by: Marko Rauhamaa <marko.rauhamaa@f-secure.com> Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using jiffies in hdac_wait_for_cmd_dmas() to determine when to time out
when interrupts are off (snd_hdac_bus_stop_cmd_io()/spin_lock_irq())
causes hard lockup so unlock while waiting using jiffies.
TIS v1.3 for TPM 1.2 and PTP for TPM 2.0 disagree about which timeout
value applies to reading a valid burstcount. It is TIMEOUT_D according to
TIS, but TIMEOUT_A according to PTP, so choose the appropriate value
depending on whether we deal with a TPM 1.2 or a TPM 2.0.
This is important since according to the PTP TIMEOUT_D is much smaller
than TIMEOUT_A. So the previous implementation could run into timeouts
with a TPM 2.0, even though the TPM was behaving perfectly fine.
During tpm2_probe TIMEOUT_D will be used even with a TPM 2.0, because
TPM_CHIP_FLAG_TPM2 is not yet set. This is fine, since the timeout values
will only be changed afterwards by tpm_get_timeouts. Until then
TIS_TIMEOUT_D_MAX applies, which is large enough.
Fixes: aec04cbdf723 ("tpm: TPM 2.0 FIFO Interface") Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> Signed-off-by: Peter Huewe <peter.huewe@infineon.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Format specifier %p can leak kernel addresses while not valuing the
kptr_restrict system settings. When kptr_restrict is set to (1), kernel
pointers printed using the %pK format specifier will be replaced with
Zeros. Debugging Note : &pK prints only Zeros as address. If you need
actual address information, write 0 to kptr_restrict.
echo 0 > /proc/sys/kernel/kptr_restrict
[Found by poking around in a random vendor kernel tree, it would be nice
if someone would actually send these types of patches upstream - gkh]
The lp_setup() code doesn't apply any bounds checking when passing
"lp=none", and only in this case, resulting in an overflow of the
parport_nr[] array. All versions in Git history are affected.
Reported-By: Roee Hay <roee.hay@hcl.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ene_usb6250 sub-driver in usb-storage does USB I/O to buffers on
the stack, which doesn't work with vmapped stacks. This patch fixes
the problem by allocating a separate 512-byte buffer at probe time and
using it for all of the offending I/O operations.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-and-tested-by: Andreas Hartmann <andihartmann@01019freenet.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When built as a module and running with update_ms >= 0, pstore will Oops
during module unload since the work timer is still running. This makes sure
the worker is stopped before unloading.
The per-prz spinlock should be using the dynamic initializer so that
lockdep can correctly track it. Without this, under lockdep, we get a
warning at boot that the lock is in non-static memory.
Fixes: 109704492ef6 ("pstore: Make spinlock per zone instead of global") Fixes: 76d5692a5803 ("pstore: Correctly initialize spinlock and flags") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit c950fd6f201a kernel registers pstore write based on flag set.
Pstore write for powerpc is broken as flags(PSTORE_FLAGS_DMESG) is not set for
powerpc architecture. On panic, kernel doesn't write message to
/fs/pstore/dmesg*(Entry doesn't gets created at all).
This patch enables pstore write for powerpc architecture by setting
PSTORE_FLAGS_DMESG flag.
In commit 658922e57b84 "libnvdimm, pfn: fix memmap reservation sizing"
we arranged for the capacity to be allocated, but failed to also update
the 'npfns' parameter. This leads to cases where there is enough
capacity reserved to hold all the allocated sections, but
vmemmap_populate_hugepages() still encounters -ENOMEM from
altmap_alloc_block_buf().
This fix is a stop-gap until we can teach the core memory hotplug
implementation to permit sub-section hotplug.
Fixes: 658922e57b84 ("libnvdimm, pfn: fix memmap reservation sizing") Reported-by: Anisha Allada <anisha.allada@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In this case '&dev->nvdimm_mutex/3' mirrors '&dev->mutex'.
Fix this by replacing the use of device_lock() with nvdimm_bus_lock() to protect
nd_{attach,detach}_ndns() operations.
Fixes: 8c2f7e8658df ("libnvdimm: infrastructure for btt devices") Reported-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the case where a dimm does not have any associated flush hints the
ndrd->flush_wpq array may be uninitialized leading to crashes with the
following signature:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: region_visible+0x10f/0x160 [libnvdimm]
Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Fixes: f284a4f23752 ("libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
msg_written_handler() may set ssif_info->multi_data to NULL
when using ipmitool to write fru.
Before setting ssif_info->multi_data to NULL, add new local
pointer "data_to_send" and store correct i2c data pointer to
it to fix NULL pointer kernel panic and incorrect ssif_info->multi_pos.
The WRITE SAME to TRIM translation rewrites the DATA OUT buffer. While
the SCSI code accomodates for this by passing a read-writable buffer
userspace applications don't cater for this behavior. In fact it can
be used to rewrite e.g. a readonly file through mmap and should be
considered as a security fix.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cgroup_get() expected to be called only on live cgroups and triggers
warning on a dead cgroup; however, cgroup_sk_alloc() may be called
while cloning a socket which is left in an empty and removed cgroup
and thus may legitimately duplicate its reference on a dead cgroup.
This currently triggers the following warning spuriously.
This patch renames the existing cgroup_get() with the dead cgroup
warning to cgroup_get_live() after cgroup_kn_lock_live() and
introduces the new cgroup_get() which doesn't check whether the cgroup
is live or dead.
All existing cgroup_get() users except for cgroup_sk_alloc() are
converted to use cgroup_get_live().
Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets") Cc: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Chris Mason <clm@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure to check the tty-device pointer before looking up the sibling
platform device to avoid dereferencing a NULL-pointer when the tty is
one end of a Unix98 pty.
Make sure to check the tty-device pointer before looking up the sibling
platform device to avoid dereferencing a NULL-pointer when the tty is
one end of a Unix98 pty.
Running 32bit userspace on 64bit kernel results in MSG_CMSG_COMPAT being
defined as 0x80000000. This results in sendmsg failure if used from 32bit
userspace running on 64bit kernel. Fix this by accounting for MSG_CMSG_COMPAT
in flags check in hci_sock_sendmsg.
Define a new early console name for Qualcomm Datacenter Technologies
QDF2400 SOCs affected by erratum 44, instead of piggy-backing on "pl011".
Previously, to enable traditional (non-SPCR) earlycon, the documentation
said to specify "earlycon=pl011,<address>,qdf2400_e44", but the code was
broken and this didn't actually work.
So instead, the method for specifying the E44 work-around with traditional
earlycon is "earlycon=qdf2400_e44,<address>". Both methods of earlycon
are now enabled with the same function.
Fixes: e53e597fd4c4 ("tty: pl011: fix earlycon work-around for QDF2400 erratum 44") Signed-off-by: Timur Tabi <timur@codeaurora.org> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While using emacs, cat or others' commands in konsole with recent
kernels, I have met many times that CTRL-C freeze konsole. After
konsole freeze I can't type anything, then I have to open a new one,
it is very annoying.
See bug report:
https://bugs.kde.org/show_bug.cgi?id=175283
The platform in that bug report is Solaris, but now the pty in linux
has the same problem or the same behavior as Solaris :)
It has high possibility to trigger the problem follow steps below:
Note: In my test, BigFile is a text file whose size is bigger than 1G
1:open konsole
1:cat BigFile
2:CTRL-C
After some digging, I find out the reason is that commit 1d1d14da12e7
("pty: Fix buffer flush deadlock") changes the behavior of pty_flush_buffer.
Thread A Thread B
-------- --------
1:n_tty_poll return POLLIN
2:CTRL-C trigger pty_flush_buffer
tty_buffer_flush
n_tty_flush_buffer
3:attempt to check count of chars:
ioctl(fd, TIOCINQ, &available)
available is equal to 0
4:read(fd, buffer, avaiable)
return 0
5:konsole close fd
Yes, I know we could use the same patch included in the BUG report as
a workaround for linux platform too. But I think the data in ldisc is
belong to application of another side, we shouldn't clear it when we
want to flush write buffer of this side in pty_flush_buffer. So I think
it is better to disable ldisc flush in pty_flush_buffer, because its new
hehavior bring no benefit except that it mess up the behavior between
POLLIN, and TIOCINQ or FIONREAD.
Also I find no flush_buffer function in others' tty driver has the
same behavior as current pty_flush_buffer.
Make sure to actually suspend the device before returning after a failed
(or deferred) probe.
Note that autosuspend must be disabled before runtime pm is disabled in
order to balance the usage count due to a negative autosuspend delay as
well as to make the final put suspend the device synchronously.
Fixes: 388bc2622680 ("omap-serial: Fix the error handling in the omap_serial probe") Cc: Shubhrajyoti D <shubhrajyoti@ti.com> Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An unbalanced and misplaced synchronous put was used to suspend the
device on driver unbind, something which with a likewise misplaced
pm_runtime_disable leads to external aborts when an open port is being
removed.
Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa024010
...
[<c046e760>] (serial_omap_set_mctrl) from [<c046a064>] (uart_update_mctrl+0x50/0x60)
[<c046a064>] (uart_update_mctrl) from [<c046a400>] (uart_shutdown+0xbc/0x138)
[<c046a400>] (uart_shutdown) from [<c046bd2c>] (uart_hangup+0x94/0x190)
[<c046bd2c>] (uart_hangup) from [<c045b760>] (__tty_hangup+0x404/0x41c)
[<c045b760>] (__tty_hangup) from [<c045b794>] (tty_vhangup+0x1c/0x20)
[<c045b794>] (tty_vhangup) from [<c046ccc8>] (uart_remove_one_port+0xec/0x260)
[<c046ccc8>] (uart_remove_one_port) from [<c046ef4c>] (serial_omap_remove+0x40/0x60)
[<c046ef4c>] (serial_omap_remove) from [<c04845e8>] (platform_drv_remove+0x34/0x4c)
Fix this up by resuming the device before deregistering the port and by
suspending and disabling runtime pm only after the port has been
removed.
Also make sure to disable autosuspend before disabling runtime pm so
that the usage count is balanced and device actually suspended before
returning.
Note that due to a negative autosuspend delay being set in probe, the
unbalanced put would actually suspend the device on first driver unbind,
while rebinding and again unbinding would result in a negative
power.usage_count.
Fixes: 7e9c8e7dbf3b ("serial: omap: make sure to suspend device before remove") Cc: Felipe Balbi <balbi@kernel.org> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds missing checks for dma_map_single() failure and proper error
reporting. Although this issue was harmless on ARM architecture, it is always
good to use the DMA mapping API in a proper way. This patch fixes the following
DMA API debug warning:
WARNING: CPU: 1 PID: 3785 at lib/dma-debug.c:1171 check_unmap+0x8a0/0xf28
dma-pl330 121a0000.pdma: DMA-API: device driver failed to check map error[device address=0x000000006e0f9000] [size=4096 bytes] [mapped as single]
Modules linked in:
CPU: 1 PID: 3785 Comm: (agetty) Tainted: G W 4.11.0-rc1-00137-g07ca963-dirty #59
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[<c011aaa4>] (unwind_backtrace) from [<c01127c0>] (show_stack+0x20/0x24)
[<c01127c0>] (show_stack) from [<c06ba5d8>] (dump_stack+0x84/0xa0)
[<c06ba5d8>] (dump_stack) from [<c0139528>] (__warn+0x14c/0x180)
[<c0139528>] (__warn) from [<c01395a4>] (warn_slowpath_fmt+0x48/0x50)
[<c01395a4>] (warn_slowpath_fmt) from [<c072a114>] (check_unmap+0x8a0/0xf28)
[<c072a114>] (check_unmap) from [<c072a834>] (debug_dma_unmap_page+0x98/0xc8)
[<c072a834>] (debug_dma_unmap_page) from [<c0803874>] (s3c24xx_serial_shutdown+0x314/0x52c)
[<c0803874>] (s3c24xx_serial_shutdown) from [<c07f5124>] (uart_port_shutdown+0x54/0x88)
[<c07f5124>] (uart_port_shutdown) from [<c07f522c>] (uart_shutdown+0xd4/0x110)
[<c07f522c>] (uart_shutdown) from [<c07f6a8c>] (uart_hangup+0x9c/0x208)
[<c07f6a8c>] (uart_hangup) from [<c07c426c>] (__tty_hangup+0x49c/0x634)
[<c07c426c>] (__tty_hangup) from [<c07c78ac>] (tty_ioctl+0xc88/0x16e4)
[<c07c78ac>] (tty_ioctl) from [<c03b5f2c>] (do_vfs_ioctl+0xc4/0xd10)
[<c03b5f2c>] (do_vfs_ioctl) from [<c03b6bf4>] (SyS_ioctl+0x7c/0x8c)
[<c03b6bf4>] (SyS_ioctl) from [<c010b4a0>] (ret_fast_syscall+0x0/0x3c)
Driver should provide its own struct device for all DMA-mapping calls instead
of extracting device pointer from DMA engine channel. Although this is harmless
from the driver operation perspective on ARM architecture, it is always good
to use the DMA mapping API in a proper way. This patch fixes following DMA API
debug warning:
WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1241 check_sync+0x520/0x9f4
samsung-uart 12c20000.serial: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000006df0f580] [size=64 bytes]
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-00137-g07ca963 #51
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[<c011aaa4>] (unwind_backtrace) from [<c01127c0>] (show_stack+0x20/0x24)
[<c01127c0>] (show_stack) from [<c06ba5d8>] (dump_stack+0x84/0xa0)
[<c06ba5d8>] (dump_stack) from [<c0139528>] (__warn+0x14c/0x180)
[<c0139528>] (__warn) from [<c01395a4>] (warn_slowpath_fmt+0x48/0x50)
[<c01395a4>] (warn_slowpath_fmt) from [<c0729058>] (check_sync+0x520/0x9f4)
[<c0729058>] (check_sync) from [<c072967c>] (debug_dma_sync_single_for_device+0x88/0xc8)
[<c072967c>] (debug_dma_sync_single_for_device) from [<c0803c10>] (s3c24xx_serial_start_tx_dma+0x100/0x2f8)
[<c0803c10>] (s3c24xx_serial_start_tx_dma) from [<c0804338>] (s3c24xx_serial_tx_chars+0x198/0x33c)
When accessing an encrypted directory without the key, userspace must
operate on filenames derived from the ciphertext names, which contain
arbitrary bytes. Since we must support filenames as long as NAME_MAX,
we can't always just base64-encode the ciphertext, since that may make
it too long. Currently, this is solved by presenting long names in an
abbreviated form containing any needed filesystem-specific hashes (e.g.
to identify a directory block), then the last 16 bytes of ciphertext.
This needs to be sufficient to identify the actual name on lookup.
However, there is a bug. It seems to have been assumed that due to the
use of a CBC (ciphertext block chaining)-based encryption mode, the last
16 bytes (i.e. the AES block size) of ciphertext would depend on the
full plaintext, preventing collisions. However, we actually use CBC
with ciphertext stealing (CTS), which handles the last two blocks
specially, causing them to appear "flipped". Thus, it's actually the
second-to-last block which depends on the full plaintext.
This caused long filenames that differ only near the end of their
plaintexts to, when observed without the key, point to the wrong inode
and be undeletable. For example, with ext4:
To fix this, when presenting long encrypted filenames, encode the
second-to-last block of ciphertext rather than the last 16 bytes.
Although it would be nice to solve this without depending on a specific
encryption mode, that would mean doing a cryptographic hash like SHA-256
which would be much less efficient. This way is sufficient for now, and
it's still compatible with encryption modes like HEH which are strong
pseudorandom permutations. Also, changing the presented names is still
allowed at any time because they are only provided to allow applications
to do things like delete encrypted directories. They're not designed to
be used to persistently identify files --- which would be hard to do
anyway, given that they're encrypted after all.
For ease of backports, this patch only makes the minimal fix to both
ext4 and f2fs. It leaves ubifs as-is, since ubifs doesn't compare the
ciphertext block yet. Follow-on patches will clean things up properly
and make the filesystems use a shared helper function.
Fixes: 5de0b4d0cd15 ("ext4 crypto: simplify and speed up filename encryption") Reported-by: Gwendal Grignou <gwendal@chromium.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To mitigate some types of offline attacks, filesystem encryption is
designed to enforce that all files in an encrypted directory tree use
the same encryption policy (i.e. the same encryption context excluding
the nonce). However, the fscrypt_has_permitted_context() function which
enforces this relies on comparing struct fscrypt_info's, which are only
available when we have the encryption keys. This can cause two
incorrect behaviors:
1. If we have the parent directory's key but not the child's key, or
vice versa, then fscrypt_has_permitted_context() returned false,
causing applications to see EPERM or ENOKEY. This is incorrect if
the encryption contexts are in fact consistent. Although we'd
normally have either both keys or neither key in that case since the
master_key_descriptors would be the same, this is not guaranteed
because keys can be added or removed from keyrings at any time.
2. If we have neither the parent's key nor the child's key, then
fscrypt_has_permitted_context() returned true, causing applications
to see no error (or else an error for some other reason). This is
incorrect if the encryption contexts are in fact inconsistent, since
in that case we should deny access.
To fix this, retrieve and compare the fscrypt_contexts if we are unable
to set up both fscrypt_infos.
While this slightly hurts performance when accessing an encrypted
directory tree without the key, this isn't a case we really need to be
optimizing for; access *with* the key is much more important.
Furthermore, the performance hit is barely noticeable given that we are
already retrieving the fscrypt_context and doing two keyring searches in
fscrypt_get_encryption_info(). If we ever actually wanted to optimize
this case we might start by caching the fscrypt_contexts.
In OpenRISC we do not have a bootloader passed initrd, but the built in
initramfs does contain the /init and other binaries, including modules.
The previous commit 08865514805d2 ("initramfs: finish fput() before
accessing any binary from initramfs") made a change to only call fput()
if the bootloader initrd was available, this caused intermittent crashes
for OpenRISC.
This patch changes the fput() to happen unconditionally if any rootfs is
loaded. Also, I added some comments to make it a bit more clear why we
call unpack_to_rootfs() multiple times.
Fixes: 08865514805d2 ("initramfs: finish fput() before accessing any binary from initramfs") Cc: Lokesh Vutla <lokeshvutla@ti.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions. generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions.
Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.
Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 CC: Jaegeuk Kim <jaegeuk@kernel.org> CC: linux-f2fs-devel@lists.sourceforge.net Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If user has no key under an encrypted dir, fscrypt gives digested dentries.
Previously, when looking up a dentry, f2fs only checks its hash value with
first 4 bytes of the digested dentry, which didn't handle hash collisions fully.
This patch enhances to check entire dentry bytes likewise ext4.
Eric reported how to reproduce this issue by:
# seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
# find edir -type f | xargs stat -c %i | sort | uniq | wc -l
100000
# sync
# echo 3 > /proc/sys/vm/drop_caches
# keyctl new_session
# find edir -type f | xargs stat -c %i | sort | uniq | wc -l
99999
Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(fixed f2fs_dentry_hash() to work even when the hash is 0) Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 88c5c13a5027 (f2fs: fix multiple f2fs_add_link() calls having
same name) does not cover the scenario where inline dentry is enabled.
In that case, F2FS_I(dir)->task will be NULL, and __f2fs_add_link will
lookup dentries one more time.
This patch fixes it by moving the assigment of current task to a upper
level to cover both normal and inline dentry.
Fixes: 88c5c13a5027 (f2fs: fix multiple f2fs_add_link() calls having same name) Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This makes a leak to register dirty segments. I reproduced the issue by
modified postmark which injects a lot of file create/delete/update and
finally triggers huge number of SSR allocations.
This patch fixes missing increased max cost caused by a patch that we increased
cose of data segments in greedy algorithm.
Fixes: b9cd20619 "f2fs: node segment is prior to data segment selected victim" Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
grab_mapping_entry()
- we add huge zero page to the radix tree
and map it to page tables
The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.
Fix the problem by locking exception entry before mapping blocks for the
fault. That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).
Fixes: 9f141d6ef6258 ("dax: Call ->iomap_begin without entry lock during dax fault") Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DAX will return to locking exceptional entry before mapping blocks for a
page fault to fix possible races with concurrent writes. To avoid lock
inversion between exceptional entry lock and transaction start, start
the transaction already in ext4_dax_huge_fault().
Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956 Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, we didn't invalidate page tables during invalidate_inode_pages2()
for DAX. That could result in e.g. 2MiB zero page being mapped into
page tables while there were already underlying blocks allocated and
thus data seen through mmap were different from data seen by read(2).
The following sequence reproduces the problem:
- open an mmap over a 2MiB hole
- read from a 2MiB hole, faulting in a 2MiB zero page
- write to the hole with write(3p). The write succeeds but we
incorrectly leave the 2MiB zero page mapping intact.
- via the mmap, read the data that was just written. Since the zero
page mapping is still intact we read back zeroes instead of the new
data.
Fix the problem by unconditionally calling invalidate_inode_pages2_range()
in dax_iomap_actor() for new block allocations and by properly
invalidating page tables in invalidate_inode_pages2_range() for DAX
mappings.
Fixes: c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55 Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
v4.
This series fixes data corruption that can happen for DAX mounts when
page faults race with write(2) and as a result page tables get out of
sync with block mappings in the filesystem and thus data seen through
mmap is different from data seen through read(2).
The series passes testing with t_mmap_stale test program from Ross and
also other mmap related tests on DAX filesystem.
This patch (of 4):
dax_invalidate_mapping_entry() currently removes DAX exceptional entries
only if they are clean and unlocked. This is done via:
However, for page cache pages removed in invalidate_mapping_pages()
there is an additional criteria which is that the page must not be
mapped. This is noted in the comments above invalidate_mapping_pages()
and is checked in invalidate_inode_page().
For DAX entries this means that we can can end up in a situation where a
DAX exceptional entry, either a huge zero page or a regular DAX entry,
could end up mapped but without an associated radix tree entry. This is
inconsistent with the rest of the DAX code and with what happens in the
page cache case.
We aren't able to unmap the DAX exceptional entry because according to
its comments invalidate_mapping_pages() isn't allowed to block, and
unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.
Since we essentially never have unmapped DAX entries to evict from the
radix tree, just remove dax_invalidate_mapping_entry().
Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate") Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Usage of device_lock() for dax_region attributes is unnecessary and
deadlock prone. It's unnecessary because the order of registration /
un-registration guarantees that drvdata is always valid. It's deadlock
prone because it sets up this situation:
ndctl/2170 is trying to acquire the device_lock() to read an attribute,
and ndctl/2186 is holding the device_lock() while trying to drain all
active attribute readers.
Thanks to Yi Zhang for the reproduction script.
Fixes: d7fe1a67f658 ("dax: add region 'id', 'size', and 'align' attributes") Reported-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If device_add() fails, cleanup the cdev. Otherwise, we leak a kobj_map()
with a stale device number.
As Jason points out, there is a small possibility that userspace has
opened and mapped the device in the time between cdev_add() and the
device_add() failure. We need a new kill_dax_dev() helper to invalidate
any established mappings.
Fixes: ba09c01d2fa8 ("dax: convert to the cdev api") Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fix_sync_read_error() modifies a bio on a newly faulty
device by setting bi_end_io to end_sync_write.
This ensure that put_buf() will still call rdev_dec_pending()
as required, but makes sure that subsequent code in
fix_sync_read_error() doesn't try to read from the device.
Unfortunately this interacts badly with sync_request_write()
which assumes that any bio with bi_end_io set to non-NULL
other than end_sync_read is safe to write to.
As the device is now faulty it doesn't make sense to write.
As the bio was recently used for a read, it is "dirty"
and not suitable for immediate submission.
In particular, ->bi_next might be non-NULL, which will cause
generic_make_request() to complain.
Break this interaction by refusing to write to devices
which are marked as Faulty.
Reported-and-tested-by: Michael Wang <yun.wang@profitbricks.com> Fixes: 2e52d449bcec ("md/raid1: add failfast handling for reads.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The author meant to free the variable that was just allocated, instead
of the one that failed to be allocated, but made a simple typo. This
patch rectifies that.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An open directory may have a NULL private_data pointer prior to readdir.
Fixes: 0de1f4c6f6c0 ("Add way to query server fs info for smb3") Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a real warning. Since the oplock is queued on the same
workqueue this can deadlock if there is only one worker thread active
for the workqueue (which will be the case during memory pressure when
the rescuer thread is handling it).
Furthermore, there is at least one other kind of hang possible due to
the oplock break handling if there is only worker. (This can be
reproduced without introducing memory pressure by having passing 1 for
the max_active parameter of cifsiod.) cifs_oplock_break() can wait
indefintely in the filemap_fdatawait() while the cifs_writev_complete()
work is blocked:
Showing all locks held in the system:
2 locks held by kworker/0:1/16:
#0: ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0
#1: ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0
Fix these problems by creating a a new workqueue (with a rescuer) for
the oplock break work.
Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As with 618763958b22, an open directory may have a NULL private_data
pointer prior to readdir. CIFS_ENUMERATE_SNAPSHOTS must check for this
before dereference.
Fixes: 834170c85978 ("Enable previous version support") Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The server may respond with success, and an output buffer less than
sizeof(struct smb_snapshot_array) in length. Do not leak the output
buffer in this case.
Fixes: 834170c85978 ("Enable previous version support") Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Macs send the maximum buffer size in response on ioctl to validate
negotiate security information, which causes us to fail the mount
as the response buffer is larger than the expected response.
Changed ioctl response processing to allow for padding of validate
negotiate ioctl response and limit the maximum response size to
maximum buffer size.
Signed-off-by: Steve French <steve.french@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>