This fixes a Spectre-v1/L1TF vulnerability in ioapic_write_indirect().
This function contains index computations based on the
(attacker-controlled) IOREGSEL register.
This patch depends on patch
"KVM: x86: Protect ioapic_read_indirect() from Spectre-v1/L1TF attacks".
Fixes: 70f93dae32ac ("KVM: Use temporary variable to shorten lines.") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes Spectre-v1/L1TF vulnerabilities in kvm_hv_msr_get_crash_data()
and kvm_hv_msr_set_crash_data().
These functions contain index computations that use the
(attacker-controlled) MSR number.
Fixes: e7d9513b60e8 ("kvm/x86: added hyper-v crash msrs into kvm hyperv context") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes a Spectre-v1/L1TF vulnerability in kvm_lapic_reg_write().
This function contains index computations based on the
(attacker-controlled) MSR number.
Fixes: 0105d1a52640 ("KVM: x2apic interface to lapic") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes a Spectre-v1/L1TF vulnerability in __kvm_set_dr() and
kvm_get_dr().
Both kvm_get_dr() and kvm_set_dr() (a wrapper of __kvm_set_dr()) are
exported symbols so KVM should tream them conservatively from a security
perspective.
Fixes: 020df0794f57 ("KVM: move DR register access handling into generic code") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes Spectre-v1/L1TF vulnerabilities in intel_find_fixed_event()
and intel_rdpmc_ecx_to_pmc().
kvm_rdpmc() (ancestor of intel_find_fixed_event()) and
reprogram_fixed_counter() (ancestor of intel_rdpmc_ecx_to_pmc()) are
exported symbols so KVM should treat them conservatively from a security
perspective.
Fixes: 25462f7f5295 ("KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes Spectre-v1/L1TF vulnerabilities in
vmx_read_guest_seg_selector(), vmx_read_guest_seg_base(),
vmx_read_guest_seg_limit() and vmx_read_guest_seg_ar(). When
invoked from emulation, these functions contain index computations
based on the (attacker-influenced) segment value. Using constants
prevents the attack.
Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes a Spectre-v1/L1TF vulnerability in picdev_write().
It replaces index computations based on the (attacked-controlled) port
number with constants through a minor refactoring.
Fixes: 85f455f7ddbe ("KVM: Add support for in-kernel PIC emulation") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we have nested or circular eventfd wakeups, then we can deadlock if
we run them inline from our poll waitqueue wakeup handler. It's also
possible to have very long chains of notifications, to the extent where
we could risk blowing the stack.
Check the eventfd recursion count before calling eventfd_signal(). If
it's non-zero, then punt the signaling to async context. This is always
safe, as it takes us out-of-line in terms of stack and locking context.
eventfd use cases from aio and io_uring can deadlock due to circular
or resursive calling, when eventfd_signal() tries to grab the waitqueue
lock. On top of that, it's also possible to construct notification
chains that are deep enough that we could blow the stack.
Add a percpu counter that tracks the percpu recursion depth, warn if we
exceed it. The counter is also exposed so that users of eventfd_signal()
can do the right thing if it's non-zero in the context where it is
called.
In year 2007 high performance SSD was still expensive, in order to
save more space for real workload or meta data, the readahead I/Os
for non-meta data was bypassed and not cached on SSD.
In now days, SSD price drops a lot and people can find larger size
SSD with more comfortable price. It is unncessary to alway bypass
normal readahead I/Os to save SSD space for now.
This patch adds options for readahead data cache policies via sysfs
file /sys/block/bcache<N>/readahead_cache_policy, the options are,
- "all": cache all readahead data I/Os.
- "meta-only": only cache meta data, and bypass other regular I/Os.
If users want to make bcache continue to only cache readahead request
for metadata and bypass regular data readahead, please set "meta-only"
to this sysfs file. By default, bcache will back to cache all read-
ahead requests now.
Cc: stable@vger.kernel.org Signed-off-by: Coly Li <colyli@suse.de> Acked-by: Eric Wheeler <bcache@linux.ewheeler.net> Cc: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After the commit 44ea39420fc9 ("drivers/watchdog: make use of
devm_register_reboot_notifier()") the struct notifier_block reboot_nb in
the struct watchdog_device is removed from the reboot notifiers chain at
the time watchdog's chardev is closed. But at least in i6300esb.c case
reboot_nb is embedded in the struct esb_dev which can be freed on its
device removal and before the chardev is closed, thus UAF at reboot:
[ 7.728581] esb_probe: esb_dev.watchdog_device ffff91316f91ab28
ts# uname -r note the address ^^^
5.5.0-rc5-ae6088-wdog
ts# ./openwdog0 &
[1] 696
ts# opened /dev/watchdog0, sleeping 10s...
ts# echo 1 > /sys/devices/pci0000\:00/0000\:00\:09.0/remove
[ 178.086079] devres:rel_nodes: dev ffff91317668a0b0 data ffff91316f91ab28
esb_dev.watchdog_device.reboot_nb memory is freed here ^^^
ts# ...woken up
[ 181.459010] devres:rel_nodes: dev ffff913171781000 data ffff913174a1dae8
[ 181.460195] devm_unreg_reboot_notifier: res ffff913174a1dae8 nb ffff91316f91ab78
attempt to use memory already freed ^^^
[ 181.461063] devm_unreg_reboot_notifier: nb->call 6b6b6b6b6b6b6b6b
[ 181.461243] devm_unreg_reboot_notifier: nb->next 6b6b6b6b6b6b6b6b
freed memory is filled with a slub poison ^^^
[1]+ Done ./openwdog0
ts# reboot
[ 229.921862] systemd-shutdown[1]: Rebooting.
[ 229.939265] notifier_call_chain: nb ffffffff9c6c2f20 nb->next ffffffff9c6d50c0
[ 229.943080] notifier_call_chain: nb ffffffff9c6d50c0 nb->next 6b6b6b6b6b6b6b6b
[ 229.946054] notifier_call_chain: nb 6b6b6b6b6b6b6b6b INVAL
[ 229.957584] general protection fault: 0000 [#1] SMP
[ 229.958770] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 5.5.0-rc5-ae6088-wdog
[ 229.960224] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
[ 229.963288] RIP: 0010:notifier_call_chain+0x66/0xd0
[ 229.969082] RSP: 0018:ffffb20dc0013d88 EFLAGS: 00010246
[ 229.970812] RAX: 000000000000002e RBX: 6b6b6b6b6b6b6b6b RCX: 00000000000008b3
[ 229.972929] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff9ccc46ac
[ 229.975028] RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000000008b3
[ 229.977039] R10: 0000000000000001 R11: ffffffff9c26c740 R12: 0000000000000000
[ 229.979155] R13: 6b6b6b6b6b6b6b6b R14: 0000000000000000 R15: 00000000fffffffa
... slub_debug=FZP poison ^^^
[ 229.989089] Call Trace:
[ 229.990157] blocking_notifier_call_chain+0x43/0x59
[ 229.991401] kernel_restart_prepare+0x14/0x30
[ 229.992607] kernel_restart+0x9/0x30
[ 229.993800] __do_sys_reboot+0x1d2/0x210
[ 230.000149] do_syscall_64+0x3d/0x130
[ 230.001277] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 230.002639] RIP: 0033:0x7f5461bdd177
[ 230.016402] Modules linked in: i6300esb
[ 230.050261] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Fix the crash by reverting 44ea39420fc9 so unregister_reboot_notifier()
is called when watchdog device is removed. This also makes handling of
the reboot notifier unified with the handling of the restart handler,
which is freed with unregister_restart_handler() in the same place.
Fixes: 44ea39420fc9 ("drivers/watchdog: make use of devm_register_reboot_notifier()") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Vladis Dronov <vdronov@redhat.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20200108125347.6067-1-vdronov@redhat.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3aa6c19d2f38be ("xen/balloon: Support xend-based toolstack")
tried to fix a regression with running on rather ancient Xen versions.
Unfortunately the fix was based on the assumption that xend would
just use another Xenstore node, but in reality only some downstream
versions of xend are doing that. The upstream xend does not write
that Xenstore node at all, so the problem must be fixed in another
way.
The easiest way to achieve that is to fall back to the behavior
before commit 96edd61dcf4436 ("xen/balloon: don't online new memory
initially") in case the static memory maximum can't be read.
This is achieved by setting static_max to the current number of
memory pages known by the system resulting in target_diff becoming
zero.
Fixes: 3aa6c19d2f38be ("xen/balloon: Support xend-based toolstack") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> # 4.13 Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The filter name is fixed to "exit_reason" for some kvm_exit events, no
matter what architect we have. Actually, the filter name ("exit_reason")
is only applicable to x86, meaning it's broken on other architects
including aarch64.
This fixes the issue by providing various kvm_exit filter names, depending
on architect we're on. Afterwards, the variable filter name is picked and
applied through ioctl(fd, SET_FILTER).
Reported-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org Tested-by: Nick French <nickfrench@gmail.com> Reported-by: Nick French <nickfrench@gmail.com> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In gfs2_file_write_iter, for direct writes, the error checking in the buffered
write fallback case is incomplete. This can cause inode write errors to go
undetected. Fix and clean up gfs2_file_write_iter along the way.
Based on a proposed fix by Christoph Hellwig <hch@lst.de>.
Fixes: 967bcc91b044 ("gfs2: iomap direct I/O support") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When upcalling gssproxy, cache_head.expiry_time is set as a
timeval, not seconds since boot. As such, RPC cache expiry
logic will not clean expired objects created under
auth.rpcsec.context cache.
This has proven to cause kernel memory leaks on field. Using
64 bit variants of getboottime/timespec
Expiration times have worked this way since 2010's c5b29f885afe "sunrpc:
use seconds since boot in expiry cache". The gssproxy code introduced
in 2012 added gss_proxy_save_rsc and introduced the bug. That's a while
for this to lurk, but it required a bit of an extreme case to make it
obvious.
Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Cc: stable@vger.kernel.org Fixes: 030d794bf498 "SUNRPC: Use gssproxy upcall for server..." Tested-By: Frank Sorenson <sorenson@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The IGTK keys are only removed by mac80211 after it has already
removed the AP station. This causes the driver to throw an error
because mac80211 is trying to remove the IGTK when the station doesn't
exist anymore.
The firmware is aware that the station has been removed and can deal
with it the next time we try to add an IGTK for a station, so we
shouldn't try to remove the key if the station ID is
IWL_MVM_INVALID_STA. Do this by removing the check for mvm_sta before
calling iwl_mvm_send_sta_igtk() and check return from that function
gracefully if the station ID is invalid.
Cc: stable@vger.kernel.org # 4.12+ Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For a little over a year, U-Boot has configured the flow controller to
perform automatic RAM re-repair on off->on power transitions of the CPU
rail[1]. This is mandatory for correct operation of Tegra124. However,
RAM re-repair relies on certain clocks, which the kernel must enable and
leave running. PLLP is one of those clocks. This clock is shut down
during LP1 in order to save power. Enable bypass (which I believe routes
osc_div_clk, essentially the crystal clock, to the PLL output) so that
this clock signal toggles even though the PLL is not active. This is
required so that LP1 power mode (system suspend) operates correctly.
The bypass configuration must then be undone when resuming from LP1, so
that all peripheral clocks run at the expected rate. Without this, many
peripherals won't work correctly; for example, the UART baud rate would
be incorrect.
NVIDIA's downstream kernel code only does this if not compiled for
Tegra30, so the added code is made conditional upon the chip ID.
NVIDIA's downstream code makes this change conditional upon the active
CPU cluster. The upstream kernel currently doesn't support cluster
switching, so this patch doesn't test the active CPU cluster ID.
There is a race between adding and removing elements to the tree mod log
list and rbtree that can lead to use-after-free problems.
Consider the following example that explains how/why the problems happens:
1) Task A has mod log element with sequence number 200. It currently is
the only element in the mod log list;
2) Task A calls btrfs_put_tree_mod_seq() because it no longer needs to
access the tree mod log. When it enters the function, it initializes
'min_seq' to (u64)-1. Then it acquires the lock 'tree_mod_seq_lock'
before checking if there are other elements in the mod seq list.
Since the list it empty, 'min_seq' remains set to (u64)-1. Then it
unlocks the lock 'tree_mod_seq_lock';
3) Before task A acquires the lock 'tree_mod_log_lock', task B adds
itself to the mod seq list through btrfs_get_tree_mod_seq() and gets a
sequence number of 201;
4) Some other task, name it task C, modifies a btree and because there
elements in the mod seq list, it adds a tree mod elem to the tree
mod log rbtree. That node added to the mod log rbtree is assigned
a sequence number of 202;
5) Task B, which is doing fiemap and resolving indirect back references,
calls btrfs get_old_root(), with 'time_seq' == 201, which in turn
calls tree_mod_log_search() - the search returns the mod log node
from the rbtree with sequence number 202, created by task C;
6) Task A now acquires the lock 'tree_mod_log_lock', starts iterating
the mod log rbtree and finds the node with sequence number 202. Since
202 is less than the previously computed 'min_seq', (u64)-1, it
removes the node and frees it;
7) Task B still has a pointer to the node with sequence number 202, and
it dereferences the pointer itself and through the call to
__tree_mod_log_rewind(), resulting in a use-after-free problem.
This issue can be triggered sporadically with the test case generic/561
from fstests, and it happens more frequently with a higher number of
duperemove processes. When it happens to me, it either freezes the VM or
it produces a trace like the following before crashing:
Fix this by ensuring that btrfs_put_tree_mod_seq() computes the minimum
sequence number and iterates the rbtree while holding the lock
'tree_mod_log_lock' in write mode. Also get rid of the 'tree_mod_seq_lock'
lock, since it is now redundant.
Fixes: bd989ba359f2ac ("Btrfs: add tree modification log functions") Fixes: 097b8a7c9e48e2 ("Btrfs: join tree mod log code with the code holding back delayed refs") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we abort a transaction we have the following sequence
if (!trans->dirty && list_empty(&trans->new_bgs))
return;
WRITE_ONCE(trans->transaction->aborted, err);
The idea being if we didn't modify anything with our trans handle then
we don't really need to abort the whole transaction, maybe the other
trans handles are fine and we can carry on.
However in the case of create_snapshot we add a pending_snapshot object
to our transaction and then commit the transaction. We don't actually
modify anything. sync() behaves the same way, attach to an existing
transaction and commit it. This means that if we have an IO error in
the right places we could abort the committing transaction with our
trans->dirty being not set and thus not set transaction->aborted.
This is a problem because in the create_snapshot() case we depend on
pending->error being set to something, or btrfs_commit_transaction
returning an error.
If we are not the trans handle that gets to commit the transaction, and
we're waiting on the commit to happen we get our return value from
cur_trans->aborted. If this was not set to anything because sync() hit
an error in the transaction commit before it could modify anything then
cur_trans->aborted would be 0. Thus we'd return 0 from
btrfs_commit_transaction() in create_snapshot.
This is a problem because we then try to do things with
pending_snapshot->snap, which will be NULL because we didn't create the
snapshot, and then we'll get a NULL pointer dereference like the
following
In order to fix this we need to make sure anybody who calls
commit_transaction has trans->dirty set so that they properly set the
trans->transaction->aborted value properly so any waiters know bad
things happened.
This was found while I was running generic/475 with my modified
fsstress, it reproduced within a few runs. I ran with this patch all
night and didn't see the problem again.
CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When using the NO_HOLES feature, if we punch a hole into a file and then
fsync it, there are cases where a subsequent fsync will miss the fact that
a hole was punched, resulting in the holes not existing after replaying
the log tree.
Essentially these cases all imply that, tree-log.c:copy_items(), is not
invoked for the leafs that delimit holes, because nothing changed those
leafs in the current transaction. And it's precisely copy_items() where
we currenly detect and log holes, which works as long as the holes are
between file extent items in the input leaf or between the beginning of
input leaf and the previous leaf or between the last item in the leaf
and the next leaf.
First example where we miss a hole:
*) The extent items of the inode span multiple leafs;
*) The punched hole covers a range that affects only the extent items of
the first leaf;
*) The fsync operation is done in full mode (BTRFS_INODE_NEEDS_FULL_SYNC
is set in the inode's runtime flags).
That results in the hole not existing after replaying the log tree.
For example, if the fs/subvolume tree has the following layout for a
particular inode:
If at transaction 11 we punch a hole coverting the range [0, 128K[, we end
up dropping the two extent items from leaf N, but we don't touch the other
leaf, so we end up in the following state:
Leaf N, generation 11:
[ ... INODE_ITEM INODE_REF ]
Leaf N + 1, generation 10:
[ EXTENT_ITEM (128K 64K) ... ]
A full fsync after punching the hole will only process leaf N because it
was modified in the current transaction, but not leaf N + 1, since it
was not modified in the current transaction (generation 10 and not 11).
As a result the fsync will not log any holes, because it didn't process
any leaf with extent items.
Second example where we will miss a hole:
*) An inode as its items spanning 5 (or more) leafs;
*) A hole is punched and it covers only the extents items of the 3rd
leaf. This resulsts in deleting the entire leaf and not touching any
of the other leafs.
So the only leaf that is modified in the current transaction, when
punching the hole, is the first leaf, which contains the inode item.
During the full fsync, the only leaf that is passed to copy_items()
is that first leaf, and that's not enough for the hole detection
code in copy_items() to determine there's a hole between the last
file extent item in the 2nd leaf and the first file extent item in
the 3rd leaf (which was the 4th leaf before punching the hole).
Fix this by scanning all leafs and punch holes as necessary when doing a
full fsync (less common than a non-full fsync) when the NO_HOLES feature
is enabled. The lack of explicit file extent items to mark holes makes it
necessary to scan existing extents to determine if holes exist.
A test case for fstests follows soon.
Fixes: 16e7549f045d33 ("Btrfs: incompatible format change to remove hole extents") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a NFS directory page cache page is removed from the page cache,
its contents are freed through a call to nfs_readdir_clear_array().
To prevent the removal of the page cache entry until after we've
finished reading it, we must take the page lock.
Fixes: 11de3b11e08c ("NFS: Fix a memory leak in nfs_readdir") Cc: stable@vger.kernel.org # v2.6.37+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nfs_readdir_xdr_to_array() must not exit without having initialised
the array, so that the page cache deletion routines can safely
call nfs_readdir_clear_array().
Furthermore, we should ensure that if we exit nfs_readdir_filler()
with an error, we free up any page contents to prevent a leak
if we try to fill the page again.
Fixes: 11de3b11e08c ("NFS: Fix a memory leak in nfs_readdir") Cc: stable@vger.kernel.org # v2.6.37+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On certain cases when response length is less than 32, NVME response data
is supplied inline in IOCB. This is indicated by some combination of state
flags. There was an instance when a high, and incorrect, response length
was indicated causing driver to overrun buffers. Fix this by checking and
limiting the response payload length.
The function crypto_spawn_alg is racy because it drops the lock
before shooting the dying algorithm. The algorithm could disappear
altogether before we shoot it.
This patch fixes it by moving the shooting into the locked section.
AMD Seattle incorporates a non-PCI version of the v3 CCP crypto
accelerator, and this version was left behind when the maximum
RSA modulus size was parameterized in order to support v5 hardware
which supports larger moduli than v3 hardware does. Due to this
oversight, RSA acceleration no longer works at all on these systems.
Fix this by setting the .rsamax property to the appropriate value
for v3 platform hardware.
Fixes: e28c190db66830c0 ("csrypto: ccp - Expand RSA support for a v5 ccp") Cc: Gary R Hook <gary.hook@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 'clean' rule in the samples/bpf Makefile tries to remove backup
files (ending in ~). However, if no such files exist, it will instead try
to remove the user's home directory. While the attempt is mostly harmless,
it does lead to a somewhat scary warning like this:
rm: cannot remove '~': Is a directory
Fix this by using find instead of shell expansion to locate any actual
backup files that need to be removed.
Fixes: b62a796c109c ("samples/bpf: allow make to be run from samples/bpf/ directory") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/157952560126.1683545.7273054725976032511.stgit@toke.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As function_graph tracer can run when RCU is not "watching", it can not be
protected by synchronize_rcu() it requires running a task on each CPU before
it can be freed. Calling schedule_on_each_cpu(ftrace_sync) needs to be used.
Link: https://lore.kernel.org/r/20200205131110.GT2935@paulmck-ThinkPad-P72 Cc: stable@vger.kernel.org Fixes: b9b0c831bed26 ("ftrace: Convert graph filter to use hash tables") Reported-by: "Paul E. McKenney" <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Because the function graph tracer can execute in sections where RCU is not
"watching", the rcu_dereference_sched() for the has needs to be open coded.
This is fine because the RCU "flavor" of the ftrace hash is protected by
its own RCU handling (it does its own little synchronization on every CPU
and does not rely on RCU sched).
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix following instances of sparse error
kernel/trace/ftrace.c:5667:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5813:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5868:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5870:25: error: incompatible types in comparison
Use rcu_dereference_protected to dereference the newly annotated pointer.
Fix following instances of sparse error
kernel/trace/ftrace.c:5664:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5785:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5864:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5866:25: error: incompatible types in comparison
Use rcu_dereference_protected to access the __rcu annotated pointer.
Link: http://lkml.kernel.org/r/20200201072703.17330-1-frextrite@gmail.com Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The function padata_flush_queues is fundamentally broken because
it cannot force padata users to complete the request that is
underway. IOW padata has to passively wait for the completion
of any outstanding work.
As it stands flushing is used in two places. Its use in padata_stop
is simply unnecessary because nothing depends on the queues to
be flushed afterwards.
The other use in padata_replace is more substantial as we depend
on it to free the old pd structure. This patch instead uses the
pd->refcnt to dynamically free the pd structure once all requests
are complete.
Fixes: 2b73b07ab8a4 ("padata: Flush the padata queues actively") Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
When committing state, the function writecache_flush does the following:
1. write metadata (writecache_commit_flushed)
2. flush disk cache (writecache_commit_flushed)
3. wait for data writes to complete (writecache_wait_for_ios)
4. increase superblock seq_count
5. write the superblock
6. flush disk cache
It may happen that at step 3, when we wait for some write to finish, the
disk may report the write as finished, but the write only hit the disk
cache and it is not yet stored in persistent storage. At step 5 we write
the superblock - it may happen that the superblock is written before the
write that we waited for in step 3. If the machine crashes, it may result
in incorrect data being returned after reboot.
In order to fix the bug, we must swap steps 2 and 3 in the above sequence,
so that we first wait for writes to complete and then flush the disk
cache.
Move blk_queue_make_request() to dm.c:alloc_dev() so that
q->make_request_fn is never NULL during the lifetime of a DM device
(even one that is created without a DM table).
Otherwise generic_make_request() will crash simply by doing:
dmsetup create -n test
mount /dev/dm-N /mnt
While at it, move ->congested_data initialization out of
dm.c:alloc_dev() and into the bio-based specific init method.
Reported-by: Stefan Bader <stefan.bader@canonical.com> BugLink: https://bugs.launchpad.net/bugs/1860231 Fixes: ff36ab34583a ("dm: remove request-based logic from make_request_fn wrapper")
Depends-on: c12c9a3c3860c ("dm: various cleanups to md->queue initialization code") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If benbi IV is used in AEAD construction, for example:
cryptsetup luksFormat <device> --cipher twofish-xts-benbi --key-size 512 --integrity=hmac-sha256
the constructor uses wrong skcipher function and crashes:
The space-maps track the reference counts for disk blocks allocated by
both the thin-provisioning and cache targets. There are variants for
tracking metadata blocks and data blocks.
Transactionality is implemented by never touching blocks from the
previous transaction, so we can rollback in the event of a crash.
When allocating a new block we need to ensure the block is free (has
reference count of 0) in both the current and previous transaction.
Prior to this fix we were doing this by searching for a free block in
the previous transaction, and relying on a 'begin' counter to track
where the last allocation in the current transaction was. This
'begin' field was not being updated in all code paths (eg, increment
of a data block reference count due to breaking sharing of a neighbour
block in the same btree leaf).
This fix keeps the 'begin' field, but now it's just a hint to speed up
the search. Instead the current transaction is searched for a free
block, and then the old transaction is double checked to ensure it's
free. Much simpler.
This fixes reports of sm_disk_new_block()'s BUG_ON() triggering when
DM thin-provisioning's snapshots are heavily used.
Reported-by: Eric Wheeler <dm-devel@lists.ewheeler.net> Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dm-zoned is observed to log failed kernel assertions and not work
correctly when operating against a device with a zone size smaller
than 128MiB (e.g. 32768 bits per 4K block). The reason is that the
bitmap size per zone is calculated as zero with such a small zone
size. Fix this problem and also make the code related to zone bitmap
management be able to handle per zone bitmaps smaller than a single
block.
A dm-zoned-tools patch is required to properly format dm-zoned devices
with zone sizes smaller than 128MiB.
Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's an OF helper called of_dma_is_coherent(), which checks if a
device has a "dma-coherent" property to see if the device is coherent
for DMA.
But on some platforms devices are coherent by default, and on some
platforms it's not possible to update existing device trees to add the
"dma-coherent" property.
So add a Kconfig symbol to allow arch code to tell
of_dma_is_coherent() that devices are coherent by default, regardless
of the presence of the property.
Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie.
when the system has a coherent cache.
Fixes: 92ea637edea3 ("of: introduce of_dma_is_coherent() helper") Cc: stable@vger.kernel.org # v3.16+ Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a device is deleted by one of its system-wide resume callbacks
(for example, because it does not appear to be present or accessible
any more) along with its children, the resume of the children may
continue leading to use-after-free errors and other issues
(potentially).
Namely, if the device's children are resumed asynchronously, their
resume may have been scheduled already before the device's callback
runs and so the device may be deleted while dpm_wait_for_superior()
is being executed for them. The memory taken up by the parent device
object may be freed then while dpm_wait() is waiting for the parent's
resume callback to complete, which leads to a use-after-free.
Moreover, the resume of the children is really not expected to
continue after they have been unregistered, so it must be terminated
right away in that case.
To address this problem, modify dpm_wait_for_superior() to check
if the target device is still there in the system-wide PM list of
devices and if so, to increment its parent's reference counter, both
under dpm_list_mtx which prevents device_del() running for the child
from dropping the parent's reference counter prematurely.
If the device is not present in the system-wide PM list of devices
any more, the resume of it cannot continue, so check that again after
dpm_wait() returns, which means that the parent's callback has been
completed, and pass the result of that check to the caller of
dpm_wait_for_superior() to allow it to abort the device's resume
if it is not there any more.
Link: https://lore.kernel.org/linux-pm/1579568452-27253-1-git-send-email-chanho.min@lge.com Reported-by: Chanho Min <chanho.min@lge.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Setting softlimit larger than hardlimit seems meaningless
for disk quota but currently it is allowed. In this case,
there may be a bit of comfusion for users when they run
df comamnd to directory which has project quota.
For example, we set 20M softlimit and 10M hardlimit of
block usage limit for project quota of test_dir(project id 123).
The WARN_ON() that child entry is always on overlay st_dev became wrong
when we allowed this function to update d_ino in non-samefs setup with xino
enabled.
It is not true in case of xino bits overflow on a non-dir inode. Leave the
WARN_ON() only for directories, where assertion is still true.
Fixes: adbf4f7ea834 ("ovl: consistent d_ino for non-samefs with xino") Cc: <stable@vger.kernel.org> # v4.17+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This driver's remove path calls cancel_delayed_work().
However, that function does not wait until the work function
finishes. This could mean that the work function is still
running after the driver's remove function has finished,
which would result in a use-after-free.
Fix by calling cancel_delayed_work_sync(), which ensures that
that the work is properly cancelled, no longer running, and
unable to re-schedule itself.
This issue was detected with the help of Coccinelle.
Cc: stable <stable@vger.kernel.org> Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
scripts/find-unused-docs.sh invokes scripts/kernel-doc to find out if a
source file contains kerneldoc or not.
However, as it passes the no longer supported "-text" option to
scripts/kernel-doc, the latter prints out its help text, causing all
files to be considered containing kerneldoc.
Get rid of these false positives by removing the no longer supported
"-text" option from the scripts/kernel-doc invocation.
Cc: stable@vger.kernel.org # 4.16+ Fixes: b05142675310d2ac ("scripts: kernel-doc: get rid of unused output formats") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20200127093107.26401-1-geert+renesas@glider.be Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to check whether spawn->alg is NULL under lock as otherwise
the algorithm could be removed from under us after we have checked
it and found it to be non-NULL. This could cause us to remove the
spawn from a non-existent list.
Fixes: 7ede5a5ba55a ("crypto: api - Fix crypto_drop_spawn crash...") Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On AXP288 and newer PMICs, bit 7 of AXP20X_VBUS_IPSOUT_MGMT can be set
to prevent using the VBUS input. However, when the VBUS unplugged and
plugged back in, the bit automatically resets to zero.
We need to set the register as volatile to prevent regmap from caching
that bit. Otherwise, regcache will think the bit is already set and not
write the register.
Fixes: cd53216625a0 ("mfd: axp20x: Fix axp288 volatile ranges") Cc: stable@vger.kernel.org Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Current code has assumption that balloon request memory size aligns
with 2MB. But actually Hyper-V doesn't guarantee such alignment. When
balloon driver receives non-aligned balloon request, it produces warning
and balloon up more memory than requested in order to keep 2MB alignment.
Remove the warning and balloon up memory according to actual requested
memory size.
Fixes: f6712238471a ("hv: hv_balloon: avoid memory leak on alloc_error of 2MB memory block") Cc: stable@vger.kernel.org Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ks_pcie_stop_link() function does not clear LTSSM_EN_VAL bit so
link training was not triggered more than once after startup.
In configurations where link can be unstable during early boot,
for example, under low temperature, it will never be established.
Fixes: 0c4ffcfe1fbc ("PCI: keystone: Add TI Keystone PCIe driver") Signed-off-by: Yurii Monakov <monakov.y@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Andrew Murray <andrew.murray@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
The geode AES driver is heavily broken because it stores per-request
state in the transform context. So it will crash or produce the wrong
result if used by any of the many places in the kernel that issue
concurrent requests for the same transform object.
This driver is also implemented using the deprecated blkcipher API,
which makes it difficult to fix, and puts it among the drivers
preventing that API from being removed.
Convert this driver to use the skcipher API, and change it to not store
per-request state in the transform context.
Fixes: 9fe757b0cfce ("[PATCH] crypto: Add support for the Geode LX AES hardware") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Florian Bezdeka <florian@bezdeka.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In ubifs, concurrent execution of writepage and bulk read on the same file
may cause ABBA deadlock, for example (Reproduce method see Link):
Process A(Bulk-read starts from page4) Process B(write page4 back)
vfs_read wb_workfn or fsync
... ...
generic_file_buffered_read write_cache_pages
ubifs_readpage LOCK(page4)
In order to ensure the serialization execution of bulk read, we can't
remove the big lock 'ui->ui_mutex' in ubifs_bulk_read(). Instead, we
allow ubifs_do_bulk_read() to lock page failed by replacing
find_or_create_page(FGP_LOCK) with
pagecache_get_page(FGP_LOCK | FGP_NOWAIT).
UBIFS's implementation of FS_IOC_SETFLAGS fails to preserve existing
inode flags that aren't settable by FS_IOC_SETFLAGS, namely the encrypt
flag. This causes the encrypt flag to be unexpectedly cleared.
Fix it by preserving existing unsettable flags, like ext4 and f2fs do.
If userspace provides an invalid fscrypt no-key filename which encodes a
hash value with any of the UBIFS node type bits set (i.e. the high 3
bits), gracefully report ENOENT rather than triggering ubifs_assert().
The alarmtimer_rtc_add_device() function creates a wakeup source and then
tries to grab a module reference. If that fails the function returns early
with an error code, but fails to remove the wakeup source.
Cleanup this exit path so there is no dangling wakeup source, which is
named 'alarmtime' left allocated which will conflict with another RTC
device that may be registered later.
Fixes: 51218298a25e ("alarmtimer: Ensure RTC module is not unloaded") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Douglas Anderson <dianders@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200109155910.907-2-swboyd@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b41901a2cf06 ("ACPI / battery: Do not export energy_full[_design] on
devices without full_charge_capacity") added support for some (broken)
devices which always report 0 for both design_capacity and
full_charge_capacity.
Since the device that commit was written as a fix for is not reporting any
form of "full" capacity we cannot calculate the value for the
POWER_SUPPLY_PROP_CAPACITY, this is worked around by using an alternative
array of available properties which does not contain this property.
This is necessary because userspace (upower) treats us returning -ENODEV
as 0 and then typically will trigger an emergency shutdown because of that.
Userspace does not do this if the capacity sysfs attribute is not present
at all.
There are two potential problems with that commit:
1) It assumes that both full_charge- and design-capacity are broken at the
same time and only checks if full_charge- is broken.
2) It assumes that this only ever happens for devices which report energy
units rather then charge units.
This commit fixes both issues by only using the alternative
array of available properties if both full_charge- and design-capacity are
broken and by also adding an alternative array of available properties for
devices using mA units.
Fixes: b41901a2cf06 ("ACPI / battery: Do not export energy_full[_design] on devices without full_charge_capacity") Cc: 4.19+ <stable@vger.kernel.org> # 4.19+ Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ThunderSoft TS178 tablet's _BIX implementation reports design_capacity
but not full_charge_capacity.
Before this commit this would cause us to return -ENODEV for the capacity
attribute, which userspace does not like. Specifically upower does this:
if (sysfs_file_exists (native_path, "capacity")) {
percentage = sysfs_get_double (native_path, "capacity");
Where the sysfs_get_double() helper returns 0 when we return -ENODEV,
so the battery always reads 0% if we return -ENODEV.
This commit fixes this by using the design-capacity instead of the
full-charge-capacity when the full-charge-capacity is not available.
Fixes: b41901a2cf06 ("ACPI / battery: Do not export energy_full[_design] on devices without full_charge_capacity") Cc: 4.19+ <stable@vger.kernel.org> # 4.19+ Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b41901a2cf06 ("ACPI / battery: Do not export energy_full[_design]
on devices without full_charge_capacity") added support for some (broken)
devices which always report 0 for both design- and full_charge-capacity.
This assumes that if the capacity is not being reported it is 0. The
ThunderSoft TS178 tablet's _BIX implementation falsifies this assumption.
It reports ACPI_BATTERY_VALUE_UNKNOWN (-1) as full_charge_capacity, which
we treat as a valid value which causes several problems.
This commit fixes this by adding a new ACPI_BATTERY_CAPACITY_VALID() helper
which checks that the value is not 0 and not -1; and using this whenever we
need to test if either design_capacity or full_charge_capacity is valid.
Fixes: b41901a2cf06 ("ACPI / battery: Do not export energy_full[_design] on devices without full_charge_capacity") Cc: 4.19+ <stable@vger.kernel.org> # 4.19+ Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Despite our heuristics to not wrongly export a non working ACPI backlight
interface on desktop machines, we still end up exporting one on desktops
using a motherboard from the MSI MS-7721 series.
I've looked at improving the heuristics, but in this case a quirk seems
to be the only way to solve this.
While at it also add a comment to separate the video_detect_force_none
entries in the video_detect_dmi_table from other type of entries, as we
already do for the other entry types.
Cc: All applicable <stable@vger.kernel.org> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1783786 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code in mmc_spi_initsequence() tries to send a burst with
high chipselect and for this reason hardcodes the device into
SPI_CS_HIGH.
This is not good because the SPI_CS_HIGH flag indicates
logical "asserted" CS not always the physical level. In
some cases the signal is inverted in the GPIO library and
in that case SPI_CS_HIGH is already set, and enforcing
SPI_CS_HIGH again will actually drive it low.
Instead of hard-coding this, toggle the polarity so if the
default is LOW it goes high to assert chipselect but if it
is already high then toggle it low instead.
Cc: Phil Elwell <phil@raspberrypi.org> Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20191204152749.12652-1-linus.walleij@linaro.org Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Code cleanup in the 5.1 kernel changed the array
passed into signing verification on large reads leading
to warning messages being logged when copying files to local
systems from remote.
SMB signature verification returned error = -5
This changeset fixes verification of SMB3 signatures of large
reads.
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In lmb_is_removable(), if a section is not present, it should continue
to test the rest of the sections in the block. But the current code
fails to do so.
Fixes: 51925fb3c5c9 ("powerpc/pseries: Implement memory hotplug remove in the kernel") Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1578632042-12415-1-git-send-email-kernelfans@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit ee71d16d22bb ("s390/mm: make TASK_SIZE independent from the number
of page table levels") changed the logic of TASK_SIZE and also removed the
arch_mmap_check() implementation for s390. This combination has a subtle
effect on how get_unmapped_area() for hugetlbfs pages works. It is now
possible that a user process establishes a hugetlbfs mapping at an address
above 4 TB, without triggering a dynamic pagetable upgrade from 3 to 4
levels.
This is because hugetlbfs mappings will not use mm->get_unmapped_area, but
rather file->f_op->get_unmapped_area, which currently is the generic
implementation of hugetlb_get_unmapped_area() that does not know about s390
dynamic pagetable upgrades, but with the new definition of TASK_SIZE, it
will now allow mappings above 4 TB.
Subsequent access to such a mapped address above 4 TB will result in a page
fault loop, because the CPU cannot translate such a large address with 3
pagetable levels. The fault handler will try to map in a hugepage at the
address, but due to the folded pagetable logic it will end up with creating
entries in the 3 level pagetable, possibly overwriting existing mappings,
and then it all repeats when the access is retried.
Apart from the page fault loop, this can have various nasty effects, e.g.
kernel panic from one of the BUG_ON() checks in memory management code,
or even data loss if an existing mapping gets overwritten.
Fix this by implementing HAVE_ARCH_HUGETLB_UNMAPPED_AREA support for s390,
providing an s390 version for hugetlb_get_unmapped_area() with pagetable
upgrade support similar to arch_get_unmapped_area(), which will then be
used instead of the generic version.
Fixes: ee71d16d22bb ("s390/mm: make TASK_SIZE independent from the number of page table levels") Cc: <stable@vger.kernel.org> # 4.12+ Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 92b34a976348 ("MIPS: boot: add missing targets for vmlinux.*.its")
fixed constant rebuild of *.its files on every make invocation, but due
to typo ("lzmo") it made no sense for vmlinux.lzma.its.
Fixes: 92b34a976348 ("MIPS: boot: add missing targets for vmlinux.*.its") Cc: <stable@vger.kernel.org> # v4.19+ Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
[paulburton@kernel.org: s/invokation/invocation/] Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Rob Herring <robh@kernel.org> Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Typo is present in kernel tree since the introduction of relocatable
kernel support in commit e818fac595ab ("MIPS: Generate relocation table
when CONFIG_RELOCATABLE"), but the relocation scripts were moved to
Makefile.postlink later with commit 44079d3509ae ("MIPS: Use
Makefile.postlink to insert relocations into vmlinux").
Fixes: 44079d3509ae ("MIPS: Use Makefile.postlink to insert relocations into vmlinux") Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
[paulburton@kernel.org: Fixup commit references in commit message.] Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Rob Herring <robh@kernel.org> Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On AArch64 you can do a sign-extended load to either a 32-bit or 64-bit
register, and we should only sign extend the register up to the width of
the register as specified in the operation (by using the 32-bit Wn or
64-bit Xn register specifier).
As it turns out, the architecture provides this decoding information in
the SF ("Sixty-Four" -- how cute...) bit.
Let's take advantage of this with the usual 32-bit/64-bit header file
dance and do the right thing on AArch64 hosts.
Confusingly, there are three SPSR layouts that a kernel may need to deal
with:
(1) An AArch64 SPSR_ELx view of an AArch64 pstate
(2) An AArch64 SPSR_ELx view of an AArch32 pstate
(3) An AArch32 SPSR_* view of an AArch32 pstate
When the KVM AArch32 support code deals with SPSR_{EL2,HYP}, it's either
dealing with #2 or #3 consistently. On arm64 the PSR_AA32_* definitions
match the AArch64 SPSR_ELx view, and on arm the PSR_AA32_* definitions
match the AArch32 SPSR_* view.
However, when we inject an exception into an AArch32 guest, we have to
synthesize the AArch32 SPSR_* that the guest will see. Thus, an AArch64
host needs to synthesize layout #3 from layout #2.
This patch adds a new host_spsr_to_spsr32() helper for this, and makes
use of it in the KVM AArch32 support code. For arm64 we need to shuffle
the DIT bit around, and remove the SS bit, while for arm we can use the
value as-is.
I've open-coded the bit manipulation for now to avoid having to rework
the existing PSR_* definitions into PSR64_AA32_* and PSR32_AA32_*
definitions. I hope to perform a more thorough refactoring in future so
that we can handle pstate view manipulation more consistently across the
kernel tree.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200108134324.46500-4-mark.rutland@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When KVM injects an exception into a guest, it generates the CPSR value
from scratch, configuring CPSR.{M,A,I,T,E}, and setting all other
bits to zero.
This isn't correct, as the architecture specifies that some CPSR bits
are (conditionally) cleared or set upon an exception, and others are
unchanged from the original context.
This patch adds logic to match the architectural behaviour. To make this
simple to follow/audit/extend, documentation references are provided,
and bits are configured in order of their layout in SPSR_EL2. This
layout can be seen in the diagram on ARM DDI 0487E.a page C5-426.
Note that this code is used by both arm and arm64, and is intended to
fuction with the SPSR_EL2 and SPSR_HYP layouts.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200108134324.46500-3-mark.rutland@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When KVM injects an exception into a guest, it generates the PSTATE
value from scratch, configuring PSTATE.{M[4:0],DAIF}, and setting all
other bits to zero.
This isn't correct, as the architecture specifies that some PSTATE bits
are (conditionally) cleared or set upon an exception, and others are
unchanged from the original context.
This patch adds logic to match the architectural behaviour. To make this
simple to follow/audit/extend, documentation references are provided,
and bits are configured in order of their layout in SPSR_EL2. This
layout can be seen in the diagram on ARM DDI 0487E.a page C5-429.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200108134324.46500-2-mark.rutland@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the driver has disabled interrupt support for Tangier but
actually interrupt works just fine if the command is not written twice
in a row. Also we need to ack the interrupt in the handler.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
boundary->width and boundary->height are sizes relative to
boundary->left and boundary->top coordinates, but they were not being
taken into consideration to adjust r->left and r->top, leading to the
following error:
Consider the follow as initial values for boundary and r:
Fix this by considering top/left coordinates from boundary.
Fixes: ac49de8c49d7 ("[media] v4l2-rect.h: new header with struct v4l2_rect helper functions") Signed-off-by: Helen Koike <helen.koike@collabora.com> Cc: <stable@vger.kernel.org> # for v4.7 and up Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The do_video_ioctl() compat handler converts the compat command
codes into the native ones before processing further, but this
causes problems for 32-bit user applications that pass a command
code that matches a 64-bit native number, which will then be
handled the same way.
Specifically, this breaks VIDIOC_DQEVENT_TIME from user space
applications with 64-bit time_t, as the structure layout is
the same as the native 64-bit layout on many architectures
(x86 being the notable exception).
Change the handler to use the converted command code only for
passing into the native ioctl handler, not for deciding on the
conversion, in order to make the compat behavior match the
native behavior.
Actual support for the 64-bit time_t version of VIDIOC_DQEVENT_TIME
and other commands still needs to be added in a separate patch.
After DMA is complete, and the device and CPU caches are synchronized,
it's still required to mark the CPU pages as dirty, if the data was
coming from the device. However, this driver was just issuing a bare
put_page() call, without any set_page_dirty*() call.
Fix the problem, by calling set_page_dirty_lock() if the CPU pages were
potentially receiving data from the device.
Link: http://lkml.kernel.org/r/20200107224558.2362728-11-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: <stable@vger.kernel.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"), the
semantic of move_pages() has changed to return the number of
non-migrated pages if they were result of a non-fatal reasons (usually a
busy page).
This was an unintentional change that hasn't been noticed except for LTP
tests which checked for the documented behavior.
There are two ways to go around this change. We can even get back to
the original behavior and return -EAGAIN whenever migrate_pages is not
able to migrate pages due to non-fatal reasons. Another option would be
to simply continue with the changed semantic and extend move_pages
documentation to clarify that -errno is returned on an invalid input or
when migration simply cannot succeed (e.g. -ENOMEM, -EBUSY) or the
number of pages that couldn't have been migrated due to ephemeral
reasons (e.g. page is pinned or locked for other reasons).
This patch implements the second option because this behavior is in
place for some time without anybody complaining and possibly new users
depending on it. Also it allows to have a slightly easier error
handling as the caller knows that it is worth to retry when err > 0.
But since the new semantic would be aborted immediately if migration is
failed due to ephemeral reasons, need include the number of
non-attempted pages in the return value too.
Link: http://lkml.kernel.org/r/1580160527-109104-1-git-send-email-yang.shi@linux.alibaba.com Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Cc: <stable@vger.kernel.org> [4.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The daxctl unit test for the dax_kmem driver currently triggers the
(false positive) lockdep splat below. It results from the fact that
remove_memory_block_devices() is invoked under the mem_hotplug_lock()
causing lockdep entanglements with cpu_hotplug_lock() and sysfs (kernfs
active state tracking). It is a false positive because the sysfs
attribute path triggering the memory remove is not the same attribute
path associated with memory-block device.
sysfs_break_active_protection() is not applicable since there is no real
deadlock conflict, instead move memory-block device removal outside the
lock. The mem_hotplug_lock() is not needed to synchronize the
memory-block device removal vs the page online state, that is already
handled by lock_device_hotplug(). Specifically, lock_device_hotplug()
is sufficient to allow try_remove_memory() to check the offline state of
the memblocks and be assured that any in progress online attempts are
flushed / blocked by kernfs_drain() / attribute removal.
The add_memory() path safely creates memblock devices under the
mem_hotplug_lock(). There is no kernfs active state synchronization in
the memblock device_register() path, so nothing to fix there.
This change is only possible thanks to the recent change that refactored
memory block device removal out of arch_remove_memory() (commit 4c4b7f9ba948 "mm/memory_hotplug: remove memory block devices before
arch_remove_memory()"), and David's due diligence tracking down the
guarantees afforded by kernfs_drain(). Not flagged for -stable since
this only impacts ongoing development and lockdep validation, not a
runtime issue.
======================================================
WARNING: possible circular locking dependency detected
5.5.0-rc3+ #230 Tainted: G OE
------------------------------------------------------
lt-daxctl/6459 is trying to acquire lock: ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80
but task is already holding lock: ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
I overlooked that some fields are words and need the converts from
LE in the recently added USB descriptor validation code.
This patch fixes those with the proper macro usages.
Currently ecm->notify_req is used to flag when a request is in-flight.
ecm->notify_req is set to NULL and when a request completes it is
subsequently reset.
This is fundamentally buggy in that the unbind logic of the ECM driver will
unconditionally free ecm->notify_req leading to a NULL pointer dereference.
Currently ncm->notify_req is used to flag when a request is in-flight.
ncm->notify_req is set to NULL and when a request completes it is
subsequently reset.
This is fundamentally buggy in that the unbind logic of the NCM driver will
unconditionally free ncm->notify_req leading to a NULL pointer dereference.
Fixes: 40d133d7f542 ("usb: gadget: f_ncm: convert to new function interface with backward compatibility") Cc: stable <stable@vger.kernel.org> Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the implementation of brcmf_usbdev_qinit() the allocated memory for
reqs is leaking if usb_alloc_urb() fails. Release reqs in the error
handling path.
Fixes: 71bb244ba2fd ("brcm80211: fmac: add USB support for bcm43235/6/8 chipsets") Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
read to 0xffffffff85a7f190 of 8 bytes by task 10 on cpu 1:
rcu_gp_fqs_check_wake kernel/rcu/tree.c:1556 [inline]
rcu_gp_fqs_check_wake+0x93/0xd0 kernel/rcu/tree.c:1546
rcu_gp_fqs_loop+0x36c/0x580 kernel/rcu/tree.c:1611
rcu_gp_kthread+0x143/0x220 kernel/rcu/tree.c:1768
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 10 Comm: rcu_preempt Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com>
[ paulmck: Added another READ_ONCE() for RCU CPU stall warnings. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>