livepatch module author can pass module name/old function name with more
than the defined character limit. With obj->name length greater than
MODULE_NAME_LEN, the livepatch module gets loaded but waits forever on
the module specified by obj->name to be loaded. It also populates a /sys
directory with an untruncated object name.
In the case of funcs->old_name length greater then KSYM_NAME_LEN, it
would not match against any of the symbol table entries. Instead loop
through the symbol table comparing them against a nonexisting function,
which can be avoided.
The same issues apply, to misspelled/incorrect names. At least gatekeep
the modules with over the limit string length, by checking for their
length during livepatch module registration.
------------[ cut here ]------------
IRQs not enabled as expected
WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: tick_nohz_idle_enter+0x44/0x8c
Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
Call Trace:
do_idle+0x33/0x202
cpu_startup_entry+0x61/0x63
start_secondary+0x18e/0x1ed
startup_32_smp+0x164/0x168
irq event stamp: 18773830
hardirqs last enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10
hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10
softirqs last enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf
softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b
---[ end trace b7c64aa79e17954a ]---
After a bit of debugging, I found what was happening. This would trigger
when performing "perf" with a high NMI interrupt rate, while enabling and
disabling function tracer. Ftrace uses breakpoints to convert the nops at
the start of functions to calls to the function trampolines. The breakpoint
traps disable interrupts and this makes calls into lockdep via the
trace_hardirqs_off_thunk in the entry.S code. What happens is the following:
do_idle {
[interrupts enabled]
<interrupt> [interrupts disabled]
TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET
test if pt_regs say return to interrupts enabled [yes]
TRACE_IRQS_ON [lockdep says irqs are on]
<nmi>
nmi_enter() {
printk_nmi_enter() [traced by ftrace]
[ hit ftrace breakpoint ]
<breakpoint exception>
TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET [return from breakpoint]
test if pt_regs say interrupts enabled [no]
[iret back to interrupt]
[iret back to code]
tick_nohz_idle_enter() {
lockdep_assert_irqs_enabled() [lockdep say no!]
Although interrupts are indeed enabled, lockdep thinks it is not, and since
we now do asserts via lockdep, it gives a false warning. The issue here is
that printk_nmi_enter() is called before lockdep_off(), which disables
lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
confused.
Cc: stable@vger.kernel.org Fixes: 42a0bb3f71383 ("printk/nmi: generic solution for safe printk in NMI") Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, when one echo's in 1 into tracing_on, the current tracer's
"start()" function is executed, even if tracing_on was already one. This can
lead to strange side effects. One being that if the hwlat tracer is enabled,
and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's
start() function is called again which will recreate another kernel thread,
and make it unable to remove the old one.
Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de Cc: stable@vger.kernel.org Fixes: 2df8f8a6a897e ("tracing: Fix regression with irqsoff tracer and tracing_on file") Reported-by: Erica Bugden <erica.bugden@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do not set the system power-off callback and omap power-off rtc pointer
until we're done setting up our device to avoid leaving stale pointers
around after a late probe error.
Fixes: 97ea1906b3c2 ("rtc: omap: Support ext_wakeup configuration") Cc: stable <stable@vger.kernel.org> # 4.9 Cc: Marcin Niestroj <m.niestroj@grinn-global.com> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Tony Lindgren <tony@atomide.com> Reviewed-by: Marcin Niestroj <m.niestroj@grinn-global.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, when all modules, including VMCI and VMware balloon are built
into the kernel, the initialization of the balloon happens before the
VMCI is probed. As a result, the balloon fails to initialize the VMCI
doorbell, which it uses to get asynchronous requests for balloon size
changes.
The problem can be seen in the logs, in the form of the following
message:
"vmw_balloon: failed to initialize vmci doorbell"
The driver would work correctly but slightly less efficiently, probing
for requests periodically. This patch changes the balloon to be
initialized using late_initcall() instead of module_init() to address
this issue. It does not address a situation in which VMCI is built as a
module and the balloon is built into the kernel.
When vmballoon_vmci_init() sets a doorbell using VMCI_DOORBELL_SET, for
some reason it does not consider the status and looks at the result.
However, the hypervisor does not update the result - it updates the
status. This might cause VMCI doorbell not to be enabled, resulting in
degraded performance.
If the hypervisor sets 2MB batching is on, while batching is cleared,
the balloon code breaks. In this case the legacy mechanism is used with
2MB page. The VM would report a 2MB page is ballooned, and the
hypervisor would only take the first 4KB.
While the hypervisor should not report such settings, make the code more
robust by not enabling 2MB support without batching.
When balloon batching is not supported by the hypervisor, the guest
frame number (GFN) must fit in 32-bit. However, due to a bug, this check
was mistakenly ignored. In practice, when total RAM is greater than
16TB, the balloon does not work currently, making this bug unlikely to
happen.
When importing the latest copy of the kernel headers into Bionic,
Christpher and Elliott noticed that the eventpoll.h casts were not
wrapped in (). As it is, clang complains about macros without
surrounding (), so this makes it a pain for userspace tools.
So fix it up by adding another () pair, and make them line up purty by
using tabs.
Fixes: 65aaf87b3aa2 ("add EPOLLNVAL, annotate EPOLL... and event_poll->event") Reported-by: Christopher Ferris <cferris@google.com> Reported-by: Elliott Hughes <enh@google.com> Cc: stable <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously, extcon used the spinlock before calling the notifier_call_chain
to prevent the scheduled out of task and to prevent the notification delay.
When spinlock is locked for sending the notification, deadlock issue
occured on the side of extcon consumer device. To fix this issue,
extcon consumer device should always use the work. it is always not
reasonable to use work.
To fix this issue on extcon consumer device, release locking when sending
the notification of connector state.
Fixes: ab11af049f88 ("extcon: Add the synchronization extcon APIs to support the notification") Cc: stable@vger.kernel.org Cc: Roger Quadros <rogerq@ti.com> Cc: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A sysfs write callback function needs to either return the number of
consumed characters or an error.
The ad952x_store() function currently returns 0 if the input value was "0",
this will signal that no characters have been consumed and the function
will be called repeatedly in a loop indefinitely. Fix this by returning
number of supplied characters to indicate that the whole input string has
been consumed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Fixes: cd1678f96329 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator") Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the displayed phase for the ad9523 driver. Currently the most
significant decimal place is dropped and all other digits are shifted one
to the left. This is due to a multiplication by 10, which is not necessary,
so remove it.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Fixes: cd1678f9632 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator") Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The IIO_CHAN_INFO_LOW_PASS_FILTER_3DB_FREQUENCY case is missing a
return and will fall through to the default case and errorenously
return -EINVAL.
Fix this by adding in missing *return ret*.
Fixes: 626f971b5b07 ("staging:iio:accel:sca3000 Add write support to the low pass filter control") Reported-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before setting channel->rescind in vmbus_rescind_cleanup(), we should make
sure the channel callback won't run any more, otherwise a high-level
driver like pci_hyperv, which may be infinitely waiting for the host VSP's
response and notices the channel has been rescinded, can't safely give
up: e.g., in hv_pci_protocol_negotiation() -> wait_for_response(), it's
unsafe to exit from wait_for_response() and proceed with the on-stack
variable "comp_pkt" popped. The issue was originally spotted by
Michael Kelley <mikelley@microsoft.com>.
In vmbus_close_internal(), the patch also minimizes the range protected by
disabling/enabling channel->callback_event: we don't really need that for
the whole function.
Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Cc: stable@vger.kernel.org Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Michael Kelley <mikelley@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I didn't really hit a real bug, but just happened to spot the bug:
we have decreased the counter at the beginning of vmbus_process_offer(),
so we mustn't decrease it again.
Fixes: 6f3d791f3006 ("Drivers: hv: vmbus: Fix rescind handling issues") Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: stable@vger.kernel.org Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Stable <stable@vger.kernel.org> # 4.14 and above Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Looking in uart_port_startup(), it seems that circ->buf (state->xmit.buf)
protected by the "per-port mutex", which based on uart_port_check() is
state->port.mutex. Indeed, the lock acquired in uart_put_char() is
uport->lock, i.e. not the same lock.
Anyway, since the lock is not acquired, if uart_shutdown() is called, the
last chunk of that function may release state->xmit.buf before its assigned
to null, and cause the race above.
To fix it, let's lock uport->lock when allocating/deallocating
state->xmit.buf in addition to the per-port mutex.
v2: switch to locking uport->lock on allocation/deallocation instead of
locking the per-port mutex in uart_put_char. Note that since
uport->lock is a spin lock, we have to switch the allocation to
GFP_ATOMIC.
v3: move the allocation outside the lock, so we can switch back to
GFP_KERNEL
dm-crypt should only increase device limits, it should not decrease them.
This fixes a bug where the user could creates a crypt device with 1024
sector size on the top of scsi device that had 4096 logical block size.
The limit 4096 would be lost and the user could incorrectly send
1024-I/Os to the crypt device.
The 'dirty' state for a cache block changes far too frequently for us
to keep updating it on the fly. So we treat it as a hint. In normal
operation it will be written when the dm device is suspended. If the
system crashes all cache blocks will be assumed dirty when restarted.
This got broken in commit f177940a8091 ("dm cache metadata: switch to
using the new cursor api for loading metadata") in 4.9, which removed
the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk
flag) when loading cache blocks. This results in data corruption on an
unclean shutdown with dirty cache blocks on the fast device. After the
crash those blocks are considered clean and may get evicted from the
cache at any time. This can be demonstrated by doing a lot of reads
to trigger individual evictions, but uncache is more predictable:
### Disable auto-activation in lvm.conf to be able to do uncache in
### time (i.e. see uncache doing flushing) when the fix is applied.
# vgchange -ay vg_cache
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
# lvconvert --uncache vg_cache/lv_slowdev
Flushing 0 blocks for cache vg_cache/lv_slowdev.
Logical volume "lv_cachedev" successfully removed
Logical volume vg_cache/lv_slowdev is not cached.
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ 0fe00010: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................
This is the case with both v1 and v2 cache pool metatata formats.
Cc: stable@vger.kernel.org Fixes: f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
policy_hint_size starts as 0 during __write_initial_superblock(). It
isn't until the policy is loaded that policy_hint_size is set in-core
(cmd->policy_hint_size). But it never got recorded in the on-disk
superblock because __commit_transaction() didn't deal with transfering
the in-core cmd->policy_hint_size to the on-disk superblock.
The in-core cmd->policy_hint_size gets initialized by metadata_open()'s
__begin_transaction_flags() which re-reads all superblock fields.
Because the superblock's policy_hint_size was never properly stored, when
the cache was created, hints_array_available() would always return false
when re-activating a previously created cache. This means
__load_mappings() always considered the hints invalid and never made use
of the hints (these hints served to optimize).
Another detremental side-effect of this oversight is the cache_check
utility would fail with: "invalid hint width: 0"
Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now both check_for_space() and do_no_space_timeout() will read & write
pool->pf.error_if_no_space. If these functions run concurrently, as
shown in the following case, the default setting of "queue_if_no_space"
can get lost.
precondition:
* error_if_no_space = false (aka "queue_if_no_space")
* pool is in Out-of-Data-Space (OODS) mode
* no_space_timeout worker has been queued
CPU 0: CPU 1:
// delete a thin device
process_delete_mesg()
// check_for_space() invoked by commit()
set_pool_mode(pool, PM_WRITE)
pool->pf.error_if_no_space = \
pt->requested_pf.error_if_no_space
// timeout, pool is still in OODS mode
do_no_space_timeout
// "queue_if_no_space" config is lost
pool->pf.error_if_no_space = true
pool->pf.mode = new_mode
Fix it by stopping no_space_timeout worker when switching to write mode.
Fixes: bcc696fac11f ("dm thin: stay in out-of-data-space mode once no_space_timeout expires") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The patch adds the flush in p9_mux_poll_stop() as it the function used by
p9_conn_destroy(), in turn called by p9_fd_close() to stop the async
polling associated with the data regarding the connection.
Link: http://lkml.kernel.org/r/20180720092730.27104-1-tomasbortoli@gmail.com Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Reported-by: syzbot+39749ed7d9ef6dfb23f6@syzkaller.appspotmail.com
To: Eric Van Hensbergen <ericvh@gmail.com>
To: Ron Minnich <rminnich@sandia.gov>
To: Latchesar Ionkov <lucho@ionkov.net> Cc: Yiwen Jiang <jiangyiwen@huwei.com> Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The p9_client_version() does not initialize the version pointer. If the
call to p9pdu_readf() returns an error and version has not been allocated
in p9pdu_readf(), then the program will jump to the "error" label and will
try to free the version pointer. If version is not initialized, free()
will be called with uninitialized, garbage data and will provoke a crash.
Link: http://lkml.kernel.org/r/20180709222943.19503-1-tomasbortoli@gmail.com Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Reported-by: syzbot+65c6b72f284a39d416b4@syzkaller.appspotmail.com Reviewed-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In my testing, v9fs_fid_xattr_set will return successfully even if the
backend ext4 filesystem has no space to store xattr key-value. That will
cause inconsistent behavior between front end and back end. The reason is
that lsetxattr will be triggered by p9_client_clunk, and unfortunately we
did not catch the error. This patch will catch the error to notify upper
caller.
p9_client_clunk (in 9p)
p9_client_rpc(clnt, P9_TCLUNK, "d", fid->fid);
v9fs_clunk (in qemu)
put_fid
free_fid
v9fs_xattr_fid_clunk
v9fs_co_lsetxattr
s->ops->lsetxattr
ext4_xattr_user_set (in host ext4 filesystem)
Link: http://lkml.kernel.org/r/5B57EACC.2060900@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Every function that returns COMPST_ERROR must set wqe->status to another
value than IB_WC_SUCCESS before returning COMPST_ERROR. Fix the only code
path for which this is not yet the case.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: <stable@vger.kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since there are adapters that have four ports, increase the size of
the srpt_device.port[] array. This patch avoids that the following
warning is hit with quad port Chelsio adapters:
Once a target session has been allocated, if an error occurs, the session
must be freed. Since it is not safe to call blocking code from the context
of an connection manager callback, trigger target session release in this
case by calling srpt_close_ch().
It is incorrect to depend on set_id value to know if counters were
allocated or not. set_id_valid field is set to true when counters
were allocated. Therefore, use set_id_valid while deciding to
free counters.
Cc: <stable@vger.kernel.org> # 4.15 Fixes: aac4492ef23a ("IB/mlx5: Update counter implementation for dual port RoCE") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a process exits without doing proper cleanup, there's a window
where an opencapi device can try to access the memory of the dying
process and may trigger a page fault. That's an expected scenario and
the ocxl driver holds a reference on the mm_struct of the process
until the opencapi device is notified of the process exiting.
However, if mm_users is already at 0, i.e. the address space of the
process has already been destroyed, the driver shouldn't try resolving
the page fault, as it will fail, but it can also try accessing already
freed data.
It is fixed by only calling the bottom half of the page fault handler
if mm_users is greater than 0 and get a reference on mm_users instead
of mm_count. Otherwise, we can safely return a translation fault to
the device, as its associated memory context is being removed. The
opencapi device will be properly cleaned up shortly after when closing
the file descriptors.
Function atomic_inc_unless_negative() returns a bool to indicate
success/failure. However cxl_adapter_context_get() wrongly compares
the return value against '>=0' which will always be true. The patch
fixes this comparison to '==0' there by also fixing this compile time
warning:
drivers/misc/cxl/main.c:290 cxl_adapter_context_get()
warn: 'atomic_inc_unless_negative(&adapter->contexts_num)' is unsigned
Fixes: 70b565bbdb91 ("cxl: Prevent adapter reset if an active context exists") Cc: stable@vger.kernel.org # v4.9+ Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The generic code is racy when multiple children of a PCI bridge try to
enable it simultaneously.
This leads to drivers trying to access a device through a
not-yet-enabled bridge, and this EEH errors under various
circumstances when using parallel driver probing.
There is work going on to fix that properly in the PCI core but it
will take some time.
x86 gets away with it because (outside of hotplug), the BIOS enables
all the bridges at boot time.
This patch does the same thing on powernv by enabling all bridges that
have child devices at boot time, thus avoiding subsequent races. It's
suitable for backporting to stable and distros, while the proper PCI
fix will probably be significantly more invasive.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper
for other platforms") replaced generic pte_access_permitted() by an
arch specific one.
The generic one is defined as
(pte_present(pte) && (!(write) || pte_write(pte)))
The arch specific one is open coded checking that _PAGE_USER and
_PAGE_WRITE (_PAGE_RW) flags are set, but lacking to check that
_PAGE_RO and _PAGE_PRIVILEGED are unset, leading to a useless test
on targets like the 8xx which defines _PAGE_RW and _PAGE_USER as 0.
Commit 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted
for hugetlb access check") replaced some tests performed with
pte helpers by a call to pte_access_permitted(), leading to the same
issue.
This patch rewrites powerpc/nohash pte_access_permitted()
using pte helpers.
Fixes: 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for other platforms") Fixes: 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb access check") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
execute-only key is allocated dynamically. This is a problem. When a
thread implicitly creates an execute-only key, and resets the UAMOR
for that key, the UAMOR value does not percolate to all the other
threads. Any other thread may ignorantly change the permissions on the
key. This can cause the key to be not execute-only for that thread.
Preallocate the execute-only key and ensure that no thread can change
the permission of the key, by resetting the corresponding bit in
UAMOR.
Total number of pkeys calculation is off by 1. Fix it.
Fixes: 4fb158f65ac5 ("powerpc: track allocation status of all pkeys") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Key allocation and deallocation has the side effect of programming the
UAMOR/AMR/IAMR registers. This is wrong, since its the responsibility of
the application and not that of the kernel, to modify the permission on
the key.
Do not modify the pkey registers at key allocation/deallocation.
This patch also fixes a bug where a sys_pkey_free() resets the UAMOR
bits of the key, thus making its permissions unmodifiable from user
space. Later if the same key gets reallocated from a different thread
this thread will no longer be able to change the permissions on the key.
Deny all permissions on all keys, with some exceptions. pkey-0 must
allow all permissions, or else everything comes to a screaching halt.
Execute-only key must allow execute permission.
Currently in a multithreaded application, a key allocated by one
thread is not usable by other threads. By "not usable" we mean that
other threads are unable to change the access permissions for that
key for themselves.
When a new key is allocated in one thread, the corresponding UAMOR
bits for that thread get enabled, however the UAMOR bits for that key
for all other threads remain disabled.
Other threads have no way to set permissions on the key, and the
current default permissions are that read/write is enabled for all
keys, which means the key has no effect for other threads. Although
that may be the desired behaviour in some circumstances, having all
threads able to control their permissions for the key is more
flexible.
The current behaviour also differs from the x86 behaviour, which is
problematic for users.
To fix this, enable the UAMOR bits for all keys, at process
creation (in start_thread(), ie exec time). Since the contents of
UAMOR are inherited at fork, all threads are capable of modifying the
permissions on any key.
This is technically an ABI break on powerpc, but pkey support is fairly
new on powerpc and not widely used, and this brings us into
line with x86.
Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Tested-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[mpe: Reword some of the changelog] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.
Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.
Severe Machine check interrupt [Recovered]
NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
Initiator: CPU
Error type: SLB [Multihit]
Effective address: d00000000ca70000
cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
pc: c0000000009694c0: vsnprintf+0x80/0x480
lr: c0000000009698e0: vscnprintf+0x20/0x60
sp: c0000000fc777830
msr: 8000000002009033
dar: a803a30c000000d0
current = 0xc00000000bc9ef00
paca = 0xc00000001eca5c00 softe: 3 irq_happened: 0x01
pid = 8860, comm = insmod
vscnprintf+0x20/0x60
vprintk_emit+0xb4/0x4b0
vprintk_func+0x5c/0xd0
printk+0x38/0x4c
init_module+0x1c0/0x338 [bork_kernel]
do_one_initcall+0x54/0x230
do_init_module+0x8c/0x248
load_module+0x12b8/0x15b0
sys_finit_module+0xa8/0x110
system_call+0x58/0x6c
--- Exception: c00 (System Call) at 00007fff8bda0644
SP (7fffdfbfe980) is in userspace
The page table fragment allocator uses the main page refcount racily
with respect to speculative references. A customer observed a BUG due
to page table page refcount underflow in the fragment allocator. This
can be caused by the fragment allocator set_page_count stomping on a
speculative reference, and then the speculative failure handler
decrements the new reference, and the underflow eventually pops when
the page tables are freed.
Fix this by using a dedicated field in the struct page for the page
table fragment allocator.
Crash memory ranges is an array of memory ranges of the crashing kernel
to be exported as a dump via /proc/vmcore file. The size of the array
is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS
value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since
commit 142b45a72e22 ("memblock: Add array resizing support").
On large memory systems with a few DLPAR operations, the memblock memory
regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such
systems, registering fadump results in crash or other system failures
like below:
as array index overflow is not checked for while setting up crash memory
ranges causing memory corruption. To resolve this issue, dynamically
allocate memory for crash memory ranges and resize it incrementally,
in units of pagesize, on hitting array size limit.
Fixes: 2df173d9e85d ("fadump: Initialize elfcore header and add PT_LOAD program headers.") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: Just use PAGE_SIZE directly, fixup variable placement] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The split of .system_keyring into .builtin_trusted_keys and
.secondary_trusted_keys broke kexec, thereby preventing kernels signed by
keys which are now in the secondary keyring from being kexec'd.
Fix this by passing VERIFY_USE_SECONDARY_KEYRING to
verify_pefile_signature().
Fixes: d3bfe84129f6 ("certs: Add a secondary system keyring that can be added to dynamically") Signed-off-by: Yannik Sembritzki <yannik@sembritzki.me> Signed-off-by: David Howells <dhowells@redhat.com> Cc: kexec@lists.infradead.org Cc: keyrings@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a potential execution path in which function
platform_get_resource() returns NULL. If this happens,
we will end up having a NULL pointer dereference.
Fix this by replacing devm_ioremap with devm_ioremap_resource,
which has the NULL check and the memory region request.
This code was detected with the help of Coccinelle.
Cc: stable@vger.kernel.org Fixes: f700e84f417b ("mailbox: Add support for APM X-Gene platform mailbox driver") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The v4l uAPI documentation [0] makes clear that in the case of interlaced
video (i.e: field is V4L2_FIELD_ALTERNATE) the height refers to the number
of lines in the field and not the number of lines in the full frame (which
is twice the field height for interlaced formats).
So the original height calculation was correct, and it shouldn't had been
changed by the mentioned commit.
Fixes: 0866df8dffd5 ("[media] tvp5150: fix pad format frame height") Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Cc: <stable@vger.kernel.org> # for v4.12 and up Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Prior to commit 573185cc7e64 ("mmc: core: Invoke sdio func driver's PM
callbacks from the sdio bus"), the MMC core used to call into the power
management functions of SDIO clients itself and removed the card if the
return code was non-zero. IOW, the mmc handled errors gracefully and didn't
upchain them to the pm core.
Since this change, the mmc core relies on generic power management
functions which treat all errors as a reason to cancel the suspend
immediately. This causes suspend attempts to fail when the libertas
driver is loaded.
To fix this, power down the card explicitly in if_sdio_suspend() when we
know we're about to lose power and return success. Also set a flag in these
cases, and power up the card again in if_sdio_resume().
Fixes: 573185cc7e64 ("mmc: core: Invoke sdio func driver's PM callbacks from the sdio bus") Cc: <stable@vger.kernel.org> Signed-off-by: Daniel Mack <daniel@zonque.org> Reviewed-by: Chris Ball <chris@printf.net> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes the BUG_ON spuriously triggering under the following
circumstances:
* reservation_object_reserve_shared is called with shared_count ==
shared_max - 1, so obj->staged is freed in preparation of an in-place
update.
* reservation_object_add_shared_fence is called with the first fence,
after which shared_count == shared_max.
* reservation_object_add_shared_fence is called with a follow-up fence
from the same context.
In the second reservation_object_add_shared_fence call, the BUG_ON
triggers. However, nothing bad would happen in
reservation_object_add_shared_inplace, since both fences are from the
same context, so they only occupy a single slot.
Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).
apparmor_secid_to_secctx() has a bad debug statement tripping on a
condition handle by the code. When kconfig SECURITY_APPARMOR_DEBUG is
enabled the debug WARN_ON will trip when **secdata is NULL resulting
in the following trace.
------------[ cut here ]------------
AppArmor WARN apparmor_secid_to_secctx: ((!secdata)):
WARNING: CPU: 0 PID: 14826 at security/apparmor/secid.c:82 apparmor_secid_to_secctx+0x2b5/0x2f0 security/apparmor/secid.c:82
Kernel panic - not syncing: panic_on_warn set ...
CC: <stable@vger.kernel.org> #4.18 Fixes: c092921219d2 ("apparmor: add support for mapping secids and using secctxes") Reported-by: syzbot+21016130b0580a9de3b5@syzkaller.appspotmail.com Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Several block drivers call alloc_disk() followed by put_disk() if
something fails before device_add_disk() is called without calling
blk_cleanup_queue(). Make sure that also for this scenario a request
queue is dissociated from the cgroup controller. This patch avoids
that loading the parport_pc, paride and pf drivers triggers the
following kernel crash:
BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride]
Read of size 4 at addr 0000000000000008 by task modprobe/744
Call Trace:
dump_stack+0x9a/0xeb
kasan_report+0x139/0x350
pi_init+0x42e/0x580 [paride]
pf_init+0x2bb/0x1000 [pf]
do_one_initcall+0x8e/0x405
do_init_module+0xd9/0x2f2
load_module+0x3ab4/0x4700
SYSC_finit_module+0x176/0x1a0
do_syscall_64+0xee/0x2b0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Reported-by: Alexandru Moise <00moses.alexander00@gmail.com> Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17 Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Alexandru Moise <00moses.alexander00@gmail.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Alexandru Moise <00moses.alexander00@gmail.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Runtime PM isn't ready for blk-mq yet, and commit 765e40b675a9 ("block:
disable runtime-pm for blk-mq") tried to disable it. Unfortunately,
it can't take effect in that way since user space still can switch
it on via 'echo auto > /sys/block/sdN/device/power/control'.
This patch disables runtime-pm for blk-mq really by pm_runtime_disable()
and fixes all kinds of PM related kernel crash.
Cc: Tomas Janousek <tomi@nomi.cz> Cc: Przemek Socha <soprwa@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: <stable@vger.kernel.org> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We find the memory use-after-free issue in __blk_drain_queue()
on the kernel 4.14. After read the latest kernel 4.18-rc6 we
think it has the same problem.
Memory is allocated for q->fq in the blk_init_allocated_queue().
If the elevator init function called with error return, it will
run into the fail case to free the q->fq.
Then the __blk_drain_queue() uses the same memory after the free
of the q->fq, it will lead to the unpredictable event.
The patch is to set q->fq as NULL in the fail case of
blk_init_allocated_queue().
Fixes: commit 7c94e1c157a2 ("block: introduce blk_flush_queue to drive flush machinery") Cc: <stable@vger.kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: xiao jin <jin.xiao@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If __blkdev_issue_discard is in progress and a device mapper device is
reloaded with a table that doesn't support discard,
q->limits.max_discard_sectors is set to zero. This results in infinite
loop in __blkdev_issue_discard.
This patch checks if max_discard_sectors is zero and aborts with
-EOPNOTSUPP.
ondemand_readahead() checks bdi->io_pages to cap the maximum pages
that need to be processed. This works until the readit section. If
we would do an async only readahead (async size = sync size) and
target is at beginning of window we expand the pages by another
get_next_ra_size() pages. Btrace for large reads shows that kernel
always issues a doubled size read at the beginning of processing.
Add an additional check for io_pages in the lower part of the func.
The fix helps devices that hard limit bio pages and rely on proper
handling of max_hw_read_sectors (e.g. older FusionIO cards). For
that reason it could qualify for stable.
Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting") Cc: stable@vger.kernel.org Signed-off-by: Markus Stockhausen stockhausen@collogia.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I have encountered an interrupt storm during the eMMC chip probing (and
the chip finally didn't get detected). It turned out that U-Boot left
the SDHI DMA interrupts enabled while the Linux driver didn't use those.
Masking those interrupts in renesas_sdhi_internal_dmac_request_dma() gets
rid of both issues...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Fixes: 2a68ea7896e3 ("mmc: renesas-sdhi: add support for R-Car Gen3 SDHI DMAC") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The mmc block driver does not support parallel dispatch of requests. In
normal circumstances, all requests are anyway funneled through a single
work item, so parallel dispatch never happens. However it can happen if
there is no elevator.
Fix that by detecting if a dispatch is in progress and returning busy
(BLK_STS_RESOURCE) in that case
The path "spi: cadence: Add usleep_range() for
cdns_spi_fill_tx_fifo()" added a usleep_range() function call,
which cannot be used in atomic context.
However the cdns_spi_fill_tx_fifo() function can be called during
an interrupt which may result in a kernel panic:
This patch replaces the function call with udelay() which can be
used in an atomic context, like an interrupt.
Signed-off-by: Jan Kotas <jank@cadence.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Registers of DSPI should not be accessed before enabling its clock. On
Toradex Colibri VF50 on Iris carrier board this could be seen during
bootup as imprecise abort:
Unhandled fault: imprecise external abort (0x1c06) at 0x00000000
Internal error: : 1c06 [#1] ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.14.39-dirty #97
Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
Backtrace:
[<804166a8>] (regmap_write) from [<80466b5c>] (dspi_probe+0x1f0/0x8dc)
[<8046696c>] (dspi_probe) from [<8040107c>] (platform_drv_probe+0x54/0xb8)
[<80401028>] (platform_drv_probe) from [<803ff53c>] (driver_probe_device+0x280/0x2f8)
[<803ff2bc>] (driver_probe_device) from [<803ff674>] (__driver_attach+0xc0/0xc4)
[<803ff5b4>] (__driver_attach) from [<803fd818>] (bus_for_each_dev+0x70/0xa4)
[<803fd7a8>] (bus_for_each_dev) from [<803fee74>] (driver_attach+0x24/0x28)
[<803fee50>] (driver_attach) from [<803fe980>] (bus_add_driver+0x1a0/0x218)
[<803fe7e0>] (bus_add_driver) from [<803fffe8>] (driver_register+0x80/0x100)
[<803fff68>] (driver_register) from [<80400fdc>] (__platform_driver_register+0x48/0x50)
[<80400f94>] (__platform_driver_register) from [<8091cf7c>] (fsl_dspi_driver_init+0x1c/0x20)
[<8091cf60>] (fsl_dspi_driver_init) from [<8010195c>] (do_one_initcall+0x4c/0x174)
[<80101910>] (do_one_initcall) from [<80900e8c>] (kernel_init_freeable+0x144/0x1d8)
[<80900d48>] (kernel_init_freeable) from [<805ff6a8>] (kernel_init+0x10/0x114)
[<805ff698>] (kernel_init) from [<80107be8>] (ret_from_fork+0x14/0x2c)
Cc: <stable@vger.kernel.org> Fixes: 5ee67b587a2b ("spi: dspi: clear SPI_SR before enable interrupt") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Intel Ice Lake SPI host controller follows the Intel Cannon Lake but the
PCI IDs are different. Add the new PCI IDs to the driver supported
devices list.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The zero-copy optimization when reading or writing large chunks of data
is quite useful. However, the 9p messages created through the zero-copy
write path have an incorrect message size: it should be the size of the
header + size of the data being written but instead it's just the size
of the header.
This only works if the server ignores the size field of the message and
otherwise breaks the framing of the protocol. Fix this by re-writing the
message size field with the correct value.
Tested by running `dd if=/dev/zero of=out bs=4k count=1` inside a
virtio-9p mount.
This patch fixes patch add handling to take care tail and headroom for
single 6lowpan frames. We need to be sure we have a skb with the right
head and tailroom for single frames. This patch do it by using
skb_copy_expand() if head and tailroom is not enough allocated by upper
layer.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195059 Reported-by: David Palma <david.palma@ntnu.no> Reported-by: Rabi Narayan Sahoo <rabinarayans0828@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, the parallelized initialization of expedited grace periods uses
the workqueue associated with each rcu_node structure's ->grplo field.
This works fine unless that CPU is offline. This commit therefore uses
the CPU corresponding to the lowest-numbered online CPU, or just queues
the work on WORK_CPU_UNBOUND if there are no online CPUs corresponding
to this rcu_node structure.
Note that this patch uses cpu_is_offline() instead of the usual approach
of checking bits in the rcu_node structure's ->qsmaskinitnext field. This
is safe because preemption is disabled across both the cpu_is_offline()
check and the call to queue_work_on().
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
[ paulmck: Disable preemption to close offline race window. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply Peter Zijlstra feedback on CPU selection. ] Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show_opcodes() is used both for dumping kernel instructions and for dumping
user instructions. If userspace causes #PF by jumping to a kernel address,
show_opcodes() can be reached with regs->ip controlled by the user,
pointing to kernel code. Make sure that userspace can't trick us into
dumping kernel memory into dmesg.
Like d88b6d04: "cdrom: information leak in cdrom_ioctl_media_changed()"
There is another cast from unsigned long to int which causes
a bounds check to fail with specially crafted input. The value is
then used as an index in the slot array in cdrom_slot_status().
Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Scott Bauer <sbauer@plzdonthack.me> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some architectures need to use stop_machine() to patch functions for
ftrace, and the assumption is that the stopped CPUs do not make function
calls to traceable functions when they are in the stopped state.
Commit ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after
MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
the stopped CPUs and those functions lack notrace annotations. This
leads to crashes when enabling/disabling ftrace on ARM kernels built
with the Thumb-2 instruction set.
Fix it by adding the necessary notrace annotations.
Fixes: ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: oleg@redhat.com Cc: tj@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We did have sporadic problems in the pinctrl framework during boot
where a pin group name unexpectedly became NULL leading to a NULL
dereference in strcmp.
Detailled analysis of the failing cases did reveal that there were
two devm allocated objects close to each other. The second one was
the affected group_desc in pinmux and the first one was the
psy_desc->properties buffer of the gab driver.
Review of the gab code showed that the address calculation for
one memcpy() is wrong. It does
properties + sizeof(type) * index
but C is defined to do the index multiplication already for
pointer + integer additions. Hence the factor was applied twice
and the memcpy() does write outside of the properties buffer.
Sometimes it happened to be the pinctrl and triggered the strcmp(NULL).
Anyways, it is overkill to use a memcpy() here instead of a simple
assignment, which is easier to read and has less risk for wrong
address calculations. So we change code to a simple assignment.
If we initialize the index to the first free location, we can even
remove the local variable 'properties'.
This bug seems to exist right from the beginning in 3.7-rc1 in
commit e60fea794e6e ("power: battery: Generic battery driver using IIO")
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: stable@vger.kernel.org Fixes: e60fea794e6e ("power: battery: Generic battery driver using IIO") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
"count" needs to be signed for the error handling to work. I made "i"
signed as well so they match.
Fixes: 02113ba93ea4 (PM / clk: Add support for obtaining clocks from device-tree) Cc: 4.6+ <stable@vger.kernel.org> # 4.6+ Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
The BAM has 3 channels - tx, rx and command. command channel
is used for register read/writes, tx channel for data writes
and rx channel for data reads. Currently, the driver assumes the
transfer completion once it gets all the command descriptors
completed. Sometimes, there is race condition between data channel
(tx/rx) and command channel completion. In these cases,
the data present in buffer is not valid during small window
between command descriptor completion and data descriptor
completion.
This patch generates NAND transfer completion when both
(Data and Command) DMA channels have completed all its DMA
descriptors. It assigns completion callback in last
DMA descriptors of that channel and wait for completion.
This patch restores the suspend and resume hooks that the old driver used
to have. Apart from stopping and starting the clocks, the resume callback
also nullifies the selected_chip pointer, so the next command that is issued
will re-select the chip and thereby restore the timing registers.
Factor out some code from marvell_nfc_init() into a new function
marvell_nfc_reset() and also call it at resume time to reset some registers
that don't retain their contents during low-power mode.
Without this patch, a PXA3xx based system would cough up an error similar to
the one below after resume.
[ 44.660162] marvell-nfc 43100000.nand-controller: Timeout waiting for RB signal
[ 44.671492] ubi0 error: ubi_io_write: error -110 while writing 2048 bytes to PEB 102:38912, written 0 bytes
[ 44.682887] CPU: 0 PID: 1417 Comm: remote-control Not tainted 4.18.0-rc2+ #344
[ 44.691197] Hardware name: Marvell PXA3xx (Device Tree Support)
[ 44.697111] Backtrace:
[ 44.699593] [<c0106458>] (dump_backtrace) from [<c0106718>] (show_stack+0x18/0x1c)
[ 44.708931] r7:00000800 r6:00009800 r5:00000066 r4:c6139000
[ 44.715833] [<c0106700>] (show_stack) from [<c0678a60>] (dump_stack+0x20/0x28)
[ 44.724206] [<c0678a40>] (dump_stack) from [<c0456cbc>] (ubi_io_write+0x3d4/0x630)
[ 44.732925] [<c04568e8>] (ubi_io_write) from [<c0454428>] (ubi_eba_write_leb+0x690/0x6fc)
...
chip->read_buf is left unassigned since commit 4da712e70294 ("mtd: nand:
fsmc: use ->exec_op()"), leading to a NULL pointer dereference when it's
called from fsmc_read_page_hwecc(). Fix that by using the appropriate
helper to read data out of the NAND.
Modern NAND controller drivers implement ->exec_op() instead of
->cmdfunc(), make sure we don't end up with a NULL pointer dereference
when hynix_nand_reg_write_op() is called.
The problem is that iscsi_login_zero_tsih_s1 sets conn->sess early in
iscsi_login_set_conn_values. If the function fails later like when we
alloc the idr it does kfree(sess) and leaves the conn->sess pointer set.
iscsi_login_zero_tsih_s1 then returns -Exyz and we then call
iscsi_target_login_sess_out and access the freed memory.
This patch has iscsi_login_zero_tsih_s1 either completely setup the
session or completely tear it down, so later in
iscsi_target_login_sess_out we can just check for it being set to the
connection.
Cc: stable@vger.kernel.org Fixes: 0957627a9960 ("iscsi-target: Fix sess allocation leak in...") Signed-off-by: Mike Christie <mchristi@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A long time ago the unfortunate decision was taken to add a self-deletion
attribute to the sysfs SCSI device directory. That decision was unfortunate
because self-deletion is really tricky. We can't drop that attribute
because widely used user space software depends on it, namely the
rescan-scsi-bus.sh script. Hence this patch that avoids that writing into
that attribute triggers a deadlock. See also commit 7973cbd9fbd9 ("[PATCH]
add sysfs attributes to scan and delete scsi_devices").
This patch avoids that self-removal triggers the following deadlock:
======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc2-dbg+ #5 Not tainted
------------------------------------------------------
modprobe/6539 is trying to acquire lock: 000000008323c4cd (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x45/0x90
but task is already holding lock: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
This patch avoids that smatch complains about a double unlock on
ioc->transport_cmds.mutex.
Fixes: 651a01364994 ("scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sathya Prakash <sathya.prakash@broadcom.com> Cc: Chaitra P B <chaitra.basappa@broadcom.com> Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As a part of host reset operation, driver will flushout all IOs outstanding
at driver level with "DID_RESET" result. To find which are all commands
outstanding at the driver level, driver loops with smid starting from one
to HBA queue depth and calls mpt3sas_scsih_scsi_lookup_get() to get scmd as
shown below
for (smid = 1; smid <= ioc->scsiio_depth; smid++) {
scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);
if (!scmd)
continue;
But in mpt3sas_scsih_scsi_lookup_get() function, driver returns some scsi
cmnds which are not outstanding at the driver level (possibly request is
constructed at block layer since QUEUE_FLAG_QUIESCED is not set. Even if
driver uses scsi_block_requests and scsi_unblock_requests, issue still
persists as they will be just blocking further IO from scsi layer and not
from block layer) and these commands are flushed with DID_RESET host bytes
thus resulting into above kernel BUG.
This issue got introduced by commit dbec4c9040ed ("scsi: mpt3sas: lockless
command submission").
To fix this issue, we have modified the mpt3sas_scsih_scsi_lookup_get() to
check for smid equals to zero (note: whenever any scsi cmnd is processing
at the driver level then smid for that scsi cmnd will be non-zero, always
it starts from one) before it returns the scmd pointer to the caller. If
smid is zero then this function returns scmd pointer as NULL and driver
won't flushout those scsi cmnds at driver level with DID_RESET host byte
thus this issue will not be observed.
[mkp: amended with updated fix from Sreekanth]
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Fixes: dbec4c9040ed ("scsi: mpt3sas: lockless command submission") Cc: stable@vger.kernel.org # v4.16+ Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix tpm ptt initialization error:
tpm tpm0: A TPM error (378) occurred get tpm pcr allocation.
We cannot use go_idle cmd_ready commands via runtime_pm handles
as with the introduction of localities this is no longer an optional
feature, while runtime pm can be not enabled.
Though cmd_ready/go_idle provides a power saving, it's also a part of
TPM2 protocol and should be called explicitly.
This patch exposes cmd_read/go_idle via tpm class ops and removes
runtime pm support as it is not used by any driver.
When calling from nested context always use both flags:
TPM_TRANSMIT_UNLOCKED and TPM_TRANSMIT_RAW. Both are needed to resolve
tpm spaces and locality request recursive calls to tpm_transmit().
TPM_TRANSMIT_RAW should never be used standalone as it will fail
on double locking. While TPM_TRANSMIT_UNLOCKED standalone should be
called from non-recursive locked contexts.
New wrappers are added tpm_cmd_ready() and tpm_go_idle() to
streamline tpm_try_transmit code.
tpm_crb no longer needs own power saving functions and can drop using
tpm_pm_suspend/resume.
This patch cannot be really separated from the locality fix. Fixes: 888d867df441 (tpm: cmd_ready command can be issued only after granting locality) Cc: stable@vger.kernel.org Fixes: 888d867df441 (tpm: cmd_ready command can be issued only after granting locality) Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The userpace expects to read the number of bytes stated in the header.
Returning the size of the buffer instead would be unexpected.
Cc: stable@vger.kernel.org Fixes: 095531f891e6 ("tpm: return a TPM_RC_COMMAND_CODE response if command is not implemented") Signed-off-by: Ricardo Schwarzmeier <Ricardo.Schwarzmeier@infineon.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some versions of GCC suboptimally generate calls to the __multi3()
intrinsic for MIPS64r6 builds, resulting in link failures due to the
missing function:
LD vmlinux.o
MODPOST vmlinux.o
kernel/bpf/verifier.o: In function `kmalloc_array':
include/linux/slab.h:631: undefined reference to `__multi3'
fs/select.o: In function `kmalloc_array':
include/linux/slab.h:631: undefined reference to `__multi3'
...
We already have a workaround for this in which we provide the
instrinsic, but we do so selectively for GCC 7 only. Unfortunately the
issue occurs with older GCC versions too - it has been observed with
both GCC 5.4.0 & GCC 6.4.0.
MIPSr6 support was introduced in GCC 5, so all major GCC versions prior
to GCC 8 are affected and we extend our workaround accordingly to all
MIPS64r6 builds using GCC versions older than GCC 8.
Signed-off-by: Paul Burton <paul.burton@mips.com> Reported-by: Vladimir Kondratiev <vladimir.kondratiev@intel.com> Fixes: ebabcf17bcd7 ("MIPS: Implement __multi3 for GCC7 MIPS64r6 builds")
Patchwork: https://patchwork.linux-mips.org/patch/20297/ Cc: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linux expects that if a CPU modifies a memory location, then that
modification will eventually become visible to other CPUs in the system.
Loongson 3 CPUs include a Store Fill Buffer (SFB) which sits between a
core & its L1 data cache, queueing memory accesses & allowing for faster
forwarding of data from pending stores to younger loads from the core.
Unfortunately the SFB prioritizes loads such that a continuous stream of
loads may cause a pending write to be buffered indefinitely. This is
problematic if we end up with 2 CPUs which each perform a store that the
other polls for - one or both CPUs may end up with their stores buffered
in the SFB, never reaching cache due to the continuous reads from the
poll loop. Such a deadlock condition has been observed whilst running
qspinlock code.
This patch changes the definition of cpu_relax() to smp_mb() for
Loongson-3, forcing a flush of the SFB on SMP systems which will cause
any pending writes to make it as far as the L1 caches where they will
become visible to other CPUs. If the kernel is not compiled for SMP
support, this will expand to a barrier() as before.
This workaround matches that currently implemented for ARM when
CONFIG_ARM_ERRATA_754327=y, which was introduced by commit 534be1d5a2da
("ARM: 6194/1: change definition of cpu_relax() for ARM11MPCore").
Although the workaround is only required when the Loongson 3 SFB
functionality is enabled, and we only began explicitly enabling that
functionality in v4.7 with commit 1e820da3c9af ("MIPS: Loongson-3:
Introduce CONFIG_LOONGSON3_ENHANCEMENT"), existing or future firmware
may enable the SFB which means we may need the workaround backported to
earlier kernels too.
[paul.burton@mips.com:
- Reword commit message & comment.
- Limit stable backport to v3.15+ where we support Loongson 3 CPUs.]