Commit 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in
the pinned physical page", 2018-07-17) added some checks to ensure
that guest DMA mappings don't attempt to map more than the guest is
entitled to access. However, errors in the logic mean that legitimate
guest requests to map pages for DMA are being denied in some
situations. Specifically, if the first page of the range passed to
mm_iommu_get() is mapped with a normal page, and subsequent pages are
mapped with transparent huge pages, we end up with mem->pageshift ==
0. That means that the page size checks in mm_iommu_ua_to_hpa() and
mm_iommu_up_to_hpa_rm() will always fail for every page in that
region, and thus the guest can never map any memory in that region for
DMA, typically leading to a flood of error messages like this:
(a) use of 'ua' not 'ua + (i << PAGE_SHIFT)' in the find_linux_pte()
call (meaning that find_linux_pte() returns the pte for the
first address in the range, not the address we are currently up
to);
(b) use of 'pageshift' as the variable to receive the hugepage shift
returned by find_linux_pte() - for a normal page this gets set
to 0, leading to us setting mem->pageshift to 0 when we conclude
that the pte returned by find_linux_pte() didn't match the page
we were looking at;
(c) comparing 'compshift', which is a page order, i.e. log base 2 of
the number of pages, with 'pageshift', which is a log base 2 of
the number of bytes.
To fix these problems, this patch introduces 'cur_ua' to hold the
current user address and uses that in the find_linux_pte() call;
introduces 'pteshift' to hold the hugepage shift found by
find_linux_pte(); and compares 'pteshift' with 'compshift +
PAGE_SHIFT' rather than 'compshift'.
The patch also moves the local_irq_restore to the point after the PTE
pointer returned by find_linux_pte() has been dereferenced because
otherwise the PTE could change underneath us, and adds a check to
avoid doing the find_linux_pte() call once mem->pageshift has been
reduced to PAGE_SHIFT, as an optimization.
Fixes: 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in the pinned physical page") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1) missing entries in the l1d_param array; this can cause a host crash
if an access attempts to reach the missing entry. Future-proof the get
function against any overflows as well. However, the two entries
VMENTER_L1D_FLUSH_EPT_DISABLED and VMENTER_L1D_FLUSH_NOT_REQUIRED must
not be accepted by the parse function, so disable them there.
2) invalid values must be rejected even if the CPU does not have the
bug, so test for them before checking boot_cpu_has(X86_BUG_L1TF)
... and a small refactoring, since the .cmd field is redundant with
the index in the array.
Reported-by: Bandan Das <bsd@redhat.com> Cc: stable@vger.kernel.org Fixes: a7b9020b06ec6d7c3f3b0d4ef1a9eba12654f4f7 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Bay Trail and Cherry Trail devices we set the pm_disabled flag for I2C
busses which the OS shares with the PUNIT as these need special handling.
Until now we called dev_pm_syscore_device(dev, true) for I2C controllers
with this flag set to keep these I2C controllers always on.
After commit 12864ff8545f ("ACPI / LPSS: Avoid PM quirks on suspend and
resume from hibernation"), this no longer works. This commit modifies
lpss_iosf_exit_d3_state() to only run if lpss_iosf_enter_d3_state() has ran
before it, so that it does not run on a resume from hibernate (or from S3).
On these systems the conditions for lpss_iosf_enter_d3_state() to run
never become true, so lpss_iosf_exit_d3_state() never gets called and
the 2 LPSS DMA controllers never get forced into D0 mode, instead they
are left in their default automatic power-on when needed mode.
The not forcing of D0 mode for the DMA controllers enables these systems
to properly enter S0ix modes, which is a good thing.
But after entering S0ix modes the I2C controller connected to the PMIC
no longer works, leading to e.g. broken battery monitoring.
The _PS3 method for this I2C controller looks like this:
Method (_PS3, 0, NotSerialized) // _PS3: Power State 3
{
If ((((PMID == 0x04) || (PMID == 0x05)) || (PMID == 0x06)))
{
Return (Zero)
}
Where PMID = 0x05, so we enter the Return (Zero) path on these systems.
So even if we were to not call dev_pm_syscore_device(dev, true) the
I2C controller will be left in D0 rather then be switched to D3.
Yet on other Bay and Cherry Trail devices S0ix is not entered unless *all*
I2C controllers are in D3 mode. This combined with the I2C controller no
longer working now that we reach S0ix states on these systems leads to me
believing that the PUNIT itself puts the I2C controller in D3 when all
other conditions for entering S0ix states are true.
Since now the I2C controller is put in D3 over a suspend/resume we must
re-initialize it afterwards and that does indeed fix it no longer working.
This commit implements this fix by:
1) Making the suspend_late callback a no-op if pm_disabled is set and
making the resume_early callback skip the clock re-enable (since it now was
not disabled) while still doing the necessary I2C controller re-init.
2) Removing the dev_pm_syscore_device(dev, true) call, so that the suspend
and resume callbacks are actually called. Normally this would cause the
ACPI pm code to call _PS3 putting the I2C controller in D3, wreaking havoc
since it is shared with the PUNIT, but in this special case the _PS3 method
is a no-op so we can safely allow a "fake" suspend / resume.
Fixes: 12864ff8545f ("ACPI / LPSS: Avoid PM quirks on suspend and resume ...") Link: https://bugzilla.kernel.org/show_bug.cgi?id=200861 Cc: 4.15+ <stable@vger.kernel.org> # 4.15+ Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit ea0212f40c6 (power: auto select CONFIG_SRCU) made the code in
drivers/base/power/wakeup.c use SRCU instead of RCU, but it forgot to
select CONFIG_SRCU in Kconfig, which leads to the following build
error if CONFIG_SRCU is not selected somewhere else:
drivers/built-in.o: In function `wakeup_source_remove':
(.text+0x3c6fc): undefined reference to `synchronize_srcu'
drivers/built-in.o: In function `pm_print_active_wakeup_sources':
(.text+0x3c7a8): undefined reference to `__srcu_read_lock'
drivers/built-in.o: In function `pm_print_active_wakeup_sources':
(.text+0x3c84c): undefined reference to `__srcu_read_unlock'
drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
(.text+0x3d1d8): undefined reference to `__srcu_read_lock'
drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
(.text+0x3d228): undefined reference to `__srcu_read_unlock'
drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
(.text+0x3d24c): undefined reference to `__srcu_read_lock'
drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
(.text+0x3d29c): undefined reference to `__srcu_read_unlock'
drivers/built-in.o:(.data+0x4158): undefined reference to `process_srcu'
Fix this error by selecting CONFIG_SRCU when PM_SLEEP is enabled.
Fixes: ea0212f40c6 (power: auto select CONFIG_SRCU) Cc: 4.2+ <stable@vger.kernel.org> # 4.2+ Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
[ rjw: Minor subject/changelog fixups ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If cppc_cpufreq.ko is deleted at the same time that tuned-adm is
changing profiles, there is a small chance that a race can occur
between cpufreq_dbs_governor_exit() and cpufreq_dbs_governor_limits()
resulting in a system failure when the latter tries to use
policy->governor_data that has been freed by the former.
This patch uses gov_dbs_data_mutex to synchronize access.
Fixes: e788892ba3cc (cpufreq: governor: Get rid of governor events) Signed-off-by: Henry Willard <henry.willard@oracle.com>
[ rjw: Subject, minor white space adjustment ] Cc: 4.8+ <stable@vger.kernel.org> # 4.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.
Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary. In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly. However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.
To address this issue, modify the governor to always use the time
till the closest timer event instead of the predicted idle duration
if the latter is less than the tick period length and the tick has
been stopped already. Also make it extend the search for a matching
idle state if the tick is stopped to avoid settling on a shallow
state if deep states with target residencies above the tick period
length are available.
In addition, make it always indicate that the tick should be stopped
if it has been stopped already for consistency.
Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick) Reported-by: Leo Yan <leo.yan@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: 4.17+ <stable@vger.kernel.org> # 4.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the tick has been stopped already, but the governor has not asked to
stop it (which it can do sometimes), the idle loop should invoke
tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care
of this case properly.
Fixes: 554c8aa8ecad (sched: idle: Select idle state before stopping the tick) Cc: 4.17+ <stable@vger.kernel.org> # 4.17+ Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The call to strlcpy in backing_dev_store is incorrect. It should take
the size of the destination buffer instead of the size of the source
buffer. Additionally, ignore the newline character (\n) when reading
the new file_name buffer. This makes it possible to set the backing_dev
as follows:
echo /dev/sdX > /sys/block/zram0/backing_dev
The reason it worked before was the fact that strlcpy() copies 'len - 1'
bytes, which is strlen(buf) - 1 in our case, so it accidentally didn't
copy the trailing new line symbol. Which also means that "echo -n
/dev/sdX" most likely was broken.
Signed-off-by: Peter Kalauskas <peskal@google.com> Link: http://lkml.kernel.org/r/20180813061623.GC64836@rodete-desktop-imager.corp.google.com Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit fa85015c0d95 (ACPICA: Clear status of all events when entering
S5) made the sleep state entry code in ACPICA clear the status of all
ACPI events when entering S5 to fix a functional regression reported
against commit 18996f2db918 (ACPICA: Events: Stop unconditionally
clearing ACPI IRQs during suspend/resume). However, it is reported
now that the regression also affects system states other than S5 on
some systems and causes them to wake up from sleep prematurely.
For this reason, make the code in question clear the status of all
ACPI events when entering all sleep states (in addition to S5) to
avoid the premature wakeups (this may cause some wakeup events to
be missed in theory, but the likelihood of that is small and the
change here simply restores the previous behavior of the code).
Fixes: 18996f2db918 (ACPICA: Events: Stop unconditionally clearing ACPI IRQs during suspend/resume) Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Cc: 4.17+ <stable@vger.kernel.org> # 4.17+: fa85015c0d95 ACPICA: Clear status ... Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This change skips the entire length of opcodes that open a scope
(Device, Scope, Processor, etc) if the creation of the op fails. The
failure could be caused by various errors including AE_ALREADY_EXISTS
and AE_NOT_FOUND.
Reported-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Erik Schmauss <erik.schmauss@intel.com> Cc: 4.17+ <stable@vger.kernel.org> # 4.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Only upper dir can be impure, but if we are in the middle of
iterating a lower real dir, dir could be copied up and marked
impure. We only want the impure cache if we started iterating
a real upper dir to begin with.
Aditya Kali reported that the following reproducer hits the
WARN_ON(!cache->refcount) in ovl_get_cache():
docker run --rm drupal:8.5.4-fpm-alpine \
sh -c 'cd /var/www/html/vendor/symfony && \
chown -R www-data:www-data . && ls -l .'
Reported-by: Aditya Kali <adityakali@google.com> Tested-by: Aditya Kali <adityakali@google.com> Fixes: 4edb83bb1041 ('ovl: constant d_ino for non-merge dirs') Cc: <stable@vger.kernel.org> # v4.14 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
LTP Functional tests have caused a bad paging request when triggering
the regmap_read_debugfs() logic of the device PMIC Hi6553 (reading
regmap/f8000000.pmic/registers file during read_all test):
Investigations have showed that, when triggered by debugfs read()
handler, the mmio regmap logic was reading a bigger (16k) register area
than the one mapped by devm_ioremap_resource() during hi655x-pmic probe
time (4k).
This commit changes hi655x's max register, according to HW specs, to be
the same as the one declared in the pmic device in hi6220's dts, fixing
the issue.
Cc: <stable@vger.kernel.org> #v4.9 #v4.14 #v4.16 #v4.17 Signed-off-by: Rafael David Tinoco <rafael.tinoco@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While debugging another bug, I was looking at all the synchronize*()
functions being used in kernel/trace, and noticed that trace_uprobes was
using synchronize_sched(), with a comment to synchronize with
{u,ret}_probe_trace_func(). When looking at those functions, the data is
protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This
is using the wrong synchronize_*() function.
Link: http://lkml.kernel.org/r/20180809160553.469e1e32@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 70ed91c6ec7f8 ("tracing/uprobes: Support ftrace_event_file base multibuffer") Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
livepatch module author can pass module name/old function name with more
than the defined character limit. With obj->name length greater than
MODULE_NAME_LEN, the livepatch module gets loaded but waits forever on
the module specified by obj->name to be loaded. It also populates a /sys
directory with an untruncated object name.
In the case of funcs->old_name length greater then KSYM_NAME_LEN, it
would not match against any of the symbol table entries. Instead loop
through the symbol table comparing them against a nonexisting function,
which can be avoided.
The same issues apply, to misspelled/incorrect names. At least gatekeep
the modules with over the limit string length, by checking for their
length during livepatch module registration.
------------[ cut here ]------------
IRQs not enabled as expected
WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: tick_nohz_idle_enter+0x44/0x8c
Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
Call Trace:
do_idle+0x33/0x202
cpu_startup_entry+0x61/0x63
start_secondary+0x18e/0x1ed
startup_32_smp+0x164/0x168
irq event stamp: 18773830
hardirqs last enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10
hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10
softirqs last enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf
softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b
---[ end trace b7c64aa79e17954a ]---
After a bit of debugging, I found what was happening. This would trigger
when performing "perf" with a high NMI interrupt rate, while enabling and
disabling function tracer. Ftrace uses breakpoints to convert the nops at
the start of functions to calls to the function trampolines. The breakpoint
traps disable interrupts and this makes calls into lockdep via the
trace_hardirqs_off_thunk in the entry.S code. What happens is the following:
do_idle {
[interrupts enabled]
<interrupt> [interrupts disabled]
TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET
test if pt_regs say return to interrupts enabled [yes]
TRACE_IRQS_ON [lockdep says irqs are on]
<nmi>
nmi_enter() {
printk_nmi_enter() [traced by ftrace]
[ hit ftrace breakpoint ]
<breakpoint exception>
TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET [return from breakpoint]
test if pt_regs say interrupts enabled [no]
[iret back to interrupt]
[iret back to code]
tick_nohz_idle_enter() {
lockdep_assert_irqs_enabled() [lockdep say no!]
Although interrupts are indeed enabled, lockdep thinks it is not, and since
we now do asserts via lockdep, it gives a false warning. The issue here is
that printk_nmi_enter() is called before lockdep_off(), which disables
lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
confused.
Cc: stable@vger.kernel.org Fixes: 42a0bb3f71383 ("printk/nmi: generic solution for safe printk in NMI") Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, when one echo's in 1 into tracing_on, the current tracer's
"start()" function is executed, even if tracing_on was already one. This can
lead to strange side effects. One being that if the hwlat tracer is enabled,
and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's
start() function is called again which will recreate another kernel thread,
and make it unable to remove the old one.
Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de Cc: stable@vger.kernel.org Fixes: 2df8f8a6a897e ("tracing: Fix regression with irqsoff tracer and tracing_on file") Reported-by: Erica Bugden <erica.bugden@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do not set the system power-off callback and omap power-off rtc pointer
until we're done setting up our device to avoid leaving stale pointers
around after a late probe error.
Fixes: 97ea1906b3c2 ("rtc: omap: Support ext_wakeup configuration") Cc: stable <stable@vger.kernel.org> # 4.9 Cc: Marcin Niestroj <m.niestroj@grinn-global.com> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Tony Lindgren <tony@atomide.com> Reviewed-by: Marcin Niestroj <m.niestroj@grinn-global.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, when all modules, including VMCI and VMware balloon are built
into the kernel, the initialization of the balloon happens before the
VMCI is probed. As a result, the balloon fails to initialize the VMCI
doorbell, which it uses to get asynchronous requests for balloon size
changes.
The problem can be seen in the logs, in the form of the following
message:
"vmw_balloon: failed to initialize vmci doorbell"
The driver would work correctly but slightly less efficiently, probing
for requests periodically. This patch changes the balloon to be
initialized using late_initcall() instead of module_init() to address
this issue. It does not address a situation in which VMCI is built as a
module and the balloon is built into the kernel.
When vmballoon_vmci_init() sets a doorbell using VMCI_DOORBELL_SET, for
some reason it does not consider the status and looks at the result.
However, the hypervisor does not update the result - it updates the
status. This might cause VMCI doorbell not to be enabled, resulting in
degraded performance.
If the hypervisor sets 2MB batching is on, while batching is cleared,
the balloon code breaks. In this case the legacy mechanism is used with
2MB page. The VM would report a 2MB page is ballooned, and the
hypervisor would only take the first 4KB.
While the hypervisor should not report such settings, make the code more
robust by not enabling 2MB support without batching.
When balloon batching is not supported by the hypervisor, the guest
frame number (GFN) must fit in 32-bit. However, due to a bug, this check
was mistakenly ignored. In practice, when total RAM is greater than
16TB, the balloon does not work currently, making this bug unlikely to
happen.
When importing the latest copy of the kernel headers into Bionic,
Christpher and Elliott noticed that the eventpoll.h casts were not
wrapped in (). As it is, clang complains about macros without
surrounding (), so this makes it a pain for userspace tools.
So fix it up by adding another () pair, and make them line up purty by
using tabs.
Fixes: 65aaf87b3aa2 ("add EPOLLNVAL, annotate EPOLL... and event_poll->event") Reported-by: Christopher Ferris <cferris@google.com> Reported-by: Elliott Hughes <enh@google.com> Cc: stable <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously, extcon used the spinlock before calling the notifier_call_chain
to prevent the scheduled out of task and to prevent the notification delay.
When spinlock is locked for sending the notification, deadlock issue
occured on the side of extcon consumer device. To fix this issue,
extcon consumer device should always use the work. it is always not
reasonable to use work.
To fix this issue on extcon consumer device, release locking when sending
the notification of connector state.
Fixes: ab11af049f88 ("extcon: Add the synchronization extcon APIs to support the notification") Cc: stable@vger.kernel.org Cc: Roger Quadros <rogerq@ti.com> Cc: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A sysfs write callback function needs to either return the number of
consumed characters or an error.
The ad952x_store() function currently returns 0 if the input value was "0",
this will signal that no characters have been consumed and the function
will be called repeatedly in a loop indefinitely. Fix this by returning
number of supplied characters to indicate that the whole input string has
been consumed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Fixes: cd1678f96329 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator") Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the displayed phase for the ad9523 driver. Currently the most
significant decimal place is dropped and all other digits are shifted one
to the left. This is due to a multiplication by 10, which is not necessary,
so remove it.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Fixes: cd1678f9632 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator") Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The IIO_CHAN_INFO_LOW_PASS_FILTER_3DB_FREQUENCY case is missing a
return and will fall through to the default case and errorenously
return -EINVAL.
Fix this by adding in missing *return ret*.
Fixes: 626f971b5b07 ("staging:iio:accel:sca3000 Add write support to the low pass filter control") Reported-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before setting channel->rescind in vmbus_rescind_cleanup(), we should make
sure the channel callback won't run any more, otherwise a high-level
driver like pci_hyperv, which may be infinitely waiting for the host VSP's
response and notices the channel has been rescinded, can't safely give
up: e.g., in hv_pci_protocol_negotiation() -> wait_for_response(), it's
unsafe to exit from wait_for_response() and proceed with the on-stack
variable "comp_pkt" popped. The issue was originally spotted by
Michael Kelley <mikelley@microsoft.com>.
In vmbus_close_internal(), the patch also minimizes the range protected by
disabling/enabling channel->callback_event: we don't really need that for
the whole function.
Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Cc: stable@vger.kernel.org Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Michael Kelley <mikelley@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I didn't really hit a real bug, but just happened to spot the bug:
we have decreased the counter at the beginning of vmbus_process_offer(),
so we mustn't decrease it again.
Fixes: 6f3d791f3006 ("Drivers: hv: vmbus: Fix rescind handling issues") Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: stable@vger.kernel.org Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Stable <stable@vger.kernel.org> # 4.14 and above Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Looking in uart_port_startup(), it seems that circ->buf (state->xmit.buf)
protected by the "per-port mutex", which based on uart_port_check() is
state->port.mutex. Indeed, the lock acquired in uart_put_char() is
uport->lock, i.e. not the same lock.
Anyway, since the lock is not acquired, if uart_shutdown() is called, the
last chunk of that function may release state->xmit.buf before its assigned
to null, and cause the race above.
To fix it, let's lock uport->lock when allocating/deallocating
state->xmit.buf in addition to the per-port mutex.
v2: switch to locking uport->lock on allocation/deallocation instead of
locking the per-port mutex in uart_put_char. Note that since
uport->lock is a spin lock, we have to switch the allocation to
GFP_ATOMIC.
v3: move the allocation outside the lock, so we can switch back to
GFP_KERNEL
dm-crypt should only increase device limits, it should not decrease them.
This fixes a bug where the user could creates a crypt device with 1024
sector size on the top of scsi device that had 4096 logical block size.
The limit 4096 would be lost and the user could incorrectly send
1024-I/Os to the crypt device.
The 'dirty' state for a cache block changes far too frequently for us
to keep updating it on the fly. So we treat it as a hint. In normal
operation it will be written when the dm device is suspended. If the
system crashes all cache blocks will be assumed dirty when restarted.
This got broken in commit f177940a8091 ("dm cache metadata: switch to
using the new cursor api for loading metadata") in 4.9, which removed
the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk
flag) when loading cache blocks. This results in data corruption on an
unclean shutdown with dirty cache blocks on the fast device. After the
crash those blocks are considered clean and may get evicted from the
cache at any time. This can be demonstrated by doing a lot of reads
to trigger individual evictions, but uncache is more predictable:
### Disable auto-activation in lvm.conf to be able to do uncache in
### time (i.e. see uncache doing flushing) when the fix is applied.
# vgchange -ay vg_cache
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
# lvconvert --uncache vg_cache/lv_slowdev
Flushing 0 blocks for cache vg_cache/lv_slowdev.
Logical volume "lv_cachedev" successfully removed
Logical volume vg_cache/lv_slowdev is not cached.
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ 0fe00010: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................
This is the case with both v1 and v2 cache pool metatata formats.
Cc: stable@vger.kernel.org Fixes: f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
policy_hint_size starts as 0 during __write_initial_superblock(). It
isn't until the policy is loaded that policy_hint_size is set in-core
(cmd->policy_hint_size). But it never got recorded in the on-disk
superblock because __commit_transaction() didn't deal with transfering
the in-core cmd->policy_hint_size to the on-disk superblock.
The in-core cmd->policy_hint_size gets initialized by metadata_open()'s
__begin_transaction_flags() which re-reads all superblock fields.
Because the superblock's policy_hint_size was never properly stored, when
the cache was created, hints_array_available() would always return false
when re-activating a previously created cache. This means
__load_mappings() always considered the hints invalid and never made use
of the hints (these hints served to optimize).
Another detremental side-effect of this oversight is the cache_check
utility would fail with: "invalid hint width: 0"
Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now both check_for_space() and do_no_space_timeout() will read & write
pool->pf.error_if_no_space. If these functions run concurrently, as
shown in the following case, the default setting of "queue_if_no_space"
can get lost.
precondition:
* error_if_no_space = false (aka "queue_if_no_space")
* pool is in Out-of-Data-Space (OODS) mode
* no_space_timeout worker has been queued
CPU 0: CPU 1:
// delete a thin device
process_delete_mesg()
// check_for_space() invoked by commit()
set_pool_mode(pool, PM_WRITE)
pool->pf.error_if_no_space = \
pt->requested_pf.error_if_no_space
// timeout, pool is still in OODS mode
do_no_space_timeout
// "queue_if_no_space" config is lost
pool->pf.error_if_no_space = true
pool->pf.mode = new_mode
Fix it by stopping no_space_timeout worker when switching to write mode.
Fixes: bcc696fac11f ("dm thin: stay in out-of-data-space mode once no_space_timeout expires") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The patch adds the flush in p9_mux_poll_stop() as it the function used by
p9_conn_destroy(), in turn called by p9_fd_close() to stop the async
polling associated with the data regarding the connection.
Link: http://lkml.kernel.org/r/20180720092730.27104-1-tomasbortoli@gmail.com Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Reported-by: syzbot+39749ed7d9ef6dfb23f6@syzkaller.appspotmail.com
To: Eric Van Hensbergen <ericvh@gmail.com>
To: Ron Minnich <rminnich@sandia.gov>
To: Latchesar Ionkov <lucho@ionkov.net> Cc: Yiwen Jiang <jiangyiwen@huwei.com> Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The p9_client_version() does not initialize the version pointer. If the
call to p9pdu_readf() returns an error and version has not been allocated
in p9pdu_readf(), then the program will jump to the "error" label and will
try to free the version pointer. If version is not initialized, free()
will be called with uninitialized, garbage data and will provoke a crash.
Link: http://lkml.kernel.org/r/20180709222943.19503-1-tomasbortoli@gmail.com Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Reported-by: syzbot+65c6b72f284a39d416b4@syzkaller.appspotmail.com Reviewed-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In my testing, v9fs_fid_xattr_set will return successfully even if the
backend ext4 filesystem has no space to store xattr key-value. That will
cause inconsistent behavior between front end and back end. The reason is
that lsetxattr will be triggered by p9_client_clunk, and unfortunately we
did not catch the error. This patch will catch the error to notify upper
caller.
p9_client_clunk (in 9p)
p9_client_rpc(clnt, P9_TCLUNK, "d", fid->fid);
v9fs_clunk (in qemu)
put_fid
free_fid
v9fs_xattr_fid_clunk
v9fs_co_lsetxattr
s->ops->lsetxattr
ext4_xattr_user_set (in host ext4 filesystem)
Link: http://lkml.kernel.org/r/5B57EACC.2060900@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Every function that returns COMPST_ERROR must set wqe->status to another
value than IB_WC_SUCCESS before returning COMPST_ERROR. Fix the only code
path for which this is not yet the case.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: <stable@vger.kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since there are adapters that have four ports, increase the size of
the srpt_device.port[] array. This patch avoids that the following
warning is hit with quad port Chelsio adapters:
Once a target session has been allocated, if an error occurs, the session
must be freed. Since it is not safe to call blocking code from the context
of an connection manager callback, trigger target session release in this
case by calling srpt_close_ch().
It is incorrect to depend on set_id value to know if counters were
allocated or not. set_id_valid field is set to true when counters
were allocated. Therefore, use set_id_valid while deciding to
free counters.
Cc: <stable@vger.kernel.org> # 4.15 Fixes: aac4492ef23a ("IB/mlx5: Update counter implementation for dual port RoCE") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a process exits without doing proper cleanup, there's a window
where an opencapi device can try to access the memory of the dying
process and may trigger a page fault. That's an expected scenario and
the ocxl driver holds a reference on the mm_struct of the process
until the opencapi device is notified of the process exiting.
However, if mm_users is already at 0, i.e. the address space of the
process has already been destroyed, the driver shouldn't try resolving
the page fault, as it will fail, but it can also try accessing already
freed data.
It is fixed by only calling the bottom half of the page fault handler
if mm_users is greater than 0 and get a reference on mm_users instead
of mm_count. Otherwise, we can safely return a translation fault to
the device, as its associated memory context is being removed. The
opencapi device will be properly cleaned up shortly after when closing
the file descriptors.
Function atomic_inc_unless_negative() returns a bool to indicate
success/failure. However cxl_adapter_context_get() wrongly compares
the return value against '>=0' which will always be true. The patch
fixes this comparison to '==0' there by also fixing this compile time
warning:
drivers/misc/cxl/main.c:290 cxl_adapter_context_get()
warn: 'atomic_inc_unless_negative(&adapter->contexts_num)' is unsigned
Fixes: 70b565bbdb91 ("cxl: Prevent adapter reset if an active context exists") Cc: stable@vger.kernel.org # v4.9+ Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The generic code is racy when multiple children of a PCI bridge try to
enable it simultaneously.
This leads to drivers trying to access a device through a
not-yet-enabled bridge, and this EEH errors under various
circumstances when using parallel driver probing.
There is work going on to fix that properly in the PCI core but it
will take some time.
x86 gets away with it because (outside of hotplug), the BIOS enables
all the bridges at boot time.
This patch does the same thing on powernv by enabling all bridges that
have child devices at boot time, thus avoiding subsequent races. It's
suitable for backporting to stable and distros, while the proper PCI
fix will probably be significantly more invasive.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper
for other platforms") replaced generic pte_access_permitted() by an
arch specific one.
The generic one is defined as
(pte_present(pte) && (!(write) || pte_write(pte)))
The arch specific one is open coded checking that _PAGE_USER and
_PAGE_WRITE (_PAGE_RW) flags are set, but lacking to check that
_PAGE_RO and _PAGE_PRIVILEGED are unset, leading to a useless test
on targets like the 8xx which defines _PAGE_RW and _PAGE_USER as 0.
Commit 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted
for hugetlb access check") replaced some tests performed with
pte helpers by a call to pte_access_permitted(), leading to the same
issue.
This patch rewrites powerpc/nohash pte_access_permitted()
using pte helpers.
Fixes: 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for other platforms") Fixes: 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb access check") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
execute-only key is allocated dynamically. This is a problem. When a
thread implicitly creates an execute-only key, and resets the UAMOR
for that key, the UAMOR value does not percolate to all the other
threads. Any other thread may ignorantly change the permissions on the
key. This can cause the key to be not execute-only for that thread.
Preallocate the execute-only key and ensure that no thread can change
the permission of the key, by resetting the corresponding bit in
UAMOR.
Total number of pkeys calculation is off by 1. Fix it.
Fixes: 4fb158f65ac5 ("powerpc: track allocation status of all pkeys") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Key allocation and deallocation has the side effect of programming the
UAMOR/AMR/IAMR registers. This is wrong, since its the responsibility of
the application and not that of the kernel, to modify the permission on
the key.
Do not modify the pkey registers at key allocation/deallocation.
This patch also fixes a bug where a sys_pkey_free() resets the UAMOR
bits of the key, thus making its permissions unmodifiable from user
space. Later if the same key gets reallocated from a different thread
this thread will no longer be able to change the permissions on the key.
Deny all permissions on all keys, with some exceptions. pkey-0 must
allow all permissions, or else everything comes to a screaching halt.
Execute-only key must allow execute permission.
Currently in a multithreaded application, a key allocated by one
thread is not usable by other threads. By "not usable" we mean that
other threads are unable to change the access permissions for that
key for themselves.
When a new key is allocated in one thread, the corresponding UAMOR
bits for that thread get enabled, however the UAMOR bits for that key
for all other threads remain disabled.
Other threads have no way to set permissions on the key, and the
current default permissions are that read/write is enabled for all
keys, which means the key has no effect for other threads. Although
that may be the desired behaviour in some circumstances, having all
threads able to control their permissions for the key is more
flexible.
The current behaviour also differs from the x86 behaviour, which is
problematic for users.
To fix this, enable the UAMOR bits for all keys, at process
creation (in start_thread(), ie exec time). Since the contents of
UAMOR are inherited at fork, all threads are capable of modifying the
permissions on any key.
This is technically an ABI break on powerpc, but pkey support is fairly
new on powerpc and not widely used, and this brings us into
line with x86.
Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Tested-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[mpe: Reword some of the changelog] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.
Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.
Severe Machine check interrupt [Recovered]
NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
Initiator: CPU
Error type: SLB [Multihit]
Effective address: d00000000ca70000
cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
pc: c0000000009694c0: vsnprintf+0x80/0x480
lr: c0000000009698e0: vscnprintf+0x20/0x60
sp: c0000000fc777830
msr: 8000000002009033
dar: a803a30c000000d0
current = 0xc00000000bc9ef00
paca = 0xc00000001eca5c00 softe: 3 irq_happened: 0x01
pid = 8860, comm = insmod
vscnprintf+0x20/0x60
vprintk_emit+0xb4/0x4b0
vprintk_func+0x5c/0xd0
printk+0x38/0x4c
init_module+0x1c0/0x338 [bork_kernel]
do_one_initcall+0x54/0x230
do_init_module+0x8c/0x248
load_module+0x12b8/0x15b0
sys_finit_module+0xa8/0x110
system_call+0x58/0x6c
--- Exception: c00 (System Call) at 00007fff8bda0644
SP (7fffdfbfe980) is in userspace
The page table fragment allocator uses the main page refcount racily
with respect to speculative references. A customer observed a BUG due
to page table page refcount underflow in the fragment allocator. This
can be caused by the fragment allocator set_page_count stomping on a
speculative reference, and then the speculative failure handler
decrements the new reference, and the underflow eventually pops when
the page tables are freed.
Fix this by using a dedicated field in the struct page for the page
table fragment allocator.
Crash memory ranges is an array of memory ranges of the crashing kernel
to be exported as a dump via /proc/vmcore file. The size of the array
is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS
value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since
commit 142b45a72e22 ("memblock: Add array resizing support").
On large memory systems with a few DLPAR operations, the memblock memory
regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such
systems, registering fadump results in crash or other system failures
like below:
as array index overflow is not checked for while setting up crash memory
ranges causing memory corruption. To resolve this issue, dynamically
allocate memory for crash memory ranges and resize it incrementally,
in units of pagesize, on hitting array size limit.
Fixes: 2df173d9e85d ("fadump: Initialize elfcore header and add PT_LOAD program headers.") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: Just use PAGE_SIZE directly, fixup variable placement] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The split of .system_keyring into .builtin_trusted_keys and
.secondary_trusted_keys broke kexec, thereby preventing kernels signed by
keys which are now in the secondary keyring from being kexec'd.
Fix this by passing VERIFY_USE_SECONDARY_KEYRING to
verify_pefile_signature().
Fixes: d3bfe84129f6 ("certs: Add a secondary system keyring that can be added to dynamically") Signed-off-by: Yannik Sembritzki <yannik@sembritzki.me> Signed-off-by: David Howells <dhowells@redhat.com> Cc: kexec@lists.infradead.org Cc: keyrings@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a potential execution path in which function
platform_get_resource() returns NULL. If this happens,
we will end up having a NULL pointer dereference.
Fix this by replacing devm_ioremap with devm_ioremap_resource,
which has the NULL check and the memory region request.
This code was detected with the help of Coccinelle.
Cc: stable@vger.kernel.org Fixes: f700e84f417b ("mailbox: Add support for APM X-Gene platform mailbox driver") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The v4l uAPI documentation [0] makes clear that in the case of interlaced
video (i.e: field is V4L2_FIELD_ALTERNATE) the height refers to the number
of lines in the field and not the number of lines in the full frame (which
is twice the field height for interlaced formats).
So the original height calculation was correct, and it shouldn't had been
changed by the mentioned commit.
Fixes: 0866df8dffd5 ("[media] tvp5150: fix pad format frame height") Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Cc: <stable@vger.kernel.org> # for v4.12 and up Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Prior to commit 573185cc7e64 ("mmc: core: Invoke sdio func driver's PM
callbacks from the sdio bus"), the MMC core used to call into the power
management functions of SDIO clients itself and removed the card if the
return code was non-zero. IOW, the mmc handled errors gracefully and didn't
upchain them to the pm core.
Since this change, the mmc core relies on generic power management
functions which treat all errors as a reason to cancel the suspend
immediately. This causes suspend attempts to fail when the libertas
driver is loaded.
To fix this, power down the card explicitly in if_sdio_suspend() when we
know we're about to lose power and return success. Also set a flag in these
cases, and power up the card again in if_sdio_resume().
Fixes: 573185cc7e64 ("mmc: core: Invoke sdio func driver's PM callbacks from the sdio bus") Cc: <stable@vger.kernel.org> Signed-off-by: Daniel Mack <daniel@zonque.org> Reviewed-by: Chris Ball <chris@printf.net> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes the BUG_ON spuriously triggering under the following
circumstances:
* reservation_object_reserve_shared is called with shared_count ==
shared_max - 1, so obj->staged is freed in preparation of an in-place
update.
* reservation_object_add_shared_fence is called with the first fence,
after which shared_count == shared_max.
* reservation_object_add_shared_fence is called with a follow-up fence
from the same context.
In the second reservation_object_add_shared_fence call, the BUG_ON
triggers. However, nothing bad would happen in
reservation_object_add_shared_inplace, since both fences are from the
same context, so they only occupy a single slot.
Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).
apparmor_secid_to_secctx() has a bad debug statement tripping on a
condition handle by the code. When kconfig SECURITY_APPARMOR_DEBUG is
enabled the debug WARN_ON will trip when **secdata is NULL resulting
in the following trace.
------------[ cut here ]------------
AppArmor WARN apparmor_secid_to_secctx: ((!secdata)):
WARNING: CPU: 0 PID: 14826 at security/apparmor/secid.c:82 apparmor_secid_to_secctx+0x2b5/0x2f0 security/apparmor/secid.c:82
Kernel panic - not syncing: panic_on_warn set ...
CC: <stable@vger.kernel.org> #4.18 Fixes: c092921219d2 ("apparmor: add support for mapping secids and using secctxes") Reported-by: syzbot+21016130b0580a9de3b5@syzkaller.appspotmail.com Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Several block drivers call alloc_disk() followed by put_disk() if
something fails before device_add_disk() is called without calling
blk_cleanup_queue(). Make sure that also for this scenario a request
queue is dissociated from the cgroup controller. This patch avoids
that loading the parport_pc, paride and pf drivers triggers the
following kernel crash:
BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride]
Read of size 4 at addr 0000000000000008 by task modprobe/744
Call Trace:
dump_stack+0x9a/0xeb
kasan_report+0x139/0x350
pi_init+0x42e/0x580 [paride]
pf_init+0x2bb/0x1000 [pf]
do_one_initcall+0x8e/0x405
do_init_module+0xd9/0x2f2
load_module+0x3ab4/0x4700
SYSC_finit_module+0x176/0x1a0
do_syscall_64+0xee/0x2b0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Reported-by: Alexandru Moise <00moses.alexander00@gmail.com> Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17 Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Alexandru Moise <00moses.alexander00@gmail.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Alexandru Moise <00moses.alexander00@gmail.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Runtime PM isn't ready for blk-mq yet, and commit 765e40b675a9 ("block:
disable runtime-pm for blk-mq") tried to disable it. Unfortunately,
it can't take effect in that way since user space still can switch
it on via 'echo auto > /sys/block/sdN/device/power/control'.
This patch disables runtime-pm for blk-mq really by pm_runtime_disable()
and fixes all kinds of PM related kernel crash.
Cc: Tomas Janousek <tomi@nomi.cz> Cc: Przemek Socha <soprwa@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: <stable@vger.kernel.org> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We find the memory use-after-free issue in __blk_drain_queue()
on the kernel 4.14. After read the latest kernel 4.18-rc6 we
think it has the same problem.
Memory is allocated for q->fq in the blk_init_allocated_queue().
If the elevator init function called with error return, it will
run into the fail case to free the q->fq.
Then the __blk_drain_queue() uses the same memory after the free
of the q->fq, it will lead to the unpredictable event.
The patch is to set q->fq as NULL in the fail case of
blk_init_allocated_queue().
Fixes: commit 7c94e1c157a2 ("block: introduce blk_flush_queue to drive flush machinery") Cc: <stable@vger.kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: xiao jin <jin.xiao@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If __blkdev_issue_discard is in progress and a device mapper device is
reloaded with a table that doesn't support discard,
q->limits.max_discard_sectors is set to zero. This results in infinite
loop in __blkdev_issue_discard.
This patch checks if max_discard_sectors is zero and aborts with
-EOPNOTSUPP.
ondemand_readahead() checks bdi->io_pages to cap the maximum pages
that need to be processed. This works until the readit section. If
we would do an async only readahead (async size = sync size) and
target is at beginning of window we expand the pages by another
get_next_ra_size() pages. Btrace for large reads shows that kernel
always issues a doubled size read at the beginning of processing.
Add an additional check for io_pages in the lower part of the func.
The fix helps devices that hard limit bio pages and rely on proper
handling of max_hw_read_sectors (e.g. older FusionIO cards). For
that reason it could qualify for stable.
Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting") Cc: stable@vger.kernel.org Signed-off-by: Markus Stockhausen stockhausen@collogia.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I have encountered an interrupt storm during the eMMC chip probing (and
the chip finally didn't get detected). It turned out that U-Boot left
the SDHI DMA interrupts enabled while the Linux driver didn't use those.
Masking those interrupts in renesas_sdhi_internal_dmac_request_dma() gets
rid of both issues...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Fixes: 2a68ea7896e3 ("mmc: renesas-sdhi: add support for R-Car Gen3 SDHI DMAC") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The mmc block driver does not support parallel dispatch of requests. In
normal circumstances, all requests are anyway funneled through a single
work item, so parallel dispatch never happens. However it can happen if
there is no elevator.
Fix that by detecting if a dispatch is in progress and returning busy
(BLK_STS_RESOURCE) in that case
The path "spi: cadence: Add usleep_range() for
cdns_spi_fill_tx_fifo()" added a usleep_range() function call,
which cannot be used in atomic context.
However the cdns_spi_fill_tx_fifo() function can be called during
an interrupt which may result in a kernel panic:
This patch replaces the function call with udelay() which can be
used in an atomic context, like an interrupt.
Signed-off-by: Jan Kotas <jank@cadence.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Registers of DSPI should not be accessed before enabling its clock. On
Toradex Colibri VF50 on Iris carrier board this could be seen during
bootup as imprecise abort:
Unhandled fault: imprecise external abort (0x1c06) at 0x00000000
Internal error: : 1c06 [#1] ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.14.39-dirty #97
Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
Backtrace:
[<804166a8>] (regmap_write) from [<80466b5c>] (dspi_probe+0x1f0/0x8dc)
[<8046696c>] (dspi_probe) from [<8040107c>] (platform_drv_probe+0x54/0xb8)
[<80401028>] (platform_drv_probe) from [<803ff53c>] (driver_probe_device+0x280/0x2f8)
[<803ff2bc>] (driver_probe_device) from [<803ff674>] (__driver_attach+0xc0/0xc4)
[<803ff5b4>] (__driver_attach) from [<803fd818>] (bus_for_each_dev+0x70/0xa4)
[<803fd7a8>] (bus_for_each_dev) from [<803fee74>] (driver_attach+0x24/0x28)
[<803fee50>] (driver_attach) from [<803fe980>] (bus_add_driver+0x1a0/0x218)
[<803fe7e0>] (bus_add_driver) from [<803fffe8>] (driver_register+0x80/0x100)
[<803fff68>] (driver_register) from [<80400fdc>] (__platform_driver_register+0x48/0x50)
[<80400f94>] (__platform_driver_register) from [<8091cf7c>] (fsl_dspi_driver_init+0x1c/0x20)
[<8091cf60>] (fsl_dspi_driver_init) from [<8010195c>] (do_one_initcall+0x4c/0x174)
[<80101910>] (do_one_initcall) from [<80900e8c>] (kernel_init_freeable+0x144/0x1d8)
[<80900d48>] (kernel_init_freeable) from [<805ff6a8>] (kernel_init+0x10/0x114)
[<805ff698>] (kernel_init) from [<80107be8>] (ret_from_fork+0x14/0x2c)
Cc: <stable@vger.kernel.org> Fixes: 5ee67b587a2b ("spi: dspi: clear SPI_SR before enable interrupt") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Intel Ice Lake SPI host controller follows the Intel Cannon Lake but the
PCI IDs are different. Add the new PCI IDs to the driver supported
devices list.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The zero-copy optimization when reading or writing large chunks of data
is quite useful. However, the 9p messages created through the zero-copy
write path have an incorrect message size: it should be the size of the
header + size of the data being written but instead it's just the size
of the header.
This only works if the server ignores the size field of the message and
otherwise breaks the framing of the protocol. Fix this by re-writing the
message size field with the correct value.
Tested by running `dd if=/dev/zero of=out bs=4k count=1` inside a
virtio-9p mount.
This patch fixes patch add handling to take care tail and headroom for
single 6lowpan frames. We need to be sure we have a skb with the right
head and tailroom for single frames. This patch do it by using
skb_copy_expand() if head and tailroom is not enough allocated by upper
layer.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195059 Reported-by: David Palma <david.palma@ntnu.no> Reported-by: Rabi Narayan Sahoo <rabinarayans0828@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, the parallelized initialization of expedited grace periods uses
the workqueue associated with each rcu_node structure's ->grplo field.
This works fine unless that CPU is offline. This commit therefore uses
the CPU corresponding to the lowest-numbered online CPU, or just queues
the work on WORK_CPU_UNBOUND if there are no online CPUs corresponding
to this rcu_node structure.
Note that this patch uses cpu_is_offline() instead of the usual approach
of checking bits in the rcu_node structure's ->qsmaskinitnext field. This
is safe because preemption is disabled across both the cpu_is_offline()
check and the call to queue_work_on().
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
[ paulmck: Disable preemption to close offline race window. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply Peter Zijlstra feedback on CPU selection. ] Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show_opcodes() is used both for dumping kernel instructions and for dumping
user instructions. If userspace causes #PF by jumping to a kernel address,
show_opcodes() can be reached with regs->ip controlled by the user,
pointing to kernel code. Make sure that userspace can't trick us into
dumping kernel memory into dmesg.
Like d88b6d04: "cdrom: information leak in cdrom_ioctl_media_changed()"
There is another cast from unsigned long to int which causes
a bounds check to fail with specially crafted input. The value is
then used as an index in the slot array in cdrom_slot_status().
Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Scott Bauer <sbauer@plzdonthack.me> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some architectures need to use stop_machine() to patch functions for
ftrace, and the assumption is that the stopped CPUs do not make function
calls to traceable functions when they are in the stopped state.
Commit ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after
MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
the stopped CPUs and those functions lack notrace annotations. This
leads to crashes when enabling/disabling ftrace on ARM kernels built
with the Thumb-2 instruction set.
Fix it by adding the necessary notrace annotations.
Fixes: ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: oleg@redhat.com Cc: tj@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We did have sporadic problems in the pinctrl framework during boot
where a pin group name unexpectedly became NULL leading to a NULL
dereference in strcmp.
Detailled analysis of the failing cases did reveal that there were
two devm allocated objects close to each other. The second one was
the affected group_desc in pinmux and the first one was the
psy_desc->properties buffer of the gab driver.
Review of the gab code showed that the address calculation for
one memcpy() is wrong. It does
properties + sizeof(type) * index
but C is defined to do the index multiplication already for
pointer + integer additions. Hence the factor was applied twice
and the memcpy() does write outside of the properties buffer.
Sometimes it happened to be the pinctrl and triggered the strcmp(NULL).
Anyways, it is overkill to use a memcpy() here instead of a simple
assignment, which is easier to read and has less risk for wrong
address calculations. So we change code to a simple assignment.
If we initialize the index to the first free location, we can even
remove the local variable 'properties'.
This bug seems to exist right from the beginning in 3.7-rc1 in
commit e60fea794e6e ("power: battery: Generic battery driver using IIO")
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: stable@vger.kernel.org Fixes: e60fea794e6e ("power: battery: Generic battery driver using IIO") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>