This commit makes it possible to use devlink to split the 100G CXP
Netronome into two 40G interfaces. Currently when you ask for 2
interfaces, the math in src/nfp_devlink.c:nfp_devlink_port_split
calculates that you want 5 lanes per port because for some reason
eth_port.port_lanes=10 (shouldn't this be 12 for CXP?). What we really
want when asking for 2 breakout interfaces is 4 lanes per port. This
commit makes that happen by calculating based on 8 lanes if 10 are
present.
Signed-off-by: Ryan C Goodfellow <rgoodfel@isi.edu> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Greg Weeks <greg.weeks@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The matches() routine for a capability must honor the "scope"
passed to it and return the proper results.
i.e, when passed with SCOPE_LOCAL_CPU, it should check the
status of the capability on the current CPU. This is used by
verify_local_cpu_capabilities() on a late secondary CPU to make
sure that it's compliant with the established system features.
However, ARM64_HAS_CACHE_{IDC/DIC} always checks the system wide
registers and this could mean that a late secondary CPU could return
"true" (since the CPU hasn't updated the system wide registers yet)
and thus lead the system in an inconsistent state, where
the system assumes it has IDC/DIC feature, while the new CPU
doesn't.
Fixes: commit 6ae4b6e0578886eb36 ("arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC") Cc: Philip Elcan <pelcan@codeaurora.org> Cc: Shanker Donthineni <shankerd@codeaurora.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we fail to allocate the request queue for a disk, we still need to
free that disk, not just the previous ones. Additionally, we need to
cleanup the previous request queues.
Move queue allocation next to disk allocation to fix a couple of issues:
- If add_disk() hasn't been called, we should clear disk->queue before
calling put_disk().
- If we fail to allocate a request queue, we still need to put all of
the disks, not just the ones that we allocated queues for.
The commit 2eb0f624b709 ("netfilter: add NAT support for shifted
portmap ranges") did not set the checkentry/destroy callbacks for
the newly added DNAT target. As a result, rulesets using only
such nat targets are not effective, as the relevant conntrack hooks
are not enabled.
The above affect also nft_compat rulesets.
Fix the issue adding the missing initializers.
Fixes: 2eb0f624b709 ("netfilter: add NAT support for shifted portmap ranges") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was found that when debug_locks was turned off because of a problem
found by the lockdep code, the system performance could drop quite
significantly when the lock_stat code was also configured into the
kernel. For instance, parallel kernel build time on a 4-socket x86-64
server nearly doubled.
Further analysis into the cause of the slowdown traced back to the
frequent call to debug_locks_off() from the __lock_acquired() function
probably due to some inconsistent lockdep states with debug_locks
off. The debug_locks_off() function did an unconditional atomic xchg
to write a 0 value into debug_locks which had already been set to 0.
This led to severe cacheline contention in the cacheline that held
debug_locks. As debug_locks is being referenced in quite a few different
places in the kernel, this greatly slow down the system performance.
To prevent that trashing of debug_locks cacheline, lock_acquired()
and lock_contended() now checks the state of debug_locks before
proceeding. The debug_locks_off() function is also modified to check
debug_locks before calling __debug_locks_off().
Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We observed that packets and bytes count are not reset
when user performs interface down. Eventually, tx queue is
exhausted and packets will not be sent out.
To avoid this problem, resets tx queue in ndo_stop.
Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver") Signed-off-by: Masahisa Kojima <masahisa.kojima@linaro.org> Signed-off-by: Yoshitoyo Osaki <osaki.yoshitoyo@socionext.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes a general protection fault, caused by accessing the contents
of a flip_done completion object that has already been freed. It occurs
due to the preemption of a non-blocking commit worker thread W by
another commit thread X. X continues to clear its atomic state at the
end, destroying the CRTC commit object that W still needs. Switching
back to W and accessing the commit objects then leads to bad results.
Worker W becomes preemptable when waiting for flip_done to complete. At
this point, a frequently occurring commit thread X can take over. Here's
an example where W is a worker thread that flips on both CRTCs, and X
does a legacy cursor update on both CRTCs:
...
1. W does flip work
2. W runs commit_hw_done()
3. W waits for flip_done on CRTC 1
4. > flip_done for CRTC 1 completes
5. W finishes waiting for CRTC 1
6. W waits for flip_done on CRTC 2
7. > Preempted by X
8. > flip_done for CRTC 2 completes
9. X atomic_check: hw_done and flip_done are complete on all CRTCs
10. X updates cursor on both CRTCs
11. X destroys atomic state
12. X done
13. > Switch back to W
14. W waits for flip_done on CRTC 2
15. W raises general protection fault
Note that i915 has this issue masked, since hw_done is signaled after
waiting for flip_done. Doing so will block the cursor update from
happening until hw_done is signaled, preventing the cursor commit from
destroying the state.
v2: The reference on the commit object needs to be obtained before
hw_done() is signaled, since that's the point where another commit
is allowed to modify the state. Assuming that the
new_crtc_state->commit object still exists within flip_done() is
incorrect.
Fix by getting a reference in setup_commit(), and releasing it
during default_clear().
Similar to d49c88d7677b ("r8169: Enable MSI-X on RTL8106e") after e9d0ba506ea8 ("PCI: Reprogram bridge prefetch registers on resume")
we can safely assume that this also fixes the root cause of
the issue worked around by 7c53a722459c ("r8169: don't use MSI-X on
RTL8168g"). So let's revert it.
Fixes: 7c53a722459c ("r8169: don't use MSI-X on RTL8168g") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/net/ethernet/qlogic/qla3xxx.c:384:24: warning: signed shift
result (0xF00000000) requires 37 bits to represent, but 'int' only has
32 bits [-Wshift-overflow]
((ISP_NVRAM_MASK << 16) | qdev->eeprom_cmd_data));
~~~~~~~~~~~~~~ ^ ~~
1 warning generated.
The warning is certainly accurate since ISP_NVRAM_MASK is defined as
(0x000F << 16) which is then shifted by 16, resulting in 64424509440,
well above UINT_MAX.
Given that this is the only location in this driver where ISP_NVRAM_MASK
is shifted again, it seems likely that ISP_NVRAM_MASK was originally
defined without a shift and during the move of the shift to the
definition, this statement wasn't properly removed (since ISP_NVRAM_MASK
is used in the statenent right above this). Only the maintainers can
confirm this since this statment has been here since the driver was
first added to the kernel.
Link: https://github.com/ClangBuiltLinux/linux/issues/127 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the function name for an inline frame is invalid, we must not try
to demangle this symbol, otherwise we crash with:
#0 0x0000555555895c01 in bfd_demangle ()
#1 0x0000555555823262 in demangle_sym (dso=0x555555d92b90, elf_name=0x0, kmodule=0) at util/symbol-elf.c:215
#2 dso__demangle_sym (dso=dso@entry=0x555555d92b90, kmodule=<optimized out>, kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400
#3 0x00005555557fef4b in new_inline_sym (funcname=0x0, base_sym=0x555555d92b90, dso=0x555555d92b90) at util/srcline.c:89
#4 inline_list__append_dso_a2l (dso=dso@entry=0x555555c7bb00, node=node@entry=0x555555e31810, sym=sym@entry=0x555555d92b90) at util/srcline.c:264
#5 0x00005555557ff27f in addr2line (dso_name=dso_name@entry=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", addr=addr@entry=2888, file=file@entry=0x0,
line=line@entry=0x0, dso=dso@entry=0x555555c7bb00, unwind_inlines=unwind_inlines@entry=true, node=0x555555e31810, sym=0x555555d92b90) at util/srcline.c:313
#6 0x00005555557ffe7c in addr2inlines (sym=0x555555d92b90, dso=0x555555c7bb00, addr=2888, dso_name=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf")
at util/srcline.c:358
So instead handle the case where we get invalid function names for
inlined frames and use a fallback '??' function name instead.
While this crash was originally reported by Hadrien for rust code, I can
now also reproduce it with trivial C++ code. Indeed, it seems like
libbfd fails to interpret the debug information for the inline frame
symbol name:
$ addr2line -e /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if b48
main
/usr/include/c++/8.2.1/complex:610
??
/usr/include/c++/8.2.1/complex:618
??
/usr/include/c++/8.2.1/complex:675
??
/usr/include/c++/8.2.1/complex:685
main
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
I've reported this bug upstream and also attached a patch there which
should fix this issue:
Reported-by: Hadrien Grasland <grasland@lal.in2p3.fr> Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: a64489c56c30 ("perf report: Find the inline stack for a given address")
[ The above 'Fixes:' cset is where originally the problem was
introduced, i.e. using a2l->funcname without checking if it is NULL,
but this current patch fixes the current codebase, i.e. multiple csets
were applied after a64489c56c30 before the problem was reported by Hadrien ] Link: http://lkml.kernel.org/r/20180926135207.30263-3-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The size of the resulting cpu map can be smaller than a multiple of
sizeof(u64), resulting in SIGBUS on cpus like Sparc as the next event
will not be aligned properly.
Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Fixes: 6c872901af07 ("perf cpu_map: Add cpu_map event synthesize function") Link: http://lkml.kernel.org/r/20181011.224655.716771175766946817.davem@davemloft.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a build is run from something like a cron job, the user's $PATH is
rather minimal, of note, not including /usr/sbin in my own case. Because
of that, an automated rpm package build ultimately fails to find
libperf-jvmti.so, because somewhere within the build, this happens...
/bin/sh: alternatives: command not found
/bin/sh: alternatives: command not found
Makefile.config:849: No openjdk development package found, please install
JDK package, e.g. openjdk-8-jdk, java-1.8.0-openjdk-devel
...and while the build continues, libperf-jvmti.so isn't built, and
things fall down when rpm tries to find all the %files specified. Exact
same system builds everything just fine when the job is launched from a
login shell instead of a cron job, since alternatives is in $PATH, so
openjdk is actually found.
The test required to get into this section of code actually specifies
the full path, as does a block just above it, so let's do that here too.
Signed-off-by: Jarod Wilson <jarod@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: William Cohen <wcohen@redhat.com> Fixes: d4dfdf00d43e ("perf jvmti: Plug compilation into perf build") Link: http://lkml.kernel.org/r/20180906221812.11167-1-jarod@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We synthesize an update event that needs to touch the evsel id array, which is
not defined at that time. Fixing this by forcing the id allocation for events
with their own cpus.
Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxarm@huawei.com Fixes: bfd8f72c2778 ("perf record: Synthesize unit/scale/... in event update") Link: http://lkml.kernel.org/r/20181003212052.GA32371@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously when populating the set ipv6 address action, we incorrectly
made use of pedit's key index to determine which 32bit word should be
set. We now calculate which word has been selected based on the offset
provided by the pedit action.
Fixes: 354b82bb320e ("nfp: add set ipv6 source and destination address") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously we did not correctly change headers when using multiple
pedit actions with partial masks. We now take this into account and
no longer just commit the last pedit action.
Fixes: c0b1bd9a8b8a ("nfp: add set ipv4 header action flower offload") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Originally, we have an issue where r8169 MSI-X interrupt is broken after
S3 suspend/resume on RTL8106e of ASUS X441UAR.
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast
Ethernet controller [1043:200f]
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at e000 [size=256]
Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
Memory at e0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 01-00-00-00-36-4c-e0-00
Capabilities: [170] Latency Tolerance Reporting
Kernel driver in use: r8169
Kernel modules: r8169
We found the all of the values in PCI BAR=4 of the ethernet adapter
become 0xFF after system resumes. That breaks the MSI-X interrupt.
Therefore, we can only fall back to MSI interrupt to fix the issue at
that time.
However, there is a commit which resolves the drivers getting nothing in
PCI BAR=4 after system resumes. It is 04cb3ae895d7 "PCI: Reprogram
bridge prefetch registers on resume" by Daniel Drake.
After apply the patch, the ethernet adapter works fine before suspend
and after resume. So, we can revert the workaround after the commit
"PCI: Reprogram bridge prefetch registers on resume" is merged into main
tree.
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=201181 Fixes: 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e") Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael reported that he could not stat following event:
$ perf stat -e unc_p_freq_ge_1200mhz_cycles -a -- ls
event syntax error: '..e_1200mhz_cycles'
\___ value too big for format, maximum is 255
Run 'perf list' for a list of valid events
The event is unwrapped into:
uncore_pcu/event=0xb,filter_band0=1200/
where filter_band0 format says it's one byte only:
The XSKMAP update and delete functions called synchronize_net(), which
can sleep. It is not allowed to sleep during an RCU read section.
Instead we need to make sure that the sock sk_destruct (xsk_destruct)
function is asynchronously called after an RCU grace period. Setting
the SOCK_RCU_FREE flag for XDP sockets takes care of this.
Fixes: fbfc504a24f5 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael reported an issue with oversized terms values assignment
and I noticed there was actually a misunderstanding of the max
value check in the past.
The above commit's changelog says:
If bit 21 is set, there is parsing issues as below.
$ perf stat -a -e uncore_qpi_0/event=0x200002,umask=0x8/
event syntax error: '..pi_0/event=0x200002,umask=0x8/'
\___ value too big for format, maximum is 511
But there's no issue there, because the event value is distributed
along the value defined by the format. Even if the format defines
separated bit, the value is treated as a continual number, which
should follow the format definition.
In above case it's 9-bit value with last bit separated:
$ cat uncore_qpi_0/format/event
config:0-7,21
Hence the value 0x200002 is correctly reported as format violation,
because it exceeds 9 bits. It should have been 0x102 instead, which
sets the 9th bit - the bit 21 of the format.
$ perf stat -vv -a -e uncore_qpi_0/event=0x102,umask=0x8/
Using CPUID GenuineIntel-6-2D
...
------------------------------------------------------------
perf_event_attr:
type 10
size 112
config 0x200802
sample_type IDENTIFIER
...
Reported-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: ac0e2cd55537 ("perf tools: Fix PMU term format max value calculation") Link: http://lkml.kernel.org/r/20181003072046.29276-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code had been clearing a namespace being deleted as the current
path while that namespace was still in the path siblings list. It is
possible a new IO could set that namespace back to the current path
since it appeared to be an eligable path to select, which may result in
a use-after-free error.
This patch ensures a namespace being removed is not eligable to be reset
as a current path prior to clearing it as the current path.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drm_mode_setcrtc() retries modesetting in case one of the functions it
calls returns -EDEADLK. connector_set, mode and fb are freed before
retrying, but they are not set to NULL. This can cause
drm_mode_setcrtc() to use those variables.
For example: On the first try __drm_mode_set_config_internal() returns
-EDEADLK. connector_set, mode and fb are freed. Next retry starts, and
drm_modeset_lock_all_ctx() returns -EDEADLK, and we jump to 'out'. The
code will happily try to release all three again.
This leads to crashes of different kinds, depending on the sequence the
EDEADLKs happen.
Fix this by setting the three variables to NULL at the start of the
retry loop.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20180917110054.4053-1-tomi.valkeinen@ti.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
PMIC_IRQB and PMIC_KEYINB lines on Exynos4210-based Origen board have
external pull-up resistors, so disable any pull control for those lines
in respective pin controller node. This fixes support for MAX8997
interrupts and enables operation of wakeup from MAX8997 RTC alarm.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: 17419726aaa1 ("ARM: dts: add max8997 device node for exynos4210-origen board") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Future Intel processors will support "Enhanced IBRS" which is an "always
on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
disabled.
From the specification [1]:
"With enhanced IBRS, the predicted targets of indirect branches
executed cannot be controlled by software that was executed in a less
privileged predictor mode or on another logical processor. As a
result, software operating on a processor with enhanced IBRS need not
use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
privileged predictor mode. Software can isolate predictor modes
effectively simply by setting the bit once. Software need not disable
enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."
If Enhanced IBRS is supported by the processor then use it as the
preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
Retpoline white paper [2] states:
"Retpoline is known to be an effective branch target injection (Spectre
variant 2) mitigation on Intel processors belonging to family 6
(enumerated by the CPUID instruction) that do not have support for
enhanced IBRS. On processors that support enhanced IBRS, it should be
used for mitigation instead of retpoline."
The reason why Enhanced IBRS is the recommended mitigation on processors
which support it is that these processors also support CET which
provides a defense against ROP attacks. Retpoline is very similar to ROP
techniques and might trigger false positives in the CET defense.
If Enhanced IBRS is selected as the mitigation technique for spectre v2,
the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
cleared. Kernel also has to make sure that IBRS bit remains set after
VMEXIT because the guest might have cleared the bit. This is already
covered by the existing x86_spec_ctrl_set_guest() and
x86_spec_ctrl_restore_host() speculation control functions.
Enhanced IBRS still requires IBPB for full mitigation.
[1] Speculative-Execution-Side-Channel-Mitigations.pdf
[2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
Both documents are available at:
https://bugzilla.kernel.org/show_bug.cgi?id=199511
Originally-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tim C Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1533148945-24095-1-git-send-email-sai.praneeth.prakhya@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Disable preemption during CR3 reset / __flush_tlb_all() and add a comment
why preemption has to be disabled so it won't be removed accidentaly.
Add another preemptible() check in __flush_tlb_all() to catch callers with
enabled preemption when PGE is enabled, because PGE enabled does not
trigger the warning in __native_flush_tlb(). Suggested by Andy Lutomirski.
Fixes: decab0888e6e ("x86/mm: Remove preempt_disable/enable() from __native_flush_tlb()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181017103432.zgv46nlu3hc7k4rq@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
memory_corruption_check[{_period|_size}]()'s handlers do not check input
argument before passing it to kstrtoul() or simple_strtoull(). The argument
would be a NULL pointer if each of the kernel parameters, without its
value, is set in command line and thus cause the following panic.
The boot loader version reported via sysfs is wrong in case of the
kernel being booted via the Xen PVH boot entry. it should be 2.12
(0x020c), but it is reported to be 2.18 (0x0212).
As the current way to set the version is error prone use the more
readable variant (2 << 8) | 12.
STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
(once enabled) prevents cross-hyperthread control of decisions made by
indirect branch predictors.
Enable this feature if
- the CPU is vulnerable to spectre v2
- the CPU supports SMT and has SMT siblings online
- spectre_v2 mitigation autoselection is enabled (default)
After some previous discussion, this leaves STIBP on all the time, as wrmsr
on crossing kernel boundary is a no-no. This could perhaps later be a bit
more optimized (like disabling it in NOHZ, experiment with disabling it in
idle, etc) if needed.
Note that the synchronization of the mask manipulation via newly added
spec_ctrl_mutex is currently not strictly needed, as the only updater is
already being serialized by cpu_add_remove_lock, but let's make this a
little bit more future-proof.
The Creative Audigy SE (SB0570) card currently exhibits an audible pop
whenever playback is stopped or resumed, or during silent periods of an
audio stream. Initialise the IZD bit to the 0 to eliminate these pops.
The Infinite Zero Detection (IZD) feature on the DAC causes the output
to be shunted to Vcap after 2048 samples of silence. This discharges the
AC coupling capacitor through the output and causes the aforementioned
pop/click noise.
The behaviour of the IZD bit is described on page 15 of the WM8768GEDS
datasheet: "With IZD=1, applying MUTE for 1024 consecutive input samples
will cause all outputs to be connected directly to VCAP. This also
happens if 2048 consecutive zero input samples are applied to all 6
channels, and IZD=0. It will be removed as soon as any channel receives
a non-zero input". I believe the second sentence might be referring to
IZD=1 instead of IZD=0 given the observed behaviour of the card.
This change should make the DAC initialisation consistent with
Creative's Windows driver, as this popping persists when initialising
the card in Linux and soft rebooting into Windows, but is not present on
a cold boot to Windows.
The front MIC on the Lenovo M715 can't record sound, after applying
the ALC294_FIXUP_LENOVO_MIC_LOCATION, the problem is fixed. So add
the pin configuration of this machine to the pin quirk table.
BIOS on ASUS G751 doesn't seem to map the headphone pin (NID 0x16)
correctly. Add a quirk to address it, as well as chaining to the
previous fix for the microphone.
In the C-code we need to put the physical address of the hpmc handler in
the interrupt vector table (IVA) in order to get HPMCs working. Since
on parisc64 function pointers are indirect (in fact they are function
descriptors) we instead export the address as variable and not as
function.
This reverts a small part of commit f39cce654f9a ("parisc: Add
cfi_startproc and cfi_endproc to assembly code").
Fix a long-existing small nasty bug in the map_pages() implementation which
leads to overwriting already written pte entries with zero, *if* map_pages() is
called a second time with an end address which isn't aligned on a pmd boundry.
This happens for example if we want to remap only the text segment read/write
in order to run alternative patching on the code. Exiting the loop when we
reach the end address fixes this.
Helge noticed that the address of the os_hpmc handler was not being
correctly calculated in the hpmc macro. As a result, PDCE_CHECK would
fail to call os_hpmc:
The Address Range Scrub implementation tried to skip running scrubs
against ranges that were already scrubbed by the BIOS. Unfortunately
that support also resulted in early scrub completions as evidenced by
this debug output from nfit_test:
nd_region region9: ARS: range 1 short complete
nd_region region3: ARS: range 1 short complete
nd_region region4: ARS: range 2 ARS start (0)
nd_region region4: ARS: range 2 short complete
...i.e. completions without any indications that the scrub was started.
This state of affairs was hard to see in the code due to the
proliferation of state bits and mistakenly trying to track done state
per-range when the completion is a global property of the bus.
So, kill the four ARS state bits (ARS_REQ, ARS_REQ_REDO, ARS_DONE, and
ARS_SHORT), and replace them with just 2 request flags ARS_REQ_SHORT and
ARS_REQ_LONG. The implementation will still complete and reap the
results of BIOS initiated ARS, but it will not attempt to use that
information to affect the completion status of scrubbing the ranges from
a Linux perspective.
Instead, try to synchronously run a short ARS per range at init time and
schedule a long scrub in the background. If ARS is busy with an ARS
request, schedule both a short and a long scrub for when ARS returns to
idle. This logic also satisfies the intent of what ARS_REQ_REDO was
trying to achieve. The new rule is that the REQ flag stays set until the
next successful ars_start() for that range.
With the new policy that the REQ flags are not cleared until the next
start, the implementation no longer loses requests as can be seen from
the following log:
nd_region region3: ARS: range 1 ARS start short (0)
nd_region region9: ARS: range 1 ARS start short (0)
nd_region region3: ARS: range 1 complete
nd_region region4: ARS: range 2 ARS start short (0)
nd_region region9: ARS: range 1 complete
nd_region region9: ARS: range 1 ARS start long (0)
nd_region region4: ARS: range 2 complete
nd_region region3: ARS: range 1 ARS start long (0)
nd_region region9: ARS: range 1 complete
nd_region region3: ARS: range 1 complete
nd_region region4: ARS: range 2 ARS start long (0)
nd_region region4: ARS: range 2 complete
...note that the nfit_test emulated driver provides 2 buses, that is why
some of the range indices are duplicated. Notice that each range
now successfully completes a short and long scrub.
Cc: <stable@vger.kernel.org> Fixes: 14c73f997a5e ("nfit, address-range-scrub: introduce nfit_spa->ars_state") Fixes: cc3d3458d46f ("acpi/nfit: queue issuing of ars when an uc error...") Reported-by: Jacek Zloch <jacek.zloch@intel.com> Reported-by: Krzysztof Rusocki <krzysztof.rusocki@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
acpi_pcc_probe() calls acpi_table_parse_entries_array() but fails
to check for an error return. This in turn can result in calling
kcalloc() with a negative count as well as emitting the following
misleading erorr message:
[ 2.642015] Could not allocate space for PCC mbox channels
Fixes: 8f8027c5f935 (mailbox: PCC: erroneous error message when parsing ACPI PCCT) Signed-off-by: David Arcari <darcari@redhat.com> Reviewed-by: Al Stone <ahs3@redhat.com> Cc: 4.18+ <stable@vger.kernel.org> # 4.18+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
a19b2e3d7839 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized kprobes”)
removed local_irq_save/restore() from optimized_callback(), the handler
might be interrupted by the rescheduling interrupt and might be
rescheduled - so we must not use the preempt_enable_no_resched() macro.
Use preempt_enable() instead, to not lose preemption events.
AML opcodes come in two lengths: 1-byte opcodes and 2-byte, extended opcodes.
If an error occurs due to illegal opcodes during table load, the AML parser
needs to continue loading the table. In order to do this, it needs to skip
parsing of the offending opcode and operands associated with that opcode.
This change fixes the AML parse loop to correctly skip parsing of incorrect
extended opcodes. Previously, only the short opcodes were skipped correctly.
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The table load process omitted adding the operation region address
range to the global list. This omission is problematic because the OS
queries the global list to check for address range conflicts before
deciding which drivers to load. This commit may result in warning
messages that look like the following:
[ 7.871761] ACPI Warning: system_IO range 0x00000428-0x0000042F conflicts with op_region 0x00000400-0x0000047F (\PMIO) (20180531/utaddress-213)
[ 7.871769] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
However, these messages do not signify regressions. It is a result of
properly adding address ranges within the global address list.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200011 Tested-by: Jean-Marc Lenoir <archlinux@jihemel.com> Signed-off-by: Erik Schmauss <erik.schmauss@intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since acpi_os_get_timer() may be called after the timer subsystem has
been suspended, use the jiffies counter instead of ktime_get(). This
patch avoids that the following warning is reported during hibernation:
Currently, "disable_clkrun" yenta_socket module parameter is only
implemented for TI CardBus bridges.
Add also an implementation for Ricoh bridges that have the necessary
setting documented in publicly available datasheets.
Tested on a RL5C476II with a Sunrich C-160 CardBus NIC that doesn't work
correctly unless the CLKRUN protocol is disabled.
Let's also make it clear in its description that the "disable_clkrun"
module parameter only works on these two previously mentioned brands of
CardBus bridges.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Cc: stable@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
early_cma does not check input argument before passing it to
simple_strtoull. The argument would be a NULL pointer if "cma", without
its value, is set in command line and thus causes the following panic.
If the policy limits change between invocations of cs_dbs_update(),
the requested frequency value stored in dbs_info may not be updated
and the function may use a stale value of it next time. Moreover, if
idle periods are takem into account by cs_dbs_update(), the requested
frequency value stored in dbs_info may be below the min policy limit,
which is incorrect.
To fix these problems, always update the requested frequency value
in dbs_info along with the local copy of it when the previous
requested frequency is beyond the policy limits and avoid decreasing
the requested frequency below the min policy limit when taking
idle periods into account.
Fixes: abb6627910a1 (cpufreq: conservative: Fix next frequency selection) Fixes: 00bfe05889e9 (cpufreq: conservative: Decrease frequency faster for deferred updates) Reported-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
blk_queue_split() does respect this limit via bio splitting, so no
need to do that in blkdev_issue_discard(), then we can align to
normal bio submit(bio_add_page() & submit_bio()).
More importantly, this patch fixes one issue introduced in a22c4d7e34402cc
("block: re-add discard_granularity and alignment checks"), in which
zero discard bio may be generated in case of zero alignment.
Fixes: a22c4d7e34402ccdf3 ("block: re-add discard_granularity and alignment checks") Cc: stable@vger.kernel.org Cc: Ming Lin <ming.l@ssi.samsung.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Xiao Ni <xni@redhat.com> Tested-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When an invalid mount option is passed to jffs2, jffs2_parse_options()
will fail and jffs2_sb_info will be freed, but then jffs2_sb_info will
be used (use-after-free) and freeed (double-free) in jffs2_kill_sb().
Fix it by removing the buggy invocation of kfree() when getting invalid
mount options.
Fixes: 92abc475d8de ("jffs2: implement mount option parsing and compression overriding") Cc: stable@kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Devices with compatible="pmbus" field have zero initial page count,
and pmbus_clear_faults() being called before the page count auto-
detection does not actually clear faults because it depends on the
page count. Non-cleared faults in its turn may fail the subsequent
page count auto-detection.
This patch fixes this problem by calling pmbus_clear_fault_page()
for currently set page and calling pmbus_clear_faults() after the
page count was detected.
refill->end record the last key of writeback, for example, at the first
time, keys (1,128K) to (1,1024K) are flush to the backend device, but
the end key (1,1024K) is not included, since the bellow code:
if (bkey_cmp(k, refill->end) >= 0) {
ret = MAP_DONE;
goto out;
}
And in the next time when we refill writeback keybuf again, we searched
key start from (1,1024K), and got a key bigger than it, so the key
(1,1024K) missed.
This patch modify the above code, and let the end key to be included to
the writeback key buffer.
When bcache device is clean, dirty keys may still exist after
journal replay, so we need to count these dirty keys even
device in clean status, otherwise after writeback, the amount
of dirty data would be incorrect.
Missed reading IOs are identified by s->cache_missed, not the
s->cache_miss, so in trace_bcache_read() using trace_bcache_read
to identify whether the IO is missed or not.
During implementation of the new API bcm_qspi_bspi_set_flex_mode() has
been modified breaking calculation of address length. An unnecessary
multiplication was added breaking flash reads.
Fixes: 5f195ee7d830 ("spi: bcm-qspi: Implement the spi_mem interface") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixing/optimizing bcm_qspi_bspi_read() performance introduced two
changes:
1) It added a loop to read all requested data using multiple BSPI ops.
2) It bumped max size of a single BSPI block request from 256 to 512 B.
The later change resulted in occasional BSPI timeouts causing a
regression.
For some unknown reason hardware doesn't always handle reads as expected
when using 512 B chunks. In such cases it may happen that BSPI returns
amount of requested bytes without the last 1-3 ones. It provides the
remaining bytes later but doesn't raise an interrupt until another LR
start.
Switching back to 256 B reads fixes that problem and regression.
We need that to adjust the len of the 2nd transfer (called data in
spi-mem) if it's too long to fit in a SPI message or SPI transfer.
Fixes: c36ff266dc82 ("spi: Extend the core to ease integration of SPI memory controllers") Cc: <stable@vger.kernel.org> Signed-off-by: Chuanhua Han <chuanhua.han@nxp.com> Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fsl_qspi_get_seqid() may return -EINVAL, but fsl_qspi_init_ahb_read()
doesn't check for error codes with the result that -EINVAL could find
itself signalled over the bus.
In conjunction with the LS1046A SoC's A-009283 errata
("Illegal accesses to SPI flash memory can result in a system hang")
this illegal access to SPI flash memory results in a system hang
if userspace attempts reading later on.
Avoid this by always checking fsl_qspi_get_seqid()'s return value
and bail out otherwise.
Intel Ice Lake exposes the SPI serial flash controller as a PCI device
in the same way than Intel Denverton. Add Ice Lake SPI serial flash PCI
ID to the driver list of supported devices.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Marek Vasut <marek.vasut@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the size of spi-nor flash is larger than 16MB, the read_opcode
is set to SPINOR_OP_READ_1_1_4_4B, and fsl_qspi_get_seqid() will
return -EINVAL when cmd is SPINOR_OP_READ_1_1_4_4B. This can
cause read operation fail.
With the current implementation, the complete() in the IRQ handler is
supposed to be called only if the register status has one or the other
RDY bit set. Other events might trigger an interrupt as well if
enabled, but should not end-up with a complete() call.
For this purpose, the code was checking if the other bits were set, in
this case complete() was not called. This is wrong as two events might
happen in a very tight time-frame and if the NDSR status read reports
two bits set (eg. RDY(0) and RDDREQ) at the same time, complete() was
not called.
This logic would lead to timeouts in marvell_nfc_wait_op() and has
been observed on PXA boards (NFCv1) in the Hamming write path.
We already build the swiotlb code for 32-bit kernels with PAE support,
but the code to actually use swiotlb has only been enabled for 64-bit
kernels for an unknown reason.
Before Linux v4.18 we paper over this fact because the networking code,
the SCSI layer and some random block drivers implemented their own
bounce buffering scheme.
[ mingo: Changelog fixes. ]
Fixes: 21e07dba9fb1 ("scsi: reduce use of block bounce buffers") Fixes: ab74cfebafa3 ("net: remove the PCI_DMA_BUS_IS_PHYS check in illegal_highdma") Reported-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Matthew Whitehead <tedheadster@gmail.com> Cc: konrad.wilk@oracle.com Cc: iommu@lists.linux-foundation.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181014075208.2715-1-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Clang warns that the declaration of jiffies in include/linux/jiffies.h
doesn't match the definition in arch/x86/time/kernel.c:
arch/x86/kernel/time.c:29:42: warning: section does not match previous declaration [-Wsection]
__visible volatile unsigned long jiffies __cacheline_aligned = INITIAL_JIFFIES;
^
./include/linux/cache.h:49:4: note: expanded from macro '__cacheline_aligned'
__section__(".data..cacheline_aligned")))
^
./include/linux/jiffies.h:81:31: note: previous attribute is here
extern unsigned long volatile __cacheline_aligned_in_smp __jiffy_arch_data jiffies;
^
./arch/x86/include/asm/cache.h:20:2: note: expanded from macro '__cacheline_aligned_in_smp'
__page_aligned_data
^
./include/linux/linkage.h:39:29: note: expanded from macro '__page_aligned_data'
#define __page_aligned_data __section(.data..page_aligned) __aligned(PAGE_SIZE)
^
./include/linux/compiler_attributes.h:233:56: note: expanded from macro '__section'
#define __section(S) __attribute__((__section__(#S)))
^
1 warning generated.
The declaration was changed in commit 7c30f352c852 ("jiffies.h: declare
jiffies and jiffies_64 with ____cacheline_aligned_in_smp") but wasn't
updated here. Make them match so Clang no longer warns.
Fixes: 7c30f352c852 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181013005311.28617-1-natechancellor@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric reported that a sequence count loop using this_cpu_read() got
optimized out. This is wrong, this_cpu_read() must imply READ_ONCE()
because the interface is IRQ-safe, therefore an interrupt can have
changed the per-cpu value.
Fixes: 7c3576d261ce ("[PATCH] i386: Convert PDA into the percpu section") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Eric Dumazet <edumazet@google.com> Cc: hpa@zytor.com Cc: eric.dumazet@gmail.com Cc: bp@alien8.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181011104019.748208519@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On 32bit systems, nosave_regions(non RAM areas) located between
max_low_pfn and max_pfn are not excluded from hibernation snapshot
currently, which may result in a machine check exception when
trying to access these unsafe regions during hibernation:
[ 612.800453] Disabling lock debugging due to kernel taint
[ 612.805786] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 6: fe00000000801136
[ 612.814344] mce: [Hardware Error]: RIP !INEXACT! 60:<00000000d90be566> {swsusp_save+0x436/0x560}
[ 612.823167] mce: [Hardware Error]: TSC 1f5939fe276 ADDR dd000000 MISC 30e0000086
[ 612.830677] mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1529487426 SOCKET 0 APIC 0 microcode 24
[ 612.839581] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 612.846394] mce: [Hardware Error]: Machine check: Processor context corrupt
[ 612.853380] Kernel panic - not syncing: Fatal machine check
[ 612.858978] Kernel Offset: 0x18000000 from 0xc1000000 (relocation range: 0xc0000000-0xf7ffdfff)
This is because on 32bit systems, pages above max_low_pfn are regarded
as high memeory, and accessing unsafe pages might cause expected MCE.
On the problematic 32bit system, there are reserved memory above low
memory, which triggered the MCE:
Fix this problem by changing pfn limit from max_low_pfn to max_pfn.
This fix does not impact 64bit system because on 64bit max_low_pfn
is the same as max_pfn.
Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Looking at the asm for native_sched_clock() I noticed we don't inline
enough. Mostly caused by sharing code with cyc2ns_read_begin(), which
we didn't used to do. So mark all that __force_inline to make it DTRT.
Fixes: 59eaef78bfea ("x86/tsc: Remodel cyc2ns to use seqcount_latch()") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: eric.dumazet@gmail.com Cc: bp@alien8.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181011104019.695196158@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With a very low cpu.cfs_quota_us setting, such as the minimum of 1000,
distribute_cfs_runtime may not empty the throttled_list before it runs
out of runtime to distribute. In that case, due to the change from c06f04c7048 to put throttled entries at the head of the list, later entries
on the list will starve. Essentially, the same X processes will get pulled
off the list, given CPU time and then, when expired, get put back on the
head of the list where distribute_cfs_runtime will give runtime to the same
set of processes leaving the rest.
Fix the issue by setting a bit in struct cfs_bandwidth when
distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can
decide to put the throttled entry on the tail or the head of the list. The
bit is set/cleared by the callers of distribute_cfs_runtime while they hold
cfs_bandwidth->lock.
This is easy to reproduce with a handful of CPU consumers. I use 'crash' on
the live system. In some cases you can simply look at the throttled list and
see the later entries are not changing:
The calculated ideal rate can easily overflow an unsigned long, thus
making the best div selection buggy as soon as no ideal match is found
before the overflow occurs.
Fixes: 4731a72df273 ("drm/sun4i: request exact rates to our parents") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181018100250.12565-1-boris.brezillon@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 7a68d9fb8510 ("USB: usbdevfs: sanitize flags more") checks the
transfer flags for URBs submitted from userspace via usbfs. However,
the check for whether the USBDEVFS_URB_SHORT_NOT_OK flag should be
allowed for a control transfer was added in the wrong place, before
the code has properly determined the direction of the control
transfer. (Control transfers are special because for them, the
direction is set by the bRequestType byte of the Setup packet rather
than direction bit of the endpoint address.)
This patch moves code which sets up the allow_short flag for control
transfers down after is_in has been set to the correct value.
Fix this by sanitizing num before using it to index
fsg_opts->common->luns
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
vhci_hub_control() accesses port_status array with out of bounds port
value. Fix it to reference port_status[] only with a valid rhport value
when invalid_rhport flag is true.
The invalid_rhport flag is set early on after detecting in port value
is within the bounds or not.
The following is used reproduce the problem and verify the fix:
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=14ed8ab6400000
The usb standard ("Universal Serial Bus Class Definitions for Communication
Devices") distiguishes between "consistent signals" (DSR, DCD), and
"irregular signals" (break, ring, parity error, framing error, overrun).
The bits of "irregular signals" are set, if this error/event occurred on
the device side and are immeadeatly unset, if the serial state notification
was sent.
Like other drivers of real serial ports do, just the occurence of those
events should be counted in serial_icounter_struct (but no 1->0
transitions).
Resetting the write index of the notification buffer on urb unlink (e.g.
closing a cdc-acm device from userspace) may lead to wrong interpretation
of further received notifications, in case the index is not 0 when urb
unlink happens (i.e. when parts of a notification already have been
transferred). On the device side there is no "reset" of the notification
transimission and thus we would get out of sync with the device.
Fix this by sanitizing hdr.cmd before using it to index
ucm_cmd_table.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
Fix this by sanitizing hdr.cmd before using it to index
ucm_cmd_table.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
Fix this by sanitizing pin_index before using it to index
ops->pin_config, and before passing it as an argument to
function ptp_set_pinfunc(), in which it is used to index
info->pin_config.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fscache_set_key() can incur an out-of-bounds read, reported by KASAN:
BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache]
Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615
and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236
BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline]
BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171
Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466
This happens for any index_key_len which is not divisible by 4 and is
larger than the size of the inline key, because the code allocates exactly
index_key_len for the key buffer, but the hashing loop is stepping through
it 4 bytes (u32) at a time in the buf[] array.
Fix this by calculating how many u32 buffers we'll need by using
DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation
buffer to hold the index_key, then using that same count as the hashing
index limit.
Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies") Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
the victim might've been rmdir'ed just before the lock_rename();
unlike the normal callers, we do not look the source up after the
parents are locked - we know it beforehand and just recheck that it's
still the child of what used to be its parent. Unfortunately,
the check is too weak - we don't spot a dead directory since its
->d_parent is unchanged, dentry is positive, etc. So we sail all
the way to ->rename(), with hosting filesystems _not_ expecting
to be asked renaming an rmdir'ed subdirectory.
The fix is easy, fortunately - the lock on parent is sufficient for
making IS_DEADDIR() on child safe.
Cc: stable@vger.kernel.org Fixes: 9ae326a69004 (CacheFiles: A cache that backs onto a mounted filesystem) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The inline key in struct rxrpc_cookie is insufficiently initialized,
zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15
bytes will end up hashing uninitialized memory because the memcpy only
partially fills the last buf[] element.
Fix this by clearing fscache_cookie objects on allocation rather than using
the slab constructor to initialise them. We're going to pretty much fill
in the entire struct anyway, so bringing it into our dcache writably
shouldn't incur much overhead.
This removes the need to do clearance in fscache_set_key() (where we aren't
doing it correctly anyway).
Also, we don't need to set cookie->key_len in fscache_set_key() as we
already did it in the only caller, so remove that.
Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies") Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com Reported-by: Eric Sandeen <sandeen@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The default mid-level PLL bias current setting interferes with sigma
delta modulation. This manifests as decreased audio quality at lower
sampling rates, which sounds like radio broadcast quality, and
distortion noises at sampling rates at 48 kHz or above.
Changing the bias current settings to the lowest gets rid of the
noise.
Fixes: de3448519194 ("clk: sunxi-ng: sun4i: Use sigma-delta modulation
for audio PLL") Cc: <stable@vger.kernel.org> # 4.15.x Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>