DMA allocations that can't sleep may return non-remapped addresses, but
we do not properly handle them in the mmap and get_sgtable methods.
Resolve non-vmalloc addresses using virt_to_page to handle this corner
case.
Cc: <stable@vger.kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7290d5809571 ("module: use relative references for __ksymtab entries")
updated the ksymtab handling of some KASLR capable architectures
so that ksymtab entries are emitted as pairs of 32-bit relative
references. This reduces the size of the entries, but more
importantly, it gets rid of statically assigned absolute
addresses, which require fixing up at boot time if the kernel
is self relocating (which takes a 24 byte RELA entry for each
member of the ksymtab struct).
Since ksymtab entries are always part of the same module as the
symbol they export, it was assumed at the time that a 32-bit
relative reference is always sufficient to capture the offset
between a ksymtab entry and its target symbol.
Unfortunately, this is not always true: in the case of per-CPU
variables, a per-CPU variable's base address (which usually differs
from the actual address of any of its per-CPU copies) is allocated
in the vicinity of the ..data.percpu section in the core kernel
(i.e., in the per-CPU reserved region which follows the section
containing the core kernel's statically allocated per-CPU variables).
Since we randomize the module space over a 4 GB window covering
the core kernel (based on the -/+ 4 GB range of an ADRP/ADD pair),
we may end up putting the core kernel out of the -/+ 2 GB range of
32-bit relative references of module ksymtab entries that refer to
per-CPU variables.
So reduce the module randomization range a bit further. We lose
1 bit of randomization this way, but this is something we can
tolerate.
Jeff discovered that performance improves from ~375K iops to ~519K iops
on a simple psync-write fio workload when moving the location of 'struct
page' from the default PMEM location to DRAM. This result is surprising
because the expectation is that 'struct page' for dax is only needed for
third party references to dax mappings. For example, a dax-mapped buffer
passed to another system call for direct-I/O requires 'struct page' for
sending the request down the driver stack and pinning the page. There is
no usage of 'struct page' for first party access to a file via
read(2)/write(2) and friends.
However, this "no page needed" expectation is violated by
CONFIG_HARDENED_USERCOPY and the check_copy_size() performed in
copy_from_iter_full_nocache() and copy_to_iter_mcsafe(). The
check_heap_object() helper routine assumes the buffer is backed by a
slab allocator (DRAM) page and applies some checks. Those checks are
invalid, dax pages do not originate from the slab, and redundant,
dax_iomap_actor() has already validated that the I/O is within bounds.
Specifically that routine validates that the logical file offset is
within bounds of the file, then it does a sector-to-pfn translation
which validates that the physical mapping is within bounds of the block
device.
Bypass additional hardened usercopy overhead and call the 'no check'
versions of the copy_{to,from}_iter operations directly.
Fixes: 0aed55af8834 ("x86, uaccess: introduce copy_from_iter_flushcache...") Cc: <stable@vger.kernel.org> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <willy@infradead.org> Reported-and-tested-by: Jeff Smits <jeff.smits@intel.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Current logic does not allow VCPU to be loaded onto CPU with
APIC ID 255. This should be allowed since the host physical APIC ID
field in the AVIC Physical APIC table entry is an 8-bit value,
and APIC ID 255 is valid in system with x2APIC enabled.
Instead, do not allow VCPU load if the host APIC ID cannot be
represented by an 8-bit value.
Also, use the more appropriate AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK
instead of AVIC_MAX_PHYSICAL_ID_COUNT.
The iproc host eMMC/SD controller hold time does not meet the
specification in the HS50 mode. This problem can be mitigated
by disabling the HISPD bit; thus forcing the controller output
data to be driven on the falling clock edges rather than the
rising clock edges.
Stable tag (v4.12+) chosen to assist stable kernel maintainers so that
the change does not produce merge conflicts backporting to older kernel
versions. In reality, the timing bug existed since the driver was first
introduced but there is no need for this driver to be supported in kernel
versions that old.
The iproc host eMMC/SD controller hold time does not meet the
specification in the HS50 mode. This problem can be mitigated
by disabling the HISPD bit; thus forcing the controller output
data to be driven on the falling clock edges rather than the
rising clock edges.
This change applies only to the Cygnus platform.
Stable tag (v4.12+) chosen to assist stable kernel maintainers so that
the change does not produce merge conflicts backporting to older kernel
versions. In reality, the timing bug existed since the driver was first
introduced but there is no need for this driver to be supported in kernel
versions that old.
The kernel self-tests picked up an issue with CTR mode:
alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep"
In the aesp8-ppc code from OpenSSL, there are two paths that
increment IVs: the bulk (8 at a time) path, and the individual
path which is used when there are fewer than 8 AES blocks to
process.
In the bulk path, the IV is incremented with vadduqm: "Vector
Add Unsigned Quadword Modulo", which does 128-bit addition.
In the individual path, however, the IV is incremented with
vadduwm: "Vector Add Unsigned Word Modulo", which instead
does 4 32-bit additions. Thus the IV would instead become FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result.
Use vadduqm.
This was probably a typo originally, what with q and w being
adjacent. It is a pretty narrow edge case: I am really
impressed by the quality of the kernel self-tests!
Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM") Cc: stable@vger.kernel.org Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The "hmac(sha3-224-generic)" algorithm has a descsize of 368 bytes,
which is greater than HASH_MAX_DESCSIZE (360) which is only enough for
sha3-224-generic. The check in shash_prepare_alg() doesn't catch this
because the HMAC template doesn't set descsize on the algorithms, but
rather sets it on each individual HMAC transform.
This causes a stack buffer overflow when SHASH_DESC_ON_STACK() is used
with hmac(sha3-224-generic).
Fix it by increasing HASH_MAX_DESCSIZE to the real maximum. Also add a
sanity check to hmac_init().
This was detected by the improved crypto self-tests in v5.2, by loading
the tcrypt module with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y enabled. I
didn't notice this bug when I ran the self-tests by requesting the
algorithms via AF_ALG (i.e., not using tcrypt), probably because the
stack layout differs in the two cases and that made a difference here.
KASAN report:
BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:359 [inline]
BUG: KASAN: stack-out-of-bounds in shash_default_import+0x52/0x80 crypto/shash.c:223
Write of size 360 at addr ffff8880651defc8 by task insmod/3689
This patch introduced regressions for devices that come online in
read-only state and subsequently switch to read-write.
Given how the partition code is currently implemented it is not
possible to persist the read-only flag across a device revalidate
call. This may need to get addressed in the future since it is common
for user applications to proactively call BLKRRPART.
Reverting this commit will re-introduce a regression where a
device-initiated revalidate event will cause the admin state to be
forgotten. A separate patch will address this issue.
Fixes: 20bd1d026aac ("scsi: sd: Keep disk read-only when re-reading partition") Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This barrier only applies to the read-modify-write operations; in
particular, it does not apply to the atomic_set() primitive.
Replace the barrier with an smp_mb().
Fixes: dac56212e8127 ("bio: skip atomic inc/dec of ->bi_cnt for most use cases") Cc: stable@vger.kernel.org Reported-by: "Paul E. McKenney" <paulmck@linux.ibm.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ming Lei <ming.lei@redhat.com> Cc: linux-block@vger.kernel.org Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for
host-initiated writes", 2019-04-02) introduced a "return false" in a
function returning int, and anyway set_efer has a "nonzero on error"
conventon so it should be returning 1.
Reported-by: Pavel Machek <pavel@denx.de> Fixes: 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes") Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We didn't wait for outstanding direct IO during truncate in nojournal
mode (as we skip orphan handling in that case). This can lead to fs
corruption or stale data exposure if truncate ends up freeing blocks
and these get reallocated before direct IO finishes. Fix the condition
determining whether the wait is necessary.
CC: stable@vger.kernel.org Fixes: 1c9114f9c0f1 ("ext4: serialize unlocked dio reads with truncate") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is possible that unlinked inode enters ext4_setattr() (e.g. if
somebody calls ftruncate(2) on unlinked but still open file). In such
case we should not delete the inode from the orphan list if truncate
fails. Note that this is mostly a theoretical concern as filesystem is
corrupted if we reach this path anyway but let's be consistent in our
orphan handling.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
User Mode Linux does not have access to the ip or sp fields of the pt_regs,
and accessing them causes UML to fail to build. Hide the int3_emulate_jmp()
and int3_emulate_call() instructions from UML, as it doesn't need them
anyway.
Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A fallthrough in switch/case was introduced in f627caf55b8e ("fbdev:
sm712fb: fix crashes and garbled display during DPMS modesetting"),
due to my copy-paste error, which would cause the memory clock frequency
for SM720 to be programmed to SM712.
Since it only reprograms the clock to a different frequency, it's only
a benign issue without visible side-effect, so it also evaded Sudip
Mukherjee's code review and regression tests. scripts/checkpatch.pl
also failed to discover the issue, possibly due to nested switch
statements.
This issue was found by Stephen Rothwell by building linux-next with
-Wimplicit-fallthrough.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: f627caf55b8e ("fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting") Signed-off-by: Yifeng Li <tomli@tomli.me> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
One of the biggest issues we face right now with picking LRU map over
regular hash table is that a map walk out of user space, for example,
to just dump the existing entries or to remove certain ones, will
completely mess up LRU eviction heuristics and wrong entries such
as just created ones will get evicted instead. The reason for this
is that we mark an entry as "in use" via bpf_lru_node_set_ref() from
system call lookup side as well. Thus upon walk, all entries are
being marked, so information of actual least recently used ones
are "lost".
In case of Cilium where it can be used (besides others) as a BPF
based connection tracker, this current behavior causes disruption
upon control plane changes that need to walk the map from user space
to evict certain entries. Discussion result from bpfconf [0] was that
we should simply just remove marking from system call side as no
good use case could be found where it's actually needed there.
Therefore this patch removes marking for regular LRU and per-CPU
flavor. If there ever should be a need in future, the behavior could
be selected via map creation flag, but due to mentioned reason we
avoid this here.
[0] http://vger.kernel.org/bpfconf.html
Fixes: 29ba732acbee ("bpf: Add BPF_MAP_TYPE_LRU_HASH") Fixes: 8f8449384ec3 ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a callback map_lookup_elem_sys_only() that map implementations
could use over map_lookup_elem() from system call side in case the
map implementation needs to handle the latter differently than from
the BPF data path. If map_lookup_elem_sys_only() is set, this will
be preferred pick for map lookups out of user space. This hook is
used in a follow-up fix for LRU map, but once development window
opens, we can convert other map types from map_lookup_elem() (here,
the one called upon BPF_MAP_LOOKUP_ELEM cmd is meant) over to use
the callback to simplify and clean up the latter.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For iptable module to load a bpf program from a pinned location, it
only retrieve a loaded program and cannot change the program content so
requiring a write permission for it might not be necessary.
Also when adding or removing an unrelated iptable rule, it might need to
flush and reload the xt_bpf related rules as well and triggers the inode
permission check. It might be better to remove the write premission
check for the inode so we won't need to grant write access to all the
processes that flush and restore iptables rules.
In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after
devres release"), we changed the ordering of tearing down the device DMA
ops and releasing all the device's resources; this was because the DMA ops
should be maintained until we release the device's managed DMA memories.
However, we have seen another crash on an arm64 system when a
device driver probe fails:
hisi_sas_v3_hw 0000:74:02.0: Adding to iommu group 2
scsi host1: hisi_sas_v3_hw
BUG: Bad page state in process swapper/0 pfn:313f5
page:ffff7e0000c4fd40 count:1 mapcount:0
mapping:0000000000000000 index:0x0
flags: 0xfffe00000001000(reserved)
raw: 0fffe00000001000ffff7e0000c4fd48ffff7e0000c4fd48 0000000000000000
raw: 0000000000000000000000000000000000000001ffffffff 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x1000(reserved)
Modules linked in:
CPU: 49 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-43081-g22d97fd-dirty #1433
Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI
RC0 - V1.12.01 01/29/2019
Call trace:
dump_backtrace+0x0/0x118
show_stack+0x14/0x1c
dump_stack+0xa4/0xc8
bad_page+0xe4/0x13c
free_pages_check_bad+0x4c/0xc0
__free_pages_ok+0x30c/0x340
__free_pages+0x30/0x44
__dma_direct_free_pages+0x30/0x38
dma_direct_free+0x24/0x38
dma_free_attrs+0x9c/0xd8
dmam_release+0x20/0x28
release_nodes+0x17c/0x220
devres_release_all+0x34/0x54
really_probe+0xc4/0x2c8
driver_probe_device+0x58/0xfc
device_driver_attach+0x68/0x70
__driver_attach+0x94/0xdc
bus_for_each_dev+0x5c/0xb4
driver_attach+0x20/0x28
bus_add_driver+0x14c/0x200
driver_register+0x6c/0x124
__pci_register_driver+0x48/0x50
sas_v3_pci_driver_init+0x20/0x28
do_one_initcall+0x40/0x25c
kernel_init_freeable+0x2b8/0x3c0
kernel_init+0x10/0x100
ret_from_fork+0x10/0x18
Disabling lock debugging due to kernel taint
BUG: Bad page state in process swapper/0 pfn:313f6
page:ffff7e0000c4fd80 count:1 mapcount:0
mapping:0000000000000000 index:0x0
[ 89.322983] flags: 0xfffe00000001000(reserved)
raw: 0fffe00000001000ffff7e0000c4fd88ffff7e0000c4fd88 0000000000000000
raw: 0000000000000000000000000000000000000001ffffffff 0000000000000000
The crash occurs for the same reason.
In this case, on the really_probe() failure path, we are still clearing
the DMA ops prior to releasing the device's managed memories.
This patch fixes this issue by reordering the DMA ops teardown and the
call to devres_release_all() on the failure path.
Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The problem is that any 'uptodate' vs 'disks' check is not precise
in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the
device that might try to kick off writes and then skip the action.
Better to prevent the raid driver from taking unexpected action *and* keep
the system alive vs killing the machine with BUG_ON.
Note: fixed warning reported by kbuild test robot <lkp@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Nigel Croxon <ncroxon@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com> Cc: Nigel Croxon <ncroxon@redhat.com> Cc: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
New race in x86_pmu_stop() was introduced by replacing the
atomic __test_and_clear_bit() of cpuc->active_mask by separate
test_bit() and __clear_bit() calls in the following commit:
3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
The race causes panic for PEBS events with enabled callchains:
The event is configured to make samples from PEBS drain code,
but when it's disabled, we'll go through NMI path instead,
where data->callchain will not get allocated and we'll crash:
Robert Walker reported a segmentation fault is observed when process
CoreSight trace data; this issue can be easily reproduced by the command
'perf report --itrace=i1000i' for decoding tracing data.
If neither the 'b' flag (synthesize branches events) nor 'l' flag
(synthesize last branch entries) are specified to option '--itrace',
cs_etm_queue::prev_packet will not been initialised. After merging the
code to support exception packets and sample flags, there introduced a
number of uses of cs_etm_queue::prev_packet without checking whether it
is valid, for these cases any accessing to uninitialised prev_packet
will cause crash.
As cs_etm_queue::prev_packet is used more widely now and it's already
hard to follow which functions have been called in a context where the
validity of cs_etm_queue::prev_packet has been checked, this patch
always allocates memory for cs_etm_queue::prev_packet.
Reported-by: Robert Walker <robert.walker@arm.com> Suggested-by: Robert Walker <robert.walker@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Robert Walker <robert.walker@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Fixes: 7100b12cf474 ("perf cs-etm: Generate branch sample for exception packet") Fixes: 24fff5eb2b93 ("perf cs-etm: Avoid stale branch samples when flush packet") Link: http://lkml.kernel.org/r/20190428083228.20246-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While cross building perf to the ARC architecture on a fedora 30 host,
we were failing with:
CC /tmp/build/perf/bench/numa.o
bench/numa.c: In function ‘worker_thread’:
bench/numa.c:1261:12: error: ‘RUSAGE_THREAD’ undeclared (first use in this function); did you mean ‘SIGEV_THREAD’?
getrusage(RUSAGE_THREAD, &rusage);
^~~~~~~~~~~~~
SIGEV_THREAD
bench/numa.c:1261:12: note: each undeclared identifier is reported only once for each function it appears in
[perfbuilder@60d5802468f6 perf]$ /arc_gnu_2019.03-rc1_prebuilt_uclibc_le_archs_linux_install/bin/arc-linux-gcc --version | head -1
arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225
[perfbuilder@60d5802468f6 perf]$
Trying to reproduce a report by Vineet, I noticed that, with just
cross-built zlib and numactl libraries, I ended up with the above
failure.
So, since RUSAGE_THREAD is available as a define, check for that and
numactl libraries, I ended up with the above failure.
So, since RUSAGE_THREAD is available as a define in the system headers,
check if it is defined in the 'perf bench numa' sources and define it if
not.
Now it builds and I have to figure out if the problem reported by Vineet
only takes place if we have libelf or some other library available.
Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: linux-snps-arc@lists.infradead.org Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Link: https://lkml.kernel.org/n/tip-2wb4r1gir9xrevbpq7qp0amk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
There are two problems with dev_err() here. One: It is not ratelimited.
Two: We don't see which driver tried to transfer something with a
suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
is that we don't see if multiple drivers are trying to transfer while
suspended. They need to be discovered one after the other now. This is
better than a high CPU load because a really broken driver might try to
resend endlessly.
Link: https://bugs.archlinux.org/task/62391 Fixes: 275154155538 ("i2c: designware: Do not allow i2c_dw_xfer() calls while suspended") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reported-by: skidnik <skidnik@gmail.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: skidnik <skidnik@gmail.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
To choose whether to pick the GID from the old (16bit) or new (32bit)
field, we should check if the old gid field is set to 0xffff. Mainline
checks the old *UID* field instead - cut'n'paste from the corresponding
code in ufs_get_inode_uid().
Fixes: 252e211e90ce Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
If a memory slot's size is not a multiple of 64 pages (256K), then
the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages
either requires the requested page range to go beyond memslot->npages,
or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect
requires log->num_pages to be both in range and aligned.
To allow this case, allow log->num_pages not to be a multiple of 64 if
it ends exactly on the last page of the slot.
Reported-by: Peter Xu <peterx@redhat.com> Fixes: 98938aa8edd6 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Enablement of AMD's Secure Memory Encryption feature is determined very
early after start_kernel() is entered. Part of this procedure involves
scanning the command line for the parameter 'mem_encrypt'.
To determine intended state, the function sme_enable() uses library
functions cmdline_find_option() and strncmp(). Their use occurs early
enough such that it cannot be assumed that any instrumentation subsystem
is initialized.
For example, making calls to a KASAN-instrumented function before KASAN
is set up will result in the use of uninitialized memory and a boot
failure.
When AMD's SME support is enabled, conditionally disable instrumentation
of these dependent functions in lib/string.c and arch/x86/lib/cmdline.c.
[ bp: Get rid of intermediary nostackp var and cleanup whitespace. ]
Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption") Reported-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Boris Brezillon <bbrezillon@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: "dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: "luto@kernel.org" <luto@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "mingo@redhat.com" <mingo@redhat.com> Cc: "peterz@infradead.org" <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/155657657552.7116.18363762932464011367.stgit@sosrh3.amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
We don't check for the validity of the lengths in the packet received
from the firmware. If the MPDU length received in the rx descriptor
is too short to contain the header length and the crypt length
together, we may end up trying to copy a negative number of bytes
(headlen - hdrlen < 0) which will underflow and cause us to try to
copy a huge amount of data. This causes oopses such as this one:
MI_00 ZTE Mobile Broadband Diagnostics Port
MI_01 ZTE Mobile Broadband AT Port
MI_02 ZTE Mobile Broadband Modem
MI_03 ZTE Mobile Broadband NDIS Port (qmi_wwan)
MI_04 ZTE Mobile Broadband ADB Port
Reported-by: Lars Melin <larsm17@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Unless the very next line is schedule(), or implies it, one must not use
preempt_enable_no_resched(). It can cause a preemption to go missing and
thereby cause arbitrary delays, breaking the PREEMPT=y invariant.
Cc: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
"bpftool map create" has an infinite loop on "while (argc)". The error
case is missing.
Symptoms: when forgetting to type the keyword 'type' in front of 'hash':
$ sudo bpftool map create /sys/fs/bpf/dir/foobar hash key 8 value 8 entries 128
(infinite loop, taking all the CPU)
^C
After the patch:
$ sudo bpftool map create /sys/fs/bpf/dir/foobar hash key 8 value 8 entries 128
Error: unknown arg hash
Fixes: 0b592b5a01be ("tools: bpftool: add map create command") Signed-off-by: Alban Crequy <alban@kinvolk.io> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix a similar endless event loop as was done in commit 8dcf32175b4e ("i2c: prevent endless uevent loop with
CONFIG_I2C_DEBUG_CORE"):
The culprit is the dev_dbg printk in the i2c uevent handler. If
this is activated (for instance by CONFIG_I2C_DEBUG_CORE) it results
in an endless loop with systemd-journald.
This happens if user-space scans the system log and reads the uevent
file to get information about a newly created device, which seems
fair use to me. Unfortunately reading the "uevent" file uses the
same function that runs for creating the uevent for a new device,
generating the next syslog entry
Both CONFIG_I2C_DEBUG_CORE and CONFIG_POWER_SUPPLY_DEBUG were reported
in https://bugs.freedesktop.org/show_bug.cgi?id=76886 but only former
seems to have been fixed. Drop debug prints as it was done in I2C
subsystem to resolve the issue.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Cc: Chris Healy <cphealy@gmail.com> Cc: linux-pm@vger.kernel.org Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
A failed KVM_ARM_VCPU_INIT should not set the vcpu target,
as the vcpu target is used by kvm_vcpu_initialized() to
determine if other vcpu ioctls may proceed. We need to set
the target before calling kvm_reset_vcpu(), but if that call
fails, we should then unset it and clear the feature bitmap
while we're at it.
Signed-off-by: Andrew Jones <drjones@redhat.com>
[maz: Simplified patch, completed commit message] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The txq of vif is added to active_txqs list for ATF TXQ scheduling
in the function ieee80211_queue_skb(), but it was not properly removed
before freeing the txq object. It was causing use after free of the txq
objects from the active_txqs list, result was kernel panic
due to invalid memory access.
Fix kernel invalid memory access by properly removing txq object
from active_txqs list before free the object.
Signed-off-by: Bhagavathi Perumal S <bperumal@codeaurora.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
It was reported that with some special Multi Processor Group configuration,
e.g:
bcdedit.exe /set groupsize 1
bcdedit.exe /set maxgroup on
bcdedit.exe /set groupaware on
for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
is in use.
Tracing kvm_hv_flush_tlb immediately reveals the issue:
The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
We don't flush anything and apparently it's not what Windows expects.
TLFS doesn't say anything about such requests and newer Windows versions
seem to be unaffected. This all feels like a WS2012 bug, which is, however,
easy to workaround in KVM: let's flush everything when we see an empty
flush request, over-flushing doesn't hurt.
In most cases, kmalloc() will not be available early in boot when
pci_setup() is called. Thus, the kstrdup() call that was added to fix the
__initdata bug with the disable_acs_redir parameter usually returns NULL,
so the parameter is discarded and has no effect.
To fix this, store the string that's in initdata until an initcall function
can allocate the memory appropriately. This way we don't need any
additional static memory.
symlink body shouldn't be freed without an RCU delay. Switch apparmorfs
to ->destroy_inode() and use of call_rcu(); free both the inode and symlink
body in the callback.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
symlink body shouldn't be freed without an RCU delay. Switch securityfs
to ->destroy_inode() and use of call_rcu(); free both the inode and symlink
body in the callback.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Guest physical to user address translation uses KVM memslots and reading
these requires holding the kvm->srcu lock. However recently introduced
kvmppc_tce_validate() broke the rule (see the lockdep warning below).
This moves srcu_read_lock(&vcpu->kvm->srcu) earlier to protect
kvmppc_tce_validate() as well.
There is a hardware bug in some POWER9 processors where a treclaim in
fake suspend mode can cause an inconsistency in the XER[SO] bit across
the threads of a core, the workaround being to force the core into SMT4
when doing the treclaim.
The FAKE_SUSPEND bit (bit 10) in the PSSCR is used to control whether a
thread is in fake suspend or real suspend. The important difference here
being that thread reconfiguration is blocked in real suspend but not
fake suspend mode.
When we exit a guest which was in fake suspend mode, we force the core
into SMT4 while we do the treclaim in kvmppc_save_tm_hv().
However on the new exit path introduced with the function
kvmhv_run_single_vcpu() we restore the host PSSCR before calling
kvmppc_save_tm_hv() which means that if we were in fake suspend mode we
put the thread into real suspend mode when we clear the
PSSCR[FAKE_SUSPEND] bit. This means that we block thread reconfiguration
and the thread which is trying to get the core into SMT4 before it can
do the treclaim spins forever since it itself is blocking thread
reconfiguration. The result is that that core is essentially lost.
To fix this we preserve the PSSCR[FAKE_SUSPEND] bit until we call
kvmppc_save_tm_hv() which will mean the core can get into SMT4 and
perform the treclaim. Note kvmppc_save_tm_hv() clears the
PSSCR[FAKE_SUSPEND] bit again so there is no need to explicitly do that.
Fixes: 95a6432ce9038 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Sometimes one of the nkmp factors is unused. This means that one of the
factors shift and width values are set to 0. Current nkmp clock code
generates a mask for each factor with GENMASK(width + shift - 1, shift).
For unused factor this translates to GENMASK(-1, 0). This code is
further expanded by C preprocessor to final version:
(((~0UL) - (1UL << (0)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (-1))))
or a bit simplified:
(~0UL & (~0UL >> BITS_PER_LONG))
It turns out that result of the second part (~0UL >> BITS_PER_LONG) is
actually undefined by C standard, which clearly specifies:
"If the value of the right operand is negative or is greater than or
equal to the width of the promoted left operand, the behavior is
undefined."
Additionally, compiling kernel with aarch64-linux-gnu-gcc 8.3.0 gave
different results whether literals or variables with same values as
literals were used. GENMASK with literals -1 and 0 gives zero and with
variables gives 0xFFFFFFFFFFFFFFF (~0UL). Because nkmp driver uses
GENMASK with variables as parameter, expression calculates mask as ~0UL
instead of 0. This has further consequences that LSB in register is
always set to 1 (1 is neutral value for a factor and shift is 0).
For example, H6 pll-de clock is set to 600 MHz by sun4i-drm driver, but
due to this bug ends up being 300 MHz. Additionally, 300 MHz seems to be
too low because following warning can be found in dmesg:
This commit solves the issue by first checking value of the factor
width. If it is equal to 0 (unused factor), mask is set to 0, otherwise
GENMASK() macro is used as before.
HSDK currently panics when built for HIGHMEM/ARC_HAS_PAE40 because ioc
is enabled with default which doesn't work for the 2 non contiguous
memory nodes. So get PAE working by disabling ioc instead.
Tested with !PAE40 by forcing @ioc_enable=0 and running the glibc
testsuite over ssh
We currently don't reload pointers pointing into skb header
after doing pskb_may_pull() in _decode_session4(). So in case
pskb_may_pull() changed the pointers, we read from random
memory. Fix this by putting all the needed infos on the
stack, so that we don't need to access the header pointers
after doing pskb_may_pull().
If an xfrmi is associated to a vrf layer 3 master device,
xfrm_policy_check() fails after traffic decapsulation. The input
interface is replaced by the layer 3 master device, and hence
xfrmi_decode_session() can't match the xfrmi anymore to satisfy
policy checking.
Extend ingress xfrmi lookup to honor the original layer 3 slave
device, allowing xfrm interfaces to operate within a vrf domain.
esp_output_udp_encap can produce a length that doesn't fit in the 16
bits of a UDP header's length field. In that case, we'll send a
fragmented packet whose length is larger than IP_MAX_MTU (resulting in
"Oversized IP packet" warnings on receive) and with a bogus UDP
length.
To prevent this, add a length check to esp_output_udp_encap and return
-EMSGSIZE on failure.
In commit 6a53b7593233 ("xfrm: check id proto in validate_tmpl()")
I introduced a check for xfrm protocol, but according to Herbert
IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so
it should be removed from validate_tmpl().
And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific
protocols, this is why xfrm_state_flush() could still miss
IPPROTO_ROUTING, which leads that those entries are left in
net->xfrm.state_all before exit net. Fix this by replacing
IPSEC_PROTO_ANY with zero.
This patch also extracts the check from validate_tmpl() to
xfrm_id_proto_valid() and uses it in parse_ipsecrequest().
With this, no other protocols should be added into xfrm.
Fixes: 6a53b7593233 ("xfrm: check id proto in validate_tmpl()") Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If tunnel registration failed during module initialization, the module
would fail to deregister the IPPROTO_COMP protocol and would attempt to
deregister the tunnel.
The tunnel was not deregistered during module-exit.
Fixes: dd9ee3444014e ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel") Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When unloading xfrm6_tunnel module, xfrm6_tunnel_fini directly
frees the xfrm6_tunnel_spi_kmem. Maybe someone has gotten the
xfrm6_tunnel_spi, so need to wait it.
Fixes: 91cc3bb0b04ff("xfrm6_tunnel: RCU conversion") Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In esp4_gro_receive() and esp6_gro_receive(), secpath can be allocated
without adding xfrm state to xvec. Then, sp->xvec[sp->len - 1] would
fail and result in dereferencing invalid pointer in esp4_gso_segment()
and esp6_gso_segment(). Reset secpath if xfrm function returns error.
xfrm_add_policy
-->verify_newpolicy_info //check the index provided by user with XFRM_POLICY_MAX
//In my case, the index is 0x6E6BB6, so it pass the check.
-->xfrm_policy_construct //copy the user's policy and set xfrm_policy_timer
-->xfrm_policy_insert
--> __xfrm_policy_link //use the orgin dir, in my case is 2
--> xfrm_gen_index //generate policy index, there is 0x6E6BB6
then xfrm_policy_timer be fired
xfrm_policy_timer
--> xfrm_policy_id2dir //get dir from (policy index & 7), in my case is 6
--> xfrm_policy_delete
--> __xfrm_policy_unlink //access policy_count[dir], trigger out of range access
Add xfrm_policy_id2dir check in verify_newpolicy_info, make sure the computed dir is
valid, to fix the issue.
Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: e682adf021be ("xfrm: Try to honor policy index if it's supplied by user") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per
POSIX") files opened even via nonseekable_open gate read and write via lock
and do not allow them to be run simultaneously. This can create read vs
write deadlock if a filesystem is trying to implement a socket-like file
which is intended to be simultaneously used for both read and write from
filesystem client. See commit 10dce8af3422 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock
on writes to /proc/xen/xenbus") for a similar deadlock example on
/proc/xen/xenbus.
To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers
so if we would do such a change it will break a real user.
Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
opened handler is having stream-like semantics; does not use file position
and thus the kernel is free to issue simultaneous read and write request on
opened file handle.
This patch together with stream_open() should be added to stable kernels
starting from v3.14+. This will allow to patch OSSPD and other FUSE
filesystems that provide stream-like files to return FOPEN_STREAM |
FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
kernel versions. This should work because fuse_finish_open ignores unknown
open flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel that
is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE
is sufficient to implement streams without read vs write deadlock.
Commit b592211c33f7 ("dm mpath: fix attached_handler_name leak and
dangling hw_handler_name pointer") fixed a memory leak for the case
where setup_scsi_dh() returns failure. But setup_scsi_dh may return
success and not "use" attached_handler_name if the
retain_attached_hwhandler flag is not set on the map. As setup_scsi_sh
properly "steals" the pointer by nullifying it, freeing it
unconditionally in parse_path() is safe.
Fixes: b592211c33f7 ("dm mpath: fix attached_handler_name leak and dangling hw_handler_name pointer") Cc: stable@vger.kernel.org Reported-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we use separate devices for data and metadata, dm-integrity would
incorrectly calculate the size of the metadata device as if it had
512-byte block size - and it would refuse activation with larger block
size and smaller metadata device.
Fix this so that it takes actual block size into account, which fixes
the following reported issue:
https://gitlab.com/cryptsetup/cryptsetup/issues/450
The information about tag size should not be printed without debug info
set. Also print device major:minor in the error message to identify the
device instance.
Also use rate limiting and debug level for info about used crypto API
implementaton. This is important because during online reencryption
the existing message saturates syslog (because we are moving hotzone
across the whole device).
Cc: stable@vger.kernel.org Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the target line contains an invalid device, delay_ctr() will call
delay_dtr() with NULL workqueue. Attempting to destroy the NULL
workqueue causes a crash.
The function blkdev_report_zones() returns success even if no zone
information is reported (empty report). Empty zone reports can only
happen if the report start sector passed exceeds the device capacity.
The conditions for this to happen are either a bug in the caller code,
or, a change in the device that forced the low level driver to change
the device capacity to a value that is lower than the report start
sector. This situation includes a failed disk revalidation resulting in
the disk capacity being changed to 0.
If this change happens while dm-zoned is in its initialization phase
executing dmz_init_zones(), this function may enter an infinite loop
and hang the system. To avoid this, add a check to disallow empty zone
reports and bail out early. Also fix the function dmz_update_zone() to
make sure that the report for the requested zone was correctly obtained.
Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Shaun Tancheff <shaun@tancheff.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Due to an erratum in some Pericom PCIe-to-PCI bridges in reverse mode
(conventional PCI on primary side, PCIe on downstream side), the Retrain
Link bit needs to be cleared manually to allow the link training to
complete successfully.
If it is not cleared manually, the link training is continuously restarted
and no devices below the PCI-to-PCIe bridge can be accessed. That means
drivers for devices below the bridge will be loaded but won't work and may
even crash because the driver is only reading 0xffff.
See the Pericom Errata Sheet PI7C9X111SLB_errata_rev1.2_102711.pdf for
details. Devices known as affected so far are: PI7C9X110, PI7C9X111SL,
PI7C9X130.
Add a new flag, clear_retrain_link, in struct pci_dev. Quirks for affected
devices set this bit.
Note that pcie_retrain_link() lives in aspm.c because that's currently the
only place we use it, but this erratum is not specific to ASPM, and we may
retrain links for other reasons in the future.
Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu>
[bhelgaas: apply regardless of CONFIG_PCIEASPM] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reestablish the PCIe link very early in the resume process in case it
went down to prevent PCI accesses from hanging the bus. Such accesses
can happen early in the PCI resume process, as early as the
SUSPEND_RESUME_NOIRQ step, thus the link must be reestablished in the
driver resume_noirq() callback.
Fixes: e015f88c368d ("PCI: rcar: Add support for R-Car H3 to pcie-rcar") Signed-off-by: Kazufumi Ikeda <kaz-ikeda@xc.jp.nec.com> Signed-off-by: Gaku Inami <gaku.inami.xw@bp.renesas.com> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
[lorenzo.pieralisi@arm.com: reformatted commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: stable@vger.kernel.org Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Phil Edworthy <phil.edworthy@renesas.com> Cc: Simon Horman <horms+renesas@verge.net.au> Cc: Wolfram Sang <wsa@the-dreams.de> Cc: linux-renesas-soc@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 60ed982a4e78 ("PCI/AER: Move internal declarations to
drivers/pci/pci.h") changed pci_aer_init() to return "void", but didn't
change the stub for when CONFIG_PCIEAER isn't enabled. Change the stub to
match.
Two functions allocate a host bridge: devm_pci_alloc_host_bridge() and
pci_alloc_host_bridge(). At the moment, only the unmanaged one initializes
the PCIe feature bits, which prevents from using features such as hotplug
or AER on some systems, when booting with device tree. Make the
initialization code common.
On ThinkPad P50 SKUs with an Nvidia Quadro M1000M instead of the M2000M
variant, the BIOS does not always reset the secondary Nvidia GPU during
reboot if the laptop is configured in Hybrid Graphics mode. The reason is
unknown, but the following steps and possibly a good bit of patience will
reproduce the issue:
1. Boot up the laptop normally in Hybrid Graphics mode
2. Make sure nouveau is loaded and that the GPU is awake
3. Allow the Nvidia GPU to runtime suspend itself after being idle
4. Reboot the machine, the more sudden the better (e.g. sysrq-b may help)
5. If nouveau loads up properly, reboot the machine again and go back to
step 2 until you reproduce the issue
This results in some very strange behavior: the GPU will be left in exactly
the same state it was in when the previously booted kernel started the
reboot. This has all sorts of bad side effects: for starters, this
completely breaks nouveau starting with a mysterious EVO channel failure
that happens well before we've actually used the EVO channel for anything:
This causes a timeout trying to bring up the GR ctx:
nouveau 0000:01:00.0: timeout
WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547 gf100_grctx_generate+0x7b2/0x850 [nouveau]
Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET82W (1.55 ) 12/18/2018
Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
...
nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000008000 engine 00 [GR] client 15 [HUB/SCC_NB] reason c4 [] on channel -1 [0000000000 unknown]
The GPU never manages to recover. Booting without loading nouveau causes
issues as well, since the GPU starts sending spurious interrupts that cause
other device's IRQs to get disabled by the kernel:
irq 16: nobody cared (try booting with the "irqpoll" option)
...
handlers:
[<000000007faa9e99>] i801_isr [i2c_i801]
Disabling IRQ #16
...
serio: RMI4 PS/2 pass-through port at rmi4-00.fn03
i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
i801_smbus 0000:00:1f.4: Transaction timeout
rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-110).
i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
i801_smbus 0000:00:1f.4: Transaction timeout
rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts!
This causes the touchpad and sometimes other things to get disabled.
Since this happens without nouveau, we can't fix this problem from nouveau
itself.
Add a PCI quirk for the specific P50 variant of this GPU. Make sure the
GPU is advertising NoReset- so we don't reset the GPU when the machine is
in Dedicated graphics mode (where the GPU being initialized by the BIOS is
normal and expected). Map the GPU MMIO space and read the magic 0x2240c
register, which will have bit 1 set if the device was POSTed during a
previous boot. Once we've confirmed all of this, reset the GPU and
re-disable it - bringing it back to a healthy state.
When using PCI passthrough with this device, the host machine locks up
completely when starting the VM, requiring a hard reboot. Add a quirk to
avoid bus resets on this device.
Fixes: c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset") Link: https://lore.kernel.org/linux-pci/20190107213248.3034-1-james.prestwood@linux.intel.com Signed-off-by: James Prestwood <james.prestwood@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.14+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ATS is broken on the Radeon R7 GPU (at least for Stoney Ridge based laptop)
and causes IOMMU stalls and system failure. Disable ATS on these devices
to make them usable again with IOMMU enabled.
Thanks to Joerg Roedel <jroedel@suse.de> for help.
[bhelgaas: In the email thread mentioned below, Alex suspects the real
problem is in sbios or iommu, so it may affect only certain systems, and it
may affect other devices in those systems as well. However, per Joerg we
lack the ability to debug further, so this quirk is the best we can do for
now.]
On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), blanking the display
or starting the X server will crash and freeze the system, or garble the
display.
Experiments showed this problem can mostly be solved by adjusting the
order of register writes. Also, sm712fb failed to consider the difference
of clock frequency when unblanking the display, and programs the clock for
SM712 to SM720.
Fix them by adjusting the order of register writes, and adding an
additional check for SM720 for programming the clock frequency.
Loongson MIPS netbooks use 1024x600 LCD panels, which is the original
target platform of this driver, but nearly all old x86 laptops have
1024x768. Lighting 768 panels using 600's timings would partially
garble the display. Since it's not possible to distinguish them reliably,
we change the default to 768, but keep 600 as-is on MIPS.
Further, earlier laptops, such as IBM Thinkpad 240X, has a 800x600 LCD
panel, this driver would probably garbled those display. As we don't
have one for testing, the original behavior of the driver is kept as-is,
but the problem has been documented is the comments.
In order to support the 1024x600 panel on Yeeloong Loongson MIPS
laptop, the original 1024x768-16 table was modified to 1024x600-16,
without leaving the original. It causes problem on x86 laptop as
the 1024x768-16 support was still claimed but not working.
On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), running fbtest or X
will crash the machine instantly, because the VRAM/framebuffer is not
mapped correctly.
On SM712, the framebuffer starts at the beginning of address space, but
SM720's framebuffer starts at the 1 MiB offset from the beginning. However,
sm712fb fails to take this into account, as a result, writing to the
framebuffer will destroy all the registers and kill the system immediately.
Another problem is the driver assumes 8 MiB of VRAM for SM720, but some
SM720 system, such as this IBM Thinkpad, only has 4 MiB of VRAM.
Fix this problem by removing the hardcoded VRAM size, adding a function to
query the amount of VRAM from register MCR76 on SM720, and adding proper
framebuffer offset.
Please note that the memory map may have additional problems on Big-Endian
system, which is not available for testing by myself. But I highly suspect
that the original code is also broken on Big-Endian machines for SM720, so
at least we are not making the problem worse. More, the driver also assumed
SM710/SM712 has 4 MiB of VRAM, but it has a 2 MiB version as well, and used
in earlier laptops, such as IBM Thinkpad 240X, the driver would probably
crash on them. I've never seen one of those machines and cannot fix it, but
I have documented these problems in the comments.
When the machine is booted in VGA mode, loading sm712fb would cause
a glitch of random pixels shown on the screen. To prevent it from
happening, we first clear the entire framebuffer, and we also need
to stop calling smtcfb_setmode() during initialization, the fbdev
layer will call it for us later when it's ready.
On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), rebooting with
sm712fb framebuffer driver would cause a white screen of death on
the next POST, presumably the proper timings for the LCD panel was
not reprogrammed properly by the BIOS.
Experiments showed a few CRTC Scratch Registers, including CRT3D,
CRT3E and CRT3F may be used internally by BIOS as some flags. CRT3B is
a hardware testing register, we shouldn't mess with it. CRT3C has
blanking signal and line compare control, which is not needed for this
driver.
Stop writing to CR3B-CR3F (a.k.a CRT3B-CRT3F) registers. Even if these
registers don't have side-effect on other systems, writing to them is
also highly questionable.
On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), the amount of Video
RAM is not detected correctly by the xf86-video-siliconmotion driver.
This is because sm712fb overwrites the GPR71 Scratch Pad Register, which
is set by BIOS on x86 and used to indicate amount of VRAM.
Other Scratch Pad Registers, including GPR70/74/75, don't have the same
side-effect, but overwriting to them is still questionable, as they are
not related to modesetting.
Stop writing to SR70/71/74/75 (a.k.a GPR70/71/74/75).
On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), rebooting with
sm712fb framebuffer driver would cause the role of brightness up/down
button to swap.
Experiments showed the FPR30 register caused this behavior. Moreover,
even if this register don't have side-effect on other systems, over-
writing it is also highly questionable, since it was originally
configurated by the motherboard manufacturer by hardwiring pull-down
resistors to indicate the type of LCD panel. We should not mess with
it.
38ac0287b7f4 ("fbdev/efifb: Honour UEFI memory map attributes when mapping the FB")
updated the EFI framebuffer code to use memory mappings for the linear
framebuffer that are permitted by the memory attributes described by the
EFI memory map for the particular region, if the framebuffer happens to
be covered by the EFI memory map (which is typically only the case for
framebuffers in shared memory). This is required since non-x86 systems
may require cacheable attributes for memory mappings that are shared
with other masters (such as GPUs), and this information cannot be
described by the Graphics Output Protocol (GOP) EFI protocol itself,
and so we rely on the EFI memory map for this.
As reported by James, this breaks some x86 systems:
[ 1.173368] efifb: probing for efifb
[ 1.173386] efifb: abort, cannot remap video memory 0x1d5000 @ 0xcf800000
[ 1.173395] Trying to free nonexistent resource <00000000cf800000-00000000cf9d4bff>
[ 1.173413] efi-framebuffer: probe of efi-framebuffer.0 failed with error -5
The problem turns out to be that the memory map entry that describes the
framebuffer has no memory attributes listed at all, and so we end up with
a mem_flags value of 0x0.
So work around this by ensuring that the memory map entry's attribute field
has a sane value before using it to mask the set of usable attributes.
Reported-by: James Hilliard <james.hilliard1@gmail.com> Tested-by: James Hilliard <james.hilliard1@gmail.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: <stable@vger.kernel.org> # v4.19+ Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Fixes: 38ac0287b7f4 ("fbdev/efifb: Honour UEFI memory map attributes when ...") Link: http://lkml.kernel.org/r/20190516213159.3530-2-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a bit of a mess, to put it mildly. But, it's a bug
that only seems to have showed up in 4.20 but wasn't noticed
until now, because nobody uses MPX.
MPX has the arch_unmap() hook inside of munmap() because MPX
uses bounds tables that protect other areas of memory. When
memory is unmapped, there is also a need to unmap the MPX
bounds tables. Barring this, unused bounds tables can eat 80%
of the address space.
But, the recursive do_munmap() that gets called vi arch_unmap()
wreaks havoc with __do_munmap()'s state. It can result in
freeing populated page tables, accessing bogus VMA state,
double-freed VMAs and more.
See the "long story" further below for the gory details.
To fix this, call arch_unmap() before __do_unmap() has a chance
to do anything meaningful. Also, remove the 'vma' argument
and force the MPX code to do its own, independent VMA lookup.
== UML / unicore32 impact ==
Remove unused 'vma' argument to arch_unmap(). No functional
change.
I compile tested this on UML but not unicore32.
== powerpc impact ==
powerpc uses arch_unmap() well to watch for munmap() on the
VDSO and zeroes out 'current->mm->context.vdso_base'. Moving
arch_unmap() makes this happen earlier in __do_munmap(). But,
'vdso_base' seems to only be used in perf and in the signal
delivery that happens near the return to userspace. I can not
find any likely impact to powerpc, other than the zeroing
happening a little earlier.
powerpc does not use the 'vma' argument and is unaffected by
its removal.
I compile-tested a 64-bit powerpc defconfig.
== x86 impact ==
For the common success case this is functionally identical to
what was there before. For the munmap() failure case, it's
possible that some MPX tables will be zapped for memory that
continues to be in use. But, this is an extraordinarily
unlikely scenario and the harm would be that MPX provides no
protection since the bounds table got reset (zeroed).
I can't imagine anyone doing this:
ptr = mmap();
// use ptr
ret = munmap(ptr);
if (ret)
// oh, there was an error, I'll
// keep using ptr.
Because if you're doing munmap(), you are *done* with the
memory. There's probably no good data in there _anyway_.
This passes the original reproducer from Richard Biener as
well as the existing mpx selftests/.
The long story:
munmap() has a couple of pieces:
1. Find the affected VMA(s)
2. Split the start/end one(s) if neceesary
3. Pull the VMAs out of the rbtree
4. Actually zap the memory via unmap_region(), including
freeing page tables (or queueing them to be freed).
5. Fix up some of the accounting (like fput()) and actually
free the VMA itself.
This specific ordering was actually introduced by:
dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
during the 4.20 merge window. The previous __do_munmap() code
was actually safe because the only thing after arch_unmap() was
remove_vma_list(). arch_unmap() could not see 'vma' in the
rbtree because it was detached, so it is not even capable of
doing operations unsafe for remove_vma_list()'s use of 'vma'.
Richard Biener reported a test that shows this in dmesg:
[1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551
[1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576
What triggered this was the recursive do_munmap() called via
arch_unmap(). It was freeing page tables that has not been
properly zapped.
But, the problem was bigger than this. For one, arch_unmap()
can free VMAs. But, the calling __do_munmap() has variables
that *point* to VMAs and obviously can't handle them just
getting freed while the pointer is still in use.
I tried a couple of things here. First, I tried to fix the page
table freeing problem in isolation, but I then found the VMA
issue. I also tried having the MPX code return a flag if it
modified the rbtree which would force __do_munmap() to re-walk
to restart. That spiralled out of control in complexity pretty
fast.
Just moving arch_unmap() and accepting that the bonkers failure
case might eat some bounds tables seems like the simplest viable
fix.
This was also reported in the following kernel bugzilla entry:
There are some reports that this commit triggered this bug:
dd2283f2605 ("mm: mmap: zap pages with read mmap_sem in munmap")
While that commit certainly made the issues easier to hit, I believe
the fundamental issue has been with us as long as MPX itself, thus
the Fixes: tag below is for one of the original MPX commits.
[ mingo: Minor edits to the changelog and the patch. ]
Reported-by: Richard Biener <rguenther@suse.de> Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-um@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: stable@vger.kernel.org Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The sample timestamp is updated to ensure that the timestamp represents
the time of the sample and not a branch that the decoder is still
walking towards. The sample timestamp is updated when the decoder
returns, but the decoder does not return for non-taken branches. Update
the sample timestamp then also.
Note that commit 3f04d98e972b5 ("perf intel-pt: Improve sample
timestamp") was also a stable fix and appears, for example, in v4.4
stable tree as commit a4ebb58fd124 ("perf intel-pt: Improve sample
timestamp").
The decoder uses its current timestamp in samples. Usually that is a
timestamp that has already passed, but in some cases it is a timestamp
for a branch that the decoder is walking towards, and consequently
hasn't reached.
The intel_pt_sample_time() function decides which is which, but was not
handling TNT packets exactly correctly.
In the case of TNT, the timestamp applies to the first branch, so the
decoder must first walk to that branch.
That means intel_pt_sample_time() should return true for TNT, and this
patch makes that change. However, if the first branch is a non-taken
branch (i.e. a 'N'), then intel_pt_sample_time() needs to return false
for subsequent taken branches in the same TNT packet.
To handle that, introduce a new state INTEL_PT_STATE_TNT_CONT to
distinguish the cases.
Note that commit 3f04d98e972b5 ("perf intel-pt: Improve sample
timestamp") was also a stable fix and appears, for example, in v4.4
stable tree as commit a4ebb58fd124 ("perf intel-pt: Improve sample
timestamp").
The timestamp used to determine if an instruction sample is made, is an
estimate based on the number of instructions since the last known
timestamp. A consequence is that it might go backwards, which results in
extra samples. Change it so that a sample is only made when the
timestamp goes forwards.
Note this does not affect a sampling period of 0 or sampling periods
specified as a count of instructions.
Example:
Before:
$ perf script --itrace=i10us
ls 13812 [003] 2167315.222583: 3270 instructions:u: 7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 30902 instructions:u: 7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 10 instructions:u: 7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 8 instructions:u: 7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 14 instructions:u: 7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 6 instructions:u: 7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 14 instructions:u: 7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222667: 4 instructions:u: 7fac71e2dab2 _dl_cache_libcmp+0xd2 (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222728: 16423 instructions:u: 7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so)
ls 13812 [003] 2167315.222734: 12731 instructions:u: 7fac71e27938 _dl_name_match_p+0x68 (/lib/x86_64-linux-gnu/ld-2.28.so)
...
Multiplying the Memory Controller clock rate by the tick count results
in an integer overflow and in result the truncated tick value is being
programmed into hardware, such that the GR3D memory client performance is
reduced by two times.
Fix to make the type of $comm "string". If we set the other type to $comm
argument, it shows meaningless value or wrong data. Currently probe events
allow us to set string array type (e.g. ":string[2]"), or other digit types
like x8 on $comm. But since clearly $comm is just a string data, it should
not be fetched by other types including array.
When reading only part of the id file, the ppos isn't tracked correctly.
This is taken care by simple_read_from_buffer.
Reading a single byte, and then the next byte would result EOF.
While this seems like not a big deal, this breaks abstractions that
reads information from files unbuffered. See for example
https://github.com/golang/go/issues/29399
This code was mentioned as problematic in
commit cd458ba9d5a5
("tracing: Do not (ab)use trace_seq in event_id_read()")
Nicolai Stange discovered[1] that if live kernel patching is enabled, and the
function tracer started tracing the same function that was patched, the
conversion of the fentry call site during the translation of going from
calling the live kernel patch trampoline to the iterator trampoline, would
have as slight window where it didn't call anything.
As live kernel patching depends on ftrace to always call its code (to
prevent the function being traced from being called, as it will redirect
it). This small window would allow the old buggy function to be called, and
this can cause undesirable results.
Nicolai submitted new patches[2] but these were controversial. As this is
similar to the static call emulation issues that came up a while ago[3].
But after some debate[4][5] adding a gap in the stack when entering the
breakpoint handler allows for pushing the return address onto the stack to
easily emulate a call.
In order to allow breakpoints to emulate call instructions, they need to push
the return address onto the stack. The x86_64 int3 handler adds a small gap
to allow the stack to grow some. Use this gap to add the return address to
be able to emulate a call instruction at the breakpoint location.
These helper functions are added:
int3_emulate_jmp(): changes the location of the regs->ip to return there.
(The next two are only for x86_64)
int3_emulate_push(): to push the address onto the gap in the stack
int3_emulate_call(): push the return address and change regs->ip
Cc: Andy Lutomirski <luto@kernel.org> Cc: Nicolai Stange <nstange@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: the arch/x86 maintainers <x86@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nayna Jain <nayna@linux.ibm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@vger.kernel.org> Cc: stable@vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Modified to only work for x86_64 and added comment to int3_emulate_push() ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To allow an int3 handler to emulate a call instruction, it must be able to
push a return address onto the stack. Add a gap to the stack to allow the
int3 handler to push the return address and change the return from int3 to
jump straight to the emulated called function target.
xfstest generic/452 was triggering a "Busy inodes after umount" warning.
ceph was allowing the mount to go read-only without first flushing out
dirty inodes in the cache. Ensure we sync out the filesystem before
allowing a remount to proceed.
Both Tegra30 and Tegra114 have 4 ASID's and the corresponding bitfield of
the TLB_FLUSH register differs from later Tegra generations that have 128
ASID's.
In a result the PTE's are now flushed correctly from TLB and this fixes
problems with graphics (randomly failing tests) on Tegra30.
Use gen_rtx_set instead of gen_rtx_SET. The former is a wrapper macro
that handles the difference between GCC versions implementing
the latter.
This fixes the following error on my system with g++ 5.4.0 as the host
compiler
HOSTCXX -fPIC scripts/gcc-plugins/arm_ssp_per_task_plugin.o
scripts/gcc-plugins/arm_ssp_per_task_plugin.c:42:14: error: macro "gen_rtx_SET" requires 3 arguments, but only 2 given
mask)),
^
scripts/gcc-plugins/arm_ssp_per_task_plugin.c: In function ‘unsigned int arm_pertask_ssp_rtl_execute()’:
scripts/gcc-plugins/arm_ssp_per_task_plugin.c:39:20: error: ‘gen_rtx_SET’ was not declared in this scope
emit_insn_before(gen_rtx_SET
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Fixes: 189af4657186 ("ARM: smp: add support for per-task stack canaries") Cc: stable@vger.kernel.org Tested-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Generalize the helper ovl_open_maybe_copy_up() and use it to copy up file
with data before FS_IOC_SETFLAGS ioctl.
The FS_IOC_SETFLAGS ioctl is a bit of an odd ball in vfs, which probably
caused the confusion. File may be open O_RDONLY, but ioctl modifies the
file. VFS does not call mnt_want_write_file() nor lock inode mutex, but
fs-specific code for FS_IOC_SETFLAGS does. So ovl_ioctl() calls
mnt_want_write_file() for the overlay file, and fs-specific code calls
mnt_want_write_file() for upper fs file, but there was no call for
ovl_want_write() for copy up duration which prevents overlayfs from copying
up on a frozen upper fs.
Rendering calls may be done simultaneously from the workqueue,
dlfb_ops_write, dlfb_ops_ioctl, dlfb_ops_set_par and dlfb_dpy_deferred_io.
The code is robust enough so that it won't crash on concurrent rendering.
However, concurrent rendering may cause display corruption if the same
pixel is simultaneously being rendered. In order to avoid this corruption,
this patch adds a mutex around the rendering calls.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Bernie Thompson <bernie@plugable.com> Cc: Ladislav Michl <ladis@linux-mips.org> Cc: <stable@vger.kernel.org>
[b.zolnierkie: replace "dlfb:" with "uldfb:" in the patch summary] Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>