Currently, the drop_caches proc file and sysctl read back the last value
written, suggesting this is somehow a stateful setting instead of a
one-time command. Make it write-only, like e.g. compact_memory.
While mitigating a VM problem at scale in our fleet, there was confusion
about whether writing to this file will permanently switch the kernel into
a non-caching mode. This influences the decision making in a tense
situation, where tens of people are trying to fix tens of thousands of
affected machines: Do we need a rollback strategy? What are the
performance implications of operating in a non-caching state for several
days? It also caused confusion when the kernel team said we may need to
write the file several times to make sure it's effective ("But it already
reads back 3?").
Link: http://lkml.kernel.org/r/20191031221602.9375-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
It is assumed that the hugetlbfs_vfsmount[] array will contain either a
valid vfsmount pointer or NULL for each hstate after initialization.
Changes made while converting to use fs_context broke this assumption.
While fixing the hugetlbfs_vfsmount issue, it was discovered that
init_hugetlbfs_fs never did correctly clean up when encountering a vfs
mount error.
It was found during code inspection. A small memory allocation failure
would be the most likely cause of taking a error path with the bug.
This is unlikely to happen as this is early init code.
Link: http://lkml.kernel.org/r/94b6244d-2c24-e269-b12c-e3ba694b242d@oracle.com Reported-by: Chengguang Xu <cgxu519@mykernel.net> Fixes: 32021982a324 ("hugetlbfs: Convert to fs_context") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Each SBDT is located at a 4KB page and contains 512 entries.
Each entry of a SDBT points to a SDB, a 4KB page containing
sampled data. The last entry is a link to another SDBT page.
When an event is created the function sequence executed is:
Both functions realloc_sampling_buffers() and
alloc_sample_data_block() allocate pages and the allocation
can fail. This is handled correctly and all allocated
pages are freed and error -ENOMEM is returned to the
top calling function. Finally the event is not created.
Once the event has been created, the amount of initially
allocated SDBT and SDB can be too low. This is detected
during measurement interrupt handling, where the amount
of lost samples is calculated. If the number of lost samples
is too high considering sampling frequency and already allocated
SBDs, the number of SDBs is enlarged during the next execution
of cpumsf_pmu_enable().
are called to allocate more pages. Page allocation may fail
and the returned error is ignored. A SDBT and SDB setup
already exists.
However the modified SDBTs and SDBs might end up in a situation
where the first entry of an SDBT does not point to an SDB,
but another SDBT, basicly an SBDT without payload.
This can not be handled by the interrupt handler, where an SDBT
must have at least one entry pointing to an SBD.
Add a check to avoid SDBTs with out payload (SDBs) when enlarging
the buffer setup.
Currently unwinder unconditionally returns %r14 from the first frame
pointed by %r15 from pt_regs. A task could be interrupted when a function
already allocated this frame (if it needs it) for its callees or to
store local variables. In that case this frame would contain random
values from stack or values stored there by a callee. As we are only
interested in %r14 to get potential return address, skip bogus return
addresses which doesn't belong to kernel text.
This helps to avoid duplicating filtering logic in unwider users, most
of which use unwind_get_return_address() and would choke on bogus 0
address returned by it otherwise.
This patch introduces support for a new architectured reply
code 0x8B indicating that a hypervisor layer (if any) has
rejected an ap message.
Linux may run as a guest on top of a hypervisor like zVM
or KVM. So the crypto hardware seen by the ap bus may be
restricted by the hypervisor for example only a subset like
only clear key crypto requests may be supported. Other
requests will be filtered out - rejected by the hypervisor.
The new reply code 0x8B will appear in such cases and needs
to get recognized by the ap bus and zcrypt device driver zoo.
To avoid breaking the build on arches where this is not wired up, at
least all the other features should be made available and when using
this specific routine, the "unknown" should point the user/developer to
the need to wire this up on this particular hardware architecture.
Detected in a container mipsel debian cross build environment, where it
shows up as:
In file included from /usr/mipsel-linux-gnu/include/stdio.h:867,
from /git/linux/tools/perf/lib/include/perf/cpumap.h:6,
from util/session.c:13:
In function 'printf',
inlined from 'regs_dump__printf' at util/session.c:1103:3,
inlined from 'regs__printf' at util/session.c:1131:2:
/usr/mipsel-linux-gnu/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=]
107 | return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/mips64-linux-gnuabi64/include/stdio.h:867,
from /git/linux/tools/perf/lib/include/perf/cpumap.h:6,
from util/session.c:13:
In function 'printf',
inlined from 'regs_dump__printf' at util/session.c:1103:3,
inlined from 'regs__printf' at util/session.c:1131:2,
inlined from 'regs_user__printf' at util/session.c:1139:3,
inlined from 'dump_sample' at util/session.c:1246:3,
inlined from 'machines__deliver_event' at util/session.c:1421:3:
/usr/mips64-linux-gnuabi64/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=]
107 | return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function 'printf',
inlined from 'regs_dump__printf' at util/session.c:1103:3,
inlined from 'regs__printf' at util/session.c:1131:2,
inlined from 'regs_intr__printf' at util/session.c:1147:3,
inlined from 'dump_sample' at util/session.c:1249:3,
inlined from 'machines__deliver_event' at util/session.c:1421:3:
/usr/mips64-linux-gnuabi64/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=]
107 | return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
brstackinsn must be allowed to be set by the user when AUX area data has
been captured because, in that case, the branch stack might be
synthesized on the fly. This fixes the following error:
Before:
$ perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' grep -rqs jhgjhg /boot
[ perf record: Woken up 19 times to write data ]
[ perf record: Captured and wrote 2.274 MB perf.data ]
$ perf script -F +brstackinsn --xed --itrace=i1usl100 | head
Display of branch stack assembler requested, but non all-branch filter set
Hint: run 'perf record -b ...'
After:
$ perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' grep -rqs jhgjhg /boot
[ perf record: Woken up 19 times to write data ]
[ perf record: Captured and wrote 2.274 MB perf.data ]
$ perf script -F +brstackinsn --xed --itrace=i1usl100 | head
grep 13759 [002] 8091.310257: 1862 instructions:uH: 5641d58069eb bmexec+0x86b (/bin/grep)
bmexec+2485: 00005641d5806b35 jnz 0x5641d5806bd0 # MISPRED 00005641d5806bd0 movzxb (%r13,%rdx,1), %eax 00005641d5806bd6 add %rdi, %rax 00005641d5806bd9 movzxb -0x1(%rax), %edx 00005641d5806bdd cmp %rax, %r14 00005641d5806be0 jnb 0x5641d58069c0 # MISPRED
mismatch of LBR data and executable 00005641d58069c0 movzxb (%r13,%rdx,1), %edi
Fixes: 48d02a1d5c13 ("perf script: Add 'brstackinsn' for branch stacks") Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191127095322.15417-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
To fix these build errors on a debian mipsel cross build environment:
builtin-diff.c: In function 'block_cycles_diff_cmp':
builtin-diff.c:550:6: error: absolute value function 'labs' given an argument of type 's64' {aka 'long long int'} but has parameter of type 'long int' which may cause truncation of value [-Werror=absolute-value]
550 | l = labs(left->diff.cycles);
| ^~~~
builtin-diff.c:551:6: error: absolute value function 'labs' given an argument of type 's64' {aka 'long long int'} but has parameter of type 'long int' which may cause truncation of value [-Werror=absolute-value]
551 | r = labs(right->diff.cycles);
| ^~~~
Fixes: 99150a1faab2 ("perf diff: Use hists to manage basic blocks per symbol") Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-pn7szy5uw384ntjgk6zckh6a@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch moves the final part of the cifsFileInfo_put() logic where we
need a write lock on lock_sem to be processed in a separate thread that
holds no other locks.
This is to prevent deadlocks like the one below:
Reported-by: Frank Sorenson <sorenson@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In attach_node_and_children memory is allocated for full_name via
kasprintf. If the condition of the 1st if is not met the function
returns early without freeing the memory. Add a kfree() to fix that.
This has been detected with kmemleak: Link: https://bugzilla.kernel.org/show_bug.cgi?id=205327
It looks like the leak was introduced by this commit: Fixes: 5babefb7f7ab ("of: unittest: allow base devicetree to have symbol metadata") Signed-off-by: Erhard Furtner <erhard_f@mailbox.org> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We currently rely on the ring destroy on cleaning things up in case of
failure, but io_allocate_scq_urings() can leave things half initialized
if only parts of it fails.
Be nice and return with either everything setup in success, or return an
error with things nicely cleaned up.
When we get an interrupt from the socket getting readable,
and start reading, there's a possibility for a race. This
depends on the implementation of the device, but e.g. with
qemu's libvhost-user, we can see:
device virtio_uml
---------------------------------------
write header
get interrupt
read header
read body -> returns -EAGAIN
write body
The -EAGAIN return is because the socket is non-blocking,
and then this leads us to abandon this message.
In fact, we've already read the header, so when the get
another signal/interrupt for the body, we again read it
as though it's a new message header, and also abandon it
for the same reason (wrong size etc.)
This essentially breaks things, and if that message was
one that required a response, it leads to a deadlock as
the device is waiting for the response but we'll never
reply.
Fix this by spinning on -EAGAIN as well when we read the
message body. We need to handle -EAGAIN as "no message"
while reading the header, since we share an interrupt.
Note that this situation is highly unlikely to occur in
normal usage, since there will be very few messages and
only in the startup phase. With the inband call feature
this does tend to happen (eventually) though.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
Ensure we grab an active reference in cifs superblock while doing
failover to prevent automounts (DFS links) of expiring and then
destroying the superblock pointer.
This patch fixes the following KASAN report:
[ 464.301462] BUG: KASAN: use-after-free in
cifs_reconnect+0x6ab/0x1350
[ 464.303052] Read of size 8 at addr ffff888155e580d0 by task
cifsd/1107
[ 464.367005] Memory state around the buggy address:
[ 464.367787] ffff888155e57f80: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 464.368877] ffff888155e58000: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 464.369967] >ffff888155e58080: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 464.371111] ^
[ 464.371775] ffff888155e58100: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 464.372893] ffff888155e58180: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 464.373983] ==================================================================
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When building pseries_defconfig, building vdso32 errors out:
error: unknown target ABI 'elfv1'
This happens because -m32 in clang changes the target to 32-bit,
which does not allow the ABI to be changed.
Commit 4dc831aa8813 ("powerpc: Fix compiling a BE kernel with a
powerpc64le toolchain") added these flags to fix building big endian
kernels with a little endian GCC.
Clang doesn't need -mabi because the target triple controls the
default value. -mlittle-endian and -mbig-endian manipulate the triple
into either powerpc64-* or powerpc64le-*, which properly sets the
default ABI.
Adding a debug print out in the PPC64TargetInfo constructor after line
383 above shows this:
Don't specify -mabi when building with clang to avoid the build error
with -m32 and not change any code generation.
-mcall-aixdesc is not an implemented flag in clang so it can be safely
excluded as well, see commit 238abecde8ad ("powerpc: Don't use gcc
specific options on clang").
pseries_defconfig successfully builds after this patch and
powernv_defconfig and ppc44x_defconfig don't regress.
Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
[mpe: Trim clang links in change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191119045712.39633-2-natechancellor@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
build_initial_tok_table() overwrites unused sym_entry to shrink the
table size. Before the entry is overwritten, table[i].sym must be freed
since it is malloc'ed data.
This fixes the 'definitely lost' report from valgrind. I ran valgrind
against x86_64_defconfig of v5.4-rc8 kernel, and here is the summary:
[Before the fix]
LEAK SUMMARY:
definitely lost: 53,184 bytes in 2,874 blocks
[After the fix]
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
find_vma() must be called under the mmap_sem, reorganize this code to
do the vma check after entering the lock.
Further, fix the unlocked use of struct task_struct's mm, instead use
the mm from hmm_mirror which has an active mm_grab. Also the mm_grab
must be converted to a mm_get before acquiring mmap_sem or calling
find_vma().
Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers") Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads") Link: https://lore.kernel.org/r/20191112202231.3856-11-jgg@ziepe.ca Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The sanity check in macro update_for_len checks to see if len
is less than zero, however, len is a size_t so it can never be
less than zero, so this sanity check is a no-op. Fix this by
making len a ssize_t so the comparison will work and add ulen
that is a size_t copy of len so that the min() macro won't
throw warnings about comparing different types.
Addresses-Coverity: ("Macro compares unsigned to 0") Fixes: f1bd904175e8 ("apparmor: add the base fns() for domain labels") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The crash handler calls hv_synic_cleanup() to shutdown the
Hyper-V synthetic interrupt controller. But if the CPU
that calls hv_synic_cleanup() has a VMbus channel interrupt
assigned to it (which is likely the case in smaller VM sizes),
hv_synic_cleanup() returns an error and the synthetic
interrupt controller isn't shutdown. While the lack of
being shutdown hasn't caused a known problem, it still
should be fixed for highest reliability.
So directly call hv_synic_disable_regs() instead of
hv_synic_cleanup(), which ensures that the synic is always
shutdown.
Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
It is possible that certain config levels are not available, even
if the max level includes the level. There can be missing levels in
some platforms. So ignore the level when called for information dump
for all levels and fail if specifically ask for the missing level.
Here the changes is to continue reading information about other levels
even if we fail to get information for the current level. But use the
"processed" flag to indicate the failure. When the "processed" flag is
not set, don't dump information about that level.
When commit 75e99bf5ed8f ("gpio: lynxpoint: set default handler to be
handle_bad_irq()") switched default handler to be handle_bad_irq() the
lp_irq_type() function remained untouched. It means that even request_irq()
can't change the handler and we are not able to handle IRQs properly anymore.
Fix it by setting correct handlers in the lp_irq_type() callback.
Fixes: 75e99bf5ed8f ("gpio: lynxpoint: set default handler to be handle_bad_irq()") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20191118180251.31439-1-andriy.shevchenko@linux.intel.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The per-SoC devtype structures can contain their own callbacks that
overwrite mpc8xxx_gpio_devtype_default.
The clear intention is that mpc8xxx_irq_set_type is used in case the SoC
does not specify a more specific callback. But what happens is that if
the SoC doesn't specify one, its .irq_set_type is de-facto NULL, and
this overwrites mpc8xxx_irq_set_type to a no-op. This means that the
following SoCs are affected:
On these boards, the irq_set_type does exactly nothing, and the GPIO
controller keeps its GPICR register in the hardware-default state. On
the LS1028A, that is ACTIVE_BOTH, which means 2 interrupts are raised
even if the IRQ client requests LEVEL_HIGH. Another implication is that
the IRQs are not checked (e.g. level-triggered interrupts are not
rejected, although they are not supported).
Fixes: 82e39b0d8566 ("gpio: mpc8xxx: handle differences between incarnations at a single place") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20191115125551.31061-1-olteanv@gmail.com Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Intel's SoCs follow a naming convention which spells out the SoC name as
two words instead of one word (E.g: Cannon Lake vs Cannonlake). Thus fix
the naming inconsistency across the intel_pmc_core driver, so future
SoCs can follow the naming consistency as below.
Cometlake -> Comet Lake
Tigerlake -> Tiger Lake
Elkhartlake -> Elkhart Lake
Cc: Mario Limonciello <mario.limonciello@dell.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: David E. Box <david.e.box@intel.com> Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> Cc: Tony Luck <tony.luck@intel.com> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Reduce context close time by skipping the VA block free list update in
order to avoid hard reset with open contexts.
Reset with open contexts can potentially lead to a kernel crash as the
generic pool of the MMU hops is destroyed while it is not empty because
some unmap operations are not done.
The commit affect mainly when running on simulator.
The root cause for this issue is there is a potential infinite loop
in f2fs_drop_inmem_pages_all() for the case where gc_failure is true
and when there an inode whose i_gc_failures[GC_FAILURE_ATOMIC] is
not set. Fix this by keeping track of the total atomic files
currently opened and using that to exit from this condition.
The iSCSI target driver is the only target driver that does not wait for
ongoing commands to finish before freeing a session. Make the iSCSI target
driver wait for ongoing commands to finish before freeing a session. This
patch fixes the following KASAN complaint:
BUG: KASAN: use-after-free in __lock_acquire+0xb1a/0x2710
Read of size 8 at addr ffff8881154eca70 by task kworker/0:2/247
Memory state around the buggy address: ffff8881154ec900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881154ec980: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>ffff8881154eca00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8881154eca80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8881154ecb00: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
Cc: Mike Christie <mchristi@redhat.com> Link: https://lore.kernel.org/r/20191113220508.198257-3-bvanassche@acm.org Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If a faulty initiator fails to bind the socket to the iSCSI connection
before emitting a command, for instance, a subsequent send_pdu, it will
crash the kernel due to a null pointer dereference in sock_sendmsg(), as
shown in the log below. This patch makes sure the bind succeeded before
trying to use the socket.
Fix up possible unclocked register access to auto hibern8 register in
resume path and through sysfs entry. Meanwhile, enable auto hibern8 only
after device is fully initialized in probe path.
Memory state around the buggy address: ffff88802ecd1700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88802ecd1780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88802ecd1800: fb fb fb fb fc fc fc fc fc fc fc fc fb fb fb fb
^ ffff88802ecd1880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88802ecd1900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Cc: Mike Christie <mchristi@redhat.com> Link: https://lore.kernel.org/r/20191113220508.198257-2-bvanassche@acm.org Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add a module parameter to inhibit disconnect/reselect for individual
targets. This gains compatibility with Aztec PowerMonster SCSI/SATA
adapters with buggy firmware. (No fix is available from the vendor.)
Apparently these adapters pass-through the product/vendor of the attached
SATA device. Since they can't be identified from the response to an INQUIRY
command, a device blacklist flag won't work.
During clock gating (ufshcd_gate_work()), we first put the link hibern8 by
calling ufshcd_uic_hibern8_enter() and if ufshcd_uic_hibern8_enter()
returns success (0) then we gate all the clocks. Now let’s zoom in to what
ufshcd_uic_hibern8_enter() does internally: It calls
__ufshcd_uic_hibern8_enter() and if failure is encountered, link recovery
shall put the link back to the highest HS gear and returns success (0) to
ufshcd_uic_hibern8_enter() which is the issue as link is still in active
state due to recovery! Now ufshcd_uic_hibern8_enter() returns success to
ufshcd_gate_work() and hence it goes ahead with gating the UFS clock while
link is still in active state hence I believe controller would raise UIC
error interrupts. But when we service the interrupt, clocks might have
already been disabled!
This change fixes for this by returning failure from
__ufshcd_uic_hibern8_enter() if recovery succeeds as link is still not in
hibern8, upon receiving the error ufshcd_hibern8_enter() would initiate
retry to put the link state back into hibern8.
Link: https://lore.kernel.org/r/1573798172-20534-8-git-send-email-cang@codeaurora.org Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Driver was missing complete() call in mpi_sata_completion which result in
SATA abort error handling timing out. That causes the device to be left in
the in_recovery state so subsequent commands sent to the device fail and
the OS removes access to it.
Link: https://lore.kernel.org/r/20191114100910.6153-2-deepak.ukey@microchip.com Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: peter chang <dpf@google.com> Signed-off-by: Deepak Ukey <deepak.ukey@microchip.com> Signed-off-by: Viswas G <Viswas.G@microchip.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Modify back __set_fixmap() to using __fix_to_virt() instead
of fix_to_virt() otherwise the following happens because it
seems GCC doesn't see idx as a builtin const.
CC mm/early_ioremap.o
In file included from ./include/linux/kernel.h:11:0,
from mm/early_ioremap.c:11:
In function ‘fix_to_virt’,
inlined from ‘__set_fixmap’ at ./arch/powerpc/include/asm/fixmap.h:87:2,
inlined from ‘__early_ioremap’ at mm/early_ioremap.c:156:4:
./include/linux/compiler.h:350:38: error: call to ‘__compiletime_assert_32’ declared with attribute error: BUILD_BUG_ON failed: idx >= __end_of_fixed_addresses
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^
./include/linux/compiler.h:331:4: note: in definition of macro ‘__compiletime_assert’
prefix ## suffix(); \
^
./include/linux/compiler.h:350:2: note: in expansion of macro ‘_compiletime_assert’
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^
./include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^
./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
^
./include/asm-generic/fixmap.h:32:2: note: in expansion of macro ‘BUILD_BUG_ON’
BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
^
The struct cdev is embedded in the struct watchdog_core_data. In the
current code, we manage the watchdog_core_data with a kref, but the
cdev is manged by a kobject. There is no any relationship between
this kref and kobject. So it is possible that the watchdog_core_data is
freed before the cdev is entirely released. We can easily get the
following call trace with CONFIG_DEBUG_KOBJECT_RELEASE and
CONFIG_DEBUG_OBJECTS_TIMERS enabled.
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x38
WARNING: CPU: 23 PID: 1028 at lib/debugobjects.c:481 debug_print_object+0xb0/0xf0
Modules linked in: softdog(-) deflate ctr twofish_generic twofish_common camellia_generic serpent_generic blowfish_generic blowfish_common cast5_generic cast_common cmac xcbc af_key sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
CPU: 23 PID: 1028 Comm: modprobe Not tainted 5.3.0-next-20190924-yoctodev-standard+ #180
Hardware name: Marvell OcteonTX CN96XX board (DT)
pstate: 00400009 (nzcv daif +PAN -UAO)
pc : debug_print_object+0xb0/0xf0
lr : debug_print_object+0xb0/0xf0
sp : ffff80001cbcfc70
x29: ffff80001cbcfc70 x28: ffff800010ea2128
x27: ffff800010bad000 x26: 0000000000000000
x25: ffff80001103c640 x24: ffff80001107b268
x23: ffff800010bad9e8 x22: ffff800010ea2128
x21: ffff000bc2c62af8 x20: ffff80001103c600
x19: ffff800010e867d8 x18: 0000000000000060
x17: 0000000000000000 x16: 0000000000000000
x15: ffff000bd7240470 x14: 6e6968207473696c
x13: 5f72656d6974203a x12: 6570797420746365
x11: 6a626f2029302065 x10: 7461747320657669
x9 : 7463612820657669 x8 : 3378302f3078302b
x7 : 0000000000001d7a x6 : ffff800010fd5889
x5 : 0000000000000000 x4 : 0000000000000000
x3 : 0000000000000000 x2 : ffff000bff948548
x1 : 276a1c9e1edc2300 x0 : 0000000000000000
Call trace:
debug_print_object+0xb0/0xf0
debug_check_no_obj_freed+0x1e8/0x210
kfree+0x1b8/0x368
watchdog_cdev_unregister+0x88/0xc8
watchdog_dev_unregister+0x38/0x48
watchdog_unregister_device+0xa8/0x100
softdog_exit+0x18/0xfec4 [softdog]
__arm64_sys_delete_module+0x174/0x200
el0_svc_handler+0xd0/0x1c8
el0_svc+0x8/0xc
This is a common issue when using cdev embedded in a struct.
Fortunately, we already have a mechanism to solve this kind of issue.
Please see commit 233ed09d7fda ("chardev: add helper function to
register char devs with a struct device") for more detail.
In this patch, we choose to embed the struct device into the
watchdog_core_data, and use the API provided by the commit 233ed09d7fda
to make sure that the release of watchdog_core_data and cdev are
in sequence.
When PREEMPT_RT is enabled, all hrtimer expiry functions are
deferred for execution into the context of ksoftirqd unless otherwise
annotated.
Deferring the expiry of the hrtimer used by the watchdog core, however,
is a waste, as the callback does nothing but queue a kthread work item
and wakeup watchdogd.
It's worst then that, too: the deferral through ksoftirqd also means
that for correct behavior a user must adjust the scheduling parameters
of both watchdogd _and_ ksoftirqd, which is unnecessary and has other
side effects (like causing unrelated expiry functions to execute at
potentially elevated priority).
Instead, mark the hrtimer used by the watchdog core as being _HARD to
allow it's execution directly from hardirq context. The work done in
this expiry function is well-bounded and minimal.
A user still must adjust the scheduling parameters of the watchdogd
to be correct w.r.t. their application needs.
Link: https://lkml.kernel.org/r/0e02d8327aeca344096c246713033887bc490dd7.1538089180.git.julia@ni.com Cc: Guenter Roeck <linux@roeck-us.net> Reported-and-tested-by: Steffen Trumtrar <s.trumtrar@pengutronix.de> Reported-by: Tim Sander <tim@krieglstein.org> Signed-off-by: Julia Cartwright <julia@ni.com> Acked-by: Guenter Roeck <linux@roeck-us.net>
[bigeasy: use only HRTIMER_MODE_REL_HARD] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20191105144506.clyadjbvnn7b7b2m@linutronix.de Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The following hang is observed when a 'reboot' command is issued:
# reboot
# Stopping network: OK
Stopping klogd: OK
Stopping syslogd: OK
umount: devtmpfs busy - remounted read-only
[ 8.612079] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system reboot
[ 10.694753] reboot: Restarting system
[ 11.699008] Reboot failed -- System halted
In the event that the RMI device is unreachable, the calls to rmi_set_mode() or
rmi_set_page() will fail before registering the RMI transport device. When the
device is removed, rmi_remove() will call rmi_unregister_transport_device()
which will attempt to access the rmi_dev pointer which was not set.
This patch adds a check of the RMI_STARTED bit before calling
rmi_unregister_transport_device(). The RMI_STARTED bit is only set
after rmi_register_transport_device() completes successfully.
The kernel oops was reported in this message:
https://www.spinics.net/lists/linux-input/msg58433.html
[jkosina@suse.cz: reworded changelog as agreed with Andrew] Signed-off-by: Andrew Duggan <aduggan@synaptics.com> Reported-by: Federico Cerutti <federico@ceres-c.it> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
However, some devices, namely Microsoft's Surface line of products
instead implement a "segmented device certification report" (usage 0xC6)
which returns the same report, but in smaller chunks.
drivers/nvdimm/btt.c: In function 'btt_read_pg':
drivers/nvdimm/btt.c:1264:8: warning: variable 'rc' set but not used
[-Wunused-but-set-variable]
int rc;
^~
Add a ratelimited message in case a storm of errors is encountered.
Fixes: d9b83c756953 ("libnvdimm, btt: rework error clearing") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/1572530719-32161-1-git-send-email-cai@lca.pw Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the default processor handling was added to the function
cpu_v7_spectre_init() it only excluded other ARM implemented processor
cores. The Broadcom Brahma B53 core is not implemented by ARM so it
ended up falling through into the set of processors that attempt to use
the ARM_SMCCC_ARCH_WORKAROUND_1 service to harden the branch predictor.
Since this workaround is not necessary for the Brahma-B53 this commit
explicitly checks for it and prevents it from applying a branch
predictor hardening workaround.
Fixes: 10115105cb3a ("ARM: spectre-v2: add firmware based hardening") Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
My Logitech M185 (PID:4038) 2.4 GHz wireless HID++ mouse is causing
intermittent errors like these in the log:
[11091.034857] logitech-hidpp-device 0003:046D:4038.0006: hidpp20_batterylevel_get_battery_capacity: received protocol error 0x09
[12388.031260] logitech-hidpp-device 0003:046D:4038.0006: hidpp20_batterylevel_get_battery_capacity: received protocol error 0x09
[16613.718543] logitech-hidpp-device 0003:046D:4038.0006: hidpp20_batterylevel_get_battery_capacity: received protocol error 0x09
[23529.938728] logitech-hidpp-device 0003:046D:4038.0006: hidpp20_batterylevel_get_battery_capacity: received protocol error 0x09
We are already silencing error-code 0x09 (HIDPP_ERROR_RESOURCE_ERROR)
errors in other places, lets do the same in
hidpp20_batterylevel_get_battery_capacity to remove these harmless,
but scary looking errors from the dmesg output.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
Schema errors can cause make to exit before useful information is
printed. This leaves developers wondering what's wrong. It can be
overcome passing '-k' to make, but that's not an obvious solution.
There's 2 scenarios where this happens.
When using DT_SCHEMA_FILES to validate with a single schema, any error
in the schema results in processed-schema.yaml being empty causing a
make error. The result is the specific errors in the schema are never
shown because processed-schema.yaml is the first target built. Simply
making processed-schema.yaml last in extra-y ensures the full schema
validation with detailed error messages happen first.
The 2nd problem is while schema errors are ignored for
processed-schema.yaml, full validation of the schema still runs in
parallel and any schema validation errors will still stop the build when
running validation of dts files. The fix is to not add the schema
examples to extra-y in this case. This means 'dtbs_check' is no longer a
superset of 'dt_binding_check'. Update the documentation to make this
clear.
Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Tested-by: Jeffrey Hugo <jhugo@codeaurora.org> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The PixArt OEM mouse disconnets/reconnects every minute on
Linux. All contents of dmesg are repetitive:
[ 1465.810014] usb 1-2.2: USB disconnect, device number 20
[ 1467.431509] usb 1-2.2: new low-speed USB device number 21 using xhci_hcd
[ 1467.654982] usb 1-2.2: New USB device found, idVendor=03f0,idProduct=1f4a, bcdDevice= 1.00
[ 1467.654985] usb 1-2.2: New USB device strings: Mfr=1, Product=2,SerialNumber=0
[ 1467.654987] usb 1-2.2: Product: HP USB Optical Mouse
[ 1467.654988] usb 1-2.2: Manufacturer: PixArt
[ 1467.699722] input: PixArt HP USB Optical Mouse as /devices/pci0000:00/0000:00:07.1/0000:05:00.3/usb1/1-2/1-2.2/1-2.2:1.0/0003:03F0:1F4A.0012/input/input19
[ 1467.700124] hid-generic 0003:03F0:1F4A.0012: input,hidraw0: USB HID v1.11 Mouse [PixArt HP USB Optical Mouse] on usb-0000:05:00.3-2.2/input0
So add HID_QUIRK_ALWAYS_POLL for this one as well.
Test the patch, the mouse is no longer disconnected and there are no
duplicate logs in dmesg.
In bch_mca_scan(), the number of shrinking btree node is calculated
by code like this,
unsigned long nr = sc->nr_to_scan;
nr /= c->btree_pages;
nr = min_t(unsigned long, nr, mca_can_free(c));
variable sc->nr_to_scan is number of objects (here is bcache B+tree
nodes' number) to shrink, and pointer variable sc is sent from memory
management code as parametr of a callback.
If sc->nr_to_scan is smaller than c->btree_pages, after the above
calculation, variable 'nr' will be 0 and nothing will be shrunk. It is
frequeently observed that only 1 or 2 is set to sc->nr_to_scan and make
nr to be zero. Then bch_mca_scan() will do nothing more then acquiring
and releasing mutex c->bucket_lock.
This patch checkes whether nr is 0 after the above calculation, if 0
is the result then set 1 to variable 'n'. Then at least bch_mca_scan()
will try to shrink a single B+tree node.
Link: https://lore.kernel.org/r/4567bcae94523b47d6f3b77450ba305823bca479.1572656814.git.fthain@telegraphics.com.au Reported-and-tested-by: Michael Schmitz <schmitzmic@gmail.com> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com>
References: commit 68ab2d76e4be ("scsi: cxlflash: Set sg_tablesize to 1 instead of SG_NONE") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Avoids confusion when printing Oops message like below
Faulting instruction address: 0xc00000000008bdb4
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
This was because we never clear the MMU_FTR_HPTE_TABLE feature flag
even if we run with radix translation. It was discussed that we should
look at this feature flag as an indication of the capability to run
hash translation and we should not clear the flag even if we run in
radix translation. All the code paths check for radix_enabled() check and
if found true consider we are running with radix translation. Follow the
same sequence for finding the MMU translation string to be used in Oops
message.
This looks like a bug, but in fact the messages are from different
functions and mean slightly different things. So keep both but change
one of the messages slightly, so that it's clear they are different:
Output before this fix:
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: RFI Flush, L1D private per thread
# echo 0 > /sys/kernel/debug/powerpc/rfi_flush
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: L1D private per thread
Output after fix:
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: RFI Flush, L1D private per thread
# echo 0 > /sys/kernel/debug/powerpc/rfi_flush
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Vulnerable: L1D private per thread
Signed-off-by: Gustavo L. F. Walbon <gwalbon@linux.ibm.com> Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190502210907.42375-1-gwalbon@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The newer ibm,drc-info property is a condensed description of the old
ibm,drc-* properties (ie. names, types, indexes, and power-domains).
When matching a drc-index to a drc-name we need to verify that the
index is within the start and last drc-index range and map it to a
drc-name using the drc-name-prefix and logical index.
Fix the mapping by checking that the index is within the range of the
current drc-info entry, and build the name from the drc-name-prefix
concatenated with the starting drc-name-suffix value and the sequential
index obtained by subtracting ibm,my-drc-index from this entries
drc-start-index.
The device tree is in big endian format and any properties directly
retrieved using OF helpers that don't explicitly byte swap should
be annotated. In particular there are several places where we grab
the opaque property value for the old ibm,drc-* properties and the
ibm,my-drc-index property.
Fix this for better static checking by annotating values we know to
explicitly big endian, and byte swap where appropriate.
In the event that the partition is migrated to a platform with older
firmware that doesn't support the ibm,drc-info property the device
tree is modified to remove the ibm,drc-info property and replace it
with the older style ibm,drc-* properties for types, names, indexes,
and power-domains. One of the requirements of the drc-info firmware
feature is that the client is able to handle both the new property,
and old style properties at runtime. Therefore we can't rely on the
firmware feature alone to dictate which property is currently
present in the device tree.
Fix this short coming by checking explicitly for the ibm,drc-info
property, and falling back to the older ibm,drc-* properties if it
doesn't exist.
When unloading the module, one gets
------------[ cut here ]------------
Device 'cmm0' does not have a release() function, it is broken and must be fixed. See Documentation/kobject.txt.
WARNING: CPU: 0 PID: 19308 at drivers/base/core.c:1244 .device_release+0xcc/0xf0
...
We only have one static fake device. There is nothing to do when
releasing the device (via cmm_exit()).
In function __ufshcd_query_descriptor(), in the event of an error
happening, we directly goto out_unlock and forget to invaliate
hba->dev_cmd.query.descriptor pointer. This results in this pointer still
valid in ufshcd_copy_query_response() for other query requests which go
through ufshcd_exec_raw_upiu_cmd(). This will cause __memcpy() crash and
system hangs. Log as shown below:
Unable to handle kernel paging request at virtual address ffff000012233c40
Mem abort info:
ESR = 0x96000047
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000047
CM = 0, WnR = 1
swapper pgtable: 4k pages, 48-bit VAs, pgdp = 0000000028cc735c
[ffff000012233c40] pgd=00000000bffff003, pud=00000000bfffe003,
pmd=00000000ba8b8003, pte=0000000000000000
Internal error: Oops: 96000047 [#2] PREEMPT SMP
...
Call trace:
__memcpy+0x74/0x180
ufshcd_issue_devman_upiu_cmd+0x250/0x3c0
ufshcd_exec_raw_upiu_cmd+0xfc/0x1a8
ufs_bsg_request+0x178/0x3b0
bsg_queue_rq+0xc0/0x118
blk_mq_dispatch_rq_list+0xb0/0x538
blk_mq_sched_dispatch_requests+0x18c/0x1d8
__blk_mq_run_hw_queue+0xb4/0x118
blk_mq_run_work_fn+0x28/0x38
process_one_work+0x1ec/0x470
worker_thread+0x48/0x458
kthread+0x130/0x138
ret_from_fork+0x10/0x1c
Code: 540000aba8c12027a88120c7a8c12027 (a88120c7)
---[ end trace 793e1eb5dff69f2d ]---
note: kworker/0:2H[2054] exited with preempt_count 1
This patch is to move "descriptor = NULL" down to below the label
"out_unlock".
Fixes: d44a5f98bb49b2(ufs: query descriptor API) Link: https://lore.kernel.org/r/20191112223436.27449-3-huobean@gmail.com Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The first entry of the ibm,drc-info property is an int encoded count
of the number of drc-info entries that follow. The "value" pointer
returned by of_prop_next_u32() is still pointing at the this value
when we call of_read_drc_info_cell(), but the helper function
expects that value to be pointing at the first element of an entry.
Fix up by incrementing the "value" pointer to point at the first
element of the first drc-info entry prior.
When using this driver on a Blizzard 1260, there were failures whenever DMA
transfers from the SCSI bus to memory of 65535 bytes were followed by a DMA
transfer of 1 byte. This caused the byte at offset 65535 to be overwritten
with 0xff. The Blizzard hardware can't handle single byte DMA transfers.
Besides this issue, limiting the DMA length to something that is not a
multiple of the page size is very inefficient on most file systems.
It seems this limit was chosen because the DMA transfer counter of the ESP
by default is 16 bits wide, thus limiting the length to 65535 bytes.
However, the value 0 means 65536 bytes, which is handled by the ESP and the
Blizzard just fine. It is also the default maximum used by esp_scsi when
drivers don't provide their own dma_length_limit() function.
The limit of 65536 bytes can be used by all boards except the Fastlane. The
old driver used a limit of 65532 bytes (0xfffc), which is reintroduced in
this patch.
Fixes: b7ded0e8b0d1 ("scsi: zorro_esp: Limit DMA transfers to 65535 bytes") Link: https://lore.kernel.org/r/20191112175523.23145-1-jongk@linux-m68k.org Signed-off-by: Kars de Jong <jongk@linux-m68k.org> Reviewed-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
*** CID 101747: Null pointer dereferences (FORWARD_NULL)
/drivers/scsi/lpfc/lpfc_els.c: 4439 in lpfc_cmpl_els_rsp()
4433 kfree(mp);
4434 }
4435 mempool_free(mbox, phba->mbox_mem_pool);
4436 }
4437 out:
4438 if (ndlp && NLP_CHK_NODE_ACT(ndlp)) {
vvv CID 101747: Null pointer dereferences (FORWARD_NULL)
vvv Dereferencing null pointer "shost".
4439 spin_lock_irq(shost->host_lock);
4440 ndlp->nlp_flag &= ~(NLP_ACC_REGLOGIN | NLP_RM_DFLT_RPI);
4441 spin_unlock_irq(shost->host_lock);
4442
4443 /* If the node is not being used by another discovery thread,
4444 * and we are sending a reject, we are done with it.
Fix by adding a check for non-null shost in line 4438.
The scenario when shost is set to null is when ndlp is null.
As such, the ndlp check present was sufficient. But better safe
than sorry so add the shost check.
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 101747 ("Null pointer dereferences") Fixes: 2e0fef85e098 ("[SCSI] lpfc: NPIV: split ports") CC: James Bottomley <James.Bottomley@SteelEye.com> CC: "Gustavo A. R. Silva" <gustavo@embeddedor.com> CC: linux-next@vger.kernel.org Link: https://lore.kernel.org/r/20191111230401.12958-3-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Naresh reported LTP diotest4 failing for 32bit x86 and arm -next
kernels on ext4. Same problem exists in 5.4-rc7 on xfs.
The failure comes down to:
openat(AT_FDCWD, "testdata-4.5918", O_RDWR|O_DIRECT) = 4
mmap2(NULL, 4096, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f7b000
read(4, 0xb7f7b000, 4096) = 0 // expects -EFAULT
Problem is conversion at iomap_dio_bio_actor() return. Ternary
operator has a return type and an attempt is made to convert each
of operands to the type of the other. In this case "ret" (int)
is converted to type of "copied" (unsigned long). Both have size
of 4 bytes:
size_t copied = 0;
int ret = -14;
long long actor_ret = copied ? copied : ret;
On x86_64: actor_ret == -14;
On x86 : actor_ret == 4294967282
Replace ternary operator with 2 return statements to avoid this
unwanted conversion.
Fixes: 4721a6010990 ("iomap: dio data corruption and spurious errors when pipes fill") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add missing dma channels free calls in case of error during probe
and reorder the remove function so that dma channels are freed after
the i2c adapter is deleted.
Overall, reorder the remove function so that probe error handling order
and remove function order are same.
Since commit 7723f4c5ecdb ("driver core: platform: Add an error message
to platform_get_irq*()"), platform_get_irq_byname() displays an error
when the IRQ isn't found. Since the SMMUv3 driver uses that function to
query which interrupt method is available, the message is now displayed
during boot for any SMMUv3 that doesn't implement the combined
interrupt, or that implements MSIs.
[ 20.700337] arm-smmu-v3 arm-smmu-v3.7.auto: IRQ combined not found
[ 20.706508] arm-smmu-v3 arm-smmu-v3.7.auto: IRQ eventq not found
[ 20.712503] arm-smmu-v3 arm-smmu-v3.7.auto: IRQ priq not found
[ 20.718325] arm-smmu-v3 arm-smmu-v3.7.auto: IRQ gerror not found
Use platform_get_irq_byname_optional() to avoid displaying a spurious
error.
Fixes: 7723f4c5ecdb ("driver core: platform: Add an error message to platform_get_irq*()") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Quota statistics counted as 64-bit per-cpu counter. Reading sums per-cpu
fractions as signed 64-bit int, filters negative values and then reports
lower half as signed 32-bit int.
As seen on the new Raspberry Pi 4 and sta2x11's DMA implementation it is
possible for a device configured with 32 bit DMA addresses and a partial
DMA mapping located at the end of the address space to overflow. It
happens when a higher physical address, not DMAable, is translated to
it's DMA counterpart.
For example the Raspberry Pi 4, configurable up to 4 GB of memory, has
an interconnect capable of addressing the lower 1 GB of physical memory
with a DMA offset of 0xc0000000. It transpires that, any attempt to
translate physical addresses higher than the first GB will result in an
overflow which dma_capable() can't detect as it only checks for
addresses bigger then the maximum allowed DMA address.
Fix this by verifying in dma_capable() if the DMA address range provided
is at any point lower than the minimum possible DMA address on the bus.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
For an external clock source, which is gated via a GPIO, the
rate change should typically be propagated to the parent clock.
The situation where we are requiring this propagation, is when an
external clock is connected to override an internal clock (which typically
has a fixed rate). The external clock can have a different rate than the
internal one, and may also be variable, thus requiring the rate
propagation.
This rate change wasn't propagated until now, and it's unclear about cases
where this shouldn't be propagated. Thus, it's unclear whether this is
fixing a bug, or extending the current driver behavior. Also, it's unsure
about whether this may break any existing setups; in the case that it does,
a device-tree property may be added to disable this flag.
Some RCGs (the gfx_3d_src_clk in msm8998 for example) are basically just
some constant ratio from the input across the entire frequency range. It
would be great if we could specify the frequency table as a single entry
constant ratio instead of a long list, ie:
{ .src = P_GPUPLL0_OUT_EVEN, .pre_div = 3 },
{ }
So, lets support that.
We need to fix a corner case in qcom_find_freq() where if the freq table
is non-null, but has no frequencies, we end up returning an "entry" before
the table array, which is bad. Then, we need ignore the freq from the
table, and instead base everything on the requested freq.
When MSM8998 support was added, and analysis was done to determine what
clocks would be consumed. That analysis had a flaw, which caused the
pnoc to be skipped. The pnoc clock needs to be on to access the uart
for the console. The clock is on from boot, but has no consumer votes
in the RPM. When we attempt to boot the modem, it causes the RPM to
turn off pnoc, which kills our access to the console and causes CPU hangs.
We need pnoc to be defined, so that clk_smd_rpm_handoff() will put in
an implicit vote for linux and prevent issues when booting modem.
Hopefully pnoc can be consumed by the interconnect framework in future
so that Linux can rely on explicit votes.
Fixes: 6131dc81211c ("clk: qcom: smd: Add support for MSM8998 rpm clocks") Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com> Link: https://lkml.kernel.org/r/20191107190615.5656-1-jeffrey.l.hugo@gmail.com Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This is causing xfstest generic/579 to fail due to fsck.f2fs reporting errors.
I'm not sure what the problem is, but it still happens even with all the
fs-verity stuff in the test commented out, so that the test just runs fsstress.
generic/579 23s ... [10:02:25]
[ 7.745370] run fstests generic/579 at 2019-11-04 10:02:25
_check_generic_filesystem: filesystem on /dev/vdc is inconsistent
(see /results/f2fs/results-default/generic/579.full for details)
[10:02:47]
Ran: generic/579
Failures: generic/579
Failed 1 of 1 tests
Xunit report: /results/f2fs/results-default/result.xml
Here's the contents of 579.full:
_check_generic_filesystem: filesystem on /dev/vdc is inconsistent
*** fsck.f2fs output ***
[ASSERT] (__chk_dots_dentries:1378) --> Bad inode number[0x24] for '..', parent parent ino is [0xd10]
The root cause is that we forgot to update directory's i_pino during
cross_rename, fix it.
Fixes: 32f9bc25cbda0 ("f2fs: support ->rename2()") Signed-off-by: Chao Yu <yuchao0@huawei.com> Tested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the driver receives a login that is later then LOGO'd by the remote port
(aka ndlp), the driver, upon the completion of the LOGO ACC transmission,
will logout the node and unregister the rpi that is being used for the
node. As part of the unreg, the node's rpi value is replaced by the
LPFC_RPI_ALLOC_ERROR value. If the port is subsequently offlined, the
offline walks the nodes and ensures they are logged out, which possibly
entails unreg'ing their rpi values. This path does not validate the node's
rpi value, thus doesn't detect that it has been unreg'd already. The
replaced rpi value is then used when accessing the rpi bitmask array which
tracks active rpi values. As the LPFC_RPI_ALLOC_ERROR value is not a valid
index for the bitmask, it may fault the system.
Revise the rpi release code to detect when the rpi value is the replaced
RPI_ALLOC_ERROR value and ignore further release steps.
Link: https://lore.kernel.org/r/20191105005708.7399-2-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
During heavy RCN activity and log_verbose = 0 we see these messages:
2754 PRLI failure DID:521245 Status:x9/xb2c00, data: x0
0231 RSCN timeout Data: x0 x3
0230 Unexpected timeout, hba link state x5
This is due to delayed RSCN activity.
Correct by avoiding the timeout thus the messages by restarting the
discovery timeout whenever an rscn is received.
Filter PRLI responses such that severity depends on whether expected for
the configuration or not. For example, PRLI errors on a fabric will be
informational (they are expected), but Point-to-Point errors are not
necessarily expected so they are raised to an error level.
Link: https://lore.kernel.org/r/20191105005708.7399-5-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
According to SBC-2 a TRANSFER LENGTH field of zero means that 256 logical
blocks must be transferred. Make the SCSI tracing code follow SBC-2.
Fixes: bf8162354233 ("[SCSI] add scsi trace core functions and put trace points") Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Link: https://lore.kernel.org/r/20191105215553.185018-1-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
jbd2 statistics counting number of blocks logged in a transaction was
wrong. It didn't count the commit block and more importantly it didn't
count revoke descriptor blocks. Make sure these get properly counted.
This patch addresses what Dave Chinner had discovered and fixed within
commit: 7684e2c4384d. This changes does not have any user visible
impact for ext4 as none of the current users of ext4_iomap_begin()
that extend files depend on IOMAP_F_DIRTY.
When doing a direct IO that spans the current EOF, and there are
written blocks beyond EOF that extend beyond the current write, the
only metadata update that needs to be done is a file size extension.
However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that
there is IO completion metadata updates required, and hence we may
fail to correctly sync file size extensions made in IO completion when
O_DSYNC writes are being used and the hardware supports FUA.
Hence when setting IOMAP_F_DIRTY, we need to also take into account
whether the iomap spans the current EOF. If it does, then we need to
mark it dirty so that IO completion will call generic_write_sync() to
flush the inode size update to stable storage correctly.
This patch updates the lock pattern in ext4_direct_IO_read() to not
block on inode lock in cases of IOCB_NOWAIT direct I/O reads. The
locking condition implemented here is similar to that of 942491c9e6d6
("xfs: fix AIM7 regression").
With large memory (8TB and more) hotplug, we can get soft lockup
warnings as below. These were caused by a long loop without any
explicit cond_resched which is a problem for !PREEMPT kernels.
Avoid this using cond_resched() while inserting hash page table
entries. We already do similar cond_resched() in __add_pages(), see
commit f64ac5e6e306 ("mm, memory_hotplug: add scheduling point to
__add_pages").
Using Makefile's wildcard with absolute path to detect
the presence of libyaml results in false-positive
detection when cross-compiling e.g. in yocto environment.
The latter results in build error:
| scripts/dtc/yamltree.o: In function `yaml_propval_int':
| yamltree.c: undefined reference to `yaml_sequence_start_event_initialize'
| yamltree.c: undefined reference to `yaml_emitter_emit'
| yamltree.c: undefined reference to `yaml_scalar_event_initialize'
...
Use pkg-config to locate libyaml to address this scenario.
Signed-off-by: Pavel Modilaynen <pavel.modilaynen@axis.com>
[robh: silence stderr] Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If a hardware-specific driver does not provide a name, the timer-of core
falls back to device_node.name. Due to generic DT node naming policies,
that name is almost always "timer", and thus doesn't identify the actual
timer used.
Fix this by using device_node.full_name instead, which includes the unit
addrees.
The NETDEV_CHANGENAME code is not "unneeded" like it is stated in commit 4cb6560514fa ("leds: trigger: netdev: fix refcnt leak on interface
rename").
The event was accidentally misinterpreted equivalent to
NETDEV_UNREGISTER, but should be equivalent to NETDEV_REGISTER.
This was the case in the original code from the openwrt project.
Otherwise, you are unable to set netdev led triggers for (non-existent)
netdevices, which has to be renamed. This is the case, for example, for
ppp interfaces in openwrt.
Fixes: 06f502f57d0d ("leds: trigger: Introduce a NETDEV trigger") Fixes: 4cb6560514fa ("leds: trigger: netdev: fix refcnt leak on interface rename") Signed-off-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
platform_get_irq_byname() might return -errno which later would be cast
to an unsigned int and used in IRQ handling code leading to usage of
wrong ID and errors about wrong irq_base.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Peng Ma <peng.ma@nxp.com> Tested-by: Peng Ma <peng.ma@nxp.com> Link: https://lore.kernel.org/r/20191004150826.6656-1-krzk@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Daniele reported that issue previously fixed in c41f9ea998f3
("drivers: dma-coherent: Account dma_pfn_offset when used with device
tree") reappear shortly after 43fc509c3efb ("dma-coherent: introduce
interface for default DMA pool") where fix was accidentally dropped.
Lets put fix back in place and respect dma-ranges for reserved memory.
As we've seen from USB and other areas[1], we need to always do runtime
checks for DMA operating on memory regions that might be remapped. This
adds vmap checks (similar to those already in USB but missing in other
places) into dma_map_single() so all callers benefit from the checking.
Some of our scripts are passed $objdump and then call it as
"$objdump". This doesn't work if it contains spaces because we're
using ccache, for example you get errors such as:
./arch/powerpc/tools/relocs_check.sh: line 48: ccache ppc64le-objdump: No such file or directory
./arch/powerpc/tools/unrel_branch_check.sh: line 26: ccache ppc64le-objdump: No such file or directory
Fix it by not quoting the string when we expand it, allowing the shell
to do the right thing for us.
Fixes: a71aa05e1416 ("powerpc: Convert relocs_check to a shell script using grep") Fixes: 4ea80652dc75 ("powerpc/64s: Tool to flag direct branches from unrelocated interrupt vectors") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191024004730.32135-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
Some of our TM (Transactional Memory) tests, list "r1" (the stack
pointer) as a clobbered register.
GCC >= 9 doesn't accept this, and the build breaks:
ptrace-tm-spd-tar.c: In function 'tm_spd_tar':
ptrace-tm-spd-tar.c:31:2: error: listing the stack pointer register 'r1' in a clobber list is deprecated [-Werror=deprecated]
31 | asm __volatile__(
| ^~~
ptrace-tm-spd-tar.c:31:2: note: the value of the stack pointer after an 'asm' statement must be the same as it was before the statement
We do have some fairly large inline asm blocks in these tests, and
some of them do change the value of r1. However they should all return
to C with the value in r1 restored, so I think it's legitimate to say
r1 is not clobbered.
As Segher points out, the r1 clobbers may have been added because of
the use of `or 1,1,1`, however that doesn't actually clobber r1.
Segher also points out that some of these tests do clobber LR, because
they call functions, and that is not listed in the clobbers, so add
that where appropriate.